Discussions
Do you have a ROBOT.txt file?
Posted by PetLvr • 1/18/08 • Subscribe to this Discussion [RSS] • Report This Topic
Topics: robots.txt
I just curious. What is the contents of your robots.txt file?
I keep seeing these blocked URL's in my adsense account under site diagnostics. I've got all my sitemaps under control, - so I think - but, still getting error messages.
Do I want a robots.txt file?
Do I need a robots.txt file?
The main reason I haven't bothered with one, is the multiple of instances I would have to create it in each of my domains, so I did nothing instead.
User Comments
-
I do not have one.
The robots.txt file prevents web spiders form accessing certain parts of your site.
Without this file, those parts would be publically viewable.
This is a cool tutorial:
www.outfront.net/tutorials_02/adv_tech/robots.htm -
-
robots.txt is a good SEO tool, everyone blog should have one! The robots.txt allows you to tell the search engine crawlers what they can and can’t put on file.
-
You can look an article that johncow did.
www.johncow.com/how-important-is-robotstxt-for-google/
Well I allow google bot for image, which I receive lots of traffic from google image. Also I allow technorati which they can ping my blog anytime.
-
Yes you should have one
www.robotstxt.org/orig.html
you'll notice that cgi-bin is the one that bots are not allowed to enter
the others are like an invitation to the rest of your site
you can disallow any directory you want.
Let say you have adult content, you would disallow it under the google bot, this way an adsense account wouldn't be closed.
here is a sample
# robots.txt
User-agent: Googlebot
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: Teoma
Disallow:
User-agent: Gigabot
Disallow:
User-agent: Scrubby
Disallow:
User-agent: Robozilla
Disallow:
User-agent: ia_archiver
Disallow:
User-agent: yahoo-mmcrawler
Disallow:
User-agent: psbot
Disallow:
User-agent: asterias
Disallow:
User-agent: yahoo-blogs/v3.9
Disallow:
User-agent: *
Disallow: /cgi-bin/
Sitemap: www.yoursite.com/sitemap.xml -
I have about 4 robot files, each for a different site. Plus i have a few 301's
Dale
dzrbenson.com/blog/ -
Also use a sitemap generator to go with your robot.txt
www.arnebrachhold.de/projects/wordpress-plugins/google-xml-sitemaps-generat... -
Your not blocking Google,
as i said,
User-agent: Googlebot
Disallow:
That means google will go to you
User-agent: *
Disallow: /adult_content/
which means google won't enter
see the diffenece?
here is a easy sitemap generator
www.xml-sitemaps.com/
Add Your Comment
Login to leave a message.










