Discussions

I just curious. What is the contents of your robots.txt file?

I keep seeing these blocked URL's in my adsense account under site diagnostics. I've got all my sitemaps under control, - so I think - but, still getting error messages.

Do I want a robots.txt file?
Do I need a robots.txt file?

The main reason I haven't bothered with one, is the multiple of instances I would have to create it in each of my domains, so I did nothing instead.

Reply

User Comments

  1. flamingpoodle
    I do not have one.
    The robots.txt file prevents web spiders form accessing certain parts of your site.
    Without this file, those parts would be publically viewable.

    This is a cool tutorial:
    www.outfront.net/tutorials_02/adv_tech/robots.htm
    1. PetLvr
      Thanks for the link .. But, is this 2002 tutorial still valid today?
  2. markstoneman
    No I don't, though I'd never considered the issue. I do have a site map file on my personal site (not my blogs, which are not self-hosted), which I believe fulfills the function FP is talking about here.
  3. urikalish
    Just a small correction, it does not prevent web spiders, it's more like a polite request.
    1. flamingpoodle
      Yes, it doesn't actually prevent crawlers from entering your site.
      It is a file that asks robots not to list certain areas of your site. Although it is supported there is no official standard.

      Thank you.
  4. xtremer
    usually most of the sites have a robots.txt file. The only difference is, sometimes you cannot access this file. But, it is edited using meta tags......
  5. Kiwipulse
    robots.txt is a good SEO tool, everyone blog should have one! The robots.txt allows you to tell the search engine crawlers what they can and can’t put on file.
    1. PetLvr
      kiwi .. does that mean you have a robots.txt file? How does yours look? or rather, what type of data are you denying robots?
    2. Kiwipulse
      You can look an article that johncow did.
      www.johncow.com/how-important-is-robotstxt-for-google/

      Well I allow google bot for image, which I receive lots of traffic from google image. Also I allow technorati which they can ping my blog anytime.
  6. Rozie818
    Yes you should have one
    www.robotstxt.org/orig.html
    you'll notice that cgi-bin is the one that bots are not allowed to enter
    the others are like an invitation to the rest of your site
    you can disallow any directory you want.
    Let say you have adult content, you would disallow it under the google bot, this way an adsense account wouldn't be closed.

    here is a sample
    # robots.txt
    User-agent: Googlebot
    Disallow:

    User-agent: Googlebot-Image
    Disallow:

    User-agent: MSNBot
    Disallow:

    User-agent: Slurp
    Disallow:

    User-agent: Teoma
    Disallow:

    User-agent: Gigabot
    Disallow:

    User-agent: Scrubby
    Disallow:

    User-agent: Robozilla
    Disallow:

    User-agent: ia_archiver
    Disallow:

    User-agent: yahoo-mmcrawler
    Disallow:

    User-agent: psbot
    Disallow:

    User-agent: asterias
    Disallow:

    User-agent: yahoo-blogs/v3.9
    Disallow:

    User-agent: *
    Disallow: /cgi-bin/

    Sitemap: www.yoursite.com/sitemap.xml
    1. PetLvr
      That was extremely helpful Rozie! Thanks
  7. dzrbenson
    I have about 4 robot files, each for a different site. Plus i have a few 301's

    Dale
    dzrbenson.com/blog/
    1. PetLvr
      Hi Dale .. is it just the two liner recommended in these articles above? or, is it some comprehensive Robot.txt file like Rozie's?
  8. jungl
    User-agent: *
    Disallow:

  9. dzrbenson
    Mine are huge, But I am able to edit mine through cPanel much easier...

    But why would you want to block Google Spiders like Rosie - You want to lose traffic ?
    1. PetLvr
      I'm been using the Google Sitemap plugin for ages now - v3.0.3 now ..

      I just noticed that Robots.txt line/field was UNTICKED ... I ticked it, and let's see if this helps!

      Thanks kiwi

      PS ... do you also have that Yahoo ticked?
  10. Rozie818
    Your not blocking Google,
    as i said,

    User-agent: Googlebot
    Disallow:
    That means google will go to you

    User-agent: *
    Disallow: /adult_content/
    which means google won't enter

    see the diffenece?

    here is a easy sitemap generator
    www.xml-sitemaps.com/
  11. godsofchaos
    Ofcourse I have robots txt file... cant imagine life without it !
  12. XIII
    I think it's useless. It'll only direct the 'good' spiders, who you want to index your site anyway. The million 'bad' spiders from all the unknowns, and all the spam spiders, won't give a rats ... anyway.

Add Your Comment

Login to leave a message.