Page 1 of 2 1 2 LastLast
Results 1 to 15 of 17

googlebot ignoring robots.txt

This is a discussion on googlebot ignoring robots.txt within the General Discussion forums, part of the vBSEO SEO Plugin category; I look at my bot crawler log and I see googlebot is crawling thousands of pages that I've specifically forbidden ...

  1. #1
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times

    googlebot ignoring robots.txt

    I look at my bot crawler log and I see googlebot is crawling thousands of pages that I've specifically forbidden via robots.txt. But then I remember that all those page names are being rewritten by vbseo, so how would googlebot know?

    Am I correct in assuming that, for example, forbidding /forums/member.php in robots.txt is useless, because googlebot will never see a page named member.php?

  2. #2
    Senior Member
    Real Name
    Michael
    Join Date
    Oct 2005
    Posts
    1,755
    Liked
    1 times
    Blog Entries
    1
    Quote Originally Posted by Doug Nelson View Post
    Am I correct in assuming that, for example, forbidding /forums/member.php in robots.txt is useless, because googlebot will never see a page named member.php?
    That is correct.

    Depending on how you have your rewrites set up, you can disallow the rewritten directory, or turn off those specific rewrites so it goes back to the vbulletin default.

  3. #3
    Senior Member
    Real Name
    Keith Cohen
    Join Date
    Jul 2005
    Location
    Raleigh, NC USA
    Posts
    6,147
    Liked
    12 times
    Correct. If there are pages you do not wish to have indexed, turn off rewrites for those pages and exclude them via robots.txt. It'll also save a little CPU load by cutting out some un-needed rewrites.

  4. #4
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times
    So it's a choice between the two? We can't have them rewritten and not indexed?

  5. #5
    Senior Member
    Real Name
    Keith Cohen
    Join Date
    Jul 2005
    Location
    Raleigh, NC USA
    Posts
    6,147
    Liked
    12 times
    There's really no need to rewrite URLs that will not be indexed. But, for members for example, you could just exclude /members/ assuming you're using the default rewrite rules.

  6. #6
    Senior Member
    Real Name
    Michael
    Join Date
    Oct 2005
    Posts
    1,755
    Liked
    1 times
    Blog Entries
    1
    It depends on how you have them rewritten. If you rewrote them in a manner than they showed up as forum/members/member-name/, you could block it by blocking /forum/members/

  7. #7
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times
    I like the rewrites for reasons beyond bot indexing. http://www.vbseo.com/members/keith-cohen/ is so much friendlier than http://www.vbseo.com/forum/member.php?u=30

  8. #8
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times
    So how would I edit this section of robots.txt to take into consideration the rewrites?

    Disallow: /forums/attachment.php
    Disallow: /forums/calendar.php
    Disallow: /forums/cron.php
    Disallow: /forums/editpost.php
    Disallow: /forums/global.php
    Disallow: /forums/image.php
    Disallow: /forums/inlinemod.php
    Disallow: /forums/joinrequests.php
    Disallow: /forums/login.php
    Disallow: /forums/member.php
    Disallow: /forums/memberlist.php
    Disallow: /forums/misc.php
    Disallow: /forums/moderator.php
    Disallow: /forums/newattachment.php
    Disallow: /forums/newreply.php
    Disallow: /forums/newthread.php
    Disallow: /forums/online.php
    Disallow: /forums/poll.php
    Disallow: /forums/postings.php
    Disallow: /forums/printthread.php
    Disallow: /forums/private.php
    Disallow: /forums/profile.php
    Disallow: /forums/register.php
    Disallow: /forums/report.php
    Disallow: /forums/reputation.php
    Disallow: /forums/search.php
    Disallow: /forums/sendmessage.php
    Disallow: /forums/showgroups.php
    Disallow: /forums/showpost.php
    Disallow: /forums/subscription.php
    Disallow: /forums/threadrate.php
    Disallow: /forums/usercp.php
    Disallow: /forums/usernote.php

  9. #9
    Senior Member
    Real Name
    Michael
    Join Date
    Oct 2005
    Posts
    1,755
    Liked
    1 times
    Blog Entries
    1
    Disallow: /forums/members/

    And you can take out: Disallow: /forums/member.php

  10. #10
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times
    Is that the only one? showpost.php seems to be my big one, but I'd definitely want those kept rewritten. printthread.php is next.

  11. #11
    Senior Member
    Real Name
    Keith Cohen
    Join Date
    Jul 2005
    Location
    Raleigh, NC USA
    Posts
    6,147
    Liked
    12 times
    For printthread you will need to disable rewriting.

    Same for showpost. And most people use the Permalink option to replace the Post Number Link with the Permalink, since individual showpost pages have never indexed well anyway.

  12. #12
    Senior Member Code Monkey's Avatar
    Real Name
    Code Monkey
    Join Date
    Aug 2006
    Posts
    780
    Liked
    0 times
    On any page you do not want to be indexed, or you want the crawlers to be shooed away from, just add this into the <head> section of that template.

    Code:
    
    <meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
    

  13. #13
    Senior Member
    Real Name
    Doug Nelson
    Join Date
    Sep 2006
    Posts
    141
    Liked
    0 times
    An excellent tip, thanks. But it doesn't really cut down on traffic, since the bot has to load the page to know not to index it.

  14. #14
    Senior Member
    Real Name
    Michael
    Join Date
    Oct 2005
    Posts
    1,755
    Liked
    1 times
    Blog Entries
    1
    If you want to continue to rewrite posts and be able to add them to you robots.txt, you can make your showpost url rewrite custom and change it to this..
    Code:
    showpost/[post_id]-[post_count]
    then add this to yoru robots.txt. file
    Code:
    Disallow: /forums/showpost/

  15. #15
    Senior Member Code Monkey's Avatar
    Real Name
    Code Monkey
    Join Date
    Aug 2006
    Posts
    780
    Liked
    0 times
    You can also add rel="nofollow" to the links to each post.

Page 1 of 2 1 2 LastLast

Similar Threads

  1. add sitemap to robots.txt for autodiscovery
    By GrendelKhan{TSU} in forum General Discussion
    Replies: 2
    Last Post: 04-13-2007, 03:21 PM
  2. quick site map question
    By sross in forum Troubleshooting
    Replies: 8
    Last Post: 01-09-2007, 11:50 PM
  3. Server Overload
    By AcornDomains.co.uk in forum Troubleshooting
    Replies: 5
    Last Post: 03-05-2006, 03:20 PM
  4. Googlebot activity in the vbSEO log
    By buro9 in forum Bug Reporting
    Replies: 18
    Last Post: 12-17-2005, 04:27 PM
  5. Googlebot not following the sitemap?
    By psico in forum Troubleshooting
    Replies: 4
    Last Post: 11-26-2005, 11:53 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •