Results 1 to 10 of 10

robots.txt and vBSEO

This is a discussion on robots.txt and vBSEO within the General Discussion forums, part of the vBulletin SEO Discussion category; Any practical proposals/recommendations? Would be great! (which locations/urls should be spidered/not spidered?) Yours, Silmarillion...

  1. #1
    Senior Member Silmarillion's Avatar
    Real Name
    Christian
    Join Date
    Jul 2005
    Location
    Germany
    Posts
    412
    Liked
    1 times

    Arrow robots.txt and vBSEO

    Any practical proposals/recommendations? Would be great!

    (which locations/urls should be spidered/not spidered?)

    Yours, Silmarillion

  2. #2
    vBSEO Staff Juan Muriente's Avatar
    Real Name
    Juan Carlos Muriente
    Join Date
    Jun 2005
    Location
    Puerto Rico
    Posts
    14,267
    Liked
    558 times
    Hi Simarillion,

    Actually, I discussed this previously with another tester. I apologize for not making it available to the vBSEO forums earlier. Here is an excerpt from one of my PMs regarding the use of robots.txt with vBSEO. Please let me know if you have any questions.

    Quote Originally Posted by Juan Muriente
    Using robots.txt to block the crawling of *some* of your vB .php files will work, but you have to be careful.

    Since vBSEO converts everything to static CRUs (depending on your configuration) it will not effect links that are discovered while Google crawls your forums directly. Note: The issue arises when there are websites that have linkbacks to you from their forums or pages that are in the old dynamic format.

    Example:

    Site A, B, and C all link to threads on your forums in the dynamic format (showthread.php).

    (i) Whenever a user clicks one of the links to your site on site A, B, or C they will be 301 redirected to the new vBSEO static CRU.
    (ii) If Google finds the link on site A, B, or C they will also be 301 redirected to the new location. Note: This is where a potential problem arises.

    - Google always checks the robots.txt file before crawling your website.
    - If your robots.txt disallows showthread.php then Google will not likely crawl the dynamic linkbacks that it finds on site A, B, or C.
    - You will loose potential PageRank value that would otherwise have been passed to the new static CRU.
    - If your static URL was not already indexed by Google, then the dynamic link will be dropped from the index, and you will have to wait for Google to find the static version by crawling your forums directly. Of course, they will eventually find it, but you would always want them to find you as quick as possible.

    For the above reasons, robots.txt disallows for newreply.php and newthread.php (and all other less important php files) are ok. However, I would recommend not disallowing the following .php files:

    (1) showthread.php *
    (2) forumdisplay.php *

    (3) announcement.php **
    (4) member.php **
    (5) showpost.php **
    (6) printthread.php **
    (7) poll.php **

    * Most critical.
    ** If you do not want any of these pages indexed, then it is fine to add them to robots.txt as disallows.

  3. #3
    Senior Member Silmarillion's Avatar
    Real Name
    Christian
    Join Date
    Jul 2005
    Location
    Germany
    Posts
    412
    Liked
    1 times
    Ok - thx. Maybe it`s better not to use a robots.txt. ^^

    Yours, Silmarillion

  4. #4
    Senior Member Silmarillion's Avatar
    Real Name
    Christian
    Join Date
    Jul 2005
    Location
    Germany
    Posts
    412
    Liked
    1 times
    Advisable robots.txt:

    Code:
    User-agent: *
    Disallow: /calendar.php
    Disallow: /editpost.php
    Disallow: /member.php
    Disallow: /memberlist.php
    Disallow: /misc.php
    Disallow: /newreply.php
    Disallow: /newthread.php
    Disallow: /printthread.php
    Disallow: /private.php
    Disallow: /register.php
    Disallow: /report.php
    Disallow: /search.php
    Disallow: /showgroups.php
    Disallow: /usercp.php
    Disallow: /impressum.php
    Disallow: /admincp/ 
    Disallow: /modcp/
    Disallow: /online.php
    Disallow: /subscription.php
    Disallow: /sendtofriend.php
    Disallow: /threadrate.php
    Disallow: /poll.php
    Disallow: /attachment.php 
    Disallow: /avatar.php
    Disallow: /faq.php
    Disallow: /usercp.php
    Disallow: /profile.php
    Yours, Silmarillion
    Last edited by Silmarillion; 08-27-2005 at 07:08 PM.

  5. #5
    Junior Member Katrina's Korner's Avatar
    Join Date
    Aug 2005
    Posts
    14
    Liked
    0 times
    I am definitely looking forward to purchasing this mod. I too am wondering about a proper robots.txt file. From what I can see Juan (the admin) is extremely professional and understands that a lot of us don't want to necessarily think about what needs to be done. So I am assuming at some point, a "recommended" robots.txt file will arise here...

    * patiently waiting for the full release and a suggested robots.txt file *

  6. #6
    Senior Member Silmarillion's Avatar
    Real Name
    Christian
    Join Date
    Jul 2005
    Location
    Germany
    Posts
    412
    Liked
    1 times
    You can use the robots.txt from my last posting.

    Yours, Silmarillion

  7. #7
    Junior Member Katrina's Korner's Avatar
    Join Date
    Aug 2005
    Posts
    14
    Liked
    0 times
    Quote Originally Posted by Silmarillion
    You can use the robots.txt from my last posting.

    Yours, Silmarillion
    I was kind of hoping for a suggestion from the developer, since he probably knows what changes will have positive and negative effects.

    Your post before the last one says "It is probably better to not have a robots.txt file then". But now you recommend one. So, are you really sure if you should have one and what should be in it?

  8. #8
    Senior Member
    Real Name
    Joseph Ward
    Join Date
    Jun 2005
    Posts
    23,847
    Liked
    32 times
    Blog Entries
    9
    @All:

    In general, vBSEO does not require a robots.txt file. Our 301 system along with our rel="nofollow" usage takes care of all issues.

    However, a robots.txt file can be used for redundancy if you choose. We'll look into providing something standard.

    When using robots.txt there are some issues to watch out for though. See Juan's post above for a list of dynamic scripts that should not be excluded in robots.txt.

  9. #9
    Senior Member I, Brian's Avatar
    Join Date
    Sep 2005
    Location
    Scotland
    Posts
    120
    Liked
    1 times
    Apparently a robots text does not disallow a robot from *indexing* content in such folders - but instead disallows robots from *downloading* the contents - see this:
    http://forums.searchenginewatch.com/...ead.php?t=7839

  10. #10
    Member
    Real Name
    Patrick
    Join Date
    Sep 2007
    Posts
    75
    Liked
    0 times
    i was wondering if i could use this to ask about mine. i didnt' feel like bugging everyone with a new thread. this one seems just right. i hope it's ok.

    User-agent: Titan
    Disallow: /
    User-agent: EmailCollector
    Disallow: /
    User-agent: EmailSiphon
    Disallow: /
    User-agent: EmailWolf
    Disallow: /
    User-agent: ExtractorPro
    Disallow: /
    User-agent: *
    Disallow: /forum/admincp/
    Disallow: /forum/cgi-bin/
    Disallow: /forum/clientscript/
    Disallow: /forum/includes/
    Disallow: /forum/install/
    Disallow: /forum/modcp/
    Disallow: /forum/sitemap/
    Disallow: /forum/tags/
    Disallow: /forum/printthread.php
    Disallow: /forum/subscription.php
    Disallow: /forum/profile.php
    Disallow: /forum/faq.php
    Disallow: /forum/calendar.php
    Disallow: /forum/private.php
    Disallow: /forum/sendmessage.php
    Disallow: /forum/sendmessage.php?do=
    Disallow: /forum/showgroups.php
    Disallow: /forum/reputation.php
    Disallow: /forum/report.php
    Disallow: /forum/threadrate.php
    Disallow: /forum/postings.php
    Disallow: /forum/newthread.php
    Disallow: /forum/search.php
    Disallow: /forum/newreply.php
    Disallow: /forum/register.php
    Disallow: /forum/login.php
    Disallow: /forum/faq.php
    Disallow: /forum/image.php
    Disallow: /forum/cron.php
    Disallow: /forum/joinrequests.php
    Disallow: /forum/usercp.php
    someone made this for us and i'd like to know if is correct. i notice your examples don't put the /forum/ before it. so either that means i need a robot inside my forum directory or that you guys just have your forum as the main page. maybe it doens't really matter.

    my major concers are for the first 5 user agent's. they're ok to block off correct? i dont' even know what/who they are.

Similar Threads

  1. vBSEO 2.4.0 Released - Includes Google AdSense Targeting Feature!
    By Juan Muriente in forum vBSEO Announcements
    Replies: 74
    Last Post: 05-20-2006, 10:29 PM
  2. vBSEO 2.0 RC7 Released
    By Juan Muriente in forum vBSEO Announcements
    Replies: 17
    Last Post: 09-09-2005, 12:00 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •