Results 1 to 9 of 9

robots.txt

This is a discussion on robots.txt within the Troubleshooting forums, part of the vBSEO Google/Yahoo Sitemap category; i've just created a new robots.txt (that i'd put onto the main folder - /public_html) this is the code: Code: ...

  1. #1
    Senior Member Array KURTZ's Avatar
    Real Name
    Christian
    Join Date
    May 2008
    Location
    Italy
    Posts
    287
    Liked
    2 times
    Blog Entries
    6

    robots.txt

    i've just created a new robots.txt (that i'd put onto the main folder - /public_html) this is the code:

    Code:
    User-agent: *
    Allow: /
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    it's correct? any suggestion? (i've grabbed the sitemap's URL from the ACP)

    but i saw one thing in the GTools: when i run a test against my board domain (FNIV - Federazione Italiana Videogiocatori) i got this result:

    Allowed by line 2: Allow: /
    Detected as a directory; specific files may have different restrictions
    so if i change the sitemap's URL in the robots.txt and put:

    Code:
    Sitemap: http://www.fniv.it/sitemap_index.xml.gz
    and run the test again, now i got this result:

    Allowed by line 2: Allow: /
    Detected as a directory; specific files may have different restrictions
    and (Parse Results)

    Line 3: Sitemap: http://www.fniv.it/sitemap_index.xml.gz Valid Sitemap reference detected
    then i'm wondering about this, what's the correct robots.txt sitemap's URL?

  2. #2
    Member Array Peter_Rosado's Avatar
    Real Name
    Peter Anthony
    Join Date
    May 2006
    Location
    Puerto Rico
    Posts
    59
    Liked
    0 times
    Hello KURTZ,


    It seems there is a problem with the "/"

    Could you try?
    with no Allow: /

    Also, trying to submit http://www.fniv.it/board/xmlsitemap.php via GTools would be good since it's easier and really effective. (Tried it myself!)

    Does that work?
    Last edited by Peter_Rosado; 07-23-2010 at 08:29 AM. Reason: syntax mistake by my part, blame it on waking up so early :P

  3. #3
    Senior Member Array KURTZ's Avatar
    Real Name
    Christian
    Join Date
    May 2008
    Location
    Italy
    Posts
    287
    Liked
    2 times
    Blog Entries
    6
    Sitemap: /board/sitemap_index.xml.gz
    result:

    Allowed by line 2: Allow: /
    Detected as a directory; specific files may have different restrictions

    and

    Invalid sitemap URL detected; syntax not understood

    Allowed by line 2: Allow: /
    Detected as a directory; specific files may have different restrictions

    (check the pic, i have a different URL)

    same thing with this:

    Attached Thumbnails Attached Thumbnails welcome-vbulletin-admin-control-panel-fniv-federazione-italiana-videogiocatori-vbul.png  

  4. #4
    Member Array Peter_Rosado's Avatar
    Real Name
    Peter Anthony
    Join Date
    May 2006
    Location
    Puerto Rico
    Posts
    59
    Liked
    0 times
    Sorry I edited my previous post a bit late.

    It seems there is a problem with the "/"

    Could you try?
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    Deleting Allow: /
    Code:
    User-agent: *
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    There was really no problem at first. That's just google's way of saying the directory is available but other specific directories/files can have different restrictions (allows or disallows).

  5. #5
    vBSEO Staff Array Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    675 times
    Blog Entries
    2
    Brian Cummiskey / Crawlability Inc.
    Security bulletin - Patch Level for all supported versions released

    Unveiling the NEW vBSEO Sitemap Generator 3.0. - available NOW for vBSEO Customers!


  6. #6
    Senior Member Array KURTZ's Avatar
    Real Name
    Christian
    Join Date
    May 2008
    Location
    Italy
    Posts
    287
    Liked
    2 times
    Blog Entries
    6
    Quote Originally Posted by Peter_Rosado View Post
    Sorry I edited my previous post a bit late.



    Deleting Allow: /
    Code:
    User-agent: *
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    There was really no problem at first. That's just google's way of saying the directory is available but other specific directories/files can have different restrictions (allows or disallows).
    just tried this:

    Code:
    User-agent: *
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    result:

    Allowed
    Detected as a directory; specific files may have different restrictions

  7. #7
    Senior Member Array KURTZ's Avatar
    Real Name
    Christian
    Join Date
    May 2008
    Location
    Italy
    Posts
    287
    Liked
    2 times
    Blog Entries
    6
    Quote Originally Posted by Brian Cummiskey View Post
    thanks Brian just tested this:

    Code:
    # Allow Archive.org to save snapshots of everything
    User-agent: ia_archiver
    Allow: /
    
    # Tame yahoo... it tends to eat a ton of resources without a delay
    User-agent: Slurp
    Crawl-delay: 60
    
    
    #list individual pages and files here that all bots should ignore, as well as group extentions.
    #If you re-write everything to .html, you can disallow *.php, but note that if you don't have a CRR for custom pages, those will be blocked.
    
    User-agent: *
    Disallow: *.js
    Disallow: /board/clientscript/
    Disallow: /board/cpstyles/
    Disallow: /board/customavatars/
    Disallow: /board/customprofilepics/
    Disallow: /board/images/
    Disallow: /board/ajax.php
    Disallow: /board/attachment.php
    Disallow: /board/calendar.php
    Disallow: /board/cron.php
    Disallow: /board/editpost.php
    Disallow: /board/global.php
    Disallow: /board/image.php
    Disallow: /board/inlinemod.php
    Disallow: /board/joinrequests.php
    Disallow: /board/login.php
    Disallow: /board/member.php
    Disallow: /board/memberlist.php
    Disallow: /board/misc.php
    Disallow: /board/moderator.php
    Disallow: /board/newattachment.php
    Disallow: /board/newreply.php
    Disallow: /board/newthread.php
    Disallow: /board/online.php
    Disallow: /board/poll.php
    Disallow: /board/postings.php
    Disallow: /board/printthread.php
    Disallow: /board/private.php
    Disallow: /board/profile.php
    Disallow: /board/register.php
    Disallow: /board/report.php
    Disallow: /board/reputation.php
    Disallow: /board/search.php
    Disallow: /board/sendmessage.php
    Disallow: /board/showgroups.php
    Disallow: /board/subscription.php
    Disallow: /board/threadrate.php
    Disallow: /board/usercp.php
    Disallow: /board/usernote.php
    
    #Finally, list the path to your sitemap:
    Sitemap: http://www.fniv.it/board/sitemap_index.xml.gz
    and got this:

    Allowed
    Detected as a directory; specific files may have different restrictions
    and this for parse:

    Line 7: Crawl-delay: 60 Rule ignored by Googlebot
    it's fine?

    latest quesion: what about this rule?

    Code:
    Disallow: *.js
    i should put /board/ before the *.js?

  8. #8
    vBSEO Staff Array Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    675 times
    Blog Entries
    2
    1) The warning is ok.
    2) this targets slurp, aka yahoo. google will ignore it
    3) this keeps your javascript files out of the index. You can use root level (will block all javascript sitewide from being indexed)
    Brian Cummiskey / Crawlability Inc.
    Security bulletin - Patch Level for all supported versions released

    Unveiling the NEW vBSEO Sitemap Generator 3.0. - available NOW for vBSEO Customers!


  9. #9
    Senior Member Array KURTZ's Avatar
    Real Name
    Christian
    Join Date
    May 2008
    Location
    Italy
    Posts
    287
    Liked
    2 times
    Blog Entries
    6
    Quote Originally Posted by Brian Cummiskey View Post
    1) The warning is ok.
    2) this targets slurp, aka yahoo. google will ignore it
    3) this keeps your javascript files out of the index. You can use root level (will block all javascript sitewide from being indexed)
    well done Brian!

Similar Threads

  1. Replies: 6
    Last Post: 10-19-2009, 10:55 AM
  2. robots.txt
    By zems in forum General Discussion
    Replies: 0
    Last Post: 05-03-2009, 03:15 PM
  3. Redirecting /forums/robots.txt to /robots.txt - Is it good?
    By MadK in forum Custom Rewrite Rules
    Replies: 6
    Last Post: 08-22-2008, 06:29 PM
  4. robots.txt
    By frank678 in forum General Discussion
    Replies: 2
    Last Post: 03-03-2008, 02:12 AM
  5. No Robots.txt
    By friendly in forum Troubleshooting
    Replies: 1
    Last Post: 10-26-2006, 10:30 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •