Page 1 of 2 1 2 LastLast
Results 1 to 15 of 23

Do i need a robots.txt?

This is a discussion on Do i need a robots.txt? within the Ad Networks forums, part of the Monetizing category; I'm just wondering, will it help or hurt my site? Do i really need it?...

  1. #1
    Member
    Real Name
    Sean
    Join Date
    Jan 2008
    Location
    Honolulu, HI
    Posts
    32
    Liked
    0 times

    Do i need a robots.txt?

    I'm just wondering, will it help or hurt my site? Do i really need it?

  2. #2
    Senior Member
    Real Name
    dave
    Join Date
    Jun 2006
    Posts
    348
    Liked
    0 times
    Blog Entries
    1
    It'll help reduce duplicate content from my understanding.

  3. #3
    Member
    Real Name
    Sean
    Join Date
    Jan 2008
    Location
    Honolulu, HI
    Posts
    32
    Liked
    0 times
    is duplicate content a bad thing?

  4. #4
    Senior Member
    Real Name
    Michael Biddle
    Join Date
    Jan 2007
    Location
    Southern California
    Posts
    7,097
    Liked
    4 times
    Yes, very much so.
    The Forum Hosting - Forum Hosting from the Forum Experts

  5. #5
    Member
    Real Name
    Sean
    Join Date
    Jan 2008
    Location
    Honolulu, HI
    Posts
    32
    Liked
    0 times
    ok so when i go to create one, what is the very best thing to put in the txt file itself so that i can just c/p what the masters have put in theirs? :P

  6. #6
    Member
    Real Name
    Sean
    Join Date
    Jan 2008
    Location
    Honolulu, HI
    Posts
    32
    Liked
    0 times
    i found a post from the dude that made like 1400 in a day or whatever and put this in my txt:

    User-agent: *
    Disallow: /admincp/
    Disallow: /cgi-bin/
    Disallow: /clientscript/
    Disallow: /includes/
    Disallow: /install/
    Disallow: /modcp/
    Disallow: /subscription.php
    Disallow: /payments.php
    Disallow: /profile.php
    Disallow: /faq.php
    Disallow: /calendar.php
    Disallow: /search.php
    Disallow: /private.php
    Disallow: /online.php
    Disallow: /sendmessage.php
    Disallow: /sendmessage.php?do=
    Disallow: /showgroups.php
    Disallow: /reputation.php
    Disallow: /report.php
    Disallow: /threadrate.php
    Disallow: /postings.php
    Disallow: /newthread.php
    Disallow: /newreply.php
    Disallow: /register.php
    Disallow: /login.php
    Disallow: /faq.php
    Disallow: /image.php
    Disallow: /cron.php
    Disallow: /joinrequests.php
    Disallow: /printthread.php
    Disallow: /showpost.php
    Disallow: /archive/
    now just waiting for google to update it's 404 message.

  7. #7
    Senior Member curriertech's Avatar
    Real Name
    Josh
    Join Date
    Feb 2006
    Location
    NH
    Posts
    106
    Liked
    0 times
    IMO the main purpose of robots.txt is to keep spiders from indexing content that doesn't matter, leaving them with more time to index the content that does matter. It's about indexing efficiency more than anything, but restricting them from showpost.php and /archive/ does reduce duplicate content and strengthens the validity of your indexed pages by essentially only indexing whole threads in your forum content.

    I don't think any of this would have much impact on the AdSense spider though. It's going to hit new pages and index them regardless of what other pages it's trying to look at, because it's called by the ad script as a user loads that page, I think.

    Your mileage may vary.

  8. #8
    Member
    Real Name
    Chris
    Join Date
    Dec 2006
    Posts
    45
    Liked
    0 times
    I'm thinking of copying seangworld's list in my own robots.txt file. Anyone here not think that's a good idea? I don't know much about this kind of thing so I'm looking for advice.

  9. #9
    Member REVHEAD's Avatar
    Real Name
    David
    Join Date
    Jan 2008
    Posts
    72
    Liked
    0 times
    The one I found is differant -
    Code:
    [User-agent: *
    Disallow: /forum/admincp/
    Disallow: /forum/clientscript/
    Disallow: /forum/cpstyles/
    Disallow: /forum/customavatars/
    Disallow: /forum/customprofilepics/
    Disallow: /forum/images/
    Disallow: /forum/modcp/
    Disallow: /forum/ajax.php
    Disallow: /forum/attachment.php
    Disallow: /forum/calendar.php
    Disallow: /forum/cron.php
    Disallow: /forum/editpost.php
    Disallow: /forum/global.php
    Disallow: /forum/image.php
    Disallow: /forum/inlinemod.php
    Disallow: /forum/joinrequests.php
    Disallow: /forum/login.php
    Disallow: /forum/misc.php
    Disallow: /forum/moderator.php
    Disallow: /forum/newattachment.php
    Disallow: /forum/newreply.php
    Disallow: /forum/newthread.php
    Disallow: /forum/online.php
    Disallow: /forum/poll.php
    Disallow: /forum/postings.php
    Disallow: /forum/printthread.php
    Disallow: /forum/private.php
    Disallow: /forum/profile.php
    Disallow: /forum/register.php
    Disallow: /forum/report.php
    Disallow: /forum/reputation.php
    Disallow: /forum/search.php
    Disallow: /forum/sendmessage.php
    Disallow: /forum/subscription.php
    Disallow: /forum/threadrate.php
    Disallow: /forum/usercp.php
    Disallow: /forum/usernote.php
    Can we have a definitive robots text from some one high up here please, they all seem to be differant.

  10. #10
    Senior Member briansol's Avatar
    Real Name
    Brian
    Join Date
    Apr 2006
    Location
    Central CT, USA
    Posts
    6,981
    Liked
    8 times
    robots.tx ONLY works in root.

    if your site is in /forums, you should use the 2nd version with the /forums/ directive

  11. #11
    Senior Member briansol's Avatar
    Real Name
    Brian
    Join Date
    Apr 2006
    Location
    Central CT, USA
    Posts
    6,981
    Liked
    8 times
    Quote Originally Posted by curriertech View Post
    I don't think any of this would have much impact on the AdSense spider though. It's going to hit new pages and index them regardless of what other pages it's trying to look at, because it's called by the ad script as a user loads that page, I think.
    The adsense spider is a totally different bot/service from the indexer.

  12. #12
    Member
    Real Name
    Sean
    Join Date
    Jan 2008
    Location
    Honolulu, HI
    Posts
    32
    Liked
    0 times
    grr, that makes sense. i believe i used the first one.
    correcting this now...

    i took out 2 things from it tho: the poll and profile.

  13. #13
    Member REVHEAD's Avatar
    Real Name
    David
    Join Date
    Jan 2008
    Posts
    72
    Liked
    0 times
    thanks guys

  14. #14
    Senior Member curriertech's Avatar
    Real Name
    Josh
    Join Date
    Feb 2006
    Location
    NH
    Posts
    106
    Liked
    0 times
    Quote Originally Posted by briansol View Post
    The adsense spider is a totally different bot/service from the indexer.
    Yep, I was just saying that with that one there's no delay in 'crawling' since it's called by the ad script. So is the adsense spider not governed by robots.txt at all?

  15. #15
    Senior Member Hendricius's Avatar
    Real Name
    Hendrik Kleinwaechter
    Join Date
    Jun 2007
    Location
    Hamburg
    Posts
    173
    Liked
    3 times
    Blog Entries
    5
    Quote Originally Posted by seangworld View Post
    i found a post from the dude that made like 1400 in a day or whatever and put this in my txt:



    now just waiting for google to update it's 404 message.
    That guy is forbidding the archive? That sounds very stupid to me...

Page 1 of 2 1 2 LastLast

Similar Threads

  1. The best robots.txt
    By MaestroX in forum General Discussion
    Replies: 1
    Last Post: 01-28-2007, 05:53 PM
  2. No Robots.txt
    By friendly in forum Troubleshooting
    Replies: 1
    Last Post: 10-26-2006, 11:30 AM
  3. Robots.txt Help
    By soletrader in forum General Discussion
    Replies: 7
    Last Post: 10-13-2006, 10:31 AM
  4. How is my robots.txt?
    By libertylounge in forum Troubleshooting
    Replies: 6
    Last Post: 08-26-2006, 01:17 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •