Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast
Results 31 to 45 of 65

robots.txt

This is a discussion on robots.txt within the General Discussion forums, part of the vBulletin SEO Discussion category; Heres mine User-agent: TurnitinBot Disallow: / User-agent: Black Hole Disallow: / User-agent: Titan Disallow: / User-agent: WebStripper Disallow: / User-agent: ...

  1. #31
    Senior Member MentaL's Avatar
    Real Name
    MentaL
    Join Date
    Oct 2005
    Location
    Wales
    Posts
    425
    Liked
    8 times
    Heres mine

    User-agent: TurnitinBot
    Disallow: /
    User-agent: Black Hole
    Disallow: /
    User-agent: Titan
    Disallow: /
    User-agent: WebStripper
    Disallow: /
    User-agent: NetMechanic
    Disallow: /
    User-agent: CherryPicker
    Disallow: /
    User-agent: EmailCollector
    Disallow: /
    User-agent: EmailSiphon
    Disallow: /
    User-agent: WebBandit
    Disallow: /
    User-agent: EmailWolf
    Disallow: /
    User-agent: ExtractorPro
    Disallow: /
    User-agent: CopyRightCheck
    Disallow: /
    User-agent: Crescent
    Disallow: /
    User-agent: Wget
    Disallow: /
    User-agent: SiteSnagger
    Disallow: /
    User-agent: ProWebWalker
    Disallow: /
    User-agent: CheeseBot
    Disallow: /
    User-agent: Alexibot
    Disallow: /
    User-agent: Teleport
    Disallow: /
    User-agent: TeleportPro
    Disallow: /
    User-agent: MIIxpc
    Disallow: /
    User-agent: Telesoft
    Disallow: /
    User-agent: Website Quester
    Disallow: /
    User-agent: WebZip
    Disallow: /
    User-agent: moget/2.1
    Disallow: /
    User-agent: WebZip/4.0
    Disallow: /
    User-agent: WebSauger
    Disallow: /
    User-agent: WebCopier
    Disallow: /
    User-agent: NetAnts
    Disallow: /
    User-agent: Mister PiX
    Disallow: /
    User-agent: WebAuto
    Disallow: /
    User-agent: TheNomad
    Disallow: /
    User-agent: WWW-Collector-E
    Disallow: /
    User-agent: RMA
    Disallow: /
    User-agent: libWeb/clsHTTP
    Disallow: /
    User-agent: asterias
    Disallow: /
    User-agent: httplib
    Disallow: /
    User-agent: turingos
    Disallow: /
    User-agent: spanner
    Disallow: /
    User-agent: InfoNaviRobot
    Disallow: /
    User-agent: Harvest/1.5
    Disallow: /
    User-agent: Bullseye/1.0
    Disallow: /
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    User-agent: CherryPickerSE/1.0
    Disallow: /
    User-agent: CherryPickerElite/1.0
    Disallow: /
    User-agent: WebBandit/3.50
    Disallow: /
    User-agent: NICErsPRO
    Disallow: /
    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /
    User-agent: DittoSpyder
    Disallow: /
    User-agent: Foobot
    Disallow: /
    User-agent: WebmasterWorldForumBot
    Disallow: /
    User-agent: SpankBot
    Disallow: /
    User-agent: BotALot
    Disallow: /
    User-agent: lwp-trivial/1.34
    Disallow: /
    User-agent: lwp-trivial
    Disallow: /
    User-agent: Wget/1.6
    Disallow: /
    User-agent: BunnySlippers
    Disallow: /
    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /
    User-agent: URLy Warning
    Disallow: /
    User-agent: Wget/1.5.3
    Disallow: /
    User-agent: LinkWalker
    Disallow: /
    User-agent: cosmos
    Disallow: /
    User-agent: moget
    Disallow: /
    User-agent: hloader
    Disallow: /
    User-agent: humanlinks
    Disallow: /
    User-agent: LinkextractorPro
    Disallow: /
    User-agent: Offline Explorer
    Disallow: /
    User-agent: Mata Hari
    Disallow: /
    User-agent: LexiBot
    Disallow: /
    User-agent: Web Image Collector
    Disallow: /
    User-agent: The Intraformant
    Disallow: /
    User-agent: True_Robot/1.0
    Disallow: /
    User-agent: True_Robot
    Disallow: /
    User-agent: BlowFish/1.0
    Disallow: /
    User-agent: JennyBot
    Disallow: /
    User-agent: MIIxpc/4.2
    Disallow: /
    User-agent: BuiltBotTough
    Disallow: /
    User-agent: ProPowerBot/2.14
    Disallow: /
    User-agent: BackDoorBot/1.0
    Disallow: /
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    User-agent: WebEnhancer
    Disallow: /
    User-agent: TightTwatBot
    Disallow: /
    User-agent: suzuran
    Disallow: /
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    User-agent: VCI
    Disallow: /
    User-agent: Szukacz/1.4
    Disallow: /
    User-agent: QueryN Metasearch
    Disallow: /
    User-agent: Openfind data gathere
    Disallow: /
    User-agent: Openfind
    Disallow: /
    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /
    User-agent: Xenu's
    Disallow: /
    User-agent: Zeus
    Disallow: /
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    User-agent: RepoMonkey
    Disallow: /
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    User-agent: Webster Pro
    Disallow: /
    User-agent: EroCrawler
    Disallow: /
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    User-agent: Keyword Density/0.9
    Disallow: /
    User-agent: Kenjin Spider
    Disallow: /
    User-agent: Cegbfeieh
    Disallow: /

    #ALL BOTS


    User-agent: *
    Disallow: /forum/arcade.php
    Disallow: /forum/ajax.php
    Disallow: /forum/attachment.php
    Disallow: /forum/calendar.php
    Disallow: /forum/cron.php
    Disallow: /forum/editpost.php
    Disallow: /forum/global.php
    Disallow: /forum/image.php
    Disallow: /forum/inlinemod.php
    Disallow: /forum/joinrequests.php
    Disallow: /forum/login.php
    Disallow: /forum/member.php
    Disallow: /forum/memberlist.php
    Disallow: /forum/misc.php
    Disallow: /forum/moderator.php
    Disallow: /forum/newattachment.php
    Disallow: /forum/newreply.php
    Disallow: /forum/newthread.php
    Disallow: /forum/online.php
    Disallow: /forum/poll.php
    Disallow: /forum/postings.php
    Disallow: /forum/printthread.php
    Disallow: /forum/private.php
    Disallow: /forum/profile.php
    Disallow: /forum/register.php
    Disallow: /forum/report.php
    Disallow: /forum/reputation.php
    Disallow: /forum/search.php
    Disallow: /forum/sendmessage.php
    Disallow: /forum/showgroups.php
    Disallow: /forum/subscription.php
    Disallow: /forum/threadrate.php
    Disallow: /forum/usercp.php
    Disallow: /forum/usernote.php

  2. #32
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    Hello, I'm using this robot.txt list

    User-agent: *
    Disallow: /ajax.php
    Disallow: /clientscript/
    Disallow: /includes/
    Disallow: /install/
    Disallow: /cron.php
    Disallow: /global.php
    Disallow: /inlinemod.php
    Disallow: /joinrequests.php
    Disallow: /calendar.php
    Disallow: /editpost.php
    Disallow: /login.php
    Disallow: /member.php
    Disallow: /memberlist.php
    Disallow: /misc.php
    Disallow: /moderator.php
    Disallow: /newreply.php
    Disallow: /newthread.php
    Disallow: /printthread.php
    Disallow: /private.php
    Disallow: /register.php
    Disallow: /report.php
    Disallow: /search.php
    Disallow: /showgroups.php
    Disallow: /usercp.php
    Disallow: /impressum.php
    Disallow: /admincp/
    Disallow: /modcp/
    Disallow: /online.php
    Disallow: /subscription.php
    Disallow: /sendtofriend.php
    Disallow: /sendmessage.php
    Disallow: /spiders.php
    Disallow: /subscription.php
    Disallow: /threadrate.php
    Disallow: /poll.php
    Disallow: /attachment.php
    Disallow: /avatar.php
    Disallow: /faq.php
    Disallow: /usercp.php
    Disallow: /usernote.php
    Disallow: /profile.php
    Disallow: /vbseocp.php
    Disallow: /disclaimer.php
    Disallow: /billspaypal.php
    Disallow: /privacy.html
    Disallow: /customavatars/
    Disallow: /xmlrpc.php
    Disallow: /archive/
    Disallow: /sitemap/
    Disallow: /cgi-bin/

    Sitemap: http://www.lottosqueeze.org/sitemap_index.xml.gz
    I've verified today form a Google Webmaster Tools there's 170 Duplicate Page with same Meta Description that seems all come from /calendar/

  3. #33
    Senior Member
    Real Name
    Michael Biddle
    Join Date
    Jan 2007
    Location
    Southern California
    Posts
    7,097
    Liked
    4 times
    The calendar would be because of vBulletin.
    The Forum Hosting - Forum Hosting from the Forum Experts

  4. #34
    nfn
    nfn is offline
    Senior Member
    Real Name
    Nuno
    Join Date
    Feb 2008
    Location
    Portugal
    Posts
    276
    Liked
    1 times
    I'm using a simplified version of robots.txt

    Code:
    # Allow Archiver
    User-agent: ia_archiver
    Allow: /
    
    # All Agents
    User-agent: *
    
    Allow: /forum/portal.php
    Allow: /forum/showthread.php
    Allow: /forum/forumdisplay.php
    Allow: /forum/external.php
    
    Disallow: /forum/archive/
    Disallow: /forum/clientscript/
    Disallow: /forum/cpstyles/
    Disallow: /forum/customavatars/
    Disallow: /forum/customprofilepics/
    Disallow: /forum/images/
    Disallow: /forum/includes/
    Disallow: /forum/info/
    Disallow: /forum/install/
    Disallow: /forum/signaturepics/
    Disallow: /forum/*.php
    
    # Sitemap
    Sitemap: http://www.portaldasviagens.com/forum/sitemap_index.xml.gz

  5. #35
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    Quote Originally Posted by Michael Biddle View Post
    The calendar would be because of vBulletin.
    Yeah, I need to exclude totally it in some mode

  6. #36
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    What's that?
    # Allow Archiver
    User-agent: ia_archiver
    Allow: /

  7. #37
    nfn
    nfn is offline
    Senior Member
    Real Name
    Nuno
    Join Date
    Feb 2008
    Location
    Portugal
    Posts
    276
    Liked
    1 times
    ia_archiver is the bot from Alexa and Internet Archive.

  8. #38
    Senior Member
    Real Name
    Martyn Day
    Join Date
    Dec 2005
    Location
    Kent - UK
    Posts
    650
    Liked
    0 times
    Blog Entries
    1

  9. #39
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    Quote Originally Posted by nfn View Post
    ia_archiver is the bot from Alexa and Internet Archive.
    Ah ok thx, anyway the command 'Allow' isn't allowed in the robots.txt file, only 'Disallow' is admited, here check if the robots.txt is valid
    New Robots.txt Syntax Checker: a validator for robots.txt files

    If and when you do not specify you want to stop using Disallow, the bot automatically see the other pages, then 'Allow' besides not being used is also unnecessary.

  10. #40
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    Quote Originally Posted by Martyn View Post
    Your robots.txt too isn't 100% valid, check it

  11. #41
    nfn
    nfn is offline
    Senior Member
    Real Name
    Nuno
    Join Date
    Feb 2008
    Location
    Portugal
    Posts
    276
    Liked
    1 times
    Hi Sysop,

    I know that. It's not standard robot.txt, but it's 100% readable by the 3 majors crawlers:

    Pattern matching - Webmaster Help Center
    How do I prevent my site or certain subdirectories from being crawled? - Yahoo! Search Help

    Yahoo!, Google, Microsoft Clarify Robots.txt Support

    I don't know if a stick with this version or a standard robots.txt.
    This one it's easier to maintain.

  12. #42
    Senior Member Sysop's Avatar
    Real Name
    Toni
    Join Date
    Oct 2007
    Location
    Italy
    Posts
    176
    Liked
    0 times
    But for who like us, using the archive as a sitemap, isn't the best to don't use

    Disallow: /sitemap/

    Not is best choice to remove this line from robots.txt file?

    I'm using this for example http://www.lottosqueeze.org/sitemap/

    but in robots file it's blocked, I think it is really bad for search engines

  13. #43
    Member
    Real Name
    kev
    Join Date
    Aug 2008
    Posts
    40
    Liked
    0 times
    hi folks,

    i've added a link in my robots file. is it okay?

    User-agent: *
    Disallow: /sitemap/
    Disallow: /archive/

    Sitemap: http://www.yoliverpool.com/forum/sitemap_index.xml.gz
    is the 'Disallow: /sitemap/' ok? I think vbSEO added that.

  14. #44
    Senior Member
    Real Name
    Joseph Ward
    Join Date
    Jun 2005
    Posts
    23,847
    Liked
    32 times
    Blog Entries
    9
    If you have changed your vBulletin archive URL to /sitemap/, but you block /sitemap/ in robots.txt, the search engines will not crawl the archive.

    You should either remove that line from your robots.txt or disable the archive completely:
    [How To] Determine How Much Traffic is From Your vBulletin Archive Pages with Google Analytics

  15. #45
    Senior Member
    Real Name
    Joseph Ward
    Join Date
    Jun 2005
    Posts
    23,847
    Liked
    32 times
    Blog Entries
    9
    Note: vBSEO does NOT modify the robots.txt file.

Page 3 of 5 FirstFirst 1 2 3 4 5 LastLast

Similar Threads

  1. Temp robots.txt Brand New Forum?
    By rmjvol in forum Pre-Sales Questions
    Replies: 7
    Last Post: 08-26-2006, 02:53 AM
  2. robots.txt entries
    By shaochun in forum General Discussion
    Replies: 5
    Last Post: 12-10-2005, 08:18 PM
  3. Possible to use Disallow: /*?pp=10 in robots.txt?
    By PageUp in forum General Discussion
    Replies: 3
    Last Post: 11-03-2005, 10:01 PM
  4. "should" I use a robots.txt file?
    By drex in forum General Discussion
    Replies: 5
    Last Post: 11-03-2005, 09:47 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •