Results 1 to 12 of 12
Like Tree3Likes
  • 1 Post By Brian Cummiskey
  • 1 Post By Brian Cummiskey
  • 1 Post By Brian Cummiskey

Baiduspider, how to tame spiders and bots

This is a discussion on Baiduspider, how to tame spiders and bots within the General Discussion forums, part of the vBulletin SEO Discussion category; I will admit I am lousy at trying to figure this stuff out. But I have at least 100 of ...

  1. #1
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times

    Baiduspider, how to tame spiders and bots

    I will admit I am lousy at trying to figure this stuff out. But I have at least 100 of the baiduspiders on my site at all given times.

    How do I block them, or at least TAME them, via robots.txt without blocking the good spiders?

    Any help would be appreciated. Thanks

  2. #2
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2
    What do you want to do? block or slow down?

  3. #3
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times
    I think for the time being I want to block them, at least for a little while, so I can see how many real guests I have at my site. Anyway you could tell me how to do both, so I can change it in the future if I decide to?

    Thank you

  4. #4
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2
    add
    User-agent: Baiduspider
    Disallow: /
    to your robots.txt file


    and

    Code:
    deny from .baidu.com
    to your htaccess file directly after rewriteengine on

  5. #5
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2
    to tame, you would use the cralw-delay setting in robots.

  6. #6
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times
    Thanks Brian, where do I find my .htaccess file again? The only one I found is in my forum root, (the one that was provided by vbseo), but it still says 'mysite.com', was I supposed to change that to englishbulldognews.com?

  7. #7
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2

  8. #8
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times
    Here is my robots and .htaccess, are they correct?

    User-agent: Baiduspider
    Disallow: /

    User-agent: *
    Allow: /

    User-agent: *
    Disallow: /forums/clientscript/
    Disallow: /forums/cpstyles/
    Disallow: /forums/customavatars/
    Disallow: /forums/customprofilepics/
    Disallow: /forums/ajax.php
    Disallow: /forums/arcade.php
    Disallow: /forums/attachment.php
    Disallow: /forums/awards.php
    Disallow: /forums/calendar.php
    Disallow: /forums/cron.php
    Disallow: /forums/editpost.php
    Disallow: /forums/faq.php
    Disallow: /forums/global.php
    Disallow: /forums/image.php
    Disallow: /forums/inlinemod.php
    Disallow: /forums/joinrequests.php
    Disallow: /forums/login.php
    Disallow: /forums/market.php
    Disallow: /forums/market_bank.php
    Disallow: /forums/market_convert.php
    Disallow: /forums/member.php
    Disallow: /forums/memberlist.php
    Disallow: /forums/misc.php
    Disallow: /forums/moderator.php
    Disallow: /forums/newattachment.php
    Disallow: /forums/newreply.php
    Disallow: /forums/newthread.php
    Disallow: /forums/online.php
    Disallow: /forums/poll.php
    Disallow: /forums/postings.php
    Disallow: /forums/printthread.php
    Disallow: /forums/private.php
    Disallow: /forums/profile.php
    Disallow: /forums/register.php
    Disallow: /forums/report.php
    Disallow: /forums/reputation.php
    Disallow: /forums/sendmessage.php
    Disallow: /forums/showgroups.php
    Disallow: /forums/subscription.php
    Disallow: /forums/threadrate.php
    Disallow: /forums/usercp.php
    Disallow: /forums/usernote.php
    Disallow: /forums/vbactivity.php
    Disallow: *.js
    Disallow: /forums/includes/
    Disallow: /forums/install/
    Disallow: /forums/customavatars/
    Disallow: /css/



    sitemap: http://www.englishbulldognews.com/fo...p_index.xml.gz
    # Comment the following line (add '#' at the beginning)
    # to disable mod_rewrite functions.
    # Please note: you still need to disable the hack in
    # the vBSEO control panel to stop url rewrites.
    RewriteEngine On

    # Some servers require the Rewritebase directive to be
    # enabled (remove '#' at the beginning to activate)
    # Please note: when enabled, you must include the path
    # to your root vB folder (i.e. RewriteBase /forums/)
    #RewriteBase /

    RewriteCond %{HTTP_HOST} !^englishbulldognews\.com$
    RewriteRule ^(.*)$ http://englishbulldognews.com/$1 [L,R=301]

    RewriteRule ^((urllist|sitemap_).*\.(xml|txt)(\.gz)?)$ vbseo_sitemap/vbseo_getsitemap.php?sitemap=$1 [L]

    RewriteCond %{REQUEST_URI} !(admincp/|modcp/|cron|vbseo_sitemap)
    RewriteRule ^((archive/)?(.*\.php(/.*)?))$ vbseo.php [L,QSA]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !/(admincp|modcp|clientscript|cpstyles|images)/
    RewriteRule ^(.+)$ vbseo.php [L,QSA]

    RewriteEngine On
    RewriteRule ^((urllist|sitemap).*\.(xml|txt)(\.gz)?)$ vbseo_sitemap/vbseo_getsitemap.php?sitemap=$1 [L]

    RewriteEngine On
    deny from .baidu.com

  9. #9
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2
    delete
    User-agent: *
    Allow: /

    as it is unnecessary. that is default behavoir.

  10. #10
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times
    Okay thank you So far the bots are still there tho. Does it take a day or so to go through?

  11. #11
    vBSEO Staff Brian Cummiskey's Avatar
    Real Name
    Brian Cummiskey
    Join Date
    Jul 2009
    Location
    btwn NYC and Boston
    Posts
    12,789
    Liked
    657 times
    Blog Entries
    2
    They will still hit your server. They just see this fail message now. hopefully, the bot gets the hint eventually.

    If you don't want it to hit your server, you need to block it at the firewall level, which you likely don't have access to configure.

  12. #12
    Member gonumber6's Avatar
    Real Name
    Lisa
    Join Date
    Oct 2010
    Location
    Sunny Arizona
    Posts
    36
    Liked
    2 times
    Yes, but I was seeing them on my forums, which are not attached to my home page. However, they seem to be off this morning. Now I just got a new spider instead, dont know who/what it is. 124.115.0.*

Similar Threads

  1. Why do spiders do this?
    By gotlinks in forum General Discussion
    Replies: 3
    Last Post: 09-04-2010, 11:55 PM
  2. Skin especially for bots/spiders
    By Riverwire in forum vBSEO.com Styles
    Replies: 4
    Last Post: 01-20-2010, 11:46 AM
  3. where are the spiders?
    By obx11 in forum General Discussion
    Replies: 8
    Last Post: 03-03-2007, 11:17 PM
  4. Spiders!
    By sohan in forum Analysis: Traffic & SERPS
    Replies: 5
    Last Post: 11-11-2006, 06:40 PM
  5. How Many Spiders...?
    By Atheda in forum General Discussion
    Replies: 5
    Last Post: 05-18-2006, 04:44 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •