Page 4 of 4 FirstFirst 1 2 3 4
Results 46 to 53 of 53
Like Tree6Likes

How to create a vbulletin bot and scraper trap

This is a discussion on How to create a vbulletin bot and scraper trap within the Member Articles forums, part of the Focus on Members category; Heres a trick with Baidu, try putting it on block in your robots txt Try slowing it down there if ...

  1. #46
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    Heres a trick with Baidu, try putting it on block in your robots txt
    Try slowing it down there if you dont want to loose it
    Its supposed to work on a self regulating crawl system
    If your getting rammed by an out of control search engine, it slows page loads and in some cases can crash if your being rammed by a good search engine like google.
    Try contacting Baidu
    One of the reasons so many web masters hate the search engine.
    The Chinese make good search engines, good at one thing and thats obtaining data, at what ever cost to the webmaster

    Yandex is Russian, say no more
    Monitoring my own logs, there is a lot of referer spam coming in from Russia
    They are also some of the best with one of the better known scrapper / spam tools on the market.
    I have only ever seen one person better and they hit from a Spanish ip. They must have spent hours if not days finding user agents to spin on each hit. That guy in particular hit with about 200 user agents

    If you run a twitter account, you will get heavily hit by the twitter retards bots
    Some of these damn things are pr8
    Now if something thats pr8 hits you and has no follow on all links, you might as well cut off a leg and enter an arse kicking contest.
    There are one or two tools on the org that publish your feeds to twitter
    You cant choose what you want and dont want published
    These morons hit those links and your in that arse kicking contest
    The better way imo is to use a tool like twitter feed
    Only publish forums you have set with external rss
    Im in two minds about twitter, some say its good for bringing in extra hits.
    The place is trawled by moron scrapers running on high octane
    Looks good with 40 or so hits on a forum after another link is published, but its nothing more than twitter bots ripping the balls out of you.

    One I dont think is listed above is Purebot/1.1
    Clever set up. Look at their search site and it all looks good. Then click on any link there and its nothing more than a one page wonder, disguised to fool people into thinking its a real website

    I need to get back out the google sandbox I have been stuck in for about three months now, then I will get back into search and destroy mode again. Its a shame its all gone quiet with the Bad Behaviour mod. The last release of that was in August, but none of the mods on the org were updated to that release

  2. #47
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    A bit of irony in a dry British humor way
    Spam post on here was for medical equipment. If only laser treatment was an option.
    If anyone wonders if I have just given up my fight and posting information on here, the answer is I have had to take an unscheduled break
    Im at the stage of needing surgery to restore my eyesight
    This week, I have my first operation.
    Touch wood, I will be back wreaking havoc on the morons before Christmas

  3. #48
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    One to block, only block them if you dont use the deal that comes with vbseo

    I uninstalled it ages ago

    Gravity insight

    ip range 74.123.148.0 - 74.123.151.255

    74.123.148.141 IP Address WHOIS | DomainTools.com

    More than 600 hits in the last 24hrs on my forums

  4. #49
    Junior Member
    Real Name
    Matt
    Join Date
    May 2009
    Posts
    19
    Liked
    0 times
    Quote Originally Posted by Lee G View Post
    On my forums there is a copy of the black list file I use with the bad behavior mod Im running version 2.0.42 and its simply a case of over writing the blacklist.inc file with the one from my forums http://www.thespainforum.com/f379/ba...59/#post328858
    I registered to your site to download the latest file you had and I don't have permissions to :( Between this thread and some of my previous experience dealing with spam on phpBB and SMF sites, I've managed to drastically reduce the amount of unwanted traffic on my site. Thanks Lee G!

  5. #50
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    Sorry about that. Im setting up a new system and even managed to lock myself out my admin area at the moment
    My security is excellent
    Pm me your user name and I will sort it out

  6. #51
    Junior Member
    Real Name
    Matt
    Join Date
    May 2009
    Posts
    19
    Liked
    0 times
    PM sent


    Also thought I'd add this site to the thread. It's very helpful with breaking our useragents to help see what's legit, what isn't, and narrowing down very specific parts of useragents so you can figure out what to add to the "Ban Spiders" mod without banning a large amount of valid users too.

    http://www.useragentstring.com/index.php

    That site in conjunction with botsvsbrowsers - Bots vs Browsers - Public Bots and User Agents Database and Commentary helped me figure out who and what I should be banning.

    I never had an issue with spam since I've made the registration form nearly bot-proof (never had a bot successfully register to date), but we had an issue with content stealing and annoying bots running wild indexing pages with no regard to throttle limits or server resources. For those, I've exclusively use the "Ban Spiders" mod to narrow down the list and get rid of all the idiots and obvious ones. Once that was more manageable, I had to start banning IP addresses for the smarter ones that use legitimate useragents. Between those two methods, I've reduced my site traffic by about 1/2 and at least the 1/2 that still uses the site is most likely a legitimate user so I'm happy with the results.

    If there's interest, I'm willing to share my useragent ban list and IP ban list once I get them to a point that I'm happy with.

  7. #52
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    Not posted much on this for a long time.
    Anyone that says eye surgery dont hurt is right. Its the recovery period thats a bitch
    Dry eye, feeling the lump of plastic they inserted, eye drops, cream, flatulence caused by the drops etc

    By chance I found the proof needed of why you limit your external feeds.
    Limit as in only allowing on a forum by forum basis
    Limiting the characters wont help
    There is an abundance of free rss readers on the net these days that turn a partial feed into a full feed.
    You can even buy one system for a whopping 20€ from one developer to run on your own server
    A simple google of the term "full text rss feeds"
    For a long time I had been looking for some kind of wordpress add on rather than a free online reader

  8. #53
    Senior Member Lee G's Avatar
    Real Name
    Lee
    Join Date
    Sep 2006
    Location
    Costa Blanca
    Posts
    690
    Liked
    40 times
    Blog Entries
    4
    Not posted much on this for a long time due to the reason above.
    Im back and fighting hard again after the second operation to restore my eye sight.
    Let my own forums get plagued with a deluge of twitter bots, while hoping automated twitter posting and the resulting frenzy of hits would be good for backlinks. What a retard in that assumption.

    An easy way to determine if a bot hitting is any good is to simply visit their site
    Right click and view the source code and then check the outbound links.
    A good 90% will have the magic "no follow" tags around those links
    You have a pr5 site, scraping all the content that goes to twitter and they reward you by giving a no follow in return.

    One of the worst offenders I have come across is a forum search engine called Omgili
    All links are no follow on there. They hit loads of pages each day, in my case they were taking content from over 200 pages each and every day. Which over time amounts to a lot of content they can outrank the original forums on.

Page 4 of 4 FirstFirst 1 2 3 4

Similar Threads

  1. Replies: 11
    Last Post: 10-22-2007, 11:29 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •