We have a forum that gets millions of unique visitors a month. We are ranked #1-#5 for our industry for almost all important keywords/phrases, but the majority of our traffic comes from random keyword searches that lead to forum posts. If I remember correctly, the keyword with the highest inbound traffic rate still only drives 6% of our traffic -- we are very hesitant to lose all of the 0.000001%-of-your-traffic keywords, as added together they make the bulk of our traffic. (Hope that makes sense.)
With that said, we currently sitemap our showpost threads with the idea that as much content as we can get indexed, the better (the whole "cast a bigger net" thing -- see above paragraph). Our sitemap currently has 1.2M URLs, and Google has 146k of those URLs indexed (doesn't seem like a lot, does it?). Searching for showpost.php in Analytics shows nothing, but it looks like the permalinked showpost pages (that use -post), were about .21% of our traffic in the last 30 days.
With showpost removed the number of URLs drops to ~300k.
Now, previously we had tried to drop the showpost threads from our sitemap and our indexed pages plummeted. We went from 130k indexed URLs to 80k and then to 60k and then to 30k -- needless to say it was a very scary time. However, around that same time one of our developers had accidentally replaced one of our sites with an old version of the main site -- and obviously Google doesn't like duped content. However, since everything was happening at the same time, we weren't sure what was causing the problem. My gut tells me that it should be obvious that the duped content caused the problem, before I go rocking the boat again I wanted to check here.
One thing to note is that we get a number of errors when Google scans the sitemaps. In fact, I've had to drop the number of URLs per map from 10,000 to 1,000 because I've found that this greatly reduces the number of good links that get tossed out with the bad.
So I guess my questions are:
1) Is it bad practice to use smaller sitemap files? Is 1000 URLs too small? We just had vbseo support help us fix some errors, and they said to set it to 10,000... but when Google finds the bad URLs (no clue how to fix them either, there's just too many of them) it dumps the whole block of 10k urls -- I've found I can "save" good URLs by making google throw out chunks of 1k instead of 10k. Is this smart? I know it would be better to fix the problem, but so far this is the easiest solution... but I can't help but wonder if Google prefers crawling sites with 10 sitemap files instead of 1,000.
2) Do you think the SEO we gain from dropping the showpost pages will make up for the loss of .21% traffic? .21% doesn't seem like a lot but it is thousands and thousands of hits. Since Google has already indexed it, we'll still be retaining the already-indexed -post pages... but no new ones will be added to the sitemap. Is it worth it in a case like ours, where we already have great SEO for the "main" keywords but rely on all the random words that are in posts for the bulk of our traffic?
3) If Google crawls the site anyways, how does the sitemap generator "hurt" SEO by including showpost threads? The fact that it takes up Googlebot's time is pretty self-explanatory, but I'm failing to see how showpost threads = duplicate content... wouldn't those pages get crawled anyways, eventually?
Anyways, sorry for writing a book and I'm assuming there is probably no cut-and-dry answer to my questions.. Just terrified to make any major changes to a site this big, last thing I want is a phone call from my boss wondering why we dropped to page 2 and only have 17k pages indexed.![]()


LinkBack URL
About LinkBacks





Reply With Quote
