Have done our site with this today, lets see what happens![]()
This is a discussion on Robots.txt now support the sitemap file within the General Discussion forums, part of the vBSEO Google/Yahoo Sitemap category; Have done our site with this today, lets see what happens...
Have done our site with this today, lets see what happens![]()
So if I understand correctly...it doesn't matter which one you put in your robots.txt because the .htaccess rewrite will direct it to the sitemap_index.xml.gz in the /data folder, correct?
Yes it does matter. If your forums are in a subdirectory and you want to include your front page you need to have the link point to the file in root not the subdirectory. Pages above the sitemap location will not be recognized. That's why the smoke and mirrors to make it look like it is somewhere else.
So if your forums are in the forums/ directory then you would move the htaccess code for the sitemap (not the vbseo code) to a htaccess in your root and add forums/ to the links structure. Like this.
Then you would useCode:RewriteRule ^sitemap(\.txt(\.gz)?)$ forums/vbseo_sitemap/vbseo_getsitemap.php?sitemap=urllist$1 [L] RewriteRule ^((urllist|sitemap).*\.(xml|txt)(\.gz)?)$ forums/vbseo_sitemap/vbseo_getsitemap.php?sitemap=$1 [L]
Code:http://www.example.com/sitemap_index.xml.gz
Last edited by Code Monkey; 04-13-2007 at 11:47 PM.
No problem. Once you get your head around it then it's smooth sailing from there. The point is to keep the sitemaps themselves in a secure directory that is writable yet make them apear to be at the top level of your site. I'm glad you got it straightened out. Get ready for the bot explosion.![]()
I would suggest not using .gz for the file in the robots.txt. Google support .gz and we can handle it from the the webmaster tool, but this for the other search engines and we are not sure about that.
But the filename has .gz, so it has to be referenced that way. Otherwise, it won't find the file.
All the major search engines recognize it is so it's not a problem. Google, Yahoo, and MSN have all agreed to support the google sitemap standard. I believe ask.com has climbed on board as well.
I just wasn`t sure but I forund this in sitemaps.org
sitemaps.org - FAQQ: Can I zip my Sitemaps or do they have to be gzipped?
Please use gzip to compress your Sitemaps. Remember, your Sitemap must be no larger than 10MB (10,485,760 bytes), whether compressed or not.
So .gz file must be supported by all major search engines.
Mike when you disallow showpost.php doesnt that prevent them from caching the pages?
And how come you dont have showthread.php disallowed? Just trying to understand the whole robots.txt stuff.Disallow: /forum/showpost.php
last thing... My forums are at the root. (mysite.com/index.php) so for the robots.txt file should I put "sitemap_index.xml.gz" at the top or bottom after all the disallow's ?
I currently have it looking like this -
thx in advance.User-agent: *
Disallow: /clientscript/
Disallow: /includes/
Disallow: /install/
Disallow: /customavatars/
Disallow: /subscription.php
Disallow: /payments.php
Disallow: /profile.php
Disallow: /faq.php
Disallow: /calendar.php
Disallow: /private.php
Disallow: /poll.php
Disallow: /sendmessage.php
Disallow: /sendmessage.php?do=
Disallow: /showgroups.php
Disallow: /reputation.php
Disallow: /report.php
Disallow: /threadrate.php
Disallow: /postings.php
Disallow: /online.php
Disallow: /search.php
Disallow: /newthread.php
Disallow: /newreply.php
Disallow: /register.php
Disallow: /login.php
Disallow: /image.php
Disallow: /cron.php
Disallow: /joinrequests.php
Disallow: /usercp.php
Disallow: /member.php
Sitemap: http://www.mysite.com/sitemap_index.xml.gz
Last edited by mikeinjersey; 04-27-2007 at 11:28 PM.
I don't think it matters where you put the sitemap file. I just put mine at the bottom because that was the last thing I had added to the robots.txt file.
I disallow showpost because I don't want spiders going into my individual posts. (I also don't rewrite the showpost url's to cut down on server load, since I block them from indexing anyway)
I do want them indexing my threads though, which is why I don't have showthread.php in there.
Im trying to understand this... Posts are where all the rich content is.. why block them from reading posts? I can understand the server load part..but geez. do others do this aswell?
so you dont allow the engines from indexing your posts at all ? (neither the archive or the standard way)
The blocking is for the "view single post" pages not the threads. The content of the posts is already contained in the thread pages. The individual posts views can be considered duplicate content.
Great info. Thank you for sharing.
I guess this means urllist.txt.gz will be going bye bye?
Last edited by jw00dy; 05-01-2007 at 06:35 AM.
allthingsmoto.com & bodynspirit.net vBSEO Optimized
I know this is a bit late, but wouldn't it be better to write it as:
Sitemap: /forums/sitemap_index.xml.gz
instead of putting the whole url?