Results 1 to 4 of 4

Possible to use Disallow: /*?pp=10 in robots.txt?

This is a discussion on Possible to use Disallow: /*?pp=10 in robots.txt? within the General Discussion forums, part of the vBulletin SEO Discussion category; My forum has lots of showhread pages in Google with a posts per page value of 10, e.g. /showthread.php?t=5274&page=5&pp=10. Some ...

  1. #1
    Member
    Join Date
    Oct 2005
    Posts
    70
    Liked
    0 times

    Possible to use Disallow: /*?pp=10 in robots.txt?

    My forum has lots of showhread pages in Google with a posts per page value of 10, e.g. /showthread.php?t=5274&page=5&pp=10. Some time ago, I changed that value to 15, so this probably means that Google will apply some kind of duplicate content filter on this forum (seems it already has).

    I haven't installed vBSEO yet, but when I do, I'd like to include a command "Disallow: /*?pp=10" to the robots.txt file so all old per page values wouldn't be crawled (vBSEO produces URLs like /thread.html?pp=10). My question is this: is that command possible? I think only Google supports it but they haven't explicitly mentioned a syntax like that on their page: http://www.google.com/webmasters/rem...#exclude_pages . Does anyone have experience with this?

    Edit:
    Another option would be to add "<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">" to a page where the URL contains a question mark (?). Since I'm not a programmer, I would still need a syntax for this. Anyone?
    Last edited by PageUp; 10-30-2005 at 08:44 AM.

  2. #2
    Senior Member
    Real Name
    Joseph Ward
    Join Date
    Jun 2005
    Posts
    23,847
    Liked
    32 times
    Blog Entries
    9
    http://www.robotstxt.org/wc/exclusion.html#robotstxt

    Robots.txt does not support regular expressions or wild cards (*).

    Note: * used in User-agent: * is the one apparent exception. It means "all robots".

    HTML Code:
    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    http://www.robotstxt.org/wc/meta-user.html

    You could probably set up a custom 301 redirect, but I'm not sure if this would be advisable.

  3. #3
    Member
    Join Date
    Oct 2005
    Posts
    70
    Liked
    0 times
    Quote Originally Posted by Joe Ward
    Robots.txt does not support regular expressions or wild cards (*).
    You're right, it's not a standard, BUT... Google does support wild cards:

    Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name.

    http://www.google.com/webmasters/rem...#exclude_pages
    Last edited by PageUp; 11-02-2005 at 04:11 PM.

  4. #4
    vBSEO Staff Oleg Ignatiuk's Avatar
    Real Name
    Oleg Ignatiuk
    Join Date
    Jun 2005
    Location
    Belarus
    Posts
    25,689
    Liked
    157 times
    @PageUp,

    this potential problem will be resolved by vBSEO now: Possible duplicate content
    Oleg Ignatiuk / Crawlability Inc.
    vBSEO 3.6.0 GOLD Released!
    Unveiling the NEW vBSEO Sitemap Generator 3.0. - available NOW for vBSEO Customers!


Similar Threads

  1. Robot.txt file
    By BamaStangGuy in forum General Discussion
    Replies: 3
    Last Post: 06-17-2006, 01:09 PM
  2. Robots file and the rewritten urls....
    By BamaStangGuy in forum General Discussion
    Replies: 9
    Last Post: 03-01-2006, 12:10 AM
  3. Well I am about to make some changes....
    By BamaStangGuy in forum General Discussion
    Replies: 16
    Last Post: 02-16-2006, 08:38 PM
  4. Robots.txt and the new url formats...
    By BamaStangGuy in forum General Discussion
    Replies: 1
    Last Post: 11-04-2005, 06:00 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •