“Crawl Caching Proxy” (BigDaddy Update) Discussed by Matt Cutts
Matt Cutts talks about the “Crawl Caching Proxy” Google has introduced with the BigDaddy update. With this new caching proxy, they will reduce bandwidth consumption (for both themselves and site owners).
How does it work?
Conspiracy theorists should note that Matt emphasizes:
- Google has multiple independent bots that will crawl your site (Main index, AdSense, News Search, Blog Search, etc).
- By using a “crawl caching proxy”, pages crawled by any one of these bots can be shared with the other services, without having to hit the website again (and, thereby, naturally consuming less bandwidth).
FYI - Having a page get stored in the crawl caching proxy will not help it to be prioritized for crawling by the other Google crawl services. Apparently, they will still determine their crawling list independently.Originally Posted by Matt Cutts
Quick Notes
Notes on vBSEO
- robots.txt directives for each bot type will still be respected even if the page is pulled from the caching proxy instead of directly from the site.
- The caching proxy is not to be confused with Google’s “Cached” links in the SERPs.
If Google agreed to crawl 100 of your pages per day, would you prefer:
- vBSEO also focuses on reducing bandwidth consumption for faster and more efficient indexing.
- vBSEO includes a HTML comment stripping feature that helps to reduce bandwidth consumption by a significant amount.
- vBSEO includes gzip compression compatibility.
- vBSEO works similar to the Google caching proxy. Our focus on 1-URL-Per-Resource helps to eliminate redundant URLs to the same content therefore also eliminating duplicate content. In addition to its other SEO advantages, this is a major bandwidth saver.
(a) it crawled 100 unique content pages, or
(b) it crawled 100 pages with a significant level of redundant/duplicate content?
The answer is obvious: With vBSEO Option (a) is a reality. Without vBSEO, a vBulletin forum is loaded with redundant URLs to the same content.
For Discussion
Source:
- Why didn’t they have such a mechanism in place a long time ago?
- No specific mention of how the freshness of crawled content will be affected when pulled from the caching proxy or how long a page will be stored there.
- One might hope that they would also try (in addition to saving bandwidth) to also eliminate processing redundancy for the multiple crawling services.
Matt Cutts: Gadgets, Google, and SEO » Crawl caching proxy


LinkBack URL
About LinkBacks





Reply With Quote

