Hi,

I have a list of sitemap files. (These I generate programmatically)
These Sitemap file are huge having thousands of URLs.

It is very difficult to check each and every URL manually.

So I have generated the utility which parses this sitemap file and using Apache Commons HttpInvoker I check if it is valid or not.
  • Some urls if they are invalid they return 404 response; so I can find out the problem.


  • But in some cases due to some exception error page is shown. So this is not a valid URL. But it does not return the 404 response.
    Response code is 200.
    So there is no way for me to identify if it is a valid URL or no.


Not sure, I have heard that web-master tool does the same checking; so there must be something which can help to identify the valid URLS.

Any Help on this is appreciated.

Thanks in advance.

Leena