Google has extended the robots.txt to have allows, but I don’t see where they would even bother looking for the sitemap entry. It is not in the robots.txt specification which normally only has excludes in it.
The extensions that google supports are not supported by anyone else and if you have signed up with google for the sitemap info, even google won’t use it.
“For instance, Googlebot supports an extended definition of the standard. It understands Allow: lines, as well as * and $ pattern matching. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site.”
Leading whitespace
We’ve detected that your Sitemap file begins with whitespace. We’ve accepted the file, but you may want to remove the whitespace so that the file adheres to the XML standard.
Ive tried changing a few things, but nothing changes with that error
what is generating your sitemap? a program or are you doing it manually
The error is true, there is a space at the top of the file before the opening xml tag
I am currently trying out the gsitecreawler and it is up to 4378 URLs and counting! 8O I think it is trying to include the entire WU through this page: http://www.cavecountryweather.com/wx13.php I paused it for now until I can figure out how to have it stop at my page.
click “URL list” tab, click “Refresh Table” button,
highlight all your http://www.cavecountryweather.com/wx13.php URLs (you highlight many at once by click on the first one, then scroll to the last one, hold down the Shift key and click last one, they all turn yellow)
click the “Delete” button on the gsite crawler screen
now click (Re)Crawl - This Project, wait until the crawlers are idle again and then click “Refresh Table” button
Make sure all is well.
If you have the FTP settings setup in gsitecrawler you can now click “Generate - Google Sitemap-File” let it save (overwrite), then upload to FTP, then Submit to google all automatically.
Ok, I think i did it correctly. I created and uploaded a sitemap.xml file. I then changed my robots.txt file to read as follows…
User-agent: *
Disallow: /log
Disallow: /xxx http://www.evansville-weather.com/sitemap.xml