Just like every website owner should have an favicon and should setup his or her robots.txt and .htaccess.
You should now also have a sitemap.
The sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. But more importantly, Google uses it to display sublinks for your site.
Just do a search on Google for sitemap.xml and you’ll find tools and generators so you can create your own.
Tom, Since I don’t know what I am doing :?, I looked at my robots.txt file on my web site. I assume it should have keywords in it for search engines to find? or, is it supposed to thave web addresses for each of my web pages? Anyway, this is all thats in it at this time…
User-agent: *
Disallow: /log
Disallow: /xxx
I have no idea what it means.
Thanks,
Bill
I have used the google sitemap generator and it works well. One note though, this week I added a page with the WU historical data, then I went to make a new sitemap. The generator wanted to map over 500 pages. I think it was trying to get all the info from WU, so I just manually added a WX13.php (the history page) to my existing sitemp.
Niko,
I think a site needs to have many pages that can be “categorized” in order to have the sitelinks as you showed.
Google has not generated any sitelinks for your site. Sitelinks are completely automated, and we show them only if we think they’ll be useful to the user.
Google has extended the robots.txt to have allows, but I don’t see where they would even bother looking for the sitemap entry. It is not in the robots.txt specification which normally only has excludes in it.
The extensions that google supports are not supported by anyone else and if you have signed up with google for the sitemap info, even google won’t use it.
“For instance, Googlebot supports an extended definition of the standard. It understands Allow: lines, as well as * and $ pattern matching. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site.”
Leading whitespace
We’ve detected that your Sitemap file begins with whitespace. We’ve accepted the file, but you may want to remove the whitespace so that the file adheres to the XML standard.
Ive tried changing a few things, but nothing changes with that error
what is generating your sitemap? a program or are you doing it manually
The error is true, there is a space at the top of the file before the opening xml tag
I am currently trying out the gsitecreawler and it is up to 4378 URLs and counting! 8O I think it is trying to include the entire WU through this page: http://www.cavecountryweather.com/wx13.php I paused it for now until I can figure out how to have it stop at my page.
click “URL list” tab, click “Refresh Table” button,
highlight all your http://www.cavecountryweather.com/wx13.php URLs (you highlight many at once by click on the first one, then scroll to the last one, hold down the Shift key and click last one, they all turn yellow)
click the “Delete” button on the gsite crawler screen
now click (Re)Crawl - This Project, wait until the crawlers are idle again and then click “Refresh Table” button
Make sure all is well.
If you have the FTP settings setup in gsitecrawler you can now click “Generate - Google Sitemap-File” let it save (overwrite), then upload to FTP, then Submit to google all automatically.
Ok, I think i did it correctly. I created and uploaded a sitemap.xml file. I then changed my robots.txt file to read as follows…
User-agent: *
Disallow: /log
Disallow: /xxx http://www.evansville-weather.com/sitemap.xml