sitemap.xml

I think this needs its own thread.

Just like every website owner should have an favicon and should setup his or her robots.txt and .htaccess.

You should now also have a sitemap.

The sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. But more importantly, Google uses it to display sublinks for your site.

Just do a search on Google for sitemap.xml and you’ll find tools and generators so you can create your own.

http://www.google.com/search?hl=en&q=sitemap.xml

Google also has free webmaster tools to allow you to validate your sitemap.

http://www.google.com/webmasters/

(One note, if you use www.mysite.com for everything, be sure to sign up your site with the www. or google will invalidate a lot of your sitemap links)

You should also add:

Sitemap: http://www.carterlake.org/sitemap.xml

to your robots.txt file so the major 3 search engines can find your sitemap.

Tom, Since I don’t know what I am doing :?, I looked at my robots.txt file on my web site. I assume it should have keywords in it for search engines to find? or, is it supposed to thave web addresses for each of my web pages? Anyway, this is all thats in it at this time…
User-agent: *
Disallow: /log
Disallow: /xxx
I have no idea what it means.
Thanks,
Bill

One result of adding the sitemap is that Google search can display your site like this:

Robots.txt only contains instructions that tell the search engine where it can, and can’t, look for content to index.

Yours is telling them not to index your log and xxx directories, and by default they can search anywhere else.

I have used the google sitemap generator and it works well. One note though, this week I added a page with the WU historical data, then I went to make a new sitemap. The generator wanted to map over 500 pages. I think it was trying to get all the info from WU, so I just manually added a WX13.php (the history page) to my existing sitemp.

Niko,
I think a site needs to have many pages that can be “categorized” in order to have the sitelinks as you showed.

Google has not generated any sitelinks for your site. Sitelinks are completely automated, and we show them only if we think they’ll be useful to the user.

You just add:

Sitemap: http://www.carterlake.org/sitemap.xml

to your robots.txt file (of course, AFTER you made your sitemap)

Google has extended the robots.txt to have allows, but I don’t see where they would even bother looking for the sitemap entry. It is not in the robots.txt specification which normally only has excludes in it.

The extensions that google supports are not supported by anyone else and if you have signed up with google for the sitemap info, even google won’t use it.

“For instance, Googlebot supports an extended definition of the standard. It understands Allow: lines, as well as * and $ pattern matching. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site.”

Google webmaster tools gives an error on that syntax for the robots.txt file.

Parsing results

sitemap: http://www.example.com/sitemap.xml

Syntax not understood

Try capitalaize the S in Sitemap… Mine validates in webmaster tools
http://www.carmosaic.com/robots.txt

Sitemap: http://www.carmosaic.com/sitemap.xml

… more info:

I keep getting this error on my sitemap…

  • Leading whitespace
    We’ve detected that your Sitemap file begins with whitespace. We’ve accepted the file, but you may want to remove the whitespace so that the file adheres to the XML standard.

Ive tried changing a few things, but nothing changes with that error

http://www.snoqualmieweather.com/sitemap.xml

what is generating your sitemap? a program or are you doing it manually
The error is true, there is a space at the top of the file before the opening xml tag

Here is a free sitemap generator I use on some of my sites.

GSiteCrawler
http://gsitecrawler.com/

I’ll give that a try.

Im using my web program to make the .txt file and then upload it and then rename it to .xml

I am currently trying out the gsitecreawler and it is up to 4378 URLs and counting! 8O I think it is trying to include the entire WU through this page: http://www.cavecountryweather.com/wx13.php I paused it for now until I can figure out how to have it stop at my page.

lmfao…i had the same prob…Go into the .xml file and remove each one manually. then upload the .xml file back to your site. that fixed mine.

Easy fix for gsite crawler:
click the “Filter” tab, click “add”
add your URL http://www.cavecountryweather.com/wx13.php

click “URL list” tab, click “Refresh Table” button,
highlight all your http://www.cavecountryweather.com/wx13.php URLs (you highlight many at once by click on the first one, then scroll to the last one, hold down the Shift key and click last one, they all turn yellow)
click the “Delete” button on the gsite crawler screen

now click (Re)Crawl - This Project, wait until the crawlers are idle again and then click “Refresh Table” button
Make sure all is well.
If you have the FTP settings setup in gsitecrawler you can now click “Generate - Google Sitemap-File” let it save (overwrite), then upload to FTP, then Submit to google all automatically.

I use an automatic sitemap creator it’s very easy to install. and i download it from http://gadelkareem.com/

Maurice

Ok, I think i did it correctly. I created and uploaded a sitemap.xml file. I then changed my robots.txt file to read as follows…
User-agent: *
Disallow: /log
Disallow: /xxx
http://www.evansville-weather.com/sitemap.xml

Is that all I need to do?
Bill

The link shows me this. In IE7 and FF.


2008-09-01_141358.gif

Bill,

Your xml file is not in the proper format. How did you create it?