cloudy cloudy

Author Topic: sitemap.xml  (Read 68279 times)

0 Members and 1 Guest are viewing this topic.

Offline carterlake

  • Tom Chaplin
  • Posts: 2,273
  • Carter Lake, Iowa USA
    • Carter Lake, Iowa Weather
sitemap.xml
« on: August 30, 2008, 03:22:36 PM »
I think this needs its own thread.

Just like every website owner should have an favicon and should setup his or her robots.txt and .htaccess.

You should now also have a sitemap.

The sitemap Protocol allows you to inform search engines about URLs on your websites that are available for crawling. But more importantly, Google uses it to display sublinks for your site.

Just do a search on Google for sitemap.xml and you'll find tools and generators so you can create your own.

http://www.google.com/search?hl=en&q=sitemap.xml

Google also has free webmaster tools to allow you to validate your sitemap.

http://www.google.com/webmasters/

(One note, if you use www.mysite.com for everything, be sure to sign up your site with the www. or google will invalidate a lot of your sitemap links)

You should also add:

Sitemap: http://www.carterlake.org/sitemap.xml

to your robots.txt file so the major 3 search engines can find your sitemap.

WD; Davis VP2 6153; Quickcam for Notebooks Pro; Boltek w/ Nexstorm; GRLevel3; Live NOAA radio

Offline Billw69

  • Posts: 720
  • If you work hard, you will have time to play hard.
  • Evansville, IN U.S.A.
    • Evansville-Weather
Re: sitemap.xml
« Reply #1 on: August 30, 2008, 03:30:06 PM »
Tom, Since I don't know what I am doing :?, I looked at my robots.txt file on my web site. I assume it should have keywords in it for search engines to find? or, is it supposed to thave web addresses for each of my web pages? Anyway, this is all thats in it at this time....
User-agent: *
Disallow: /log
Disallow: /xxx
I have no idea what it means.
Thanks,
Bill

Offline niko

  • syzygy
  • Global Moderator
  • Posts: 27,525
  • Crystal Ball broken! Please post the URL.
  • Northern California, U.S.A.
Re: sitemap.xml
« Reply #2 on: August 30, 2008, 03:31:25 PM »
One result of adding the sitemap is that Google search can display your site like this:

« Last Edit: August 30, 2008, 04:52:38 PM by niko »

Offline niko

  • syzygy
  • Global Moderator
  • Posts: 27,525
  • Crystal Ball broken! Please post the URL.
  • Northern California, U.S.A.
Re: sitemap.xml
« Reply #3 on: August 30, 2008, 03:37:03 PM »
Tom, Since I don't know what I am doing :?, I looked at my robots.txt file on my web site. I assume it should have keywords in it for search engines to find? or, is it supposed to thave web addresses for each of my web pages? Anyway, this is all thats in it at this time....
User-agent: *
Disallow: /log
Disallow: /xxx
I have no idea what it means.
Thanks,
Bill

Robots.txt only contains instructions that tell the search engine where it can, and can't, look for content to index.

Yours is telling them not to index your log and xxx directories, and by default they can search anywhere else.
« Last Edit: August 30, 2008, 03:40:27 PM by niko »

Offline drobbins

  • Posts: 1,694
  • Kentucky, USA
    • Cave Country Weather
Re: sitemap.xml
« Reply #4 on: August 30, 2008, 04:09:35 PM »
I have used the google sitemap generator and it works well. One note though, this week I added a page with the WU historical data, then I went to make a new sitemap. The generator wanted to map over 500 pages. I think it was trying to get all the info from WU, so I just manually added a WX13.php (the history page) to my existing sitemp.

Niko,
I think a site needs to have many pages that can be "categorized" in order to have the sitelinks as you showed.
Quote
Google has not generated any sitelinks for your site. Sitelinks are completely automated, and we show them only if we think they'll be useful to the user.

Offline carterlake

  • Tom Chaplin
  • Posts: 2,273
  • Carter Lake, Iowa USA
    • Carter Lake, Iowa Weather
Re: sitemap.xml
« Reply #5 on: August 30, 2008, 04:44:09 PM »
Tom, Since I don't know what I am doing :?, I looked at my robots.txt file on my web site. I assume it should have keywords in it for search engines to find? or, is it supposed to thave web addresses for each of my web pages? Anyway, this is all thats in it at this time....
User-agent: *
Disallow: /log
Disallow: /xxx
I have no idea what it means.
Thanks,
Bill

You just add:

Sitemap: http://www.carterlake.org/sitemap.xml

to your robots.txt file (of course, AFTER you made your sitemap)

WD; Davis VP2 6153; Quickcam for Notebooks Pro; Boltek w/ Nexstorm; GRLevel3; Live NOAA radio

Offline TNETWeather

  • Kevin Reed (KrelvinAZ)
  • Posts: 5,939
  • Gremlins are at work...
  • Mesa, AZ
    • TNET Weather Station - Mesa AZ
Re: sitemap.xml
« Reply #6 on: August 30, 2008, 05:40:48 PM »
Sitemap: http://www.carterlake.org/sitemap.xml

to your robots.txt file so the major 3 search engines can find your sitemap.

Google has extended the robots.txt to have allows, but I don't see where they would even bother looking for the sitemap entry.  It is not in the robots.txt specification which normally only has excludes in it.

The extensions that google supports are not supported by anyone else and if you have signed up with google for the sitemap info, even google won't use it.

"For instance, Googlebot supports an extended definition of the standard. It understands Allow: lines, as well as * and $ pattern matching. So while the tool shows lines that include these extensions as understood, remember that this applies only to Googlebot and not necessarily to other bots that may crawl your site."


All you need is Time, Aptitude and Desire ... and you can build just about anything...

Offline TNETWeather

  • Kevin Reed (KrelvinAZ)
  • Posts: 5,939
  • Gremlins are at work...
  • Mesa, AZ
    • TNET Weather Station - Mesa AZ
Re: sitemap.xml
« Reply #7 on: August 30, 2008, 05:48:24 PM »

Google webmaster tools gives an error on that syntax for the robots.txt file.

Parsing results

sitemap: http://www.example.com/sitemap.xml

Syntax not understood

All you need is Time, Aptitude and Desire ... and you can build just about anything...

Offline MCHALLIS

  • Posts: 2,156
  • Long Beach, WA USA
    • Weather for Long Beach, WA USA
Re: sitemap.xml
« Reply #8 on: August 30, 2008, 08:25:03 PM »
Google webmaster tools gives an error on that syntax for the robots.txt file.

Parsing results

sitemap: http://www.example.com/sitemap.xml

Syntax not understood


Try capitalaize the S in Sitemap..... Mine validates in webmaster tools
http://www.carmosaic.com/robots.txt

Sitemap: http://www.carmosaic.com/sitemap.xml

... more info:
http://www.sitemaps.org/
« Last Edit: August 30, 2008, 08:30:43 PM by MCHALLIS »

Offline ALITTLEweird1

  • Mark
  • Global Moderator
  • Posts: 5,140
  • North Bend, WA
    • North Bend Weather
Re: sitemap.xml
« Reply #9 on: August 30, 2008, 09:45:14 PM »
I keep getting this error on my sitemap....

- Leading whitespace
We've detected that your Sitemap file begins with whitespace. We've accepted the file, but you may want to remove the whitespace so that the file adheres to the XML standard. 

Ive tried changing a few things, but nothing changes with that error

http://www.snoqualmieweather.com/sitemap.xml
"Nature can do without man, but man cannot do without nature."

Davis VP2 + VP2 Solar + VP2 UV + Lightning Detector + Logitech Webcam

Offline MCHALLIS

  • Posts: 2,156
  • Long Beach, WA USA
    • Weather for Long Beach, WA USA
Re: sitemap.xml
« Reply #10 on: August 30, 2008, 10:31:58 PM »
I keep getting this error on my sitemap....

- Leading whitespace
We've detected that your Sitemap file begins with whitespace. We've accepted the file, but you may want to remove the whitespace so that the file adheres to the XML standard. 

Ive tried changing a few things, but nothing changes with that error

http://www.snoqualmieweather.com/sitemap.xml

what is generating your sitemap? a program or are you doing it manually
The error is true, there is a space at the top of the file before the opening xml tag

Offline MCHALLIS

  • Posts: 2,156
  • Long Beach, WA USA
    • Weather for Long Beach, WA USA
Re: sitemap.xml
« Reply #11 on: August 30, 2008, 10:34:05 PM »
Here is a free sitemap generator I use on some of my sites.

GSiteCrawler
http://gsitecrawler.com/

Offline ALITTLEweird1

  • Mark
  • Global Moderator
  • Posts: 5,140
  • North Bend, WA
    • North Bend Weather
Re: sitemap.xml
« Reply #12 on: August 30, 2008, 10:35:37 PM »
I'll give that  a try.

Im using my web program to make the .txt file and then upload it and then rename it to .xml
"Nature can do without man, but man cannot do without nature."

Davis VP2 + VP2 Solar + VP2 UV + Lightning Detector + Logitech Webcam

Offline drobbins

  • Posts: 1,694
  • Kentucky, USA
    • Cave Country Weather
Re: sitemap.xml
« Reply #13 on: August 31, 2008, 03:20:37 AM »
I am currently trying out the gsitecreawler and it is up to 4378 URLs and counting!  8O I think it is trying to include the entire WU through this page: http://www.cavecountryweather.com/wx13.php I paused it for now until I can figure out how to have it stop at my page.

Offline ALITTLEweird1

  • Mark
  • Global Moderator
  • Posts: 5,140
  • North Bend, WA
    • North Bend Weather
Re: sitemap.xml
« Reply #14 on: August 31, 2008, 03:42:41 AM »
lmfao...i had the same prob...Go into the .xml file and remove each one manually. then upload the .xml file back to your site. that fixed mine.
"Nature can do without man, but man cannot do without nature."

Davis VP2 + VP2 Solar + VP2 UV + Lightning Detector + Logitech Webcam

 

cumulus