SUBMIT  A  SITEMAP  XML  FILE  TO  GOOGLE

When setting up a Google Analytics account you are first asked to create a standard Google Account. The Google Analytics account is then assigned to that standard Google Account. After logging into your Google Analytics account you will normally want to view your website's statistics, which is quite normal of course. However. What many people do not realise, when first opening a google analytics account, is that google has many more free tools to help you with website promotion. One of these tools is called Webmaster Tools. It has a feature inside it that allows you to submit a Sitemap file, and it is a sitemap file that I will be showing you how to create later in this section.

A sitemap file is a .xml formatted file that contains one or more of your web page links. Submitting a sitemap.xml file to a search engine, such as Google or Yahoo, informs them of your web page links and more precisely of web page links they have not crawled (spidered/searched) yet.....for whatever reason(s). This does not mean they will crawl your web page links just because you submitted a sitemap.xml file. It means you are only informing them of your web page links, giving them a nudge. A sitemap file also allows you to include additional information about each URL (web page link). When it was last updated, how often it changes and how important it is, priority-wise, in relation to the other URLs in the list. Just to clarify. Web Page links here are the names of your web pages (i.e. http://www.???.com/index.htm, http://www.???.com/about_us.htm, etc) - Internal links (i.e. a link to microsoft.com) are not crawled. You are creating a map file of your site and not a map file of internal-links-to-websites.

WebMaster Tools is accessible, after logging in to your google analytics account, by clicking on the MY ACCOUNT link in the top corner of the google account page. The google accounts page will show you what you have previously signed up for as well as show you offers for Google Adwords, Google Adsense and so on. You can also edit your account details from the google account page. In this example though you need to click on the WEBMASTER TOOLS link (Fig 1.0) to take you to the Google Webmaster Tools page (Fig 1.1). From there, click on the link belonging to the website you want to submit a sitemap file for.



Fig 1.0  Click on the WEBMASTER TOOLS link to continue




Fig 1.1  Click on the link belonging to the website you want to submit a sitemap file for




Fig 1.2  Expand the SITE CONFIGURATION links and then click on the SITEMAPS sub-link to continue

After clicking on the link belonging to the website (Fig 1.1 above) you will see a new set of links on the left-side of the Webmaster Tools window. Heading these set of links is SITE CONFIGURATION. Click on it to expand it and then select the SITEMAP link (Fig 1.2 above) to use the sitemap submission tool.



Fig 1.3  Click on the + SUBMIT A SITEMAP button in order to see and use the sitemap submission tool

When you arrive on the Sitemaps page (above) the first thing you need to do is click on the + SUBMIT A SITEMAP button so that you can see and use the sitemap submission tool (edit box). The sitemap submission tool requires you to enter the name of a sitemap xml file into its URL Edit Box. In this example I have used the generic file name sitemap.xml, although I could of used any name (i.e. weblinks.xml). This file is expected to be in your public_html (website) folder. If it is not, you will receive an error message. So make sure you have uploaded it to your public_html folder before you click on the SUBMIT SITEMAP button to proceed.



Fig 1.4  Make sure you have uploaded your sitemap xml file to your public_html folder before submitting it to Google

After clicking on the SUBMIT SITEMAP button (Fig 1.3 above) the sitemap xml file is looked for in your public_html folder. If it is found the sitemaps page will refresh and create a link for your sitemap xml file (below). This means success. In which case, click on the sitemap.xml file link to continue (below).



Fig 1.5  Click on the sitemap.xml file link to continue - Note TOTAL URLS and INDEXED URLS




Fig 1.6  Click on the GO BACK link to continue




Fig 1.7  The sitemap.xml file has been submitted successfully - Note TOTAL URLS and INDEXED URLS

Clicking on the sitemap.xml file link in Fig 1.5 above, and then clicking on the GO BACK link in Fig 1.6 above, simply refreshes the status of the sitemap for you (Fig 1.7 above). This just saves you from signing out, for example, in order to see/notice the changes made to the TOTAL URLS and INDEXED URLS status. TOTAL URLS is the total number of urls (web page links) found in the sitemap.xml file and INDEXED URLS is the number of urls (web page links) actually indexed (crawled/searched for) so far. In this example I have 41 urls indexed out of 66. One thing to remember here is that Google does not not guarantee or promise to index any urls, and if they do index any urls it could takes weeks/months to do so. Therefore. When your sitemap.xml file has been successfully submitted JUST WAIT.....to see if there is any improvement to your overall web page listings. If not, it could be that Google has not indexed your website, for whatever reason(s), or that something is wrong with your website structure (i.e. missing files, broken links, etc). In which case you need to investigate further.

CREATE  A  SITEMAP  XML  FILE

Now that you know how to submit a sitemap xml file I will now teach you how to create one manually. If you want to create one automatically use the free, popular, online sitemap file generator at http://www.xml-sitemaps.com/. It will scan your website for web pages and then put their urls (i.e. http://www.???.com/index.htm, http://www.???.com/contact_us.htm, etc) into a downloadable sitemap.xml file or sitemap.html file. And if you want to create a sitemap file from a folder on your computer, that contains web pages of course, use the free, popular, program called GSitemap at http://www.vigos.com/products/gsitemap/. It creates a sitemap.xml file that you can then save into a folder, which in turn you can upload to your public_html (website) folder.

A html sitemap file (sitemap.html) is good if you want to create a html web page with your sitemap (urls) inside it. Ideally, you would then have a link to the sitemap.html web page at the bottom of one or more of your other web pages. For example. At the bottom of your index web page you might have these links: contact us  about us  sitemap terms/conditions.

Fig 2.0 below shows a very simple sitemap.xml file consisting of two URLs only. I have highlighted one of the URL code blocks to show you the main parts you can edit. The rest of the file is standard and does not need editing. This sitemap file was created by xml-sitemap.com (see above). The structure of a sitemap file is as follows.



Fig 2.0  A very simple sitemap.xml file with two URLs inside it

<?xml version="1.0" encoding="UTF-8"?>

This specifies what version of the XML language you are using and more precisely that you are declaring your sitemap file as a XML formatted/coded file. The encoding part just declares what character encoder (UTF-8 or UTF-16) should be used for the text (Unicode/Non-Standard characters) in this sitemap file, if any. UTF-8 Unicode characters include/recognize characters such as the French ê è and é. This makes UTF-8/Unicode ideal for using bilingual text, in general. Do not worry if all this programming talk isn't making sense! Just leave this code line as it is.

<urlset>

URLSET is a xml tag, just like a html tag, that is encased by the Less Than (<) and More Than (>) opening/closing brackets. Within the tag you can insert Schemas and Attributes - As with the above XML line, unless you want to learn programming just leave this urlset line as it is (how it was automatically created). At the bottom of your sitemap file you should have a closing urlset tag (</urlset>). Only one set of urlset tags are required in a sitemap.xml file. Below are two variations of the urlset opening tag, with schemas and attributes, you can use. Both were created with different online sitemap generators.

<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">


<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">


As you can see, both urlsets use roughly the same schemas and attributes. And you will find this if you experiment using different sitemap file generators. The line that might follow an urlset is a comment line. In the above example (Fig 2.0) <!-- created with Free Online Sitemap Generator www.xml-sitemaps.com --> was used to acknowledge the website that created the sitemap.xml file. As said above though, just leave urlset as it is. The comment line can be deleted if you wish, but it is nice to leave it there as a Thank You and to keep the popularity of that website alive.

<url>

The code in between the opening (<url>) and closing (</url>) URL tags is the actual code that makes up each URL listing. For example. In Fig 2.0 above the url tags contain the <loc> (URL/Web Page Location) tags, the <lastmod> (Date Last Modified) tags, the <changefreq> (Change/Update Frequency) tags and the <priority> (Priority) tags. These blocks of code (tags) are inserted into your sitemap for each web page you want the Google search engine to crawl. So if a generated sitemap file contained a web page url you did not want included simply highlight that url block of code (as in Fig 2.0 above) and delete it. Just to clarify. The block of code starts with the <url> tag and ends with the </url> tag, and includes the code (tags) in between those url tags.

<loc>

The code in between the opening <loc> and closing </loc> LOC tags contains the actual url (web page address) of the web page you want the Google search engine to crawl (i.e. http://www.websitecreationhelp.com/index.htm). LOC tags are the only required tags within the URL tags. The other tags within the URL tags are optional.

<lastmod>

The code in between the opening <lastmod> and closing </lastmod> LASTMOD tags contains the date/time of when the web page in question was last modified. In this example the date/time was automatically set by the sitemap generator, but you can make your own combination of date/time. Here are the date/time formats:

YYYY - Example: 2009

YYYY-MM - Example: 2009-08

YYYY-MM-DD - Example: 2009-08-17

YYYY-MM-DDThh:mmTZD - Example: 2009-08-17T19:20+01:00      T is the Time (i.e. Hrs/Mins) and TZD is the Time Zone Designator (Z or +hh:mm or -hh:mm)

YYYY-MM-DDThh:mm:ssTZD - Example: 2009-08-17T19:20:33+01:00      T is the Time (i.e. Hrs/Mins/Secs) and TZD is the Time Zone Designator (Z or +hh:mm or -hh:mm)

YYYY-MM-DDThh:mm:ssTZD - Example: 2009-08-17T19:20:33:46+01:00      T is the Time (i.e. includes MilliSecs) and TZD is the Time Zone Designator (Z or +hh:mm or -hh:mm)

<changefreq>

The code in between the opening <changefreq> and closing </changefreq> CHANGEFREQ tags is a word that relates to the frequency of how often you normally update this web page's content (the web page mentioned in the url tags). The word can be:

always - The web page's content changes each time it is viewed.

hourly, daily, weekly, monthly or yearly - The web page's content changes each hour, day, week, month or year.

never - The web page's content never changes. It has static content.

<priority>

The code in between the opening <priority> and closing </priority> PRIORITY tags indicates the priority value of this url (the url (web page address) mentioned in the url tags) relative to other urls in this sitemap file. Values range from 0.0 to 1.0. This value does not affect how your web pages are compared against other website's web pages (i.e. your competitions web pages). It simply tells the Google search engine what web pages you consider most important amongst your own web pages. The default priority value for a web page is 0.5, but you can change each web page's priority value between 0.0 and 1.0 - Do not give each web page (url) the same priority value though, otherwise they will all be of the same priority of course. You are better off using a sitemap generator to calculate each web page's priority value for you.

Note well: The above tags and their settings (attributes/values) are there to help search engines better crawl your website, based on the urls in your sitemap/xml file. They will not directly influence the position of those urls (web pages) in a search engine's result. Also note: The changefreq word you specify for a web page (i.e. hourly) does not mean a search engine will crawl your website within that time frame. It may crawl your website earlier or later than that time frame. It may even crawl a web page with a NEVER changefreq, just so it can handle unexpected changes to that web page. More information about the XML protocol (format) in relation to the sitemap file can be found at http://www.sitemaps.org/ (click on their PROTOCOL link).

So to sum up. If you want an easy life use a sitemap generator to automatically build your sitemap.xml file for you. And if you need to manually edit it thereafter, for whatever reason(s), use a text editor such as Notepad. When the sitemap.xml file has been created, and edited if need be, upload it to your public_html (website) folder and then submit it to Google through your Google Account. You can create a sitemap.xml file from scratch, as exampled above, but it is easier just to use a sitemap generator. Note: In these examples I have dealt with Google, but the sitemap.xml file can be used by other search engines as well.

??? Index HTML - Build A Website