How To Create A Sitemap For Google To Index
Help Google To Properly Index The Web Pages In Your Website Folder
When setting up a Google Analytics account you are first asked to create a standard Google Account. The Google Analytics account is then assigned to that
standard Google Account. After logging into your Google Analytics account it is normal to want to view your website's statistics, but what many people do
not realise is that the google account offers more than just analytics. It has many more free tools to help you with your website promotion.
One of these tools is called Webmaster Tools. It has a feature inside it that allows you to submit a Sitemap file, which I will be showing you how to
create later in this section. This tool allows you to inform certain search engines about web page files on your website that have not been spidered/crawled
(search for) yet using a sitemap file.
WHAT IS A SITEMAP?
A sitemap file is a .xml formatted file that contains one or more of your web page links. Submitting a sitemap.xml file to a search engine, such as
Google or Yahoo, informs them of your web page links and more precisely of web page links they have not crawled (spidered/searched) yet for whatever
reason(s).
This does not mean they will crawl your web page links just because you submitted a sitemap.xml file. It means you are only informing them of your web
page links, giving them a nudge. A sitemap file also allows you to include additional information about each URL (web page link). When it was last updated,
how often it changes and how important it is, priority-wise, in relation to the other URLs in the list.
Just to clarify. Web Page links here are the names of your web pages (i.e. http://www.???.com/index.htm, http://www.???.com/about_us.htm, etc) - Internal links (i.e. a link to microsoft.com) are not crawled. You are creating a map file of your own site and not a map file of internal-links-to-websites.
GETTING STARTED
WebMaster Tools is accessible, after logging in to your google analytics account, by clicking on the MY ACCOUNT link in the top corner of the google account page or by going straight to the webmaster tools web page: www.google.com/webmasters/tools/ (if you do not have a google analytics account but do have a google account). Either way, the ADD A SITE link should be available to you once you have logged into one of these accounts.
Fig 1.0 Click on ADD A SITE, enter your website address and then click on CONTINUE
You need to add a website in order to inform the search engine which website you are submitting with a sitemap file. So click on the expandable ADD A SITE button to display its Website Address edit box and then enter your website address into it. Clicking on the CONTINUE button will then take you to the Verify Ownership page where you need to Copy & Paste the unique meta-tag code into your INDEX web page (i.e. index.html or index.htm). DO NOT CLICK ON THE VERIFY BUTTON YET.
Fig 1.1 Select ALL of the META-TAG Code, COPY it and then PASTE it into your Index web page
Before you click on the VERIFY button (above) you must first Copy & Paste the unique meta-tag code into your INDEX web page and then upload that index
web page to your web space (public_html website folder). Only then should you click on the VERIFY button.
Once your website has been verified you should be taken to the Dashboard. If not, go back to the main webmaster tools web page and click on your verified
website address link. On the dashboard page you then click on the expandable SUBMIT A SITEMAP link which takes you to the Sitemap page (Fig 1.3 below).
Fig 1.2 The Dashboard page - Click on the SUBMIT A SITEMAP link to continue
When you arrive on the Sitemap page the next thing to do is enter the name of your sitemap xml file into the SITEMAP edit box - The name of the sitemap xml file should be generic, such as sitemap.xml. When you have entered the file name click on the SUBMIT SITEMAP button to continue.
Fig 1.3 Enter the name of your sitemap xml file and then click on the SUBMIT SITEMAP button
After clicking on the SUBMIT SITEMAP button the sitemap xml file is looked for inside your public_html folder. If it is found the sitemaps page will refresh and create a link for your sitemap xml file (below). This means success. In which case, click on the sitemap.xml file link to continue (below).
Fig 1.4 Click on the sitemap.xml file link to continue - Note TOTAL URLS and INDEXED URLS
Clicking on the sitemap.xml file link and then clicking on the GO BACK link (below) simply refreshes the status of the sitemap for you (Fig 1.6 below). This just saves you from signing out, for example, in order to see/notice the changes made to the TOTAL URLS and INDEXED URLS status which may have been estimated (or plain incorrect!) by google upon submission.
Fig 1.5 Click on the GO BACK link to continue
Fig 1.6 The sitemap.xml file has been submitted successfully - Note TOTAL URLS and INDEXED URLS
TOTAL URLS is the total number of urls (web page links) found in the sitemap.xml file and INDEXED URLS is the number of urls (web page links) actually
indexed (crawled/searched for) so far. In this example I have 41 urls indexed out of 52, which stands to reason as I have only just created and uploaded
the other urls (web pages).
One thing to remember here is that Google does not not guarantee or promise to index any urls, and if they do index any urls it could takes weeks/months
to do so. Therefore. When your sitemap.xml file has been successfully submitted JUST WAIT.....to see if there is any improvement to your overall web page
listings. If not, it could be that Google has not indexed your website, for whatever reason(s), or that something is wrong with your website structure
(i.e. missing files, broken links, etc). In which case you need to investigate further.
CREATE A SITEMAP XML FILE
Now that you know how to submit a sitemap xml file I will now teach you how to create one manually. If you want to create one automatically use the free, popular, online sitemap file generator at http://www.xml-sitemaps.com/. It will scan your website for web pages and then put their urls (i.e. http://www.???.com/index.htm, http://www.???.com/contact_us.htm, etc) into a downloadable sitemap.xml file or sitemap.html file.
And if you want to create a sitemap file from a folder on your computer that contains web pages use the free, popular, program called GSitemap at http://www.vigos.com/products/gsitemap/. It creates a sitemap.xml file that you can then save into a folder, which in turn you can upload to your public_html (website) folder.
HTML Sitemap / XML Sitemap
Do not confuse a sitemap.xml file with a html sitemap file (sitemap.html). They use two different formats - xml and html. A html sitemap file is good if you want to create a web page that contains your sitemap (urls) inside it (which you could use (take from) the sitemap.xml file). Ideally, you would then have a link to that sitemap.html web page at the bottom of one or more of your other web pages. For example. At the bottom of your index web page you might have these links: contact us about us sitemap terms/conditions.
THE Sitemap XML File
Fig 2.0 below shows a very simple sitemap.xml file consisting of two URLs only. I have highlighted one of the URL code blocks to show you the main parts you can edit. The rest of the file is standard and does not need editing. This sitemap file was created by xml-sitemap.com (see above). The structure of a sitemap file is as follows.
Fig 2.0 A very simple sitemap.xml file with two URLs inside it
<?xml version="1.0" encoding="UTF-8"?>
This specifies what version of the XML language you are using and more precisely that you are declaring your sitemap file as a XML formatted (coded) file. The encoding part just declares what character encoder (UTF-8 or UTF-16) should be used for the text (Unicode/Non-Standard characters) in this sitemap file, if any. UTF-8 Unicode characters include/recognize characters such as the French ê è and é. This makes UTF-8/Unicode ideal for using bilingual text, in general. Do not worry if all this programming talk isn't making sense! Just leave this code line as it is.
<urlset>
URLSET is a xml tag, just like a html tag, that is encased by the Less Than (<) and More Than (>) opening/closing brackets. Within the
tag you can insert Schemas and Attributes - As with the above XML line, unless you want to learn programming just leave this urlset line as it is (how it
was automatically created).
At the bottom of your sitemap file you should have a closing urlset tag (</urlset>). Only one set of urlset tags are required in a sitemap.xml
file. Below are two variations of the urlset opening tag, with schemas and attributes, you can use. Both were created with different online sitemap
generators.
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
As you can see, both urlsets use roughly the same schemas and attributes. And you will find this if you experiment using different sitemap file generators.
The line that might follow an urlset is a comment line. In the above example (Fig 2.0) <!-- created with Free Online Sitemap Generator www.xml-sitemaps.com -->
was used to acknowledge the website that created the sitemap.xml file. As said above though, just leave urlset as it is. The comment line can be deleted
if you wish, but it is nice to leave it there as a Thank You and to keep the popularity of that website alive.
<url>
The code in between the opening (<url>) and closing (</url>) URL tags is the actual code that makes up each URL listing. For
example. In Fig 2.0 above the url tags contain the <loc> (URL/Web Page Location) tags, the <lastmod> (Date Last Modified) tags,
the <changefreq> (Change/Update Frequency) tags and the <priority> (Priority) tags.
These blocks of code (tags) are inserted into your sitemap for each web page you want the search engine to crawl. So if a generated sitemap file contained
a web page url you did not want included simply highlight that url block of code (as in Fig 2.0 above) and delete it. Just to clarify. The block of code
starts with the <url> tag and ends with the </url> tag, and includes the code (tags) in between those url tags.
<loc>
The code in between the opening <loc> and closing </loc> LOC tags contains the actual url (address) of the web page you want a search engine to crawl (i.e. http://www.websitecreationhelp.com/index.htm). LOC tags are the only required tags within the URL tags. The other tags within the URL tags are optional.
<lastmod>
The code in between the opening <lastmod> and closing </lastmod> LASTMOD tags contains the date/time of when the web page in
question was last modified. In this example the date/time was automatically set by the sitemap generator, but you can make your own combination of
date/time. Here are the date/time formats:
YYYY - Example: 2009
YYYY-MM - Example: 2009-08
YYYY-MM-DD - Example: 2009-08-17
YYYY-MM-DDThh:mmTZD - Example: 2009-08-17T19:20+01:00 T is the Time (Hrs/Mins) and TZD is the Time Zone
Designator (Z or +hh:mm or -hh:mm)
YYYY-MM-DDThh:mm:ssTZD - Example: 2009-08-17T19:20:33+01:00 T is the Time (Hrs/Mins/Secs) and TZD is the Time
Zone Designator (Z or +hh:mm or -hh:mm)
YYYY-MM-DDThh:mm:ssTZD - Example: 2009-08-17T19:20:33:46+01:00 T is the Time (includes MilliSecs) and TZD is
the Time Zone Designator (Z or +hh:mm or -hh:mm)
<changefreq>
The code in between the opening <changefreq> and closing </changefreq> CHANGEFREQ tags is a word that relates to the frequency
of how often you normally update this web page's content (the web page mentioned in the url tags). The word can be:
always - The web page's content changes each time it is viewed.
hourly, daily, weekly, monthly or yearly - The web page's content changes each hour, day, week, month or year.
never - The web page's content never changes. It has static content.
<priority>
The code in between the opening <priority> and closing </priority> PRIORITY tags indicates the priority value of this url
(the url (web page address) mentioned in the url tags) relative to other urls in this sitemap file. Values range from 0.0 to 1.0.
This value does not affect how your web pages are compared against other website's web pages (i.e. your competitions web pages). It simply tells a search
engine what web pages you consider most important amongst your own web pages. The default priority value for a web page is 0.5, but you can change each
web page's priority value between 0.0 and 1.0.
Do not give each web page (url) the same priority value though, otherwise they will all be of the same priority of course. You are better off using a
sitemap generator to calculate each web page's priority value for you.
Note well: The above tags and their settings (attributes/values) are there to help search engines better crawl your website, based on the urls in your
sitemap/xml file. They will not directly influence the position of those urls (web pages) in a search engine's result.
Also note: The changefreq word you specify for a web page (i.e. hourly) does not mean a search engine will crawl your website within that time frame. It
may crawl your website earlier or later than that time frame. It may even crawl a web page with a NEVER changefreq, just so it can handle unexpected changes
to that web page. More information about the XML protocol (format) in relation to the sitemap file can be found at
http://www.sitemaps.org/ (click on their PROTOCOL link).
So to sum up. If you want an easy life use a sitemap generator to automatically build your sitemap.xml file for you. And if you need to manually edit it
thereafter, for whatever reason(s), use a text editor such as Notepad. When the sitemap.xml file has been created, and edited if need be, upload it to
your public_html (website) folder and then submit it to Google through your Google Account.
You can create a sitemap.xml file from scratch, as exampled above, but it is easier just to use a sitemap generator. Note: In these examples I have dealt
with Google, but the sitemap.xml file can be used by other search engines too.