The Info List - Sitemaps

--- Advertisement ---

The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML
file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complements robots.txt, a URL exclusion protocol. Sitemaps are particularly beneficial on websites where:

some areas of the website are not available through the browsable interface webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines. The site is very large and there is a chance for the web crawlers to overlook some of the new or recently updated content When websites have a huge number of pages that are isolated or not well linked together, or When a website has few external links


1 History 2 File format

2.1 Element definitions

3 Other formats

3.1 Text file 3.2 Syndication feed

4 Search engine submission

4.1 Limitations for search engine indexing

5 Sitemap limits 6 Multilingual and multinational Sitemaps 7 See also 8 References 9 External links

History[edit] Google
first introduced Sitemaps 0.84 in June 2005 so web developers could publish lists of links from across their sites. Google, MSN
and Yahoo
announced joint support for the Sitemaps protocol in November 2006. The schema version was changed to "Sitemap 0.90", but no other changes were made. In April 2007, Ask.com and IBM announced support for Sitemaps. Also, Google, Yahoo, MS announced auto-discovery for sitemaps through robots.txt. In May 2007, the state governments of Arizona, California, Utah and Virginia announced they would use Sitemaps on their web sites. The Sitemaps protocol is based on ideas[1] from "Crawler-friendly Web Servers,"[2] with improvements including auto-discovery through robots.txt and the ability to specify the priority and change frequency of pages. File format[edit] The Sitemap Protocol format consists of XML
tags. The file itself must be UTF-8
encoded. Sitemaps can also be just a plain text list of URLs. They can also be compressed in .gz format. A sample Sitemap that contains just one URL and uses all optional tags is shown below.

<?xml version="1.0" encoding="utf-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>http://example.com/</loc> <lastmod>2006-11-18</lastmod> <changefreq>daily</changefreq> <priority>0.8</priority> </url> </urlset>

The Sitemap XML
protocol is also extended to provide a way of listing multiple Sitemaps in a 'Sitemap index' file. The maximum Sitemap size of 50 MiB or 50,000 URLs[3] means this is necessary for large sites. An example of Sitemap index referencing one separate sitemap follows.

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2014-10-01T18:23:17+00:00</lastmod> </sitemap> </sitemapindex>

Element definitions[edit] The definitions for the elements are shown below:[3]

Element Required? Description

<urlset> Yes The document-level element for the Sitemap. The rest of the document after the '<?xml version>' element must be contained in this.

<url> Yes Parent element for each entry.

<sitemapindex> Yes The document-level element for the Sitemap index. The rest of the document after the '<?xml version>' element must be contained in this.

<sitemap> Yes Parent element for each entry in the index.

<loc> Yes Provides the full URL of the page or sitemap, including the protocol (e.g. http, https) and a trailing slash, if required by the site's hosting server. This value must be shorter than 2,048 characters. Note that ampersands in the URL need to be escaped as &amp;.

<lastmod> No The date that the file was last modified, in ISO 8601 format. This can display the full date and time or, if desired, may simply be the date in the format YYYY-MM-DD.

<changefreq> No How frequently the page may change:

always hourly daily weekly monthly yearly never

"Always" is used to denote documents that change each time that they are accessed. "Never" is used to denote archived URLs (i.e. files that will not be changed again). This is used only as a guide for crawlers, and is not used to determine how frequently pages are indexed. Does not apply to <sitemap> elements.

<priority> No The priority of that URL relative to other URLs on the site. This allows webmasters to suggest to crawlers which pages are considered more important. The valid range is from 0.0 to 1.0, with 1.0 being the most important. The default value is 0.5. Rating all pages on a site with a high priority does not affect search listings, as it is only used to suggest to the crawlers how important pages in the site are to one another. Does not apply to <sitemap> elements.

Support for the elements that are not required can vary from one search engine to another.[3] Other formats[edit] Text file[edit] The Sitemaps protocol allows the Sitemap to be a simple list of URLs in a text file. The file specifications of XML
Sitemaps apply to text Sitemaps as well; the file must be UTF-8
encoded, and cannot be more than 10 MB large or contain more than 50,000 URLs,[4] but can be compressed as a gzip file.[3] Syndication feed[edit] A syndication feed is a permitted method of submitting URLs to crawlers; this is advised mainly for sites that already have syndication feeds. One stated drawback is this method might only provide crawlers with more recently created URLs, but other URLs can still be discovered during normal crawling.[3] It can be beneficial to have a syndication feed as a delta update (containing only the newest content) to supplement a complete sitemap. Search engine submission[edit] If Sitemaps are submitted directly to a search engine (pinged), it will return status information and any processing errors. The details involved with submission will vary with the different search engines. The location of the sitemap can also be included in the robots.txt file by adding the following line:

Sitemap: <sitemap_location>

The <sitemap_location> should be the complete URL to the sitemap, such as:


This directive is independent of the user-agent line, so it doesn't matter where it is placed in the file. If the website has several sitemaps, multiple "Sitemap:" records may be included in robots.txt, or the URL can simply point to the main sitemap index file. The following table lists the sitemap submission URLs for several major search engines:

Search engine Submission URL Help page Market

Baidu http://zhanzhang.baidu.com/dashboard/index Baidu
Webmaster Dashboard China, Hong Kong, Singapore

Bing (and Yahoo!) http://www.bing.com/webmaster/ping.aspx?siteMap= Bing Webmaster Tools Global

Google http://www.google.com/webmasters/tools/ping?sitemap= Submitting a Sitemap Global

Yandex http://webmaster.yandex.com/site/map.xml Sitemaps files Russia, Ukraine, Belarus, Kazakhstan, Turkey

Sitemap URLs submitted using the sitemap submission URLs need to be URL-encoded, for example: replacing : (colon) with %3A, / (slash) with %2F.[3] Limitations for search engine indexing[edit] Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.

- Webmaster Support on Sitemaps: "Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google
processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one."[5] Bing - Bing uses the standard sitemaps.org protocol and is very similar to the one mentioned below. Yahoo
- After the search deal commenced between Yahoo!
Inc. and Microsoft, Yahoo!
Site Explorer has merged with Bing Webmaster Tools

Sitemap limits[edit] Sitemap files have a limit of 50,000 URLs and 50MiB per sitemap. Sitemaps can be compressed using gzip, reducing bandwidth consumption. Multiple sitemap files are supported, with a Sitemap index file serving as an entry point. Sitemap index files may not list more than 50,000 Sitemaps and must be no larger than 50MiB (52,428,800 bytes) and can be compressed. You can have more than one Sitemap index file.[3] As with all XML
files, any data values (including URLs) must use entity escape codes for the characters ampersand (&), single quote ('), double quote ("), less than (<), and greater than (>). Multilingual and multinational Sitemaps[edit] In December 2011, Google
announced the annotations for sites that want to target users in many languages and, optionally, countries. A few months later Google
announced, on their official blog,[6] that they are adding support for specifying the rel="alternate" and hreflang annotations in Sitemaps. Instead of the (until then only option) HTML link elements the Sitemaps option offered many advantages which included a smaller page size and easier deployment for some websites. One example of the Multilingual Sitemap would be as followed If for example we have a site that targets English language users through http://www.example.com/en and Greek language users through http://www.example.com/gr, up until then the only option was to add the hreflang annotation either in the HTTP header or as HTML elements on both URLs like this

<link rel="alternate" hreflang="en" href="http://www.example.com/en" > <link rel="alternate" hreflang="gr" href="http://www.example.com/gr" >

But now, one can alternatively use the following equivalent markup in Sitemaps:

1 <url> 2 <loc>http://www.example.com/en</loc> 3 <xhtml:link 4 rel="alternate" 5 hreflang="gr" 6 href="http://www.example.com/gr" /> 7 <xhtml:link 8 rel="alternate" 9 hreflang="en" 10 href="http://www.example.com/en" /> 11 </url> 12 <url> 13 <loc>http://www.example.com/gr</loc> 14 <xhtml:link 15 rel="alternate" 16 hreflang="gr" 17 href="http://www.example.com/gr" /> 18 <xhtml:link 19 rel="alternate" 20 hreflang="en" 21 href="http://www.example.com/en" /> 22 </url>

See also[edit]

Biositemap Metadata Resources of a Resource Yahoo!
Site Explorer Google
Webmaster Tools


^ M.L. Nelson; J.A. Smith; del Campo; H. Van de Sompel; X. Liu (2006). "Efficient, Automated Web Resource Harvesting" (PDF). WIDM'06.  ^ O. Brandman, J. Cho, Hector Garcia-Molina, and Narayanan Shivakumar (2000). "Crawler-friendly web servers". Proceedings of ACM SIGMETRICS Performance Evaluation Review, Volume 28, Issue 2. doi:10.1145/362883.362894. CS1 maint: Multiple names: authors list (link) ^ a b c d e f g " Sitemaps XML
format". Sitemaps.org. 2016-11-21. Retrieved 2016-12-01.  ^ https://support.google.com/webmasters/bin/answer.py?hl=en&answer=183668 ^ "About Google
Sitemaps". Google.com. 2016-12-01. Retrieved 2016-12-01.  ^ "Multilingual and multinational site annotations in Sitemaps". Google
Webmaster Central Blog. Pierre Far. May 24, 2012. 

External links[edit]

Official website "Major Search Engines Unite to Support a Common Mechanism for Website Submission". Google. Nov 16, 2006.  Google
news groups

Sitemaps (archived) Webmaster help - Sitemap

v t e



Alphabet Inc. History List of mergers and acquisitions by Alphabet Products Criticism

Privacy concerns

Censorship Easter eggs Don't be evil


AdMob Adscape AdSense AdWords Analytics Contributor Partners DoubleClick DoubleClick
for Publishers Wallet


Allo Alerts Apps Script Duo Calendar Contacts Gmail

history interface

Google+ Groups Hangouts Inbox Sync Text-to-Speech Translate Transliteration Voice





for Android for iOS Chrome Web Store Apps Extensions

Cloud Print Earth

Sky Moon Mars

Gadgets Gboard Goggles IME

Japanese Pinyin

Photos Keep News & Weather Now OpenRefine Search

Operating systems


version history software development Android Auto Android TV

Chrome OS

Chromebit Chromebook Chromebox Chrome Zone

Fuchsia Wear OS

Programming languages

Dart Go Sawzall


Account Authenticator Body Books

Library Project

Caja Virtual reality

Cardboard Daydream

Cast Chromecast Cloud Platform

App Engine BigQuery Bigtable Compute Engine Storage

Contact Lens Custom Search Daydream Earth Engine Fit GFS Firebase G Suite


Home Jamboard Marketplace Native Client Nexus OnHub OpenSocial Pay


Primer Pixel Play

Books Games Movies & TV Music Newsstand

Public DNS Safe Browsing URL Shortener Wifi YouTube

Development tools

AJAX APIs App Inventor Closure Tools Developers Dialogflow Flutter GData Googlebot Guava Guice GWS KML Kythe MapReduce Mediabot Sitemaps Summer of Code Web Toolkit Search Console Website Optimizer Swiffy


Blogger Bookmarks Drive Docs, Sheets, Slides, Forms Drawings My Maps Sites Fusion Tables Domains FeedBurner Map Maker YouTube YouTube
Instant YouTube
Red Vevo Zagat

Search (timeline)

Appliance Blog Search Books

Ngram Viewer

Custom Search Finance Flights Images Maps

Mars Moon Sky Street View

Coverage Competition Privacy concerns



Patents Public Data Scholar Shopping Tenor Usenet Videos


PageRank Panda Penguin Hummingbird


Personalized Real-Time Instant Search SafeSearch Voice Search


Insights for Search Trends Knowledge Graph Knowledge Vault


Aardvark Answers Ara Browser Sync Base Buzz Checkout Chrome Frame Click-to-Call Cloud Connect Code Search Currents Desktop Dictionary Directory Dodgeball Fast Flip Friend Connect Gears Glass Glass OS GOOG-411 Google
TV Jaiku Knol Health iGoogle Image Labeler Labs Latitude Lively Mashup Editor Notebook Offers Orkut Pack Page Creator Panoramio Picasa Picasa
Web Albums Picnik PowerMeter Questions and Answers Reader Script Converter SearchWiki Sidewiki Slide Squared Talk Toolbar Updater Urchin Videos Wallet Wave Web Accelerator


Arts & Culture Calico Current Chrome Experiments Code-in Code Jam Developer Day Google
Business Groups Made with Code Data Liberation


Developer Expert Google
for Work Self-driving car Earth Outreach Fiber GV "Google" Google
China Google
Express Googlization Grants Google.org Lunar X Prize Project Fi Material Design Motorola Mobility reCAPTCHA WiFi X


Science Fair Searchology I/O Developer Day Code Jam Highly Open Participation Contest Code-in


Loon Tango Sunroof

Real estate

111 Eighth Avenue Chelsea Market Googleplex


Doodle4Google Google


Al Gore Alan Eustace Alan Mulally Amit Singhal Ann Mather David Drummond Eric Schmidt Jeff Dean John Doerr John L. Hennessy Krishna Bharat Matt Cutts Patrick Pichette Paul Otellini Omid Kordestani Rachel Whetstone Rajen Sheth Ram Shriram Ray Kurzweil Ruth Porat Salar Kamangar Sanjay Ghemawat Shirley M. Tilghman Sundar Pichai Susan Wojcicki Urs Hölzle Vint Cerf Hal Varian Gayglers


Larry Page Sergey Brin


AI Challenge Bomb GmailFS "Google: Behind the Screen" (2006 documentary) Google: The Thinking Factory (2008 documentary) Google
and the World Brain (2013 documentary) Goojje Monopoly City Streets Unity