Sitemap xml where it is located. A detailed guide to the Sitemap file. Technical details of the sitemap

Typically, to create a sitemap.xml, use one of the following options: online service s, CMS modules, specialized programs or manually. Below we will look at each of the tools in detail.

How to create a sitemap online

There are a lot of services on the Internet that allow you to create a sitemap for search robots. Here are the most popular ones:

These services work great and perform their functions. However, as a rule, they have a limit on the number of pages taken into account (usually 500 pieces). In addition, if the site has poor navigation and some documents are quite difficult for parcels to reach, then most likely these pages will not be included in the sitemap.xml.

How to make sitemap.xml using CMS add-ons

Most popular systems administration have add-ons that allow them to create a sitemap automatically or manual mode. This is the most convenient way working with a site map for large resources, with new materials constantly published. As practice shows, you can find several suitable add-ons; you just have to choose the one that best suits your goals.

For example, for WordPress such a plugin is Google XML Sitemaps, and for Joomla - the Xmap component. In addition, on many engines the ability to create a sitemap file is present in the original configuration (for example, 1c-Bitrix or DataLife Engine).

How to make a sitemap using Xenu

Xenu is one of the most popular software products, created for SEO specialists. This program can not only create a sitemap for a website, it has a lot of useful functions– checking broken links, identifying redirects and much more.

It should be noted that Xenu is not the only program that allows you to create a sitemap.

How to create a sitemap for a website manually

The most labor-intensive, but at the same time the right way– manually. It is used when other options are not suitable. This could happen, for example, if too much large number unnecessary pages to the site map, or a site with poor navigation that does not use a CMS.


After you create the sitemap.xml, be sure to check the resulting file. This can be done using the service in the Yandex webmaster panel, which is located at

The robots.txt and sitemap.xml files make it possible to organize site indexing. These two files complement each other well, although at the same time they solve opposite problems. If robots.txt serves to prohibit indexing of entire sections or individual pages, then sitemap.xml, on the contrary, tells search robots which URLs need to be indexed. Let's analyze each of the files separately.

Robots.txt file

robots.txt is a file in which rules are written that restrict search robots’ access to directories and site files in order to avoid their contents being included in the search engine index. The file must be located in the root directory of the site and be available at: site.ru/robots.txt.

In robots.txt, you need to block all duplicate and service pages of the site from indexing. Often public CMS create duplicates, articles can be accessed in several URLs at the same time, for example; in categories site.ru/category/post-1/, tags site.ru/tag/post-1/ and archive site.ru/arhive/post-1/. In order to avoid duplicates, it is necessary to prohibit indexing of tags and the archive; only categories will remain in the index. By service pages, I mean the pages of the administrative part of the site and automatically generated pages, for example: results when searching on the site.

It is simply necessary to get rid of duplicates, as they deprive the site’s pages of uniqueness. After all, if the index contains several pages with the same content, but accessible at different URLs, then the content of none of them will be considered unique. As a result, search engines will forcefully lower the positions of such pages in the search results.

Robots.txt directives

Directives are rules, or you can also say commands for search robots. The most important one is User-agent, with its help you can set rules for all robots or for a specific bot. This directive is written first, and after it all other rules are indicated.

# For all robots User-agent: * # For Yandex robot User-agent: Yandex

Another mandatory directive is Disallow, with its help sections and pages of the site are closed, and its opposite is the Allow directive, which, on the contrary, forcibly allows the specified sections and pages of the site to be indexed.

# Prohibit indexing of the section Disallow: /folder/ # Allow indexing of the subsection with pictures Allow: /folder/images/

In order to indicate the main mirror of the site, for example: with or without www, the Host directive is used. It is worth noting that the main mirror is registered without specifying the http:// protocol, but the https:// protocol must be specified. Host is understood only by Yandex and Mail.ru bots and you only need to enter the directive once.

# If the main mirror works on http protocol without www Host: site.ru # If the main mirror works according to https protocol from www Host: https://www.site.ru

Sitemap is a directive indicating the path to the sitemap.xml file, the path must be specified in full with the protocol, this directive can be written anywhere in the file.

# Specify the full path to the sitemap.xml file Sitemap: http://site.ru/sitemap.xml

To simplify writing rules, there are special symbolic operators:

  • * - denotes any number of characters, as well as their absence;
  • $ - means that the symbol before the dollar sign is the last one;
  • # - denotes a comment, everything that is on the line after of this operator will be ignored by search robots.

After familiarizing yourself with the basic directives and special operators You can already sketch out the contents of a simple robots.txt file.

User-agent: * Disallow: /admin/ Disallow: /arhive/ Disallow: /tag/ Disallow: /modules/ Disallow: /search/ Disallow: *?s= Disallow: /login.php User-agent: Yandex Disallow: / admin/ Disallow: /arhive/ Disallow: /tag/ Disallow: /modules/ Disallow: /search/ Disallow: *?s= Disallow: /login.php # Allow the Yandex robot to index images in the modules section Allow: /modules/*. png Allow: /modules/*.jpg Host: site.ru Sitemap: http://site.ru/sitemap.xml

Acquainted with detailed description All directives with examples of their use can be found in the publication on the Yandex website in the help section.

Sitemap.xml file

sitemap.xml is a so-called site map for search engines. The sitemap.xml file contains information for search robots about the site pages that need to be indexed. The contents of the file must contain the URL addresses of the pages, but it is not necessary to indicate the priority of the pages, the frequency of page re-crawling, date and time last change pages.

It should be noted that sitemap.xml is not required, and search engines may not take it into account, but at the same time, all search engines say that having the file is desirable and helps to correctly index the site, especially if pages are created dynamically or the site has a complex structure nesting.

There is only one conclusion: that the robots.txt and sitemap.xml files are necessary. Correct setting indexing is one of the factors in placing site pages in higher places in search results, and this is the goal of any more or less serious site.

Hello, dear readers of the blog site. I decided to summarize in one article everything that I have already written about the sitemap ( Sitemap xml), which is primarily needed to indicate to search engines those pages that they should index first. It is very important and actually mandatory attribute any web project, but many either do not know this or do not attach much importance to the Site map.

Let’s dot all the i’s right away and try to separate the two concepts - site maps in xml format and in Html format (there is also a geographical interpretation of this word, which I wrote about in the article “”). The second option is regular list all materials web resource, which will be available to anyone by clicking on the corresponding menu item. This option is also useful and helps speed up and improve the indexing of your resource by search engines.

Sitemap sitemap in xml format - but I need it

But the main tool designed for direct instructions for search engines of those resource pages that will need to be indexed, there is a file called Sitemap.xml (this is its most common name, but in theory it can be called anything, it doesn’t matter), which will not be visible to visitors of your web project.

It is compiled taking into account a special syntax that is understandable search engines, where all pages to be indexed will be listed, indicating their degree of importance, date last update and approximate update frequency.

There are two main files that any web project should have - robots.txt and sitemap.xml. If your project does not have them or they are not filled out correctly, then with a high degree of probability you are seriously harming your resource and not allowing it to reveal itself to its full potential.

You, of course, may not listen to me (because I am not an authority, due to the relatively small accumulated factual material), but I think that you will not argue indiscriminately with specialists who have statistics from tens of thousands of projects at hand.

For this occasion, I had a “grand piano in the bushes” in stock. Right before writing this article, I came across a publication by experts from all over the world. known system automatic promotion under unusual name“Hands” (this is an analogue of MegaIndex, which I wrote about).

It is clear that any system similar to them is interested in the successful progress of their clients’ projects, but they can only pump up clients’ resources with link mass, and influence the content and correct technical setup Unfortunately, they cannot do websites.

Therefore, a very interesting and revealing study was conducted, which was designed to identify the 10 most popular reasons that make it difficult to promote projects and point these data directly to clients...

In first place, of course, was “not unique content"(either you copied it or you have it, which doesn’t change the essence). But in second place was precisely the sitemap in xml format, or rather its absence or inconsistency with the recognized creation format. Well, in third place was the previously mentioned robots.txt file (its absence or incorrect creation):

When you assert unfoundedly that your project must have a map (otherwise it’s a waste), it does not sound as convincing as if this statement is supported by real facts from a fairly representative study.

Okay, let’s assume that I’ve convinced you and let’s see how you can create a sitemap yourself (format syntax), how to make it for Joomla and WordPress, and also see how you can create it using special online generators.

But simply creating a sitemap is not enough to be sure that your project will be correctly indexed by search engines. It will also be necessary to make sure that search engines (in our case, Google and Yandex) find out about this very sitemap. This can be done in two ways, but we will talk about this a little later (there must be at least some kind of intrigue that holds the attention of readers).

Why do you need a map site and a robots.txt file?

Let's first try to justify the logical necessity of using both a robots.txt file that prohibits indexing certain elements your web project, and the site map file that prescribes indexing certain pages. To do this, let’s go back five or ten years ago, when most resources on the Internet were simply a set of Html files, which contained the texts of all articles.

A Google or Yandex search robot simply went to such an Html project and began indexing everything it could get its hands on, since almost everything contained the project’s content. And what is happening now, in conditions of general using CMS(content management systems)? Actually, even immediately after installing the engine, the search robot will already find several thousand files on you, and this despite the fact that you may not yet have any content (well, you haven’t written a single article yet).

And in general, content in modern CMSs, as a rule, is stored not in files, but in a database, which a search robot naturally cannot index directly (for working with databases, I recommend it for free).

It is clear that after poking around here and there, the search robots of Yandex and Google will still find your content and index it, but how quickly will this happen and How complete will the indexing be? your project is a very big question.

It is precisely to simplify and speed up the indexing of projects by search engines in the context of the widespread use of CMS that one should mandatory create robots.txt and sitemap.xml. Using the first file, you tell search engine robots which files should not waste time indexing (engine objects, for example), and you can also use it to block some pages from indexing to eliminate the effect of duplicating content, which is inherent in many CMSs (read about this more details in the article about).

And with the help of a sitemap file, you clearly and clearly tell the Yandex and Google robots what exactly your project contains content, so that they do not poke around in vain in the corners of the file management of the engine used. Do not forget that bots have certain limits on the time and number of documents viewed. It will wander through your engine files and leave, but the content will remain unindexed for a long time. Oh how.

Remember how in a famous comedy one colorful character said: “Don’t go there, you go here, otherwise...”. It is precisely the function of this character that is performed by robots.txt and the map site with xml extension to regulate the movements of search bots through the nooks and crannies of your web project. It is clear that bots can kick up, but most likely they will obediently carry out your well-written prohibiting and prescriptive (in the site map) instructions.

It's clear? Then let’s proceed directly to solving the question of how to create a sitemap.xml in various ways and how to inform about its existence to the two pillars of search in RuNet - Google and Yandex, so that they do not fumble around your project in vain, while also creating additional load on your hosting server, but this, however, is a very secondary thing, the main thing is this is precisely indexing (fast and comprehensive).

Unlike robots.txt, which you will most likely have to write yourself, a sitemap file in xml format, as a rule, they try to create in some way in an automatic way. This is understandable, because when large quantities pages on a frequently updated project, manually creating it can damage the mind of the webmaster.

Yes, this is not necessary at all, because... For almost every CMS there is an extension that will allow you to create, and when new materials appear, re-create a sitemap file. Well, or you can always use some online sitemap generator as a ready-made solution.

But still, it seems to me that it would be useful to familiarize yourself with the simple (what can I say - the simplest) syntax for creating a sitemap. In addition, on small and rarely updated projects, you can sketch it out manually.

How to create Sitemap.xml yourself in Joomla and WordPress

Usually the “Sitemap” directive is written at the very end. The next time search robots visit your web project, they will definitely look at the contents of robots.txt and download your map for study. However, in this way all sorts of radishes can find out about its existence, which the map site will help steal your content.

But there is another way to directly transmit information about the location of the site map to search engines without the mediation of robots.txt. This is done through the Yandex Webmaster interface and panels Google tools, although it can be used. Are you already familiar with these search engine tools?

If not, be sure to add your project to both , and to , and then indicate the path to your sitemap in Xml format in the appropriate tabs.

This is what the form for adding a sitemap for Yandex Webmaster looks like:

And this is what a similar form for entering a path looks like in the Google toolbar:

Online generators Sitemap Generator and XML Sitemaps

If you don’t want to look for extensions for your CMS that allow you to automatically create a map site, then you can use online generator mi. However, there is one drawback here compared to the automatic creation of a map in the CMS itself - after adding new materials, you will have to go to the online service again and re-create this file, and then upload it to your server.

Probably one of the most famous online sitemap generators is Sitemap Generator. It has quite a lot of functionality and will allow you to generate a sitemap for 1500 pages for free, which is quite a lot.

Sitemap Generator will take into account the contents of your robots.txt file so that pages that are prohibited from indexing are not included in the map. This in itself is not scary, because a ban in robots will in any case have a higher priority, but it will save you from unnecessary information V created file Site map. In order to make a map, you just need to specify the URL home page and provide your E-mail, after which you will be put in a queue for generation:

When it's your turn, you'll be notified of it. postal notification and by following the link from the letter you can download the file that Sitemap Generator made for you. All you have to do is upload it to the right place on your server. Well, you will have to repeat this procedure from time to time in order to keep your sitemap up to date.

There is a similar English-language online generator service, which you can find at this link - XML ​​Sitemaps. It has a limit of 500 pages, but otherwise everything is almost the same as described above.

Good luck to you! Before see you soon on the pages of the blog site

You can watch more videos by going to
");">

You might be interested

What's happened URL addresses, what is the difference between absolute and relative links for site
What is Chmod, what permissions to assign to files and folders (777, 755, 666) and how to do it via PHP
Yandex search by site and online store
OpenServer - modern local server and an example of its use for WordPress installations on computer
Webmaster and RuNet - who and what they are, as well as who can live well on the Russian-language Internet

    Select a site from the list.

    In the field, enter the URL where the file is available. For example, https://example.com/sitemap.xml.

    Click the Add button.

After adding the file, it is queued for processing. The robot will download it within two weeks. Each added file, including those attached to the Sitemap index file, is processed by the robot separately.

After downloading, next to each file you will see one of the statuses:

Status Description Note
"OK"
"Redirect" Remove the redirect and notify the robot about the update
"Error" The file is not formed correctly inform the robot about the update
"Not indexed"

Checking the server response

Disallow inform the robot about the update
Status Description Note
"OK" The file is formed correctly and loaded into the robot database

The date of the last download will be displayed next to the file.

Indexed pages will appear in search results within two weeks

"Redirect" The specified URL redirects to another address Remove the redirect and notify the robot about the update
"Error" The file is not formed correctly Click the Error link for details. After making changes to the file, notify the robot about the update
"Not indexed" When accessing Sitemap, the server returns an HTTP code other than 200

Check if the file is accessible to the robot using the Check Server Response tool by specifying the full path to the file.

If the file is not available, contact the administrator of the site or server on which it is located.

Access to the file is denied in robots.txt using the Disallow directive Allow access to the Sitemap and notify the robot about the update

Sitemap update

If you have changed the Sitemap file added to Yandex.Webmaster, you do not need to delete it and upload it again - the robot regularly checks the file for updates and errors.

To speed up crawling a file, click the icon. If you are using a Sitemap index file, you can start processing each file listed in it. The robot will download the data within three days. You can use the function up to 10 times for one host.

Once you have used up all attempts, the next one will be available 30 days after the first. Exact date displayed in the Webmaster interface.

Removing Sitemap

In the Yandex.Webmaster interface, you can delete those files that were added on the Sitemap Files page: If a directive was added for Sitemap in the robots.txt file, delete it. After making changes, information about the Sitemap will disappear from the robot and Yandex.Webmaster database within a few weeks.

Sitemap or site map is special file(usually has the extension xml), which contains information about all pages existing on the site. With the help of this file it is possible to make it clear search engine, which pages of the site should be indexed first, how regularly the data on the pages is updated, and the importance of indexing individual pages of the site. This greatly simplifies indexing for search robots. A SiteMap file must be present on all sites consisting of fifty pages or more.

How to create a SiteMap file online and add it to your site

Since the sitemap is an xml file, you can create it in text format, using any editor, and save with the xml extension. However independent efforts are not at all necessary, they exist on the Internet special services, with the help of which you can generate for free - automatically create a sitemap.xml file online and add it to any site. You can watch the video in more detail about the process of creating a sitemap.xml file:

[yt=QT21XhPmSSQ]

For automatic creation sitemap you need to enter the address of the desired site in the appropriate field, select the appropriate file format, determine the sequence of indexing of pages on the site, indicate the frequency of updating pages and set other parameters that interest you. After all these operations, you need to click on the “execute” button, and after a short time the code for the created sitemap will appear in the window below. You just need to copy and paste this code into the file you created in the editor sitemap.xml, save it and upload it to the root directory of your site.

But in order for this file to have the expected effect, it is not enough just to create and add a SiteMap to your website, you also need to convey it to search robot information about its availability. To do this, you need to write the path to it in the file, adding the line to it:

Sitemap: http://YASH_SITE.ru/sitemap.xml

After this, all operations are completed, your sitemap is ready to perform its functions. You just need to remember that in one created xml file should not be more than 50,000 pages, and its volume should not be more than 10 megabytes. Otherwise, you will need to create another such file.