XML Basics for Beginners. XML language practice and theory XML is a complement to HTML

Today we will begin to consider a very popular and convenient XML markup language. Since this format for presenting data is very flexible and universal, and it can be used almost anywhere, this means conscientiously with something. Therefore, a novice programmer will sooner or later have to deal with this language, and it doesn’t matter what exactly you do, be it web programming or database administration, because everyone uses XML, and you will also use it to implement the tasks you need.

We will start, as usual, with theory, let's look at what kind of language it is, why it is good, how to use it and where it is used.

XML Language Definition

XML (eXtensible Markup Language) is a universal and extensible data markup language that is independent of the operating system and processing environment. Xml is used to present certain data in the form of a structure, and you can develop this structure yourself or customize it for a particular program or service. That is why this language is called extensible, and this is its main advantage, for which it is so valued.

As you know, there are quite a lot of markup languages, for example, the HTML language, but all of them, one way or another, depend on the processor, for example, the same html, the code of which the browser parses, is standardized and not extensible, there are clear tags there, syntax that cannot be violated, and in xml you can create your own tags, i.e. your markup. The main difference between HTML and XML is that html just describes the markup for displaying data, and xml is an abstract data structure that can be processed and displayed as you wish and anywhere, and therefore there is no need to compare these languages, they have completely different purposes.

As noted above, xml is a very common and universal language, through which almost all applications, both web and just for the computer, use it as an exchange of information, since with the help of it you can very easily exchange data between applications or services that are even written different languages. In this connection, every novice programmer who is involved in absolutely any programming should have an understanding of XML. If you want to become a web master, then you simply must know XML, and we have already discussed how to become a WEB Master and what you need to know for this.

For example, I once had the task of writing a certain service that should return data in xml form upon request, i.e. a kind of development of the server part of the application, and I had no idea what the client that would process this data was written in, and that I wrote a service that returned the data in xml form and that’s it, the application worked perfectly. And this is just an example that I had to deal with, but now imagine how many different organizations collaborate and conscientiously develop software and exchange data, and I would not be surprised that this data will be in xml form.

For example, I once had a task to write a certain service that should return data in xml form upon request, i.e. a kind of development of the server part of the application, and I had no idea what the client that would process this data was written in, and that I wrote a service that returned the data in xml form and that’s it, the application worked perfectly. And this is just an example that I had to deal with, but now imagine how many different organizations collaborate and conscientiously develop software and exchange data, and I would not be surprised that this data will be in xml form.

Also, I once had to store xml data in a MS SQL 2008 database in order to better represent this data and exchange it between the server and the client part of the application, we discussed this in the article - Transact-sql - working with xml.

The XML language itself is very simple, and it is simply impossible to get confused in it; all the complexity arises precisely in the processing and interaction of XML with other applications, technologies, i.e. everything that surrounds xml, which is where you can easily get confused.

Today we are talking only about the basics of XML, and we will not focus on technologies for processing and interacting with this language, since this is true, very voluminous material, but I think in the future we will continue to get acquainted with related technologies.

Let's move on to practice. And I will write all the examples that we will consider in Notepad++ only because it is very convenient, but we will not talk about this now, since we have already discussed this in the article - What is Notepad++ good for a novice developer.

XML tags

XML language uses tags ( tags are case sensitive), but not the same tags as in html, but those that you come up with yourself, but the xml document also has a clear structure, i.e. there is an opening tag and a closing tag, there are nested tags and, of course, there are values ​​that are located in these tags. In other words, all you need for basic xml knowledge is just to follow these rules. Together, the opening, closing tag and value are called an element, and the entire xml document consists precisely of elements that together form a data structure. An xml document can only have one root element, remember this, because if you write two root elements, it will be an error.

And it’s time to give an example of xml markup, and the first example for now is for syntax:

<Начало элемента> <Начало вложенного элемента>Nested element value

As you can see, everything is quite simple, and there can be a lot of such elements nested within each other.

Now let's give an example of a real xml document:

As you can see, I just gave an example of a kind of book catalog here, but I did not declare this document, i.e. I didn’t write an XML declaration that tells the application that will process this data that the XML data is located here and in what encoding it is presented. You can also write comments and attributes, so let's give an example of such a document:

Book 1 Ivan Just book 1 Book 2 Sergey Just book 2 Book 3 Novel Just book 3

Where the first line is the declaration that this is an XML document and must be read in UTF-8 encoding.

This data without processing will look, for example, in a browser (Mozilla Firefox) as follows:

I hope you understand that here catalog is the root element, which consists of the book elements, which in turn consists of the name, author and comment elements, and for the example, I also set several attributes for the catalog element and the book element.

For the basics, I think that's enough, because if we dive deeper and deeper into XML, and into all the technologies that are associated with this language, then this article will never end. So that's all for today. Bye!

XML is a very popular and flexible format nowadays. Every programmer should understand it, it's simply a must have. Many technologies today are actively using it, and modern ones are among them.

Introduction

Hello, dear readers of my article. I want to say right away that this is only the first article in my series of three articles. The main goal of the entire series is to initiate each reader into XML and give, if not a complete explanation and understanding, then at least a good push towards it, explaining the main points and things. The entire cycle will be for one nomination – “Attention to detail”, and the division into 3 articles was done in order to fit into the character limit in posts and divide a large amount of material into smaller portions for greater understanding. The first article will be devoted to XML itself and what it is, as well as one of the ways to create a schema for XML files - DTD. To begin with, I would like to make a small preface for those who are not yet familiar with XML: there is no need to be scared. XML is not very complicated and should be understood by any programmer, as it is a very flexible, efficient and popular file format today for storing a variety of information that you want. XML is used in Ant, Maven, Spring. Any programmer needs knowledge of XML. Now that you have gathered the strength and motivation, let's start studying. I will try to lay out all the material as simply as possible, collecting only the most important and not going into the weeds.

XML

For a clearer explanation, it would be better to visualize the XML with an example.< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee> < name> Maxim< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Ivan< / name> < job> Junior Software Developer< / job> < / employee> < employee> < name> Franklin< / name> < job> Junior Software Developer< / job> < / employee> < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee> < name> Herald< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Adam< / name> < job> Middle Software Developer< / job> < / employee> < employee> < name> Leroy< / name> < job> Junior Software Developer< / job> < / employee> < / employees> < / office> < / offices> < / company>HTML and XML are similar in syntax because they have a common parent - SGML. However, in HTML there are only fixed tags of a specific standard, while in XML you can create your own tags, attributes and, in general, do whatever you want to store data in the way that suits you. In fact, XML files can be read by anyone who knows English. This example can be depicted using a tree. Tree root– Company. It is also the root (root) element from which all other elements come. Each XML file can only have one root element. It should be announced after xml file declaration(the first line in the example) and contain all other elements. A little about the declaration: it mandatory and is needed to identify the document as XML. It has three pseudo-attributes (special predefined attributes): version (according to the 1.0 standard), encoding (encoding) and standalone (autonomy: if yes and external schemes are connected to the document, then there will be an error, the default is no). Elements are entities that store data using other elements and attributes. Attributes– this is additional information about the element, which is specified when adding an element. If we translate the explanation into an OOP field, we can give the following example: we have a car, each car has characteristics (color, capacity, brand, etc.) - these are attributes, and there are entities that are inside the car: doors, windows, engine , the steering wheel are other elements. You can store properties either as individual elements or as attributes, depending on your desire. After all, XML is an extremely flexible format for storing information about anything. After the explanations, we just need to look at the example above for everything to fall into place. In the example, we described a simple company structure: there is a company that has a name and offices, and in the offices there are employees. The Employees and Offices elements are wrapper elements - they serve to collect elements of the same type, essentially combining them into one set for ease of processing. Floor and room deserve special attention. These are the attributes of the office (floor and number), in other words, its properties. If we had an “image” element, then we could transfer its dimensions. You may notice that company does not have a name attribute, but does have a name element. You can simply describe structures the way you want. Nobody obliges you to write all the properties of elements only in attributes; you can use just elements and write some data inside them. For example, we can record the name and position of our employees as attributes:< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee name= "Maksim" job= "Middle Software Developer" > < / employee> < employee name= "Ivan" job= "Junior Software Developer" > < / employee> < employee name= "Franklin" job= "Junior Software Developer" > < / employee> < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" > < / employee> < employee name= "Adam" job= "Middle Software Developer" > < / employee> < employee name= "Leroy" job= "Junior Software Developer" > < / employee> < / employees> < / office> < / offices> < / company>As you can see, now the name and position of each employee are his attributes. And you can notice that there is nothing inside the employee entity (tag), all employee elements are empty. Then you can make employee an empty element - close it immediately after declaring the attributes. This is done quite simply, just add a slash:< ? xml version= "1.0" encoding= "UTF-8" ? > < company> < name> IT Heaven< / name> < offices> < office floor= "1" room= "1" > < employees> < employee name= "Maksim" job= "Middle Software Developer" / > < employee name= "Ivan" job= "Junior Software Developer" / > < employee name= "Franklin" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company>As you can see, by closing the empty elements, we preserved the entire integrity of the information and greatly shortened the record, making the information more concise and readable. To add a comment(text that will be skipped when parsing the file) in XML, there is the following syntax:< ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- >And the last design is CDATA , means "character data". Thanks to this design, it is possible to write text that will not be interpreted as XML markup. This is useful if you have an entity inside the XML file that stores XML markup in the information. Example:< ? xml version= "1.0" encoding= "UTF-8" ? > < bean> < information> < ! [ CDATA[ < name> Ivan< / name> < age> 26 < / age> ] ] > < / information> < / bean>The thing about XML is that you can extend it however you want: use your own elements, your own attributes, and structure it as you wish. You can use both attributes and elements to store data (as was shown in the example earlier). However, you need to understand that you can come up with your own elements and attributes on the fly and however you want, but what if you work on a project where another programmer wants to move the name element into attributes, and your entire program logic is written so that name was an element? How can you create your own rules about what elements should be, what attributes they have, and other things, so that you can validate XML files and be sure that the rules will become standard in your project and no one will violate them? In order to write all the rules of your own XML markup, there are special tools. The most famous: DTD and XML Schema. This article will only talk about the first.

DTD

DTD is created to describe types of documents. DTD is already becoming obsolete and is now being actively abandoned in XML, but there are still many XML files that use DTD and, in general, it is useful to understand. DTD is a technology for validating XML documents. A DTD declares specific rules for a document type: its elements, what elements can be inside the element, attributes, whether they are required or not, the number of their repetitions, as well as Entities. Similar to XML, a DTD can be visualized with an example for a clearer explanation.< ! -- Объявление возможных элементов -- > < ! ELEMENT employee EMPTY> < ! ELEMENT employees (employee+ ) > < ! ELEMENT office (employees) > < ! ELEMENT offices (office+ ) > < ! ELEMENT name (#PCDATA) > < ! ELEMENT company (name, offices) > < ! -- Добавление атрибутов для элементов employee и office -- > < ! ATTLIST employee name CDATA #REQUIRED job CDATA #REQUIRED > < ! ATTLIST office floor CDATA #REQUIRED room CDATA #REQUIRED > < ! -- Добавление сущностей -- > < ! ENTITY M "Maksim" > < ! ENTITY I "Ivan" > < ! ENTITY F "Franklin" >Here we have such a simple example. In this example, we declared our entire hierarchy from the XML example: employee, employees, office, offices, name, company. To create DTD files, there are 3 main constructions used to describe any XML files: ELEMENT (to describe elements), ATTLIST (to describe attributes for elements) and ENTITY (to substitute text with abbreviated forms). ELEMENT Used to describe the element. Elements that can be used within a described element are listed in parentheses in list form. You can use quantifiers to indicate quantity (they are similar to quantifiers from regular expressions): + means 1+ * means 0+ ? means 0 OR 1 If no quantifiers were added, then it is considered that there should be only 1 element. If we needed one of a group of elements, we could write it like this:< ! ELEMENT company ((name | offices) ) >Then one of the elements would be selected: name or offices, but if there were two of them inside the company, then the validation would not pass. You can also notice that in employee there is the word EMPTY - this means that the element must be empty. There is also ANY - any elements. #PCDATA – text data. ATTLIST Used to add attributes to elements. After ATTLIST follows the name of the desired element, and after it a dictionary of the form “attribute name - attribute type”, and at the end you can add #IMPLIED (optional) or #REQUIRED (required). CDATA – text data. There are other types, but they are all lowercase. ENTITY ENTITY is used to declare abbreviations and the text that will be placed on them. In fact, we will simply be able to use in XML, instead of the full text, just the name of the entity with the & sign before the and; after. For example: to differentiate between HTML markup and just characters, the left angle bracket is often escaped with lt; , you just need to put & before lt. Then we will not use markup, but simply a symbol< . Как вы можете видеть, все довольно просто: объявляете элементы, объясняете, какие элементы объявленные элементы способны содержать, добавление атрибутов этим элементам и, по желанию, можете добавить сущности, чтобы сокращать какие-то записи. И тут вы должны были бы спросить: а как использовать наши правила в нашем XML файле? Ведь мы просто объявили правила, но мы не использовали их в XML. There are two ways to use them in XML: 1. Implementation - writing DTD rules inside the XML file itself, just write the root element after the DOCTYPE keyword and enclose our DTD file inside square brackets. < ? xml version= "1.0" encoding= "UTF-8" ? > < ! DOCTYPE company [ < ! -- Объявление возможных элементов -- > < ! ELEMENT employee EMPTY> < ! ELEMENT employees (employee+ ) > < ! ELEMENT office (employees) > < ! ELEMENT offices (office+ ) > < ! ELEMENT name (#PCDATA) > < ! ELEMENT company (name, offices) > < ! -- Добавление атрибутов для элементов employee и office -- > < ! ATTLIST employee name CDATA #REQUIRED job CDATA #REQUIRED > < ! ATTLIST office floor CDATA #REQUIRED room CDATA #REQUIRED > < ! -- Добавление сущностей -- > < ! ENTITY M "Maksim" > < ! ENTITY I "Ivan" > < ! ENTITY F "Franklin" > ] > < company> < name> IT Heaven< / name> < ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- > < offices> < office floor= "1" room= "1" > < employees> < employee name= "&M;" job= "Middle Software Developer" / > < employee name= "&I;" job= "Junior Software Developer" / > < employee name= "&F;" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company> 2. Import - we write all our rules in a separate DTD file, after which in the XML file we use the DOCTYPE construction from the first method, only instead of square brackets you need to write SYSTEM and specify an absolute or relative path to the current location of the file. < ? xml version= "1.0" encoding= "UTF-8" ? > < ! DOCTYPE company SYSTEM "dtd_example1.dtd" > < company> < name> IT Heaven< / name> < ! -- Иван недавно уволился, только неделю отработать должен. Не забудьте потом удалить его из списка. -- > < offices> < office floor= "1" room= "1" > < employees> < employee name= "&M;" job= "Middle Software Developer" / > < employee name= "&I;" job= "Junior Software Developer" / > < employee name= "&F;" job= "Junior Software Developer" / > < / employees> < / office> < office floor= "1" room= "2" > < employees> < employee name= "Herald" job= "Middle Software Developer" / > < employee name= "Adam" job= "Middle Software Developer" / > < employee name= "Leroy" job= "Junior Software Developer" / > < / employees> < / office> < / offices> < / company>You can also use the PUBLIC keyword instead of SYSTEM, but it is unlikely to be useful to you. If you are interested, you can read about it (and about SYSTEM too) in detail here: link. Now we can't use other elements without declaring them in the DTD, and all XML is subject to our rules. You can try to write this code in IntelliJ IDEA in a separate file with the .xml extension and try adding some new elements or removing an element from our DTD and notice how the IDE will indicate an error to you. However, DTD has its disadvantages:
  • It has its own syntax, different from xml syntax.
  • A DTD has no data type checking and can only contain strings.
  • There is no namespace in a DTD.
About the problem of your own syntax: you must understand two syntaxes at once: XML and DTD syntax. They are different and this may make you confused. Also, because of this, it is more difficult to track errors in huge XML files in conjunction with the same DTD schemas. If something doesn’t work for you, you have to check a huge amount of text with different syntaxes. It's like reading two books at the same time: in Russian and English. And if your knowledge of one language is worse, then understanding the text will be just as difficult. About the problem of data type checking: attributes in DTDs do have different types, but they are all, at their core, string representations of something, lists or links. However, you cannot demand only numbers, and especially not positive or negative ones. And you can completely forget about object types. The last problem will be discussed in the next article, which will be devoted to namespaces and XML schemas, since discussing it here is pointless. Thank you all for your attention, I have done a lot of work and continue to do it to finish the entire series of articles on time. Basically, I just have to figure out the XML schemas and come up with an explanation of them in clearer words to finish the 2nd article. Half of it is already done, so you can expect it soon. The last article will be entirely devoted to working with XML files using Java. Good luck to everyone and success in programming :) Next article:

Today, it has become obvious to all specialists in the field of web technologies that existing standards for data transmission over the Internet are not enough. The HTML format, having once become a breakthrough in the field of displaying the content of Internet sites, no longer satisfies all the currently necessary requirements. It allows you to describe how data should be displayed on the end user's screen, but does not provide any means to effectively describe and manage the data being transmitted.

In addition, a stumbling block for many software development companies is the need to share different components, ensure their interaction, and the ability to exchange data between them.

Until recently, there was no standard that provided tools for intelligent information retrieval, data exchange, and adaptive processing of received data.

The solution to all the problems described above was the XML language approved in 1998 by the international organization W3C (EN). XML (eXtensible Markup Language) is an extensible markup language designed to describe structured data in text form. This text-based format, much like HTML, is designed specifically for storing and transmitting data.

XML allows you to describe and transmit structured data such as:

  • separate documents;
  • metadata describing the content of an Internet site;
  • objects that contain data and methods for working with it (for example, ActiveX controls or Java objects);
  • individual records (for example, the results of executing database queries);
  • all kinds of web links to information and human resources on the Internet (email addresses, hypertext links, etc.).

Creating XML Documents

Data described in XML is called XML documents. XML is easy to read and simple enough to understand. If you were familiar with HTML, then learning how to compose XML documents will not be difficult for you.

The source text of an XML document consists of a set of XML elements, each of which contains a start and end tag. Each pair of tags represents a piece of data. That is, like HTML, XML uses tags to describe data. But unlike HTML, XML allows for an unlimited set of tag pairs, each of which represents not what the data it contains should look like, but what it means.

Good morning NEWS TV series Gentle Poison Field of Miracles (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS Finest hour NEWS Weather GOOG night kids TIME Sight

This text can be created in plain text format and saved in a file with an XML extension.

Any element of an XML document can have attributes that specify its characteristics. An attribute is a name="value" pair that is specified when defining an element in the start tag. In the example above, the element has a date="December 25" attribute, and the element - name="ORT" attribute.

The principle of extensibility of the XML language is the ability to use an unlimited number of tag pairs, defined by the creator of the XML document. For example, the above description of the TV program schedule can be expanded to include information about the broadcast region and the program schedule of the RTR channel. In this case, the XML description will take the form:

Russia Saint Petersburg Good morning NEWS TV series Gentle Poison Field of Miracles (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS Finest hour NEWS Weather GOOG night kids TIME Sight M. f. Weather RTR Mail Good morning Country! My own director Purple Haze GOLDEN KEY Federation Secret agents Boyarsky Dvor My family Full house NEWS ASTEROID (USA) DINNER AT FRED'S (USA) Weather

Now from this XML description you can extract the TV program program of the ORT and RTR channels for December 25 in the city of St. Petersburg, Russia.

The principle of independence of determining the internal structure of a document from the methods of presenting this information is to separate data from the process of their processing and display. Thus, the obtained data can be used in accordance with the client’s needs, that is, select the desired design, apply the necessary processing methods.

You can control the display of elements in the client program window (for example, in a browser window) using special instructions - XSL (eXstensible Stylesheet Language) style sheets. These XSL tables allow you to define the appearance of an element depending on its location within the document, meaning that two elements with the same name can have different formatting rules applied. Additionally, the underlying language of XSL is XML, which means that XSL tables are more versatile, and DTDs or data schemas, discussed below, can be used to control the correctness of such style sheets.

The XML format, compared to HTML, has a small set of simple parsing rules that allows you to parse XML documents without resorting to any external descriptions of the XML elements used. In general, XML documents must satisfy the following requirements:

  • Each opening tag that defines some part of the data in the document must be accompanied by a closing tag, that is, unlike HTML, closing tags cannot be omitted.
  • The nesting of tags in XML is strictly controlled, so it is necessary to monitor the order of opening and closing tags.
  • XML is case sensitive.
  • All information between the start and end tags is treated as data in XML, and therefore all formatting characters are taken into account (that is, spaces, newlines, tabs are not ignored, as in HTML).
  • XML has a set of reserved characters that must only be specified in a specific way in an XML document. Such symbols and the character sets that define them are:
    < <
    & &
    > >
    " "
    " "
  • Every XML document must have a unique root element. In our example, such an element is the element .
  • All attribute values ​​used in tag definitions must be enclosed in quotation marks.

If an XML document does not violate the above rules, then it is called formally correct.

Today, there are two ways to control the correctness of an XML document: DTD (Document Type Definition) and data schema (Semantic Schema). If an XML document is created and sent using DTDs or Schemas, then it is called valid.

Scheme is a way of creating rules for constructing XML documents, that is, specifying valid names, types, attributes and relationships of elements in an XML document. Schemas are an alternative way to create rules for constructing XML documents. Compared to DTD descriptions, schemas have more powerful tools for defining complex data structures, provide a clearer way to describe the grammar of a language, and can be easily modernized and extended. An undoubted advantage of schemas is that they allow you to describe rules for an XML document using XML itself. From this point of view, XML can be called self-describing.

Because XML elements used in the same document may come from different XML schemas, element naming conflicts may occur. Name spaces solve this problem. Namespaces allow you to distinguish between elements that have the same name but have different meanings. However, they do not define how such elements are processed - that is done by the XML parsers discussed below.

To more clearly understand the purpose and possibilities of using XML schemas, we present a diagram for the example of a TV program discussed above.

This XML schema must be saved in the TV-ProgramSchema.XML file. The root element of this XML file is the element , whose attributes are the name of the TV-ProgramSchema schema and a reference to namespaces that define the built-in data types used in this schema: xmlns="urn:schemas-microsoft-com:xml-data" . The minOccurs and maxOccurs attributes of the elements of this scheme set the minimum and maximum possible number of such elements in the scheme, respectively. For example, the line means that the number of elements of the item type (that is, the TV shows themselves) in the circuit can be from 0 to infinity.

If you use the above scheme to control the correctness of the XML description of a TV program, then you must indicate the scheme used in the header of the XML document. Then the XML description of the ORT channel TV program will look like this:

Russia Saint Petersburg Good morning NEWS TV series Gentle Poison Field of Miracles (repeat) M. f. Health NEWS Enjoy Your Bath! M. f. Together NEWS Finest hour NEWS Weather GOOG night kids TIME Sight

Now the root element This XML description has an attribute xmlns="x-schema:TV-ProgramSchema.xml", which is a link to the XML schema used.

Parsing XML Documents

Obtaining data from an XML document, as well as checking the correctness of XML documents is provided analyzers(parsers) XML documents. If an XML document is formally correct, then all analyzers designed to parse XML documents will be able to work with it correctly.

Since the use of DTD in XML is not mandatory, any formally correct document can be recognized and parsed by a program designed to parse XML documents. For example, any XML description given in this document is formally correct, so any XML parser will recognize it correctly.

If the input to the XML parser is an XML document that uses an XML schema, then it will be parsed, checked for correctness and compliance with the schema. For example, an XML description of a TV program on the RTR channel using the TV-ProgramSchema.xml schema will be considered formally correct and valid.

XML analyzers allow, if the language constructs specified in the document are syntactically correct, to correctly extract the document elements they define and transfer them to the application program that performs the necessary display actions. That is, after parsing an XML document in most cases, the application program is provided with an object model that displays the contents of the resulting XML document and the tools necessary to work with it (traversing the tree of elements).

Since XML, unlike HTML, does not in any way define the way the document elements described with its help are displayed and used, the XML parser is given the opportunity to choose the desired design.

As mentioned, you can use XSL tables to define the appearance of XML elements. The principle of processing XML documents using style sheets is as follows: when parsing an XSL document, the analyzer program processes the instructions of this language and assigns to each element found in the XML tree a set of tags that determine the formatting of this element. In other words, using XSL tables, a formatting template for XML elements is specified, and this template itself can have the structure of the corresponding fragment of the XML document. XSL instructions define the exact location of an XML element in the tree, so it is possible to apply different styling to the same elements, depending on the context in which they are used.

Some parsers base the way they represent document structure on the Document Object Model (DOM) specification, allowing the use of a strict hierarchical DOM when creating XML documents.

An example of an XML parser is the MSXML XML parser built into Microsoft Internet Explorer version 5.0. It allows you to read data from an XML file, process it, generate a tree of elements, display data using XSL style sheets, and also, using the DOM, represent all data elements as objects.

Using XML

Many experts view XML as a new technology for integrating software components. The main benefits of using XML are:

  • Integration of data from various sources. XML can be used to combine heterogeneous structured data at the middle level of three-tier web systems, databases.
  • Local data processing. The received data in XML format can be parsed, processed and displayed directly on the client without additional calls to the server.
  • Viewing and manipulating data from various perspectives. The received data can be processed and viewed by the client in various ways depending on the needs of the end user.
  • Possibility of partial data update. With XML, you can update only the portion of structured data that has changed, rather than the entire structure.

All these advantages make XML an indispensable tool for developing flexible database retrieval tools, powerful three-tier web applications, and transaction-enabled applications. In other words, using XML, you can form queries against databases of various structures, which allows you to search for information in numerous databases that are incompatible with each other. The use of XML at the middle tier of three-tier web applications allows for efficient data exchange between clients and servers of e-commerce systems.

In addition, XML can be used as a means to describe the grammar of other languages ​​and control the correctness of documents.

Tools for processing data received in XML format can be developed in Visual Basic, Java or C++.

Lucinda Dykes, Ed Tittel

XML is a markup language that creates web pages. Before you start using XML, learn the difference between a valid and a well-formed document, how to create DTD (Document Type Definition) elements, and basic schema declarations for creating an XML document. You'll also want to understand regularly used reserved characters, as well as which web browsers best support XML and style sheets.

Valid vs. Well Formed XML Document

In XML, a valid document must follow the rules in its DTD (document type definition) or schema, which defines what elements can appear in the document and how elements can fit within each other. If a document is poorly formed, it doesn't get very far in the XML world, so you need to play by some very simple rules when creating an XML document. A well formed document must have the following components:

    All start and end tags are the same. In other words, the opening and closing parts must always contain the same name in the same case: ... or ..., but not ....

    Empty elements follow special XML syntax, for example .

    All attribute values ​​occur within single or double quotes: id="value"> or .

Rules for creating a Document Type Definition or DTD, Elements

Basically, you prepare and use a Document Type Definition (DTD) to add structure and logic, making it easy to ensure that all the necessary functionality is present - in the correct order - in your XML document. You can develop many rules in a DTD that control how elements can be used in an XML document.

SymbolMeaningExample
#PCDATAContains parsed character data or text
#PCDATA, element-nameContains text and another element; #PCDATA always appears first in the rulechild) *>
, (comma)Must be used in this orderchild3)>
| (pipe panel)Use only one element from the options providedchild3)>
element-name (by itself)Use only one name
element name? child3?)>
element-name +Use one or more timeschild3)>
element name *Use once, many times or not at allchild3)>
() Indicates groups; can be nestedor
child4)>

Basic XML Schema Declarations

An XML Schema document is built from a series of declarations that provide very detailed information and ensure that the information contained in the XML document is in the correct form.

AnnouncementPurposeSyntax
SchemeSpecifies the language that the schema usesxmlns:xsd="//www.w3.org/2001/XMLSchema">
ElementDefines an element
AttributeDefines an attributetype="type">
Complex typeDefines an element that contains other elements, contains attributes
or contains mixed content (elements and text)
Simple typeCreates a restricted data type for an element or attribute
meaning
Serial linkerIndicates that attributes or elements in a complex type
must be listed in order
Choice compositorIndicates that any of the attributes or elements in a complex type can be used
All composerIndicates that any or all attributes or elements in a complex type can be used
annotationContains documentation and/or appInfo elements that provide
additional information and comments on the schema document
DocumentationProvides readable information in the annotation
Application informationProvides machine-readable information within
abstract

Common reserved characters in XML

Some objects are reserved for internal use in XML and should be replaced by symbolic links in your content. These five commonly used internal objects are already defined as part of XML and are ready to use:

CSS1?

XSLT 1.0?YesYesNoNo
Internet Explorer 6.0 Yes Yes Yes Yes
Mozilla 1.7.5 Yes Yes Yes Yes
Mozilla Firefox 1.0 Yes Yes Yes Yes
Netscape Navigator 7 Yes Yes Yes Yes
Opera 7 Yes Yes Yes No