Sommerville, Ian. Software Engineering - file n1.doc. III.03.2. a File server applications. Goals of Distributed Operations

In the previous chapter, we looked at tightly coupled multiprocessor systems with shared memory, shared kernel data structures, and a shared pool from which processes are called for execution. Often, however, it is desirable, in order to ensure resource sharing, to distribute processors so that they are independent of the operating environment and operating conditions. Suppose, for example, a user of a personal computer needs to access files located on a larger machine, but at the same time maintain control over the personal computer. Although some programs such as uucp support network file transfer and other network functions, their use will not be hidden from the user because the user knows that he is working on a network. In addition, it should be noted that programs like text editors, they do not work with deleted files, as with regular ones. Users should have the standard UNIX system functionality and, with the exception of possible performance penalties, should not experience any machine boundary violations. So, for example, the work of the open and read system functions with files on remote machines should not differ from their work with files belonging to local systems.

The distributed system architecture is shown in Figure 13.1. Each computer shown in the figure is a self-contained module consisting of a CPU, memory, and peripherals. Compliance with the model is not violated even though the computer does not have a local file system: it must have peripheral devices for communication with other machines, and all files belonging to it can be located on another computer. The physical memory available to each machine is independent of processes running on other machines. This feature distinguishes distributed systems from the tightly coupled multiprocessor systems discussed in the previous chapter. Accordingly, the system kernel on each machine functions independently of external conditions operation of a distributed environment.

Figure 13.1. Model of a system with a distributed architecture


Distributed systems, well described in the literature, are traditionally divided into the following categories:

Peripheral systems, which are groups of machines that have a distinct commonality and are connected to one (usually larger) machine. Peripheral processors share their workload with the central processor and forward all calls to the operating system to it. The purpose of the peripheral system is to increase overall network performance and to allow the processor to be dedicated to a single process in a UNIX operating environment. The system runs as a separate module; Unlike other distributed system models, peripheral systems do not have real autonomy, except in cases related to process dispatch and local memory allocation.

Distributed systems like "Newcastle", allowing remote communication using the names of remote files in the library (the name is taken from the article "The Newcastle Connection" - see). Deleted files have a specification (component name) that contains special characters or special characters in the search path. additional component name preceding the file system root. Implementation of this method does not involve making changes to the core of the system, as a result it is simpler than other methods discussed in this chapter, but less flexible.

Absolutely “transparent” distributed systems, in which to access files located on other machines, it is enough to specify their standard compound names; It is the kernel's responsibility to recognize these files as deleted. The search paths for files, specified in their qualified names, cross machine boundaries at mount points, no matter how many such points were formed when mounting file systems on disks.

In this chapter we will look at the architecture of each model; all information provided is not based on the results of specific developments, but on information published in various technical articles. This assumes that addressing, routing, flow control, error detection and correction are the responsibility of protocol modules and device drivers, in other words, that each model is independent of the network being used. The examples of using system functions given in the next section for peripheral systems work similarly for systems like Newcastle and for completely transparent systems, which will be discussed later; Therefore, we will look at them in detail once, and in the sections devoted to other types of systems, we will focus mainly on the features that distinguish these models from all others.

13.1 PERIPHERAL PROCESSORS

The peripheral system architecture is shown in Figure 13.2. The purpose of this configuration is to improve overall network performance by redistributing running processes between the central and peripheral processors. Each peripheral processor has no other local peripherals at its disposal than those it needs to communicate with the central processor. The file system and all devices are at the disposal of the central processor. Let's assume that all user processes run on a peripheral processor and do not move between peripheral processors; Once transferred to the processor, they remain on it until completion. The peripheral processor contains a lightweight version of the operating system designed to process local calls to the system, manage interrupts, memory allocation, work with network protocols and with the device driver for communication with the central processor.

When the system is initialized on the central processor, the kernel loads the local operating system on each of the peripheral processors via communication lines. Any process running on the periphery is associated with a satellite process belonging to the central processor (see); When a process running on a peripheral processor calls a system function that requires the services of the central processor exclusively, the peripheral process communicates with its companion and the request is sent to the central processor for processing. The satellite process executes the system function and sends the results back to the peripheral processor. The relationship between a peripheral process and its companion is similar to the client-server relationship we discussed in detail in Chapter 11: the peripheral process acts as a client of its companion, which supports file system functions. In this case, the remote server process has only one client. In Section 13.4 we'll look at server processes that have multiple clients.


Figure 13.2. Peripheral System Configuration


Figure 13.3. Message formats

When a peripheral process calls a system function that can be processed locally, the kernel does not need to send a request to the companion process. So, for example, in order to obtain additional memory a process can call the sbrk function for local execution. However, if the services of the central processing unit are required, for example, to open a file, the kernel encodes information about the parameters passed to the called function and the process's execution conditions into a message sent to the companion process (Figure 13.3). The message includes a sign indicating that the system function is being executed by a companion process on behalf of the client, parameters passed to the function, and data about the process execution environment (for example, user and group identification codes), which different functions are different. The remainder of the message is variable length data (such as a qualified file name or data to be written by the write function).

The satellite process waits for requests from the peripheral process to arrive; When a request is received, it decodes the message, determines the type of system function, executes it, and converts the results into a response that is sent to the peripheral process. The response, in addition to the results of the system function, includes an error message (if any), a signal number, and a variable-length data array containing, for example, information read from a file. The peripheral process pauses until it receives a response; upon receiving it, it decrypts it and transmits the results to the user. This is the general scheme for processing calls to the operating system; Now let's move on to a more detailed consideration of individual functions.

To explain how the peripheral system works, we'll look at a number of functions: getppid, open, write, fork, exit, and signal. The getppid function is quite simple as it relates to simple forms request and response exchanged between peripheral and central processors. The kernel on the peripheral processor generates a message indicating that the requested function is the getppid function, and sends the request to the central processor. The satellite process on the central processor reads the message from the peripheral processor, deciphers the type of the system function, executes it, and receives the identifier of its parent. It then generates a response and transmits it to a peripheral process waiting at the other end of the communication line. When the peripheral processor receives a response, it passes it on to the process that called the getppid system function. If the peripheral process stores data (such as the parent process ID) in local memory, it does not have to communicate with its companion at all.

If the open system function is called, the peripheral process sends a corresponding message to its companion, which includes the file name and other parameters. If successful, the satellite process allocates an index and entry point to the file table, allocates a user file descriptor table entry in its space, and returns the file descriptor to the peripheral process. All this time, at the other end of the communication line, the peripheral process is waiting for a response. He does not have any structures at his disposal that would store information about the file being opened; The handle returned by open is a pointer to an entry in the user file descriptor table owned by the companion process. The results of the function are shown in Figure 13.4.


Figure 13.4. Calling the open function from a peripheral process

If the write system function is called, the peripheral processor generates a message consisting of the write function attribute, the file descriptor, and the amount of data being written. Then, from the space of the peripheral process, it copies the data to the satellite process along the communication line. The satellite process decrypts the received message, reads the data from the communication line and writes it to the corresponding file (the descriptor contained in the message is used as a pointer to the index and entry of which in the file table); All of these actions are performed on the central processor. At the end of its work, the satellite process sends a message to the peripheral process confirming the receipt of the message and containing the number of bytes of data successfully copied to the file. The read operation is similar; the satellite informs the peripheral process about the number of bytes actually read (in the case of reading data from a terminal or from a channel, this number does not always coincide with the number specified in the request). To perform both functions, it may be necessary to send information messages multiple times over the network, which is determined by the volume of data sent and the size of the network packets.

The only function that requires modification when running on the CPU is the fork system function. When a process executes this function on the CPU, the kernel selects a peripheral processor for it and sends a message to a special process - the server, informing the latter that it is about to start unloading the current process. Assuming that the server has accepted the request, the kernel uses the fork function to create a new peripheral process, allocating a process table entry and address space. The CPU offloads a copy of the process that called the fork function to the peripheral processor, overwriting the newly allocated address space, spawns a local satellite to communicate with the new peripheral process, and sends a message to the peripheral to initialize the program counter for the new process. The satellite process (on the CPU) is a child of the process that called the fork function; The peripheral process is technically a child of the server process, but logically it is a child of the process that called the fork function. The server process has no logical connection with its child after the fork function completes; The server's only job is to assist in unloading the child. Because of the strong coupling between system components (peripheral processors do not have autonomy), the peripheral process and the satellite process have the same identification code. The relationship between processes is shown in Figure 13.5: the continuous line shows the parent-child relationship, the dotted line shows the relationship between equal partners.


Figure 13.5. Executing a fork on the CPU

When a process executes the fork function on a peripheral processor, it sends a message to its companion on the CPU, which then executes the entire sequence of actions described above. The satellite selects a new peripheral processor and makes the necessary preparations for unloading the image of the old process: it sends a request to the parent peripheral process to read its image, in response to which the transmission of the requested data begins at the other end of the communication channel. The satellite reads the transmitted image and rewrites it to the peripheral descendant. When the image unloading is completed, the satellite process executes the fork function, creating its child on the CPU, and passes the value of the program counter to the peripheral child so that the latter knows at which address to start execution. Obviously, it would be better if the child of the satellite process was assigned to the peripheral child as a parent, but in our case, the spawned processes get the opportunity to run on other peripheral processors, not just the one on which they were created. The relationship between processes after the fork function is completed is shown in Figure 13.6. When the peripheral process completes its work, it sends a corresponding message to the satellite process and it also terminates. The initiative to shut down cannot come from a companion process.


Figure 13.6. Fork a peripheral processor

In both multiprocessor and uniprocessor systems, a process must respond to signals in the same way: the process either terminates the system function before checking the signals, or, conversely, upon receiving a signal, immediately wakes up from the suspended state and abruptly interrupts the system function, if this is consistent with the priority. with which he was suspended. Because a satellite process performs system functions on behalf of a peripheral process, it must respond to signals in coordination with the latter. If on a single-processor system a signal causes a process to terminate a function abnormally, a companion process on a multiprocessor system SHOULD behave in the same way. The same can be said when a signal causes a process to terminate its work using the exit function: the peripheral process exits and sends a corresponding message to the companion process, which, of course, also exits.

When a peripheral process calls the signal system function, it stores current information in local tables and sends a message to its companion informing it whether the specified signal should be accepted or ignored. The companion process is indifferent between signal interception and the default action. How a process reacts to a signal depends on three factors (Figure 13.7): whether the signal arrives while the process is executing a system function, whether the signal function is used to indicate that the signal is to be ignored, and whether the signal originates on the same peripheral processor or on some other processor. Let's move on to consider various possibilities.


sighandle algorithm /* signal processing algorithm */
if (the current process is someone's companion or has a prototype)
if (signal ignored)
if (signal received during execution of system function)
put a signal in front of the satellite process;
send a signal message to a peripheral process;
else ( /* peripheral process */
/* whether a signal was received during the execution of a system function or not */
send a signal to a companion process;
algorithm satellite_end_of_syscall /* end a system function called by a peripheral process */
input information: none
output information: none
if (an interrupt occurred while a system function was executing)
send an interrupt message or signal to a peripheral process;
else /* execution of the system function was not interrupted */
send response: turn on a flag indicating the arrival of a signal;

Figure 13.7. Signal processing in peripheral system


Let's say that a peripheral process has suspended its work while a satellite process is executing a system function on its behalf. If a signal occurs elsewhere, the satellite process detects it before the peripheral process does. There are three possible cases.

1. If, while waiting for some event, the satellite process did not go into a suspended state, from which it would exit upon receiving a signal, it executes the system function to the end, sends the results of execution to the peripheral process and shows which of the signals it received.

2. If the process has instructed to ignore the signal of this type, the satellite continues to follow the algorithm for executing the system function without leaving the suspended state via longjmp. The response sent to the peripheral process will not contain a message indicating that the signal was received.

3. If, upon receiving a signal, a satellite process interrupts the execution of a system function (via longjmp), it informs the peripheral process about this and tells it the signal number.

The peripheral process searches the received response for information about receiving signals and, if any are detected, processes the signals before exiting the system function. Thus, the behavior of a process on a multiprocessor system is exactly the same as its behavior on a single-processor system: it either exits without exiting kernel mode, or calls a user-defined signal-handling function, or ignores the signal and successfully completes the system function.


Figure 13.8. Interrupt while executing a system function

Suppose, for example, that a peripheral process calls a read function from a terminal associated with the central processor and suspends itself while the companion process executes the function (Figure 13.8). If the user presses the break key, the CPU core sends a corresponding signal to the companion process. If the satellite was in a suspended state waiting for a piece of data to be input from the terminal, it immediately exits this state and stops executing the read function. In its response to a request from a peripheral process, the satellite reports an error code and a signal number corresponding to the interrupt. The peripheral process analyzes the response and, since the message indicates an interrupt signal has arrived, sends the signal to itself. Before exiting the read function, the peripheral kernel checks for signals, detects an interrupt signal from a companion process, and processes it as usual. If, as a result of receiving an interrupt signal, a peripheral process terminates its work using the exit function, this function takes care of killing the companion process. If a peripheral process intercepts interrupt signals, it calls custom function signal processing and upon exiting the read function returns the error code to the user. On the other hand, if a satellite is executing a system stat function on behalf of a peripheral process, it will not interrupt its execution when it receives a signal (the stat function is guaranteed to recover from any suspension because it has a limited timeout for the resource). The satellite completes the function and returns the signal number to the peripheral process. A peripheral process sends a signal to itself and receives it as output from a system function.

If a signal occurs on a peripheral processor while a system function is executing, the peripheral process will be unaware of whether control will soon return to it from the companion process or whether the latter will go into a suspended state indefinitely. The peripheral process sends a special message to the satellite, informing it of the occurrence of the signal. The core on the CPU decrypts the message and sends a signal to the satellite, whose reaction to receiving the signal is described in the previous paragraphs (crashing the function or bringing it to completion). The peripheral process cannot send a message to the satellite directly because the satellite is busy executing a system function and is not reading data from the communication line.

Using the example of the read function, the peripheral process has no idea whether its companion is waiting for input from a terminal or performing other actions. The peripheral process sends a signal message to the satellite: if the satellite is in a suspended state with a priority that allows interruptions, it immediately exits this state and stops executing the system function; otherwise, the function is executed to successful completion.

Finally, consider the case of a signal arriving at a time not associated with the execution of a system function. If the signal originates on another processor, the satellite receives it first and sends a signal message to the peripheral process, regardless of whether the signal concerns the peripheral process or not. The peripheral kernel decrypts the message and sends a signal to the process, which responds to it as usual. If the signal originates on a peripheral processor, the process performs standard actions without resorting to the services of its satellite.

When a peripheral process sends a signal to other peripheral processes, it encodes the kill message and sends it to a companion process that executes the called function locally. If some of the processes for which the signal is intended are located on other peripheral processors, their satellites will receive the signal (and react to its receipt in the manner described above).

13.2 NEWCASTLE COMMUNICATION

In the previous section, we looked at a type of tightly coupled system, which is characterized by sending all calls to the functions of the file management subsystem that occur on the peripheral processor to the remote (central) processor. Now let's move on to consider less tightly coupled systems, which consist of machines that access files located on other machines. In a network of personal computers and workstations, for example, users often access files located on a large machine. In the next two sections we will look at system configurations in which all system functions are performed in local subsystems, but at the same time it is possible to access files (through the functions of the file management subsystem) located on other machines.

These systems use one of the following two paths to identify deleted files. In some systems, a special character is added to the compound file name: the component of the name preceding this character identifies the machine, the rest of the name identifies the file located on this machine. So, for example, a compound name


"sftig!/fs1/mjb/rje"


identifies the file "/fs1/mjb/rje" located on the machine "sftig". This file identification scheme follows the convention established by the uucp program for transferring files between UNIX-type systems. In another scheme, deleted files are identified by adding a special prefix to the name, for example:


/../sftig/fs1/mjb/rje


where "/../" is a prefix indicating that the file is deleted; the second component of the file name is the name of the remote machine. This scheme uses the familiar syntax of file names in the UNIX system, so, unlike the first scheme, here user programs do not need to adapt to using names that have an unusual structure (see).


Figure 13.9. Formulating requests to a file server (processor)


We'll spend the rest of this section looking at a system model that uses a Newcastle link, in which the kernel doesn't recognize remote files; this function is entirely assigned to routines from the standard C library, which in this case perform the role of a system interface. These routines analyze the first component of the file name, which in both described identification methods contains an indication that the file is remote. This is a departure from the norm in which library routines do not parse file names. Figure 13.9 shows how requests to a file server are formulated. If the file is local, the kernel local system processes the request in the usual way. Let's consider the reverse case:


open("/../sftig/fs1/mjb/rje/file", O_RDONLY);


The C library's open routine parses the first two components of a compound file name and learns that the file should be looked for on the remote machine "sftig". In order to have information about whether the process previously had a connection with a given machine, the subroutine creates a special structure in which it remembers this fact, and if the answer is negative, it establishes a connection with a file server running on a remote machine. When a process makes its first request for remote processing, the remote server acknowledges the request, writes the user and group identification code fields if necessary, and creates a companion process to act on behalf of the client process.

To fulfill client requests, the satellite must have the same file permissions on the remote machine as the client. In other words, the user "mjb" must have both remote and local files equal access rights. Unfortunately, it is possible that the client identification code "mjb" may coincide with the identification code of another client on the remote machine. Thus, system administrators on machines operating on a network should either ensure that each user is assigned an identification code that is unique for the entire network, or perform code conversion at the time of formulating a request for network service. If this is not done, the satellite process will have the rights of another client on the remote machine.

A more sensitive issue is obtaining superuser rights to work with deleted files. On the one hand, the superuser client should not have the same rights over the remote system, so as not to confuse the security controls of the remote system. On the other hand, some of the programs, if they are not given superuser rights, simply will not be able to work. An example of such a program is the mkdir program (see Chapter 7), which creates a new directory. The remote system would not allow the client to create a new directory because superuser rights do not apply on the remote site. The problem of creating remote directories serves as a serious reason for revising the mkdir system function in the direction of expanding its capabilities in automatic installation all connections necessary for the user. However, setuid programs (which include the mkdir program) still have superuser rights over deleted files. common problem, requiring its decision. It's possible that the best solution this problem would be setting for files additional characteristics, describing access to them by remote superusers; Unfortunately, this would require changes to the disk index structure (in terms of adding new fields) and would create too much confusion in existing systems.

If the open routine completes successfully, the local library leaves a corresponding mark in a user-accessible structure containing the network node address, companion process identifier, file descriptor, and other similar information. The read and write library routines determine, based on the descriptor, whether the file is deleted, and if the answer is positive, they send a message to the satellite. The client process interacts with its companion in all cases of access to system functions that require the services of a remote machine. If a process accesses two files located on the same remote machine, it uses one satellite, but if the files are located on different machines, it uses two satellites: one on each machine. Two satellites are also used when two processes are accessing a file on a remote machine. When calling a system function through a satellite, the process generates a message that includes the function number, search path name, and other necessary information similar to that included in the message structure in a system with peripheral processors.

The mechanism for performing operations on the current directory is more complex. When a process selects a remote directory as the current directory, the library routine sends a corresponding message to the satellite, which changes the current directory, and the routine remembers that the directory is remote. In all cases where the search path name begins with a character other than a slash (/), the routine sends the name to the remote machine, where the companion process traces the path starting in the current directory. If the current directory is local, the routine simply passes the search path name to the local system kernel. The chroot system function performs similarly on a remote directory, but its execution on the local system kernel goes unnoticed; strictly speaking, the process can ignore this operation, since only the library records its execution.

When a process calls the fork function, the corresponding library routine sends messages to each satellite. The satellite processes perform a fork operation and send the identifiers of their children to the parent client. The client process runs the fork system function, which transfers control to the generated child; the local child carries out a dialogue with the remote child satellite, the addresses of which are stored by the library routine. This interpretation of the fork function makes it easier for satellite processes to control open files and current directories. When a process working with remote files exits (by calling the exit function), the routine sends messages to all its remote peers so that they do the same when they receive the message. Certain aspects of the implementation of the exec and exit system functions are covered in the exercises.

The advantage of Newcastle-type communication is that the process's access to remote files becomes "transparent" (invisible to the user), and no changes need to be made to the system core. However, this development also has a number of disadvantages. First of all, its implementation may reduce system performance. Due to the use of the extended C library, the amount of memory used by each process increases, even if the process does not access remote files; The library duplicates kernel functions and requires more memory space. Increasing the size of processes results in longer startup times and can cause more contention for memory resources, allowing tasks to be paged and paged out more frequently. Local queries will execute more slowly due to the increase in the duration of each access to the kernel; the slowdown may also threaten the processing of remote requests, the cost of sending them over the network increases. Additional processing of remote requests at the user level increases the number of context switches, unloading and paging operations. Finally, in order to access remote files, programs must be recompiled with new libraries; old programs and supplied object modules will not be able to work with deleted files without this. All these disadvantages are absent in the system described in the next section.

13.3 "TRANSPARENT" DISTRIBUTED FILE SYSTEMS

The term "transparent distribution" means that users working on one machine can access files located on another machine without realizing that in doing so they are crossing machine boundaries, just as they are on their own machine when moving from one file share. system to another cross mount points. The names by which processes access files located on remote machines are similar to the names of local files: they do not contain distinguishing characters. In the configuration shown in Figure 13.10, the /usr/src directory belonging to machine B is mounted into the /usr/src directory belonging to machine A. This configuration is convenient if you want to use the same system source code, traditionally located in the "/usr/src" directory. Users running on machine A can access files located on machine B using the usual filename syntax (for example: "/usr/src/cmd/login.c"), and the kernel itself decides whether the file is deleted or local. Users on machine B have access to their local files (unaware that the same files may also be accessed by users on machine A), but in turn do not have access to files located on machine A. Of course , other options are possible, particularly those in which all remote systems are mounted at the root of the local system, allowing users to access all files on all systems.


Figure 13.10. File systems after remote mount

The similarities between mounting local file systems and exposing remote file systems led to the adaptation of the mount function for remote file systems. In this case, the kernel has an extended format mount table at its disposal. By performing the mount function, the kernel organizes a network connection with a remote machine and stores information characterizing this connection in the mount table.

An interesting problem is with pathnames that include "..". If a process makes the current directory on a remote file system, subsequent use of ".." in the name will return the process to the local file system rather than allowing it to access files above the current directory. Returning again to Figure 13.10, note that when a process belonging to machine A, having previously selected the directory "/usr/src/cmd" located on the remote file system as the current directory, executes the command



the current directory will be the root directory belonging to machine A, not machine B. The namei algorithm running in the kernel of the remote system, having received the sequence of characters "..", checks whether the calling process is an agent of the client process, and if the answer is positive, sets, whether the client treats the current working directory as the root of the remote file system.

Communication with a remote machine takes one of two forms: a remote procedure call or a remote system function call. In the first form, each kernel procedure that deals with indexes checks whether the index points to a remote file, and if so, sends a request to the remote machine to perform the specified operation. This scheme fits naturally into the abstract structure of support for file systems of various types, described in the final part of Chapter 5. Thus, accessing a remote file can initiate the transmission of several messages over the network, the number of which is determined by the number of implied operations on the file, with a corresponding increase in the response time to the request taking into account the network waiting time. Each set of remote operations includes at least the actions of index locking, reference counting, etc. In order to improve the model, various optimization solutions have been proposed related to combining several operations into one request (message) and buffering the most important data (cm. ).


Figure 13.11. Opening a remote file


Consider a process that opens the remote file "/usr/src/cmd/login.c", where "src" is the mount point. By parsing the file name (using the namei-iget scheme), the kernel detects that the file is deleted and sends a request to the machine where it is located to obtain the locked index. Having received the desired answer, the local kernel creates a copy of the index in memory that corresponds to the remote file. Then the kernel checks for the necessary access rights to the file (to read, for example), by sending another message to the remote machine. Execution of the open algorithm continues exactly as planned in Chapter 5, sending messages to the remote machine as needed until the algorithm completes and the index is freed. The relationship between kernel data structures at the end of the open algorithm is shown in Figure 13.11.

If a client calls the read system function, the client kernel locks the local index, issues a lock request for the remote index, issues a read request for the data, copies the data to local memory, issues a release request for the remote index, and releases the local index. This scheme corresponds to the semantics of the existing single-processor kernel, but the frequency of network use (several calls for each system function) reduces the performance of the entire system. However, to reduce network traffic, multiple operations can be combined into a single request. In the example with the read function, the client can send the server one general “read” request, and the server itself, when executing it, makes the decision to seize and release the index. Reducing network traffic can also be achieved by using remote buffers (as discussed above), but care must be taken to ensure that system file functions that use these buffers run properly.

In the second form of communication with a remote machine (a call to a remote system function), the local kernel detects that the system function refers to a remote file and sends the parameters specified in its call to the remote system, which executes the function and returns the results to the client. The client machine receives the results of the function and exits the call state. Most system functions can be performed using only one network request and receiving a response within a reasonable amount of time, but not all functions fit into this model. For example, upon receiving certain signals, the kernel creates a file for the process called "core" (Chapter 7). The creation of this file is not associated with a specific system function, but completes several operations such as creating the file, checking access rights, and performing a number of write operations.

In the case of the open system function, the request to execute the function sent to the remote machine includes part of the file name, the remaining component of the search path name that distinguishes the remote file, and various flags. In the earlier example of opening the file "/usr/src/cmd/login.c", the kernel sends the name "cmd/login.c" to the remote machine. The message also includes identification data, such as user and group identification codes, necessary to verify access rights to files on the remote machine. If a response is received from the remote machine indicating the success of the open function, the local kernel selects a free index in memory local machine and marks it as the index of the remote file, saves information about the remote machine and the remote index, and routinely allocates new entry in the file table. Compared to the real index on the remote machine, the index owned by the local machine is formal and does not violate the model configuration, which is generally the same as the configuration used when calling the remote procedure (Figure 13.11). If a function called by a process accesses a remote file by its handle, the local kernel learns from the (local) index that the file is remote, formulates a request that includes the called function, and sends it to the remote machine. The request contains a pointer to the remote index, by which the satellite process can identify the remote file itself.

Having received the result of executing any system function, the kernel can resort to the services of a special program to process it (on completion of which the kernel will finish working with the function), because local processing of results used in a single-processor system is not always suitable for a system with multiple processors. As a result, changes are possible in the semantics of system algorithms aimed at providing support for the execution of remote system functions. However, at the same time, a minimal flow of messages circulates in the network, ensuring minimum time system reactions to incoming requests.

13.4 DISTRIBUTED MODEL WITHOUT TRANSFER PROCESSES

Using transfer processes (satellite processes) in a transparent distributed system makes it easier to track remote files, but the process table of the remote system is overloaded with satellite processes that are idle most of the time. In other schemes, special server processes are used to process remote requests (see and). The remote system has a set (pool) of server processes that it assigns from time to time to process incoming remote requests. After processing the request, the server process returns to the pool and becomes ready to process other requests. The server does not save user context between two requests, because it can process requests from several processes at once. Consequently, each message arriving from a client process must include information about its execution environment, namely: user identification codes, current directory, signals, etc. Satellite processes receive this data at the time of their appearance or during the execution of the system functions.

When a process opens a remote file, the remote system's kernel assigns an index for subsequent references to the file. The local machine has a user file descriptor table, a file table, and an index table with the usual set of entries, with an entry in the index table identifying the remote machine and the remote index. In cases where a system function (such as read) uses a file descriptor, the kernel sends a message pointing to the previously assigned remote inode and transmits process-related information: user identification code, maximum file size, etc. If the remote the machine has at its disposal a server process, interaction with the client takes the form described earlier, however, the connection between the client and the server is established only for the duration of the system function.

If you use servers instead of satellite processes, managing data flow, signals, and remote devices can become more complex. Admissions to large quantities requests to a remote machine in the absence of a sufficient number of servers must be queued. This requires a higher level protocol than the one used on the main network. In the satellite model, on the other hand, request overload is eliminated because all client requests are processed synchronously. A client can have at most one request pending.

Handling signals that interrupt the execution of a system function also becomes more complicated when using servers, since the remote machine must then find the appropriate server to service the function. It is not even possible that, due to the busyness of all servers, the request to execute a system function is in a state of pending processing. Conditions for competition to occur also occur when the server returns the result of executing a system function to the calling process and the server's response includes sending a corresponding signaling message through the network. Each message must be marked in such a way that the remote system can recognize it and, if necessary, interrupt the work of the server processes. When using satellites, the process that processes the client's request is identified automatically, and when a signal is received, checking whether the request has been processed or not is not difficult.

Finally, if a system function called by the client causes the server to pause indefinitely (for example, while reading data from remote terminal), the server cannot process other requests, thereby freeing up the server pool. If several processes access remote devices at once and if the number of servers is limited from above, a quite noticeable bottleneck occurs. This does not happen with satellites because the satellite is allocated to each client process. Another issue related to the use of servers for remote devices will be discussed in Exercise 13.14.

Despite the advantages provided by the use of satellite processes, the need for free process table entries in practice becomes so acute that in most cases they still resort to the services of server processes to process remote requests.


Figure 13.12. Conceptual diagram of interaction with remote files at the kernel level

13.5 CONCLUSIONS

In this chapter, we examined three schemes for working with files located on remote machines, which treat remote file systems as an extension of the local one. The architectural differences between these schemes are shown in Figure 13.12. All of them, in turn, differ from the multiprocessor systems described in the previous chapter in that here the processors do not share physical memory. A peripheral processor system consists of a tightly coupled set of processors that share file resources on the central processor. Newcastle-type communication provides hidden (“transparent”) access to remote files, but not through the operating system kernel, but through the use of a special C library. For this reason, all programs that intend to use this type of communication must be recompiled, which in general is a serious disadvantage of this scheme. The remoteness of a file is indicated by a special sequence of characters describing the machine on which the file is located, and this is another factor limiting the portability of programs.

In "transparent" distributed systems To access remote files, a modification of the mount system function is used. Indexes on the local system are marked as referring to remote files, and the local kernel sends a message to the remote system describing the requested system function, its parameters, and the remote index. Communication in a transparent distributed system is supported in two forms: in the form of a remote procedure call (a message is sent to the remote machine containing a list of operations associated with the index) and in the form of a call to a remote system function (the message describes the requested function). The final part of the chapter examines issues related to processing remote requests using satellite processes and servers.

13.6 EXERCISES

*1. Describe the implementation of the exit system function on a system with peripheral processors. What is the difference between this case and when the process terminates upon receiving an uncaught signal? How should the kernel dump the contents of memory?

2. Processes cannot ignore SIGKILL signals; explain what happens in the peripheral system when a process receives such a signal.

*3. Describe the implementation of the exec system function on a system with peripheral processors.

*4. How should the central processor distribute processes among peripheral processors in order to balance the overall load?

*5. What happens if the peripheral processor does not have enough memory to accommodate all the processes swapped to it? How should processes be unloaded and swapped on the network?

6. Consider a system in which requests to a remote file server are sent if a special prefix is ​​detected in the file name. Let the process call the function execl("/../sftig/bin/sh", "sh", 0); The executable resides on the remote machine, but must be executed on the local system. Explain how a remote module is transferred to the local system.

7. If an administrator needs to add new machines to an existing Newcastle-connected system, what is the best way to inform the C library modules about this?

*8. When exec is executed, the kernel erases the process's address space, including the library tables used by the Newcastle link to keep track of references to remote files. After executing the function, the process must retain the ability to access these files by their old handles. Describe the implementation of this moment.

*9. As shown in Section 13.2, calling the exit system function on systems with a Newcastle connection causes a message to be sent to the companion process, causing it to terminate. This is done at the library routine level. What happens when a local process receives a signal telling it to exit in kernel mode?

*10. How, on a system with a Newcastle link where remote files are identified by adding a special prefix to the filename, can a user specify a ".." (parent directory) component of the filename to traverse a remote mount point?

11. From Chapter 7, we know that various signals induce a process to dump the contents of memory into the current directory. What should happen if the current directory is from a remote file system? What answer would you give if the system uses a Newcastle type connection?

*12. What would be the impact on local processes if all satellite processes or servers were removed from the system?

*13. Consider how a transparent distributed system would implement the link algorithm, which can take two remote file names as parameters, and the exec algorithm, which involves performing multiple internal read operations. Consider two forms of communication: a remote procedure call and a remote system function call.

*14. When accessing a device, the server process may enter a suspended state, from which it will be woken up by the device driver. Naturally, if the number of servers is limited, the system will no longer be able to satisfy the requests of the local machine. Come up with a reliable design so that not all server processes are suspended while waiting for device-related I/O to complete. The system function will not stop executing while all servers are busy.


Figure 13.13. Configuration with terminal server

*15. When a user logs into the system, the terminal line discipline stores information that the terminal is an operator terminal leading a group of processes. For this reason, when the user presses the "break" key on the terminal keyboard, all processes in the group receive the interrupt signal. Consider a system configuration in which all terminals are physically connected to one machine, but user registration is logically implemented on other machines (Figure 13.13). In each individual case, the system creates a getty process for the remote terminal. If requests to a remote system are processed using a set of server processes, it should be noted that when the open procedure is performed, the server stops waiting for a connection. When the open function completes, the server returns back to the server pool, breaking its connection with the terminal. How is the interrupt signal caused by pressing the "break" key sent to the addresses of processes included in the same group?

*16. Memory sharing is a feature native to local machines. From a logical point of view, the selection general area physical memory (local or remote) can also be implemented for processes belonging to different machines. Describe the implementation of this moment.

*17. The algorithms for unloading processes and paging on request, discussed in Chapter 9, assume the use local device unloading. What changes should be made to these algorithms to enable support for remote upload devices?

*18. Let's assume that a fatal failure occurred on a remote machine (or network) and the local network layer protocol recorded this fact. Develop a recovery scheme for a local system accessing a remote server with requests. In addition, develop a plan for restoring a server system that has lost contact with clients.

*19. When a process accesses a remote file, it is possible that the process will travel to multiple machines to find the file. As an example, let's take the name "/usr/src/uts/3b2/os", where "/usr" is the directory belonging to machine A, "/usr/src" is the root mount point of machine B, "/usr/src/uts /3b2" is the mount point of the root of machine C. Passing through several machines to the final destination is called a "multihop". However, if there is a direct network connection between machines A and C, sending data through machine B would be ineffective. Describe the implementation features of "multi-hop" in a Newcastle connected system and in a "transparent" distributed system.

  • Translation

I joined Uber two years ago as a mobile developer with some experience in backend development. Here I was developing the payment functionality in the application - and along the way I rewrote the application itself. After which I moved to developer management and headed the team itself. This allowed me to become much more familiar with the backend, as my team is responsible for many of our backend systems that enable payments.

Before my work at Uber, I had no experience working with distributed systems. I received a traditional education in Computer Science, after which I worked in full-stack development for ten years. So while I could draw various diagrams and talk about trade-offs ( tradeoffs) in systems, by that time I did not understand and perceive the concepts of distribution well enough, such as consistency ( consistency), availability ( availability) or idempotency ( idempotency).

IN this post I'm going to talk about a few of the concepts that I needed to learn and put into practice when building the large-scale, highly available, distributed payment system that Uber runs today. This is a system with a load of up to several thousand requests per second, in which critical payment functionality must work correctly even in cases where certain parts of the system stop working.

Is this a complete list? Most likely no. However, if I personally had learned about these concepts earlier, it would have made my life much easier.

So let's begin our dive into SLA, consistency, data durability, message persistence, idempotency and some other things I needed to learn in my new job.

SLA

In large systems that process millions of events a day, some things are bound to go wrong. That's why before we dive into system planning, the most important step we need to take is deciding what a “healthy” system means to us. The degree of "health" must be something that In fact can be measured. A common way to measure the health of a system is through SLAs ( service level agreements). Here are some of the most common types of SLAs that I have encountered in practice:
  • Availability: The percentage of time that the service is up. While it may be tempting to achieve 100% availability, achieving this result can be truly challenging, and also quite expensive. Even large and critical systems like a network VISA cards, Gmail or ISPs do not have 100% availability - over the years they will accumulate seconds, minutes or hours spent in downtime. For many systems, availability of four nines (99.99%, or approximately 50 minutes of downtime per year) is considered high availability. In order to get to this level, you have to work hard.
  • Accuracy: Is data loss or inaccuracy acceptable? If so, what percentage is acceptable? For the payment system I was working on, this figure had to be 100%, since there was no way to lose data.
  • Throughput/Power (Capacity): What load should the system withstand? This metric is usually expressed in requests per second.
  • Latency: How long should the system respond? How long should it take to service 95% and 99% of requests? In such systems, typically many of the requests are "noise", so the p95 and p99 delays have more practical use in the real world.
Why are SLAs needed when creating a large payment system? We are creating a new system to replace the existing one. To make sure we were doing everything right and that our new system would be “better” than its predecessor, we used an SLA to define our expectations for it. Accessibility was one of the most important requirements. Once we defined the goal, we needed to understand the tradeoffs in the architecture to achieve those goals.

Horizontal and vertical scaling

As the business that uses our newly created system grows, the load on it will only increase. At a certain point, the existing installation will not be able to handle any further increase in load and we will need to increase the load capacity. Two common scaling strategies are vertical or horizontal scaling.

Horizontal scaling is about adding more machines (or nodes) to the system to increase throughput ( capacity). Horizontal scaling is the most popular way to scale distributed systems.

Vertical scaling is essentially "buy a bigger/stronger machine" - a (virtual) machine with more cores, better processing power and more memory. In the case of distributed systems, vertical scaling is usually less popular because it can be more expensive than horizontal scaling. However, some well-known large sites, like Stack Overflow, have successfully scaled vertically to meet the load.

Why does a scaling strategy make sense when you're building a large payment system? We decided early on that we would build a system that would scale horizontally. Although vertical scaling is acceptable in some cases, our payments system had already reached its projected load and we were pessimistic about the assumption that a single super-expensive mainframe could handle this load today, let alone in the future. . In addition, there were several people on our team who had worked for large payment service providers and had negative experiences trying to scale vertically even on the most powerful machines that money could buy in those years.

Consistency

Availability of either system is important. Distributed systems are often built from machines whose individual availability is lower than the availability of the entire system. Let our goal be to build a system with 99.999% availability (downtime is approximately 5 minutes/year). We use machines/nodes that have an average availability of 99.9% (they are downtime for approximately 8 hours/year). The direct way to achieve the availability indicator we need is to add several more such machines/nodes to the cluster. Even if some of the nodes are down, others will continue to be operational and the overall availability of the system will be higher than the availability of its individual components.

Consistency is a key issue in highly available systems. A system is consistent if all nodes see and return the same data at the same time. Unlike our previous model, where we added more nodes to achieve greater availability, making sure the system remains consistent is not nearly as trivial. To ensure that each node contains the same information, they must send messages to each other to stay in sync at all times. However, messages sent by them to each other may not be delivered - they may be lost and some of the nodes may be unreachable.

Consistency is the concept that took me the longest to understand and appreciate. There are several types of consistency, the most widely used in distributed systems is strong consistency ( strong consistency), weak consistency ( weak consistency) and eventual consistency ( eventual consistency). You can read a useful practical analysis of the advantages and disadvantages of each model in this article. Typically, the weaker the required level of consistency, the faster the system can run - but the more likely it is that it will not return the latest set of data.

Why is consistency worth considering when building a large payment system? The data in the system must be consistent. But how consistent are they? For some parts of the system, only highly consistent data will be suitable. For example, we need to store in a highly consistent form the information that a payment has been initiated. For other parts of the system that are not as critical, consistency can ultimately be considered a reasonable trade-off.

This is well illustrated by listing recent transactions: they can be implemented using eventual consistency ( eventual consistency) - that is, the last transaction may not appear in some parts of the system until some time later, but due to this, the list query will return the result with less delay or require fewer resources to execute.

Data durability

Durability means that once data is successfully added to the data warehouse, it will be available to us in the future. This will be true even if the system nodes go offline, they fail, or the node data is damaged.

Various distributed databases have different levels data durability. Some of them support data durability at the machine/node level, others do it at the cluster level, and some don't provide this functionality out of the box at all. Some form of replication is usually used to increase longevity - if data is stored on multiple nodes and one of the nodes goes down, the data will still be available. , which explains why achieving durability in distributed systems can be a major challenge.

Why does data durability matter when building a payment system? If the data is critical (for example, payments), then we cannot afford to lose it in many parts of our system. The distributed data stores we built needed to support data durability at the cluster level - so that even if instances failed, completed transactions would be preserved. These days, most distributed storage services - like Cassandra, MongoDB, HDFS or Dynamodb - all support durability at various levels and can all be configured to provide cluster-level durability.

Message persistence and durability

Nodes in distributed systems perform calculations, store data, and send messages to each other. Key Feature sending messages is how reliably these messages will arrive. For critical important systems There is often a requirement that no messages be lost.

In the case of distributed systems, messaging ( messaging) is usually performed using some distributed messaging service - RabbitMQ, Kafka or others. These message brokers can support (or are configured to support) varying levels of message delivery reliability.

Message persistence means that when the node processing the message fails, the message will still be available for processing after the problem is resolved. Message durability is typically used at the message queue level. With a durable message queue, if the queue (or node) goes offline when a message is sent, it will still receive the message when it comes back online. A good detailed article on this issue is available here.


Why do the safety and durability of messages matter when building large payment systems? We had messages that we couldn't afford to lose - for example, a message that a person had initiated a payment for a trip. This meant that the messaging system we would be using had to be lossless: every message had to be delivered once. However, creating a system that delivers every message smooth once rather than at least once - these are tasks that vary significantly in their difficulty. We decided to implement a messaging system that delivers at least once and chose a message bus ( messaging bus), on top of which we decided to build it (we chose Kafka, creating a lossless cluster, which was required in our case).

Idempotency

In the case of distributed systems, anything can go wrong - connections can fall off in the middle or requests can time out. Clients will repeat these requests frequently. An idempotent system ensures that no matter what happens, and no matter how many times a particular request is executed, the actual execution of that request occurs only once. A good example is making a payment. If a client creates a payment request, the request is successful, but if the client times out, then the client can repeat the same request. In the case of an idempotent system, the person making the payment will not be charged twice; but for a non-idemponet system this is a completely possible phenomenon.

Designing idempotent distributed systems requires some kind of distributed locking strategy. This is where the concepts we discussed earlier come into play. Let's say we intend to implement idempotency by using optimistic locking to avoid concurrent updates. In order for us to use optimistic locking, the system must be strictly consistent - so that while an operation is running, we can check whether another operation has started, using some form of versioning.

There are many ways to achieve idempotency, and each specific choice will depend on system constraints and the type of operation being performed. Designing idempotent approaches is a worthy challenge for a developer - just look at Ben Nadel's posts in which he talks about various strategies he has used, which include both distributed locks and constraints ( constraints) Database. When you're designing a distributed system, idempotency can easily be one of the parts that you've overlooked. In our practice, we have encountered cases in which my team got burned by not making sure that the correct idempotency was available for some key operations.

Why does idempotency matter when building a large payment system? The most important thing: to avoid double debits and double refunds. Given that our messaging system has at least once, lossless delivery, we must assume that all messages can be delivered multiple times and systems must guarantee idempotency. We have chosen to handle this using versioning and optimistic locking, where our systems implement idempotent behavior using a strongly consistent store as their data source.

Sharding and quorum

Distributed systems often need to store much more data than a single node can handle. So how do we store the data set on the right number of machines? The most popular technique for this is sharding. Data is horizontally partitioned using a hash assigned to the partition. While many distributed databases today implement sharding under the hood, it is an interesting topic worth exploring in its own right—especially resharding. Foursquare suffered a 17-hour downtime in 2010 due to an edge case of sharding, after which the company shared a report that sheds light on the root of the problem.

Many distributed systems have data or computation that is replicated across multiple nodes. To ensure that operations are performed consistently, a voting approach is defined, in which a certain number of nodes must have the same result for an operation to be considered successful. This process is called quorum.

Why do quorum and sharding make sense when building a large payment system at Uber? Both of these concepts are simple and are used almost everywhere. I met them when we were setting up replication in Cassandra. Cassandra (and other distributed systems) uses quorum and local quorum ( local quorum) in order to ensure consistency between clusters.

Actor model

The usual vocabulary we use to describe programming practices - things like variables, interfaces, method calls - assume systems of a single machine. When we talk about distributed systems, we must use different approaches. A common way to describe such systems is the actor model, in which we see the code in terms of communication. This model is popular because it coincides with the mental model of how we imagine, for example, the interaction of people in an organization. Another, no less popular way to describe distributed systems is CSP, interacting sequential processes.

The actor model is based on actors that send messages to each other and react to them. Each actor can do limited set things - create other actors, send messages to others, or decide what to do with the next message. With the help of several simple rules, we can describe quite well complex distributed systems that can restore themselves after an actor “falls”. If you are not familiar with this approach, then I recommend you the article

This type of system is more complex in terms of system organization. The essence distributed systems is to store local copies of important data.

Schematically this architecture can be represented as shown in Fig. 5.6.

Rice. 5.6. Distributed Systems Architecture

More than 95% of the data used in enterprise management can be placed on one personal computer, allowing it to operate independently. The stream of corrections and additions generated on this computer is negligible compared to the amount of data used. Therefore, if you store continuously used data on the computers themselves, and organize the exchange of corrections and additions to the stored data between them, then the total transmitted traffic will sharply decrease. This makes it possible to lower the requirements for communication channels between computers and more often use asynchronous communication, and thanks to this, create reliably functioning distributed information systems that use unstable communications such as the Internet, mobile communications, and commercial satellite channels to connect individual elements. And minimizing traffic between elements will make the cost of operating such a connection quite affordable. Of course, the implementation of such a system is not elementary, and requires solving a number of problems, one of which is timely data synchronization.

Each workstation is independent, containing only the information it needs to work with, and the relevance of data throughout the system is ensured through continuous exchange of messages with other workstations. Messaging between workstations can be implemented in various ways, from sending data via email to transmitting data over networks.

Another advantage of this operating scheme and architecture system is to ensure the possibility of personal responsibility for the safety of data. Since the data available at a specific workplace is located only on this computer, when using encryption tools and personal hardware keys, access to the data by third parties, including IT administrators, is excluded.

Such architecture The system also allows you to organize distributed computing between client machines. For example, the calculation of any task that requires large calculations can be distributed among neighboring workstations due to the fact that they, as a rule, have the same information in their databases and, thus, achieve maximum system performance.

Distributed systems with replication

Data between different workstations and a centralized data storage is transferred by replication (Fig. 5.7). When entering information on workstations, the data is also recorded in a local database, and only then synchronized.

Rice. 5.7. Architecture of distributed systems with replication

Distributed systems with elements of remote execution

There are certain features that cannot be efficiently implemented on a conventional distributed replicative system. These features include:

    using data from entities that are stored on a remote server (node);

    using data from entities stored on different servers(nodes) partially;

    use of separate functionality on a dedicated server (node).

Each of the described types uses a common principle: the client program either accesses a dedicated (remote) server directly or accesses a local database, which encapsulates access to a remote server (Fig. 5.8).

Rice. 5.8. Architecture of distributed systems with remote execution

Open source software has become a core building block in the creation of some of the world's largest websites. With the growth of these websites, best practices and guidelines for their architecture have emerged. This chapter aims to cover some of the key issues to consider when designing large websites, as well as some of the basic components used to achieve these goals.

The focus of this chapter is on the analysis of web-based systems, although some of the material may be extrapolated to other distributed systems.

1.1 Principles of building distributed web systems

What exactly does it mean to create and manage a scalable website or application? At a primitive level, this is simply connecting users to remote resources through the Internet. And resources or access to these resources, which are distributed over many servers and are the link that ensures the scalability of the website.

Like most things in life, time spent upfront planning the build of a web service can help later; Understanding some of the considerations and trade-offs behind large websites can yield smarter decisions when building smaller websites. Below are some key principles that influence the design of large-scale web systems:

  • Availability: Website uptime is critical to the reputation and functionality of many companies. For some larger online retailers, being unavailable for even a few minutes can result in thousands or millions of dollars in lost revenue. Thus, developing their systems to be always available and resilient to failure is both a fundamental business and technology requirement. High availability in distributed systems requires careful consideration of redundancy for key components, rapid recovery from partial system failures, and smooth curtailment of capabilities when problems arise.
  • Performance: Website performance has become an important metric for most websites. Website speed impacts user experience and satisfaction, as well as search engine rankings—a factor that directly impacts audience retention and revenue. As a result, the key is to create a system that is optimized for fast responses and low latency.
  • Reliability: the system must be reliable such that a given data request consistently returns specified data. In case of data change or update, the same request should return new data. Users need to know that if something is recorded or stored in the system, they can be confident that it will remain in place for later retrieval.
  • Scalability: When it comes to any large distributed system, size is just one item on a list that needs to be considered. Equally important are efforts to increase the throughput to handle large volumes of workload, which is usually referred to as system scalability. Scalability can refer to various parameters of a system: the amount of additional traffic it can handle, how easy it is to add storage capacity, or how many more other transactions can be processed.
  • Controllability: Designing a system that is easy to operate is another important factor. System manageability equates to the scalability of “maintenance” and “update” operations. To ensure manageability, it is necessary to consider the ease of diagnosing and understanding emerging problems, the ease of updating or modifying, and the ease of use of the system. (That is, does it work as expected without failures or exceptions?)
  • Price: Cost is an important factor. This can obviously include hardware and software costs, but it is also important to consider other aspects needed to deploy and maintain the system. The amount of developer time required to build the system, the amount of operational effort required to get the system up and running, and even the appropriate level of training must all be provided for. Cost represents the total cost of ownership.

Each of these principles provides a basis for decision-making in the design of a distributed web architecture. However, they can also be in conflict with each other because achieving the goals of one comes at the expense of neglecting the other. Simple example: choice simple addition multiple servers as a performance (scalability) solution can increase the cost of management (you have to run an additional server) and server purchases.

When developing any kind of web application, it is important to consider these key principles, even if it is to confirm that the project can sacrifice one or more of them.

1.2 Basics

When considering system architecture, there are several issues that need to be addressed, such as which components are worth using, how they fit together, and what trade-offs can be made. Investing money in scaling without a clear need for it is not a smart business decision. However, some forethought in planning can save significant time and resources in the future.

This section focuses on some basic factors that are critical to almost all large web applications: Services,
redundancy, segmentation, And failure handling. Each of these factors involves choices and trade-offs, especially in the context of the principles described in the previous section. To clarify, let's give an example.

Example: Image Hosting Application

You've probably posted images online before. For large sites that store and deliver many images, there are challenges in creating a cost-effective, highly reliable architecture that has low response latencies (fast retrieval).

Imagine a system where users have the ability to upload their images to a central server, and where images can be requested through a site link or API, similar to Flickr or Picasa. To simplify the description, let's assume that this application has two main tasks: the ability to upload (write) images to the server and request images. Of course, efficient loading is an important criterion, but fast delivery when requested by users (for example, images may be requested for display on a web page or by another application) will be a priority. This functionality is similar to what a web server or Content Delivery Network (CDN) edge server can provide. A CDN server typically stores data objects in multiple locations, thus bringing them geographically/physically closer to users, resulting in improved performance.

Other important aspects of the system:

  • The number of images stored can be unlimited, so storage scalability must be considered from this point of view.
  • There should be low latency for image downloads/requests.
  • If a user uploads an image to the server, its data must always remain intact and accessible.
  • The system must be easy to maintain (manageability).
  • Since image hosting does not generate much profit, the system must be cost-effective.

Another potential problem with this design is that a web server such as Apache or lighttpd usually has an upper limit on the number of simultaneous connections it is able to service (the default is approximately 500, but it can be much higher) and with high traffic, recordings can quickly use up this limit. Since reads can be asynchronous or take advantage of other performance optimizations like gzip compression or chunking, the web server can switch feed reads faster and switch between clients while serving many more requests than the maximum number of connections (with Apache and with a maximum number of connections set to 500, it is quite possible to service several thousand read requests per second). Records, on the other hand, tend to keep the connection open for the entire duration of the download. Thus, transferring a 1 MB file to the server could take more than 1 second on most home networks, resulting in the web server only being able to process 500 of these simultaneous entries.


Figure 1.2: Read/Write Separation

Anticipating this potential problem suggests the need to separate image reading and writing into independent services, shown in . This will allow us to not only scale each of them individually (since it's likely that we will always be doing more reads than writes), but also to be aware of what is happening in each service. Finally, it will differentiate problems that may arise in the future, making it easier to diagnose and evaluate the problem of slow read access.

The advantage of this approach is that we are able to solve problems independently of each other - without having to worry about recording and retrieving new images in the same context. Both of these services still use a global corpus of images, but by using service-specific techniques, they are able to optimize their own performance (for example, by queuing requests, or caching popular images - more on this later). From both a service and cost perspective, each service can be scaled independently as needed. And this is a positive thing, since combining and mixing them could inadvertently affect their performance, as in the scenario described above.

Of course, the above model will work optimally if there are two different endpoints (in fact, this is very similar to several implementations of cloud storage providers and Content Delivery Networks). There are many solutions similar problems, and in each case a compromise can be found.

For example, Flickr solves this read-write problem by distributing users across different pods such that each pod can only serve a limited number of specific users, and as the number of users increases, more pods are added to the cluster (see Flickr's scaling presentation,
http://mysqldba.blogspot.com/2008/04/mysql-uc-2007-presentation-file.html). In the first example, it is easier to scale the hardware based on the actual usage load (number of reads and writes across the entire system), whereas Flickr scales based on the user base (however, this assumes equal usage across users, so capacity needs to be planned according to stock). In the past, an unavailability or problem with one of the services would break the functionality of the entire system (for example, no one could write files), then the unavailability of one of the Flickr modules would only affect the users associated with it. In the first example, it is easier to perform operations on the entire data set - for example, updating the recording service to include new metadata, or searching all image metadata - whereas with the Flickr architecture, each module had to be updated or searched (or the search service had to be created to sort the metadata that is actually intended for this purpose).

As for these systems, there is no panacea, but you should always proceed from the principles described at the beginning of this chapter: determine system needs (read or write load or both, level of parallelism, queries on data sets, ranges, sorting, etc.), conduct comparative benchmark testing of various alternatives, understand potential system failure conditions and develop comprehensive plan in case of failure.

Redundancy

To gracefully handle failure, the web architecture must have redundancy in its services and data. For example, if there is only one copy of a file stored on a single server, the loss of that server will mean the loss of the file. Hardly similar situation can be described positively and can usually be avoided by creating multiple or backup copies.

This same principle applies to services. You can protect against single node failure by providing an integral part of the functionality for the application to ensure that multiple copies or versions of it can run simultaneously.

Creating redundancy in the system allows you to get rid of weak points and provide backup or redundant functionality in case of an emergency. For example, if there are two instances of the same service running in production, and one of them fails completely or partially, the system can overcome the failure by switching to a working copy.
Switching may occur automatically or require manual intervention.

.

Another key role of service redundancy is to create architecture that does not provide for resource sharing. With this architecture, each node is able to operate independently and, moreover, in the absence of a central “brain” managing the states or coordinating the actions of other nodes. It promotes scalability because adding new nodes does not require special conditions or knowledge. Most importantly, these systems have no critical point of failure, making them much more resilient to failure.

.

For example, in our image server application, all the images would have redundant copies somewhere in another piece of hardware (ideally in a different geographic location in the event of a disaster such as an earthquake or fire in the data center), and the services would access the images will be redundant, given that all of them will potentially serve requests. (Cm. .)
Looking ahead, load balancers are a great way to make this possible, but more on that below.


Figure 1.3: Redundant Image Hosting Application

Segmentation

Data sets can be so large that they cannot be accommodated on a single server. It may also happen that computational operations require too large computer resources, reducing performance and making it necessary to increase power. In any case, you have two options: vertical or horizontal scaling.

Vertical scaling involves adding more resources to a single server. So, for a very large data set, this would mean adding more (or larger) hard drives so that the entire data set could fit on a single server. In the case of compute operations, this would mean moving the computation to a larger server with a faster CPU or more memory. In any case, vertical scaling is done in order to make a separate resource computing system capable of additional processing data.

Horizontal scaling, on the other hand, involves adding more nodes. In the case of a large data set, this would mean adding a second server to store part of the total data volume, and for a computing resource, this would mean dividing the work or load across some additional nodes. To take full advantage of the potential of horizontal scaling, it must be implemented as an internal design principle of the system architecture. Otherwise, changing and isolating the context needed for horizontal scaling can be problematic.

The most common method of horizontal scaling is to divide services into segments or modules. They can be distributed in such a way that each logical set of functionality works separately. This can be done by geographic boundaries, or other criteria such as paying and non-paying users. The advantage of these schemes is that they provide a service or data store with enhanced functionality.

In our image server example, it is possible that the single file server used to store the image could be replaced by multiple file servers, each containing its own unique set of images. (See.) This architecture would allow the system to fill each file server with images, adding additional servers as it becomes full. disk space. The design will require a naming scheme that associates the name of the image file with the server containing it. The image name can be generated from a consistent hashing scheme tied to the servers. Or alternatively, each image could have an incremental ID, which would allow the delivery service, when requesting an image, to only process the range of IDs associated with each server (as an index).


Figure 1.4: Image hosting application with redundancy and segmentation

Of course, there are difficulties in distributing data or functionality across many servers. One of the key questions is data location; In distributed systems, the closer the data is to the operation or computation point, the better the system performance. Consequently, distributing data across multiple servers is potentially problematic, since at any time the data may be needed, there is a risk that it may not be available at the location required, the server will have to perform a costly retrieval of the necessary information over the network.

Another potential problem arises in the form
inconsistency (inconsistency).When various services reads from and writes to a shared resource, potentially another service or data store, it is possible for a race condition to occur - where some data is thought to be updated to the latest state, but is actually read before it is updated - in which case the data is inconsistent. For example, in an image hosting scenario, a race condition might occur if one client sent a request to update an image of a dog, changing the title "Dog" to "Gizmo", while another client was reading the image. In such a situation, it is unclear which title, "Dog" or "Gizmo", would have been received by the second client.

.

There are, of course, some obstacles associated with data segmentation, but segmentation allows you to isolate each problem from the others: by data, by load, by usage patterns, etc. into managed blocks. This can help with scalability and manageability, but there is still risk. There are many ways to reduce risk and handle failures; however, in the interests of brevity they are not covered in this chapter. If you want more information on this topic, you should take a look at the blog post on fault tolerance and monitoring.

1.3. Building blocks for fast and scalable data access

Having looked at some basic principles in distributed system development, let's now move on to a more complex issue - scaling data access.

The simplest web applications, such as LAMP stack applications, are similar to the image in .


Figure 1.5: Simple web applications

As an application grows, two main challenges arise: scaling access to the application server and to the database. In a highly scalable application design, the web or application server is typically minimized and often implements a resource-sharing architecture. This makes the application server layer of the system horizontally scalable. As a result of this design, the hard work will move down the stack to the database server and support services; This layer is where the real scaling and performance issues come into play.

The rest of this chapter covers some of the most common strategies and techniques for improving the performance and scalability of these types of services by providing fast access to data.


Figure 1.6: Simplified web application

Most systems can be simplified to a circuit in
which is a good place to start looking. If you have a lot of data, it's safe to assume that you want it to be as easy and quick to access as the box of candy in your top desk drawer. Although this comparison is overly simplistic, it highlights two complex issues: data storage scalability and fast data access.

For the purposes of this section, let's assume that you have many terabytes (TB) of data and you allow users to access small parts these data in random order. (Cm. .)
A similar task is to determine the location of an image file somewhere on a file server in a sample image hosting application.


Figure 1.7: Access to specific data

This is especially difficult because loading terabytes of data into memory can be very expensive and directly impacts disk I/O. The speed of reading from a disk is several times lower than the speed of reading from RAM - we can say that accessing memory is as fast as Chuck Norris, while accessing a disk is slower than the queue at the clinic. This difference in speed is especially noticeable for large data sets; In raw numbers, memory access is 6 times faster than disk reads for sequential reads, and 100,000 times faster for random reads (see Pathologies of Big Data, http://queue.acm.org/detail. cfm?id=1563874).). Moreover, even with unique identifiers, solving the problem of finding the location of a small piece of data can be as difficult as trying to pick the last chocolate-filled candy out of a box of hundreds of other candies without looking.

Fortunately, there are many approaches that can be taken to simplify things, with the four most important approaches being the use of caches, proxies, indexes, and load balancers. The remainder of this section discusses how each of these concepts can be used to make data access much faster.

Caches

Caching Benefits from a Feature basic principle: Data recently requested is likely to be needed again. Caches are used at almost every level of computing: hardware, operating systems, web browsers, web applications and more. A cache is like short-term memory: limited in size, but faster than the original data source and containing items that have been recently accessed. Caches can exist at all levels in the architecture, but are often found at the level closest to the front end, where they are implemented to return data quickly without significant backend load.

So how can a cache be used to speed up data access within our example API? In this case, there are several suitable cache locations. One possible placement option is to select nodes at the query level, as shown in
.


Figure 1.8: Cache placement on a query-level node

Placing the cache directly on the request-level node allows local storage of response data. Each time a service request is made, the node will quickly return local, cached data, if any exists. If it is not in the cache, the request node will request the data from disk. The cache on a single query-level node could also be located either in memory (which is very fast) or on the node's local disk (faster than trying to access network storage).


Figure 1.9: Cache systems

What happens when you spread caching across many nodes? As you can see, if the request level includes many nodes, then it is likely that each node will have its own cache. However, if your load balancer randomly distributes requests among nodes, then the same request will go to different nodes, thus increasing cache misses. Two ways to overcome this obstacle are global and distributed caches.

Global cache

The meaning of a global cache is clear from the name: all nodes share one single cache space. In this case, you add a server or file storage of some kind that is faster than your original storage and that will be available to all request level nodes. Each of the request nodes queries the cache in the same way as if it were local. This type of caching scheme can cause some problems, since a single cache can be very easily overloaded if the number of clients and requests increases. At the same time, this scheme is very effective on certain architectures (especially those associated with specialized hardware that makes this global cache very fast, or that has a fixed set of data that must be cached).

There are two standard forms of global caches shown in the diagrams. The situation shown is that when the cached response is not found in the cache, the cache itself becomes responsible for retrieving the missing part of the data from the underlying storage. Illustrated is the responsibility of request nodes to retrieve any data that is not found in the cache.


Figure 1.10: Global cache, where the cache is responsible for retrieval



Figure 1.11: Global cache, where request nodes are responsible for retrieval

Most applications that leverage global caches tend to use the first type, where the cache itself manages replacement and fetch data to prevent clients from flooding requests for the same data. However, there are some cases where the second implementation makes more sense. For example, if the cache is used for very large files, a low cache hit rate will cause the buffer cache to become overloaded with cache misses; in this situation it helps to have a large percentage of the total data set (or hot data set) in the cache. Another example is an architecture where files stored in the cache are static and should not be deleted. (This may occur due to underlying performance characteristics regarding such data latency - perhaps certain parts of the data need to be very fast for large data sets - where the application logic understands the replacement strategy or hotspots better than the cache.)

Distributed cache

These indexes are often stored in memory or somewhere very local to the incoming client request. Berkeley DB (BDB) and tree data structures, which are typically used to store data in ordered lists, are ideal for indexed access.

There are often many layers of indexes that act as a map, moving you from one location to another, and so on, until you have the piece of data you need. (Cm. )


Figure 1.17: Multi-level indexes

Indexes can also be used to create multiple other views of the same data. For large data sets, this is a great way to define different filters and views without having to create many additional copies of the data.

For example, let's say that the image hosting system mentioned above actually hosts images of book pages, and the service allows clients to query the text in those images, searching for all text content on a given topic in the same way that search engines allow you to search HTML. content. In this case, all these book images use so many servers to store files, and finding a single page to present to the user can be quite difficult. Initially, reverse indexes for querying arbitrary words and sets of words should be readily available; then there is the task of navigating to the exact page and location in that book and retrieving the correct image for the search results. So in this case, the inverted index would map to the location (such as book B), and then B could contain the index with all the words, locations, and number of occurrences in each part.

An inverted index, which Index1 might display in the diagram above, would look something like this: Each word or set of words serves as an index for those books that contain them.

The intermediate index will look similar, but will only contain the words, location, and information for book B. This layered architecture allows each index to take up less space than if all this information were stored in one large inverted index.

And this is a key point in large-scale systems, because even when compressed, these indexes can be quite large and expensive to store. Let's assume that we have many books from all over the world in this system - 100,000,000 (see blog post "Inside Google Books") - and that each book is only 10 pages long (to simplify calculations) with 250 words per page : This gives us a total of 250 billion words. If we take the average number of characters in a word to be 5, and encode each character with 8 bits (or 1 byte, even though some characters actually take up 2 bytes), thus spending 5 bytes per word, then the index containing each word only once would require storage of more than 1 terabyte. So you can see that indexes that also contain other information, such as word sets, data locations, and usage counts, can grow in size very quickly.

Creating such intermediate indexes and presenting the data in smaller chunks makes the big data problem easier to solve. Data can be distributed across multiple servers and at the same time be quickly accessible. Indexes are the cornerstone of information retrieval and the basis for today's modern search engines. Of course, this section only scratches the surface of the topic of indexing, and there has been a lot of research on how to make indexes smaller, faster, contain more information (such as relevance), and easily updated. (There are some issues with managing competing conditions, as well as the number of updates required to add new data or change existing data, especially when relevance or scoring is involved).

Being able to find your data quickly and easily is important, and indexes are the simplest and most effective tool for achieving this goal.

Load Balancers

Finally, another critical part of any distributed system is the load balancer. Load balancers are a core part of any architecture because their role is to distribute the load among the nodes responsible for serving requests. This allows multiple nodes to transparently serve the same function in the system. (See.) Their main purpose is to handle many simultaneous connections and route those connections to one of the requested nodes, allowing the system to scale by simply adding nodes to serve more requests.


Figure 1.18: Load Balancer

There are many various algorithms to serve requests, including random node selection, round robin, or even node selection based on certain criteria such as CPU or RAM usage. Load balancers can be implemented as hardware devices or software. Among open source software load balancers source code the most widely used is HAProxy.

In a distributed system, load balancers are often located at the "front edge" of the system, so that all incoming requests pass directly through them. It is very likely that in a complex distributed system a request will have to go through multiple balancers, as shown in
.


Figure 1.19: Multiple load balancers

Like proxies, some load balancers can also route requests differently depending on the type of request. They are also known as reverse proxies.

Managing data specific to a particular user session is one of the challenges when using load balancers. On the site ecommerce When you only have one customer, it's very easy to allow users to place items in their cart and save the contents between visits (this is important because the likelihood of an item selling increases significantly if, when the user returns to the site, the product is still in their cart ). However, if a user is directed to one node for the first session, and then to a different node on his next visit, inconsistencies may occur because the new node may not have knowledge of the contents of that user's shopping cart. (Wouldn't you be upset if you put a package of Mountain Dew in your cart and it wasn't there when you came back?) One solution might be to make sessions "sticky" so that the user is always directed to the same node. However, taking advantage of some reliability features, such as automatic failover, will be significantly difficult. In this case, the user's cart will always have content, but if their sticky node becomes inaccessible then a special approach will be needed and the assumption about the contents of the cart will no longer be true (though hopefully this assumption will not be built into the application). Of course, this problem can be solved using other strategies and tools, such as those described in this chapter, such as services, and many others (such as browser caches, cookies, and URL rewriting).

If the system only has a few nodes, then techniques such as DNS carousel are likely to be more practical than load balancers, which can be expensive and add an unnecessary layer of complexity to the system. Of course, in large systems there are all sorts of different scheduling and load balancing algorithms, including something as simple as randomization or carousel, to more complex mechanisms that take into account the performance characteristics of the system's usage pattern. All of these algorithms allow traffic and requests to be distributed, and can provide useful reliability tools such as automatic fault tolerance or automatic removal a damaged node (for example, when it stops responding). However, these advanced features can make diagnosing problems cumbersome. For example, in high-load situations, load balancers will remove nodes that may be running slow or timing out (due to a barrage of requests), which will only make things worse for other nodes. Extensive monitoring is important in these cases because even if overall system traffic and load appear to be decreasing (as nodes are serving fewer requests), individual nodes may be stretched to their limits.

Load balancers are an easy way to increase system capacity. Like the other methods described in this article, it plays an essential role in distributed system architecture. Load balancers also provide the critical function of checking the health of nodes. If, as a result of such a check, a node is not responding or is overloaded, then it can be removed from the request processing pool, and, thanks to the redundancy of your system, the load will be redistributed among the remaining working nodes.

Queues

So far we have looked at many ways to quickly read data. At the same time another important part scaling the data layer is efficient records management. When systems are simple and have minimal processing loads and small databases, writing can be predictably fast. However, in more complex systems this process may take an indefinitely long time. For example, data may have to be written to multiple places on different servers or indexes, or the system may simply be under high load. In cases where writes, or even just any task, take a long time, achieving performance and availability requires building asynchrony into the system. A common way to do this is to organize a request queue.


Figure 1.20: Synchronous request

Imagine a system in which each client requests a remote service task. Each of these clients sends its request to the server, which performs the tasks as quickly as possible and returns their results to the corresponding clients. In small systems where a single server (or logical service) can serve incoming clients as quickly as they arrive, situations of this nature should work fine. However, when the server receives more requests than it can handle, then each client is forced to wait for the other clients' requests to complete processing before a response to its own request is generated. This is an example of a synchronous request, depicted in .

This type of synchronous behavior can significantly degrade client performance; in fact, being idle, the client is forced to wait until it receives a response to the request. Addition additional servers in order to cope with the system load, in fact, does not solve the problem; Even with effective load balancing in place, it is extremely difficult to provide the even and fair load distribution needed to maximize client productivity. Moreover, if the server to process this request is unavailable (or it has crashed), then the client connected to it will also stop working. Effective solution This problem requires an abstraction between the client's request and the actual work done to serve it.


Figure 1.21: Using Queues to Manage Requests

Entry queues. The way the queue works is very simple: a task arrives, gets into the queue, and then the “workers” accept the next task as soon as they have the opportunity to process it. (See.) These tasks may be simple notes into a database or something as complex as generating a preview image for a document. When a client submits task requests to a queue, it no longer needs to wait for execution results; instead, requests only need confirmation that they were properly received. This confirmation can later serve as a reference to the work results when the client requests them.

Queues allow clients to work in an asynchronous manner, providing a strategic abstraction of the client request and response. On the other hand, in a synchronous system, there is no differentiation between the request and the response, and therefore they cannot be managed separately. In an asynchronous system, the client submits a task, the service responds with a message confirming that the task has been received, and the client can then periodically check the status of the task, only requesting the result once it has completed. While the client is making an asynchronous request, it is free to do other work, and even make asynchronous requests from other services. The latter is an example of how queues and messages work in distributed systems.

Queues also provide some protection against service interruptions and failures. For example, it is quite easy to create a very resilient queue that can retry service requests that fail due to momentary server failures. It is preferable to use a queue to enforce quality of service guarantees rather than expose clients to temporary service interruptions, requiring complex and often inconsistent error handling on the client side.

Queues are the basic principle in managing distributed transmission between various parts any large-scale distributed system, and there are many ways to implement them. There are quite a few open source queue implementations like RabbitMQ.
ActiveMQ
BeanstalkD, but some also use services like Add Tags

Distributed automated information systems have now become an everyday reality. Numerous corporate automated information systems use distributed databases. Methods for data distribution and distributed data management, architectural approaches that ensure system scalability, implementing the principles of multi-tier client-server architecture, as well as middle-layer architecture, have been developed.

Mobile architectures are beginning to be put into practice. This applies to both database systems and Web applications.

An approach to building distributed systems is being revived, based on a peer-to-peer architecture (Peer-to-Peer), in which, unlike the client-server architecture dominant in distributed systems today, the roles of the interacting parties in the network are not fixed. They are assigned depending on the situation in the network and the load on its nodes.

Due to intensive development communication technologies Mobile AIS is actively developing. Developed technical means and software for creating them. Thanks to this, mobile database systems began to develop. Many scientific teams conduct research specific features of such systems, a variety of prototypes are created. An essential tool for mobile development software became Java technologies.

Protocol standard created wireless access applications on the Web (Wireless Application Protocol - WAP), which is already supported by some cell phone models. Based on WAP and XML, the W3C consortium developed a markup language for wireless communications, WML (Wireless Markup Language).

In AIS developments, more attention has been paid to metadata. Here steps are being taken in two directions - standardizing the presentation of metadata and ensuring their support in the system.

AIS uses a variety of methods and means of presenting metadata (various types of metadata repositories). The lack of unification in this area greatly complicates the solution of application mobility problems, reuse and integration of information resources and information technologies, as well as AIS reengineering.

To overcome these difficulties, metadata standards focused on various information technologies are being actively developed. In this area, a number of international, national and industrial standards already exist that define the presentation of metadata and the exchange of metadata in AIS. Some of them have already acquired the status of de facto standards. We will limit ourselves here to mentioning only the most significant of them.

Probably the first de facto standard in this category was the CODASYL data description language for network structure databases. More recent standards include: the SQL query language standard for relational databases, which contains the definition of the so-called information schema - a set of representations of relational database schemas; a component of the ODMG object database standard that describes object schema repository interfaces; international standard IRDS (Information Resource Dictionary Systems), which describes systems for creating and maintaining directories of organizational information resources.

Next, we should mention the CWM (Common Warehouse Metamodel) standard developed by the OMG consortium for representing data warehouse metadata, based on the OIM (Open Information Model) standard previously created for broader purposes by the MDC (Meta Data Coalition) consortium.

The new XML technology platform for the Web also includes standards for representing metadata. Metadata support is one of the most important innovations of the Web, radically changing the technology for managing its information resources. While metadata support was inherently necessary in database technologies, Web first generation metadata was not supported.

Web metadata standards include a subset of the XML language used to describe logical structure XML documents of some type. This description is called DTD (Document Type Definition). Additionally, the XML platform includes XML standard Schema, which offers more advanced capabilities for describing XML documents. The RDF (Resource Definition Framework) standard defines a simple knowledge representation language for describing the content of XML documents. Finally, the developing OWL (Ontology Web Language) standard defines a formal ontology description language intended for the semantic Web.

Standard UML language(Unified Modeling Language), which provides a metadata representation of CASE tools for visual object analysis and design, developed by the OMG consortium. This language is supported in many CASE software products. The OMG consortium also created the XMI (XML Metadata Interchange) standard for exchanging metadata between CASE tools using the UML language.

It is also worth mentioning here the Dublin Core (DC) standard - a set of metadata elements for describing the content of documents of various natures. This standard quickly gained popularity and found, in particular, wide application in the Web environment (see Section 3.3).

Work on developing existing and creating new standards for presenting metadata for AIS continues. More detailed information about the standards in question can be found in the encyclopedia.