Architecture of distributed information systems and Web applications. Management of infrastructure services. Web Services Management Standard

Web services- a new word in distributed systems technology. Specification Open Net Environment (ONE) Sun Microsystems Corporation and Initiative. Microsoft's Net provides the infrastructure for writing and deploying Web services. IN currently There are several definitions of a Web service. A Web service can be any application that has access to the Web, such as a Web page with dynamic content. In more in the narrow sense A web service is an application that provides open interface, suitable for use by other applications on the Web. The ONE Sun specification requires that Web services be accessible over HTTP and other Web protocols, to allow information to be exchanged via XML messages, and to be searchable through special services- search services. A special protocol has been developed for access to Web services - Simple Object Access Protocol (SOAP), which introduces XML-based interoperability for many Web services. Web services are particularly attractive because they can provide a high degree of compatibility between different systems.

A hypothetical Web service, designed according to Sun's ONE architecture, could take the form in which a service registry publishes Web description-service in the form of a document .

The enormous potential of Web services is not determined by the technology used to create them. HTTP, XML and other protocols used by Web services are not new. Interoperability and the scalability of Web services means that developers can quickly build larger applications and larger Web services from smaller Web services. The Sun Open Net Environment specification describes an architecture for creating intelligent web services.Intelligent Web services leverage a common operating environment. By sharing context, intelligent Web services can perform standard authentication for financial transactions, provide recommendations and guidance based on the geographic location of the companies involved in the transaction. e-business.

In order to create an application that is a Web service, it is necessary to apply a number of technologies.

The relationship between these technologies is conventionally presented in Fig. 10.1.


Rice. 10.1.

In fact, Web services are one of the implementation options component architecture, in which the application is considered as a collection of components interacting with each other. As has already been said many times, the interaction of components running on different platforms is quite difficult task, in particular, requires development communication protocol, taking into account the features of data transfer between different platforms. One of the main ideas underlying the Web services technology under consideration is the rejection of the binary communication protocol. Messages are exchanged between system components by transmitting XML messages. Since XML messages are text files, the transmission transport protocol can be very different - XML ​​messages can be transmitted via HTTP, SMTP, FTP protocols, and the use of different transport protocols is transparent to applications. As already mentioned, the protocol that allows Web services to interact is called SOAP (Simple Object Access Protocol). It is defined based on XML. SOAP ensures the interaction of distributed systems, regardless of the object model or platform used. Data within SOAP transmitted as XML documents of a special format. SOAP does not impose any specific transport protocol. However, in real applications the transmission is most often implemented SOAP-messages by HTTP protocol. It is also common to use SMTP, FTP and even “pure” TCP as a transport protocol. So, SOAP defines a mechanism by which Web services can call each other's functions. In a sense, the operation of this protocol is reminiscent of a remote procedure call - the caller knows the name of the Web service, the name of its method, the parameters that the method accepts, and formalizes the call to this method in the form SOAP-message and sends it to the Web service.

However, the described approach is only suitable if the “signatures” of the methods that the Web service implements are known in advance. But what if this is not the case? To solve this problem, the Web service model introduced additional layer- layer for describing service interfaces. This layer is presented as a description WSDL.

As defined by the W3C, " WSDL - XML format for description network services as a set of finite operations operating using messages containing document-oriented or procedure-oriented information." Document WSDL fully describes the Web service interface with outside world. It provides information about the services that can be obtained using service methods and how to access these methods. Thus, if the method signature of a Web service is not exactly known (for example, it has changed over time), the target Web service can be queried WSDL-description - the file in which this information will be contained.

The next layer of technology is service Universal Description, Discovery and Integration (UDDI).This technology involves maintaining a registry of Web services. By connecting to this registry, the consumer can find Web services that best suit his needs. Technology UDDI allows searching and publishing the required service, and these operations can be performed either by a person or by another Web service or a special client program. UDDI, in turn, is also a Web service.

Thus, Web services are another implementation of the system middleware. Distinctive feature this technology is its independence from the software used and hardware, as well as the use of widely used open standards(such as XML) and standard communication protocols.

Currently, Web services are a very actively promoted technology and are positioned as a means of solving a number of problems.

It should be noted that using them, so-called “standard” applications can also be built, where the server part is designed as a Web service.

Simple Object Access Protocol (SOAP)

The basic protocol that ensures interaction in a Web services environment is the

“RESEARCH AND DEVELOPMENT OF METHODS FOR BUILDING DISTRIBUTED CAD-CAE SYSTEMS BASED ON TECHNOLOGY...”

As a manuscript

Anisimov Denis Andreevich

RESEARCH AND DEVELOPMENT OF CONSTRUCTION METHODS

DISTRIBUTED AUTOMATED SYSTEMS

DESIGNING BASED ON WEB SERVICE TECHNOLOGY

Specialty: 05.13.12 – Design automation systems

dissertations for an academic degree

candidate of technical sciences

St. Petersburg 2013

The work was carried out at the federal state budgetary educational institution of higher professional education "St. Petersburg State Electrotechnical University "LETI" named after. V. I. Ulyanova (Lenin), Department of Computer-Aided Design Systems

Scientific director– Doctor of Technical Sciences, Professor Dmitrevich Gennady Daniilovich

Official opponents:

Doctor of Technical Sciences, Professor, St. Petersburg State Electrotechnical University "LETI" named after. IN AND.

Ulyanova (Lenin), Department of Automated Information Processing and Control Systems Kutuzov Oleg Ivanovich Candidate of Technical Sciences, Open Joint Stock Company "Concern"

"RESEARCH AND PRODUCTION ASSOCIATION "AURORA",


head of the laboratory Pakhomenkov Yuri Mikhailovich

Leading organization: Federal State Budgetary Educational Institution of Higher Professional Education "St. Petersburg National research university information technology, mechanics and optics"

The defense of the dissertation will take place on May 23, 2013 at 16.30 at a meeting of the dissertation council D212.238.02 of the St. Petersburg State Electrotechnical University "LETI" named after. IN AND.

Ulyanova (Lenin) at the address: 197376, St. Petersburg, st. Professora Popova, no. 5.

The dissertation can be found in the library of St. Petersburg State Technical University. The abstract was sent out “___”__________ 2013.

Scientific Secretary of the Dissertation Council D212.238.02 N. M. Safyannikov

GENERAL DESCRIPTION OF WORK

Relevance research The widespread introduction of computer-aided design systems into the practice of engineering problems is significantly limited by the high cost of licensed software. Along with this, the creation of your own CAD systems is associated with a huge expenditure of resources and cannot be implemented in a short time, since the development of modern CAD systems requires hundreds of man-years. The problem is also complicated by the fact that in real operating situations, multifunctional integrated CAD systems are used, as a rule, extremely ineffectively, since when solving specific tasks Of the main composition of these systems, no more than 10-20% of the software most specific to each department is often used.

The solution to this pressing problem may be the decentralization of CAD architecture through the transition to distributed design systems built on the basis of Internet technologies that implement tasks communication and information exchange between applications.

Such regardless managed applications are autonomous and can interact with each other in the process of performing a common task.

Internet technology protocols provide a reliable basis for linking subsystems and do not require coordinated use of resources located in different network nodes, which significantly simplifies the process of building and operating a distributed CAD system. The main requirement for the possibility of implementing such distributed system is the consistency of the interfaces through which individual subsystems are connected. If this requirement is met, individual distributed CAD components can be created by different developers and maintained at different sites, from where they will be delivered (possibly on a commercial basis) to customers.

The most effective method of combining subsystems into a distributed application should be considered the organization remote call procedures based on service-oriented architecture using web services. Integration based on web services in the development of decentralized CAD systems allows you to move on to the description of interfaces and interactions based on XML, providing the ability to modify and develop the built software while maintaining the selected interface. This allows, due to the loose coupling of individual subsystems, to ensure interaction between services on an arbitrary platform and to adapt existing applications to changing design conditions.

The main burden of performing computational operations with such an architecture falls on web services that solve all problems of modeling the systems being designed; client applications are assigned only the simplest functions of preparing data and displaying modeling results.

When developing CAD software using web services, they can be used following types client applications:

console type application, window type application and web application.

A feature of console applications is the absence GUI, however, their use may be useful when implementing simple CAD systems for pocket computers with a small screen area.

Windowed applications provide the best possible graphical implementation and are best suited for developing distributed systems based on web services. For any web service, it is possible to build several client applications with different ways of implementing dialog interaction.

Web applications provide the ability to place all of your CAD software entirely online. The advantage of applying this structure is open access to using distributed CAD through any type of browser, the disadvantage of this type of application is the increase in time required to describe the components of the designed system due to waiting for a response at individual data entry steps.

For any type of client application, web services are called in the same way, and for each web service it is possible to use any way to implement client applications written in different languages. If necessary, such client applications can be easily modified to suit changing design conditions, and the web service can also be expanded to include additional methods.

Goal of the work and the main objectives of the research This dissertation is devoted to the research and development of methods for constructing platform-independent distributed CAD systems using web services. For a specific implementation, the task of developing a distributed automation system for circuit design was chosen.

To achieve this goal, the following tasks should be solved:

1. Develop a general methodology for building, offline testing, and deploying to a selected Java web services server.

2. Perform research common methods building Java web services software for a distributed circuit design automation system.

3. Research and develop a methodology for building Java web services using data compression technology.

4. Conduct research and development of a general methodology for building templates for client applications of console and window types, as well as client web applications.

5. Develop a methodology for implementing the functioning of web services and client applications in heterogeneous environments.

Research methods When performing the assigned tasks in the dissertation, the basics were used general theory CAD, theory of modeling systems, basic theory of matrices and graphs.

The reliability of scientific results is confirmed by the basic provisions of the general theory of CAD, modeling theory, the correctness of the mathematical apparatus used, and the results obtained from testing the created software for web services and client applications.

New scientific results

1. A service-oriented architecture for distributed CAD using web services is proposed.

2. A general methodology has been developed for the implementation, offline testing, and deployment of Java web services on a distributed CAD server.

3. Researched and developed methods for building Java web services software to solve typical electronic circuit design problems.

5. A general methodology for building console and window client applications, as well as client web applications, has been developed.

6. A methodology has been developed for implementing distributed CAD software for organizing interaction in heterogeneous environments of web services and client applications.

Basic provisions submitted for defense

1. Architecture of distributed service-oriented CAD based on web services.

2. General methodology for bottom-up design of Java web services

3. Methodology for implementing Java web services software based on data compression.

Practical value

1. The proposed distributed CAD structure provides the ability to organize interaction between various web services on the selected platform and adapt applications to changing design conditions.

2. The built library of auxiliary functions based on data compression increases the efficiency of creating Java web services software for circuit design automation systems

3. Developed implementation methodology client-server interaction ensures the operation of distributed CAD systems in heterogeneous environments.

4. The software of the developed distributed automation system for circuit design contains an invariant kernel for organizing both symbolic and numerical stages of Java programs, which can be used as a basis for building design systems for a wide range of objects.

Implementation and implementation of results The distributed CAD system developed in the thesis using web services was implemented in Java using the WTP (Web Tools Platform) platform. The practical result is a platform-independent distributed circuit design CAD system, which performs multivariate modeling of nonlinear circuits in stationary mode, in dynamic mode, to calculate frequency characteristics, and also provides calculation of the sensitivity of transfer functions and the sensitivity of stationary mode variables to parameter variations.

The results of the dissertation work were used in state budget research on the topic “Development of models and methods for analysis and synthesis of intelligent decision support systems for managing complex distributed objects” (code CAD-47 subject plan of St. Petersburg State Economic University 2011) and on the topic “Mathematical and logical foundations of the construction environments of virtual instruments" (code CAD-49 topic plan SPbGETU 2012) The results of the dissertation are introduced into the engineering practice of the scientific-production company "Modem" and are used in the educational process of the CAD department of St. Petersburg State Technical University to study the methodology for constructing software for circuit design automation systems in the preparation of bachelors and master's degree in "Informatics and Computer Science".

Approbation of work The main provisions of the dissertation were reported and discussed at the following conferences:

1. 9th conference of young scientists “Navigation and traffic control”. – St. Petersburg;

2. 5th international conference “Instrument making in ecology and human safety”. – St. Petersburg, SUAI;

3. XIII, XIV, XVII international conferences « Modern education: content, technology, quality." – St. Petersburg, St. Petersburg Electrotechnical University;

4. 60, 61, 63rd scientific and technical conferences of the teaching staff of SETU.

Publications The main theoretical and practical content of the dissertation was published in 16 scientific papers, including 4 articles in leading peer-reviewed publications recommended in the current list of the Higher Attestation Commission, 1 certificate of official registration of a computer program registered with the Federal Service for Intellectual Property, Patents and Trademarks.

Structure and scope of the dissertation The dissertation contains an introduction, four chapters of main content, a conclusion and a bibliography containing 69 sources. The work is presented on 154 pages of text, and contains 21 figures and one table.

In the introduction a justification is given for the relevance of the dissertation topic, the objectives of the research are formulated, and a list of tasks to be solved in the work is given.

In the first chapter The issues of constructing the architecture of distributed applications are considered, which determines the general structure, functions performed and the relationship of individual components of the system.

It is shown that the architecture of a distributed application covers both its structural and behavioral aspects, as well as the rules of integration and use, functionality, flexibility, reliability, performance, reusability, technological limitations, and user interface issues. The main task of integrating autonomous applications (subsystems) into a distributed application is to provide functional connections, providing the required interactions with minimal dependence between subsystems.

The dissertation work shows that such a mechanism is most effectively provided when using an architecture based on interaction between subsystems using remote procedure calls, used for the interchange of data and for performing certain actions. In case an application needs to retrieve or change any information maintained by another application, it accesses it through a function call.

To build distributed CAD systems, the thesis proposes to use a service-oriented architecture (SOA) based on a modular software structure and standardized interfaces. SOA uses unification of the basic operational processes, principles of repeated use of functional elements, organization based on an integration platform. Although the SOA architecture is not associated with any specific remote procedure call technology, software subsystems designed in accordance with SOA are typically implemented as a collection of web services linked using underlying protocols (SOAP, WSDL).

Systems based on service-oriented architecture belong to the class of multi-agent systems (MAS), which are formed by several interacting intelligent agents that ensure autonomy, limited representation and decentralization of individual subsystems of a distributed information computing system.

Web services are based on the XML standard and enable users to interact with external system tools over the Internet, being loosely coupled components of a software system that are available for use through Internet protocols. The dissertation work shows that in the practical implementation of distributed CAD systems using web services, significant attention should be paid to the correct division of functional responsibilities assigned to the main client application and the web service that interacts with this application.

The specific methodology for implementing web services depends significantly on the chosen programming language. The work shows that preference when choosing a programming language for building web services should be given to the Java language, which most fully ensures the platform independence of the implemented solutions. An important factor in favor of this choice is also the availability of powerful tool support for the development of web-based applications in Java, which is provided by the WTP (Web Tools Platform) environment.

The dissertation work carried out a comparative analysis of two main methods for building Java web services - bottom-up (Bottom-Up), when a Java class of a web service is first created, and then a WSDL document is generated based on it, and top-down (Top-Down), when the required WSDL document is first created, and then the web service implementation code is generated based on it. Based on a comparative assessment, it is shown that the design of web services should be performed using a bottom-up method, since in this case the WSDL document is formed based on a Java class created in advance, which describes all the parameters passed to the web service method and the values ​​​​returned by this method. In this case, all the information available in the Java class is automatically converted into the corresponding WSDL document, the content of which exactly corresponds to the basic structure of the WSDL specification and the main characteristics of the called web service method, which ensures the complete reliability of the information contained in the WSDL document.

To make it possible to practically implement the design of web services using a bottom-up method, the dissertation proposes a methodology for constructing a dynamic web project and the Java class for implementing the web service contained in it with a description of the called methods, among which, in addition to the main working methods, must necessarily contain an auxiliary method without arguments, which returns a string variable that contains all the information about the main methods that ensure the functioning of the web service and a description of the formats of the passed parameters, as well as the returned data, which ensures the self-documentation of the web service and the ability to create and continuously improve client applications regardless of the web service developer.

The dissertation provides a technique for organizing a call to a web service information method directly from an integrated Java application development environment, in which, by using the web service URL, you can access the Soap response of the information method and the contents of its return value.

In the second chapter Methods for constructing distributed CAD web services are considered, with the help of which nonlinear circuits are calculated in stationary mode, circuit functions of linear and linearized circuits are calculated for the frequency domain, and nonlinear circuits are calculated in dynamic modes. In addition, the subsystems included in the distributed system include a web service for calculating the sensitivity of circuit functions in the frequency domain, and a web service for calculating the sensitivity of steady-state variables of nonlinear circuits to parameter variations. As components of circuits designed on the basis of developed web services, you can use two-terminal devices of type R, C, L, linear frequency-dependent controlled sources, nonlinear controlled sources, transformers, bipolar and unipolar transistors, operational amplifiers, as well as master current and voltage sources.

The basis for such methods is the general structure of the mathematical description of circuit design systems. The dissertation provides a comparative assessment of possible methods for choosing a coordinate basis to form a description of linearized circuits, with preference given to the extended basis of nodal potentials. It is noted that along with the undoubted advantages, a significant limitation of this basis is the impossibility of mathematically describing the components of the circuit with equations in implicit form, which often makes it difficult, and sometimes impossible, to practical use. To realize the possibility of describing circuit components by equations in implicit form, the work provides a modified version of the extended basis of nodal voltages, which is taken as the basis for constructing web services software.

When considering issues related to the construction of web services for problems of calculating the frequency properties of electronic circuits, it is noted that problems of this type can be divided into two groups. The first group includes problems of calculating linear circuits, the parameters of the components of which have fixed values ​​that do not depend in the process of solving the problem on the coordinate values ​​of the operating points of the components. The second group is associated with the calculation of the frequency characteristics of linearized circuits, the parameters of which depend on the coordinates of the operating points of the components and these coordinates, as well as the values ​​of the corresponding linearized parameters, must be pre-calculated.

To solve the first group of problems of calculating the frequency properties of electronic circuits when performing dissertation work, the ModService_Java web service was built. To be able to work with complex numbers when constructing it, a custom class Complex was created, since at the time of this work such a class is not part of the standard tools Java API. The Complex class contains constructors and helper functions for processing complex data and all the necessary functions for performing arithmetic and logical operations on complex numbers, since Java does not have an override operator for these operations. The web service receives a description of the circuit components and calculation directives as arguments and returns an array describing the results of calculating the frequency characteristics.



To calculate the stationary mode of nonlinear systems, the dissertation proposes a general methodology for constructing software for the corresponding web services, implemented when creating the StaticService_Java web service. The web service also receives as arguments a description of the circuit components and calculation directives and returns an array describing the results of calculating the basis variables and steady-state coordinates for all nonlinear components (diodes, bipolar transistors, unipolar transistors, operational amplifiers, nonlinear controlled sources). The zero element of the returned array is reserved for transferring information to the client side in the event of a lack of convergence of the computational process, which requires changing the calculation directives and re-calling the web service method.

The dissertation examines possible approaches to developing a methodology for constructing web services for calculating the frequency characteristics of linearized circuits, the parameters of which depend on the coordinates of the operating points of the components. As a result of the comparative assessment, a path was chosen to build a web service based on an integrated system, which includes software for the linearization of nonlinear components at the calculated operating points and for the subsequent calculation of the frequency properties of the linearized circuit. The work provides a general methodology for solving such a problem, the implementation of which was carried out in the StFrqService_Java web service. The web service receives as arguments a description of the frequency-dependent and nonlinear components of the circuit, as well as calculation directives, and as a result of its work, an array is returned describing the results of calculating the frequency characteristics. In the same way as in the steady-state calculation, the zero element of the returned array is used to transfer information to the client side in the event that the process does not converge.

When developing a methodology for constructing a web service for calculating the dynamic modes of nonlinear systems, a mathematical description of the circuit in a modified extended basis of nodal potentials is used, which makes it possible to obtain a system of equations of algebraic-differential type in the most general form. Elimination of derivatives from component equations is performed on the basis of correction formulas that follow from multi-step implicit methods of higher orders, while the second-order Gear method is adopted as the main one with the possibility of increasing its order. The components from the equations of which derivatives are excluded are two-terminal networks of type C and L, diodes, transformers, bipolar and unipolar transistors, operational amplifiers, as well as frequency-dependent controlled sources. To calculate the values ​​of autonomous sources that preserve the values ​​of the corresponding variables in the previous steps, auxiliary sampling functions dis_cmp are built for all listed cmp components with frequency-dependent properties.

The developed methodology was implemented in the construction of the Dyn2Service_Java web service, which returns to the client side an array describing the results of calculating dynamic characteristics.

In the third chapter The issues of building web services using data compression methods are considered. The relevance of these issues is determined by the fact that the structure real systems characterized by a weak connection between the components, which results in their mathematical description in the form of sparse type matrices, in which only a small part of the elements have meaningful information.

This circumstance poses the task of changing generally accepted approaches to the formation and solution of equations in order to save memory and increase performance, which is crucial for the functioning of web-based systems.

The dissertation work analyzed the effectiveness of possible methods for converting data into compact arrays, on the basis of which a conclusion was made about the advisability of choosing a method based on the use of Sherman compression and requiring a two-stage procedure for performing symbolic and numerical data processing. A significant advantage of the adopted two-stage procedure is its division into two independent parts of the symbolic and numerical stages. Since almost all real circuit design problems are associated with multivariate calculation of a circuit of an unchanged structure, the symbolic stage is performed for each structure only once, while the numerical stage is implemented tens, hundreds, and sometimes thousands of times.

However, the two-stage procedure is characterized by a rather complex logic for constructing the program code, and when moving to a description based on data compression, a significant change to the previously created complete description of the problem is required.

The dissertation examines a block diagram of the implementation of two-stage data processing when building Java applications, according to which, at the stage of symbolic analysis, an index matrix of an integer type is formed, for which a symbolic stage of LU factorization is carried out, where rows (columns) are ordered in order to minimize the number again appearing elements with non-zero values. At the final step of the symbolic stage, coordinate matrices are constructed, which contain information about the structure of the index matrix, as a result of which this matrix can be deleted.

At the numerical stage, in accordance with the known description format, compact matrices are formed and their virtual numerical LU factorization is performed based on the algorithm constructed in the work. After completing the numerical stage of LU factorization, all system variables are calculated and recoded according to the permutations of rows (columns) carried out at the symbolic processing stage. This task, in accordance with the general LU factorization technique, is usually solved using backward and forward moves through the rows of the original matrix, but since there is no complete matrix when using data compression, both forward and backward moves are performed using special algorithms, implementing these tasks using data compression.

The thesis shows that two different approaches to developing web services software based on data compression are possible. The first is associated with the processing of existing software, based on a complete mathematical description in the form of initial matrices with a sparse structure, in order to construct modified method, using compact arrays. Having a prototype greatly simplifies the process of creating a method based on data compression, but to make the most effective use of the available material, it is necessary to have a development methodology at your disposal modified versions web services. This technique was built in the dissertation and on its basis all the web services discussed above were modified. The result is a web services framework containing two main working methods, one based on the full description of the modeled circuit, and the second using compact data processing technology.

The second approach is used in cases where there is no prototype to develop a method based on data compression. In this case, both the symbolic and numerical stages are implemented in the absence of a complete description of the simulated circuit in the form of a sparse matrix, which significantly complicates the programming process. In the dissertation, the second approach is used to build web services that calculate the sensitivity of circuit transmissions and stationary mode variables of circuits to variations in the parameters of their components.

To calculate the sensitivity of the frequency characteristics of circuit functions, the VaryService web service was built, which contains a method based on differentiation of equations and a method based on connected circuits.

Based on the differentiation of equations, the method of the VaryService web service allows you to calculate the values ​​of the absolute and relative vector sensitivity of circuit functions for the frequency domain to the selected variable parameter for the entire set of basic variables. Variable parameters can be the values ​​of resistance, capacitance, or inductance of an arbitrary two-terminal circuit of type R, C or L, and transmission parameters of controlled frequency-dependent sources such as ITUN, INUN, ITUT or INUT.

The VaryService web service method, using attached circuits, allows you to calculate the values ​​of both absolute and relative scalar sensitivity of circuit functions for the frequency domain with respect to all possible variable parameters for the selected value of the analyzed variable. The software block diagram proposed in this work allows you to use the results of the formation of compact arrays of the main circuit to calculate the attached circuit. The variable parameters in the method based on the adjoint circuit can be the same parameters as for the method based on the differentiation of equations.

To calculate the sensitivity of variables that define the stationary mode of nonlinear circuits to variations in their parameters, the StVaryService web service has been developed, which also contains two methods, one of which is based on differentiation of equations, and the second on an attached circuit. The variable parameters in both methods can be resistor resistance values ​​and transmission parameters of controlled sources such as ITUN, INUN, ITUT or INUT.

The algorithm for calculating the absolute sensitivity of the basic variables of the stationary mode by the method of differentiation of equations provides for the differentiation of the nonlinear equation of the circuit by the basic variables and variable parameters, which makes it possible to obtain a sensitivity equation, the solution of which determines the desired vector sensitivity of the variables of the stationary mode.

The practical implementation of the method is based on differentiating the equations of the varied components using the results of calculating the basic variables of the stationary mode of the nonlinear circuit.

The algorithm of the method that calculates the scalar sensitivity of stationary mode variables using an adjoined circuit provides for the calculation of the basic variables of the stationary mode of the main circuit and the calculation of the basic variables of the linearized connected circuit, which is performed on the basis of previously generated compact arrays for the main circuit. The result of the second method is an array of absolute and relative sensitivity values ​​of the selected circuit variable for all variable component parameters.

The fourth chapter discusses methods for building custom client applications that provide interaction with web services, which, after they are built in a Java application development tool, must be deployed on a distributed CAD server. To deploy a web service, you need to know its main characteristics, including the service name, class name, method names, and WSDL document type.

Relevant information about the web services developed and described above for a distributed circuit design system is given in the thesis, and in the information methods named getInf, which are included in all developed web services. The paper proposes a simple technique for directly deploying web services on a server, and discusses possible ways importing the WSDL file to the client side. Based on a comparative analysis, the work shows that the correctness of the operation of delivering a WSDL file from a remote web service to a client application can be most effectively ensured by using the Web Services Explorer tool, and the most optimal sequence for importing a WSDL file into the initial framework of the client application is established.

Once the WSDL file has been delivered to the client application project, further transformation of the initial project skeleton into a complete client application is proposed to be carried out in two stages. The first stage of this transformation is the creation of a proxy object in the initial project framework, and the second stage is the formation of classes containing methods that support the functioning of the proxy object and the interaction of the remote service with the client application. The implementation of the first stage comes down to supplementing the project with operators creating a proxy object; the second stage is carried out using the Web Services tool of the WTP platform, the most effective ways the uses of which are given in the dissertation.

The finalization of the initial project skeleton into a completed client application can be done in various ways For different types this application. The simplest way to run client applications is to design them based on console applications that do not have a graphical interface. The work proposes a generalized structural scheme implementation of a distributed CAD console application for the calculation of electronic circuits, which can be used for any web service despite the variety of possible specific solutions.

During the thesis, client console applications were implemented for all of the above distributed CAD web services. Their source files can be delivered by standard means via the Internet to the client and used both to build the simplest version of a console application for interacting with distributed CAD services, and as methodological manual to develop better windowed applications.

Windowed client applications provide the greatest opportunities for functioning in distributed CAD, since they make it possible to make the most of graphic elements. For this web service, you can build various versions of client applications with different ways of organizing a dialogue with the user and displaying calculation results. The work has established a minimum set of dialog tools that ensure full interaction with the service. Such a set contains a window menu and a set dialog boxes for entering data and displaying calculation results, as well as managing calculation directives.

The dissertation proposes a methodology for building client web applications, on the basis of which JSP page templates have been developed that perform the functions of selecting and moving to other pages, entering a certain set of variables with moving to home page, cyclically entering a certain set of variables with a transition to the next page according to the values ​​of these variables, calling the required web service and outputting the results of its work. The use of client web applications allows you to place the entire program code of a distributed system on the network, and, depending on the adopted method of placing the client and server applications, the service can be called from either one or several web pages.

A positive feature of this architecture is the ability to organize access to distributed CAD through an arbitrary web browser without the need to build your own client application; the disadvantage of this approach to organizing distributed CAD is the inevitable increase in the time interval required to describe system components in the process of interactive interaction.

The dissertation builds a methodology for organizing the process of deploying client applications, the purpose of which is to ensure the ability to launch a client application using system-wide tools without using an application development tool, while both for console-type client applications and for window-type client applications, the launch should be carried out through the command line , and for web applications - from the browser. Information about the location of the web service is transmitted through a proxy class object, which must first be configured to the appropriate URL.

The work notes that when deploying a console application, the project code page must first be changed, for which it is necessary to move from a code page in which Cyrillic text is displayed in all integrated tool systems working with Java code, to a code page in which Cyrillic text will be displayed normally in the command line window.

The dissertation shows that when using client web applications, depending on the chosen communication structure between the client and server computers, information about the location of the web service can be transmitted both through a proxy class object and through a URL entered from the browser. The corresponding methods of client interaction with the service when implementing their functional tasks for the selected communication structure are also discussed in the work.

SOAP standardization provides the ability to interconnect loosely coupled applications regardless of their implementation platform, which allows for efficient and optimal use a wide range of heterogeneous, loosely coupled resources in distributed applications. The dissertation provides a general methodology for building software for interacting an object of the proxy class of a .NET application with a service of the Java/J2EE environment. Based on this methodology, the organization of interaction between the developed Java web services and Windows client applications built in the .NET environment based on the C# language was implemented.

The ability to operate distributed CAD in heterogeneous environments significantly expands the scope of its application.

In custody the main scientific and practical results obtained on the basis of the research carried out in the dissertation are formulated.

Main results work

1. An architecture for distributed service-oriented CAD based on web services has been developed, characterized by a decentralized structure, platform independence and the ability to carry out continuous modernization of individual subsystems to adapt their properties to changing design conditions.

2. A general methodology has been implemented for constructing Java web services and corresponding WSDL documents using a bottom-up method, as well as delivering them to a distributed CAD server after offline testing in the development environment.

3. A methodology has been developed for building Java web services software to solve typical modeling problems continuous systems in automated design of electronic circuits.

4. A library of auxiliary functions has been built to implement Java web services software based on data compression.

5. A general methodology for constructing templates for console and window client applications for a distributed circuit design automation system has been developed and the organization of the functioning of a distributed CAD system with client web applications has been implemented.

6. A methodology has been developed for constructing distributed CAD systems that ensures the interaction of Java web services and client applications of any type in heterogeneous environments.

1. Anisimov D.A. Construction of computer-aided design systems based on Web services [Text] / Anisimov D.A. Gridin V.N., Dmitrevich G.D. // Automation in industry – 2011. – No. 1 – pp. 9-12.

2. Anisimov D.A. Construction of computer-aided design systems based on Web technologies [Text] / Gridin V.N., Dmitrevich G.D., Anisimov D.A. // Information technologies - 2011. - No. 5. – pp. 23-27.

3. Anisimov D.A. Construction of web services for circuit design automation systems [Text] / Gridin V.N., Dmitrevich G.D., Anisimov D.A // Information technologies and computing systems - 2012. - No. 4. – P. 79-84.

4. Anisimov D.A. Methods for building automation systems for circuit design based on web services [Text] / Anisimov D.A // News of St. Petersburg Electrotechnical University "LETI" - 2012. - No. 10. – St. Petersburg: Publishing house of St. Petersburg Electrotechnical University “LETI”, – pp. 56-61.

5. Anisimov D.A. Access to Web resources in CAD navigation and control systems [Text] / Laristov D.A., Anisimov D.A. // Gyroscopy and navigation. 2007. No. 2. –P. 106.

Architecture of distributed information systems and Web applications

Distributed system is a set of independent computers, which appears to their users as a single unified system. Despite the fact that all computers are autonomous, for users they appear to be a single system.

The main characteristics of distributed systems:

1. The differences between computers and methods of communication between them are hidden from users. The same applies to the external organization of distributed systems.

2. Users and applications experience a consistent experience across distributed systems, no matter where or when they interact.

Distributed systems should also be relatively easy to expand, or scale. This characteristic is a direct consequence of having independent computers, but at the same time does not indicate how these computers are actually combined into a single system.

To maintain a unified view of the system, distributed systems often include an additional layer of software that sits between top level, where users and applications reside, and the lower level, consisting of operating systems (Figure 1.11).

Accordingly, such a distributed system is usually called intermediate level system (middleware). Note that the intermediate layer is distributed among many computers.

Features of the functioning of distributed systems include:

· Availability large quantity objects;

· request execution delays (for example, if local calls require about a couple of hundred nanoseconds, then requests to an object in distributed systems require from 0.1 to 10 ms);

· some objects may not be used for a long time;

· distributed components are executed in parallel, which leads to the need to coordinate execution;

· requests in distributed systems have a high probability of failure;

· increased safety requirements.

Due to the presence of increased latency, interfaces in a distributed system must be designed to reduce query execution time. This can be achieved by reducing the frequency of access, as well as by enlarging the functions performed.

To combat failures, clients are required to check whether requests are being executed by the server. Security in distributed applications can be increased by monitoring communication sessions (authentication, authorization, data encryption).

The architecture of Web applications (Web services) is widely used nowadays. Web service is an application accessible via the Internet. It provides services, the form of which does not depend on the service provider, since it uses a universal operating platform and universal format data (XML). Web services are based on standards that define the formats and language of requests, as well as protocols for searching these services on the Internet. The scheme for accessing the database via the Internet is shown in Fig. 1.12.


Figure 1.12 – Scheme of access to the DBMS server via the Internet

There are currently three various technologies, supporting the concept of distributed object systems: EJB, DCOM CORBA.

The main idea behind the development of EJB technology ( Enterprise Java Beans) - create an infrastructure for components so that they can be easily inserted and removed from servers, thereby increasing or decreasing the functionality of the server. EJB components are Java classes and can run on any EJB-compatible server, even without recompilation. The main goals of EJB technology are:

1. Make it easier for developers to create applications by relieving them of the need to implement services such as transactions, threads, loads, etc. from scratch. Developers can concentrate their attention on describing the logic of their applications, shifting the tasks of storing, transferring and security of data to the EJB system .

2. Describe the main structures of the EJB system and the interfaces for interaction between its components.

3. Free the developer from implementing EJB objects due to the presence of a special code generator.

Thanks to the Java model used, EJB is a relatively simple and fast way to create distributed systems.

DCOM technology ( Distributed Component Object Model) is a software architecture developed by Microsoft for distributing applications across multiple computers on a network. Software component one computer can use DCOM to pass messages to a component on another computer. DCOM automatically establishes a connection, transmits a message, and returns a response from the remote component. DCOM's ability to interconnect components allowed Microsoft to provide Windows with a number of additional capabilities, in particular, the implementation of the Microsoft Transaction Server, which is responsible for executing database transactions over the Internet.

Let's call service a resource that implements a business function and has the following properties:

    is reusable;

    defined by one or more explicit technology-independent interfaces;

    is loosely coupled to other similar resources and can be invoked through communication protocols that allow resources to interact with each other.

Web service called a software system identified by the string URI, whose interfaces and bindings are defined and described by XML. The description of this software system can be found by other software systems that can interact with it according to this description through XML-based messages transmitted using Internet protocols.

1.1 Web Services Basics

Web services is a new promising architecture that provides a new level of distribution. Instead of developing or purchasing components and embedding them into an IS, it is proposed to buy their operating time and form a software system that makes method calls from components that are owned and supported by independent providers. Thanks to Web services the functions of any program on the network can be made available via the Internet. The simplest example Web service- system Passport on Hotmail, which allows you to create user authentication on your own site.

Web services are based on the following universal technologies:

    TCP/IPuniversal protocol, understood by all network devices, from mainframes to mobile phones and PDAs;

    HTMLuniversal language markup used to display information on user devices;

    XML(Extensible Markup Language) – a universal language for working with any type of data.

The versatility of these technologies is the basis for understanding web services. They are based only on generally accepted, open and formally vendor-neutral technologies. Only through this can the main advantage of web services be achieved as a concept for building distributed IS - their universality, i.e. the ability to be used for any operating systems, programming languages, application servers, etc.

Thus, web services solve the original problem - the problem of integrating applications of various natures and building distributed IS. This is the main fundamental difference between web services and their predecessors.

Web services - This XML applications, linking data with programs, objects, databases, or entire business transactions. XML documents formatted as messages are exchanged between the web service and the program. Web service standards define the format of such messages, the interface to which the message is sent, the rules for binding the message content to the application implementing the service and vice versa, as well as the mechanisms for publishing and searching for interfaces.

XML(EnglisheX tensible M arkup L anguage- expandable markup language; pronounced [ x-em-el]). Recommended World Wide Web Consortium(W3C). The XML specification describes XML documents and partially describes the behavior of XML processors (programs that read XML documents and provide access to their content). XML was designed as a language with a simple formal syntax, convenient for creation and document processing programs and at the same time convenient for humans to read and create documents, emphasizing the focus on use on the Internet. The language is called extensible because it does not fix the markup used in the documents: the developer is free to create markup according to the needs of a particular domain, limited only by the syntactic rules of the language. Combination of simple formal syntax, human friendliness, extensibility, and encoding based Unicode to present the contents of the documents led to widespread use both XML itself and many derivative specialized languages ​​based on XML in a wide variety of software.

Standard XML Applications

XML can be used for more than just describing a single document. An individual, company, or standards committee can define the required set of XML elements and document structure to be used for a particular class of documents. A similar set of elements and description of the document structure is called XML application or XML dictionary. For example, an organization might define an XML application to create documents describing molecular structures, multimedia presentations, or containing vector graphics.

Web services can be used in many applications. Regardless of whether web services run from customers' desktops or laptops, they can be used to access Internet applications such as pre-ordering or order tracking.

Web services suitable for B2B integration (business-to-business), closing applications performed by different organizations into one production process. Web services can also address the broader problem of enterprise application integration (Enterprise Application Integration, EAI), linking multiple applications from one enterprise to multiple other applications. In all of these cases, web services technologies are the “glue” that connects the various pieces of software.

As can be seen from Fig. 1, Web services are a wrapper that provides a standard way of interacting with application software environments such as database management systems (DBMS), .NET, J2EE (Java2 Platform, Enterprise Edition), CORBA (Common Object Request Broker Architecture), resellers of enterprise resource planning packages ( Enterprise Resource Planning, ERP), integration brokers, etc.

Fig.1. Web services interact with application systems

Web service interfaces are obtained from the network environment standard XML messages, transform XML data into a format “understood” by a specific application software system, and send a response message (the latter is optional). Software implementation of web services (base software, Lower level) can be created in any programming language using any operating system and any middleware ( middleware).

A simple example: searching for information

Currently, most services are called over the Internet by entering data into HTML forms and sending this data to the service by adding it to the Uniform Resource Locator string ( Uniform Resource Locator, URL):

http://www.google.com/search?q=Skate+boots&btnG=Google+Search

This example illustrates the simplicity of web interactions (such as searching, buying stocks, or requesting driving directions) where parameters and keywords are embedded directly in the URL. IN in this case a simple search request for skate boots (boots with skates) is presented in the query string to the Google search engine. The search keyword represents the service that will be accessed, and the Skate+boots parameter is the search string that was entered into the HTML form on the Google website page. Google search will pass this request to various search engines, which will return a list of URLs for pages that match the search parameter Skate+boots. This ineffective method of searching the Internet is entirely based on matching the specified text string with indexed HTML pages.

XML - The best way sending data. XML provides significant advantages when transmitting data over the Internet. Now the previous query can be represented as XML document:

xmlns:s="www.xmlbus.com/SearchService">

Skate

boots

size 7.5

Sending a request in the form XML document has the following advantages: the ability to define data types and structures, greater flexibility and extensibility. XML can represent structured data or data of a specific type (for example, it is acceptable to specify the value of the size field as a string of numbers or as a floating point number) and contain more information than a URL allows.

This example is presented in the form of a SOAP (Simple Object Access Protocol) message, a standard form of XML message exchange that is one of the technologies underlying web services. In a SOAP message, the name of the requested service and the input parameters are represented as separate XML elements. This example also illustrates the use of XML namespaces (xmlns:), another important element of web services. Because XML documents support multiple data types, complex structures, and schema aggregation, modern web services technologies provide significant advantages over existing capabilities for accessing software applications through HTML and URLs.

Open source software has become a core building block in the creation of some of the world's largest websites. With the growth of these websites, best practices and guidelines for their architecture have emerged. This chapter aims to cover some of the key issues to consider when designing large websites, as well as some of the basic components used to achieve these goals.

The focus of this chapter is on the analysis of web-based systems, although some of the material may be extrapolated to other distributed systems.

1.1 Principles of building distributed web systems

What exactly does it mean to create and manage a scalable website or application? At a primitive level, this is simply connecting users to remote resources through the Internet. And resources or access to these resources, which are distributed over many servers and are the link that ensures the scalability of the website.

Like most things in life, time spent upfront planning the build of a web service can help later; Understanding some of the considerations and trade-offs behind large websites can yield smarter decisions when building smaller websites. Below are some key principles that influence the design of large-scale web systems:

  • Availability: Website uptime is critical to the reputation and functionality of many companies. For some larger online retailers, being unavailable for even a few minutes can result in thousands or millions of dollars in lost revenue. Thus, developing their systems to be always available and resilient to failure is both a fundamental business and technology requirement. High availability in distributed systems requires careful consideration of redundancy for key components, quick recovery after partial system failures and a smooth reduction in capabilities when problems arise.
  • Performance: Website performance has become an important metric for most websites. Website speed impacts user experience and satisfaction, as well as search engine rankings—a factor that directly impacts audience retention and revenue. As a result, the key is to create a system that is optimized for fast responses and low latency.
  • Reliability: the system must be reliable such that a given data request consistently returns specified data. In case of data change or update, the same request should return new data. Users need to know that if something is recorded or stored in the system, they can be confident that it will remain in place for later retrieval.
  • Scalability: When it comes to any large distributed system, size is just one item on a list that needs to be considered. Equally important are efforts to increase the throughput to handle large volumes of workload, which is usually referred to as system scalability. Scalability can refer to various parameters of a system: the amount of additional traffic it can handle, how easy it is to add storage capacity, or how much more other transactions can be processed.
  • Controllability: Designing a system that is easy to operate is another important factor. System manageability equates to the scalability of “maintenance” and “update” operations. To ensure manageability, it is necessary to consider the ease of diagnosing and understanding emerging problems, the ease of updating or modifying, and the ease of use of the system. (That is, does it work as expected without failures or exceptions?)
  • Price: Cost is an important factor. This can obviously include hardware and software costs, but it is also important to consider other aspects needed to deploy and maintain the system. The amount of developer time required to build the system, the amount of operational effort required to get the system up and running, and even the appropriate level of training must all be provided for. Cost represents the total cost of ownership.

Each of these principles provides a basis for decision-making in the design of a distributed web architecture. However, they can also be in conflict with each other because achieving the goals of one comes at the expense of neglecting the other. A simple example: choosing to simply add multiple servers as a performance (scalability) solution can increase the cost of management (you have to run an additional server) and server purchases.

When developing any kind of web application, it is important to consider these key principles, even if it is to confirm that the project can sacrifice one or more of them.

1.2 Basics

When considering system architecture, there are several issues that need to be addressed, such as which components are worth using, how they fit together, and what trade-offs can be made. Investing money in scaling without a clear need for it is not a smart business decision. However, some forethought in planning can save significant time and resources in the future.

This section focuses on some basic factors that are critical to almost all large web applications: Services,
redundancy, segmentation, And failure handling. Each of these factors involves choices and trade-offs, especially in the context of the principles described in the previous section. To clarify, let's give an example.

Example: Image Hosting Application

You've probably posted images online before. For large sites that store and deliver many images, there are challenges in creating a cost-effective, highly reliable architecture that has low response latencies (fast retrieval).

Imagine a system where users have the ability to upload their images to a central server, and where images can be requested through a site link or API, similar to Flickr or Picasa. To simplify the description, let's assume that this application has two main tasks: the ability to upload (write) images to the server and request images. Of course, efficient loading is important criterion, however, priority will be given to prompt delivery as requested by users (for example, images may be requested for display on a web page or by another application). This functionality is similar to what a web server or Content Delivery Network (CDN) edge server can provide. A CDN server typically stores data objects in multiple locations, thus bringing them geographically/physically closer to users, resulting in improved performance.

Other important aspects of the system:

  • The number of images stored can be unlimited, so storage scalability must be considered from this point of view.
  • There should be low latency for image downloads/requests.
  • If a user uploads an image to the server, its data must always remain intact and accessible.
  • The system must be easy to maintain (manageability).
  • Since image hosting does not generate much profit, the system must be cost-effective.

Another potential problem with this design is that a web server such as Apache or lighttpd usually has an upper limit on the number of simultaneous connections it is able to service (the default is approximately 500, but it can be much higher) and with high traffic, recordings can quickly use up this limit. Since reads can be asynchronous or take advantage of other performance optimizations like gzip compression or chunking, the web server can switch feed reads faster and switch between clients while serving many more requests than the maximum number of connections (with Apache And maximum number connections set to 500, it is quite possible to serve several thousand read requests per second). Records, on the other hand, tend to keep the connection open for the entire duration of the download. Thus, transferring a 1 MB file to the server could take more than 1 second on most home networks, resulting in the web server only being able to process 500 of these simultaneous entries.


Figure 1.2: Read/Write Separation

Anticipating this potential problem suggests the need to separate image reading and writing into independent services, shown in . This will allow us to not only scale each of them individually (since it's likely that we will always be doing more reads than writes), but also to be aware of what is happening in each service. Finally, it will differentiate problems that may arise in the future, making it easier to diagnose and evaluate the problem of slow read access.

The advantage of this approach is that we are able to solve problems independently of each other - without having to worry about recording and retrieving new images in the same context. Both of these services still use the global image corpus, but by using service-specific techniques, they are able to optimize their own performance (for example, by queuing requests, or caching popular images- this will be discussed in more detail later). From both a service and cost perspective, each service can be scaled independently as needed. And this is a positive thing, since combining and mixing them could inadvertently affect their performance, as in the scenario described above.

Of course, the above model will work optimally if there are two different endpoints (in fact, this is very similar to several implementations of cloud storage providers and Content Delivery Networks). There are many ways to solve such problems, and in each case a compromise can be found.

For example, Flickr solves this read-write problem by distributing users among different modules so that each module can only serve a limited number of users. certain users, and as the number of users increases, more pods are added to the cluster (see the Flickr scaling presentation,
http://mysqldba.blogspot.com/2008/04/mysql-uc-2007-presentation-file.html). In the first example, it is easier to scale the hardware based on the actual usage load (the number of reads and writes in the entire system), whereas Flickr scales based on the user base (however, this assumes equal usage across users, so capacity needs to be planned according to stock). In the past, an unavailability or problem with one of the services would break the functionality of the entire system (for example, no one could write files), then the unavailability of one of the Flickr modules would only affect the users associated with it. In the first example, it is easier to perform operations on the entire data set - for example, updating the recording service to include new metadata, or searching all image metadata - whereas with the Flickr architecture, each module had to be updated or searched (or the search service had to be created to sort the metadata that is actually intended for this purpose).

As for these systems, there is no panacea, but you should always proceed from the principles described at the beginning of this chapter: determine system needs (read or write load or both, level of parallelism, queries on data sets, ranges, sorting, etc.), conduct comparative benchmark testing of various alternatives, understand potential system failure conditions, and develop a comprehensive plan should failure occur.

Redundancy

To gracefully handle failure, the web architecture must have redundancy in its services and data. For example, if there is only one copy of a file stored on a single server, the loss of that server will mean the loss of the file. Hardly similar situation can be described positively and can usually be avoided by creating multiple or backup copies.

This same principle applies to services. You can protect against single node failure by providing an integral part of the functionality for the application to ensure that multiple copies or versions of it can run simultaneously.

Creating redundancy in the system allows you to get rid of weak points and provide backup or redundant functionality in case of an emergency. For example, if there are two instances of the same service running in production, and one of them fails completely or partially, the system can overcome the failure by switching to a working copy.
Switching may occur automatically or require manual intervention.

.

Other key role service redundancy - creation architecture that does not provide for resource sharing. With this architecture, each node is able to operate independently and, moreover, in the absence of a central “brain” managing the states or coordinating the actions of other nodes. It promotes scalability since adding new nodes does not require special conditions or knowledge. Most importantly, there is no critical point of failure in these systems, making them much more resilient to failure.

.

For example, in our image server application, all the images would have redundant copies somewhere in another piece of hardware (ideally in a different geographic location in the event of a disaster such as an earthquake or fire in the data center), and the services would access the images will be redundant, given that all of them will potentially serve requests. (Cm. .)
Looking ahead, load balancers are a great way to make this possible, but more on that below.


Figure 1.3: Redundant Image Hosting Application

Segmentation

Data sets can be so large that they cannot be accommodated on a single server. It may also happen that computational operations require too large computer resources, reducing productivity and making necessary increase power. In any case, you have two options: vertical or horizontal scaling.

Vertical scaling involves adding more resources to a single server. So, for a very large data set, this would mean adding more (or more) hard drives, and thus the entire data set could be hosted on one server. In the case of computing operations, this would mean moving the calculations to a larger server with a faster CPU or more memory. In any case, vertical scaling is done in order to make a single computing system resource capable of additional data processing.

Horizontal scaling, on the other hand, involves adding more nodes. In the case of a large data set, this would mean adding a second server to store part of the total data volume, and for a computing resource, this would mean dividing the work or load across some additional nodes. To take full advantage of the potential horizontal scaling, it must be implemented as an internal principle of system architecture development. Otherwise, changing and isolating the context needed for horizontal scaling can be problematic.

The most common method of horizontal scaling is to divide services into segments or modules. They can be distributed in such a way that each logical set functionality will work separately. This can be done by geographic boundaries, or other criteria such as paying and non-paying users. The advantage of these schemes is that they provide a service or data store with enhanced functionality.

In our image server example, it is possible that the single file server used to store the image could be replaced by multiple file servers, each containing its own unique set of images. (See.) This architecture would allow the system to fill each file server with images, adding additional servers as it becomes full. disk space. The design will require a naming scheme that associates the name of the image file with the server containing it. The image name can be generated from a consistent hashing scheme tied to the servers. Or alternatively, each image could have an incremental ID, which would allow the delivery service, when requesting an image, to only process the range of IDs associated with each server (as an index).


Figure 1.4: Image hosting application with redundancy and segmentation

Of course, there are difficulties in distributing data or functionality across many servers. One of key issues - data location; in distributed systems, the closer the data is to the point of operation or computation, the better performance systems. Consequently, distributing data across multiple servers is potentially problematic, since at any time the data may be needed, there is a risk that it may not be available at the location required, the server will have to perform a costly retrieval of the necessary information over the network.

Another potential problem arises in the form
inconsistency (inconsistency).When various services reads from and writes to a shared resource, potentially another service or data store, it is possible for a race condition to occur - where some data is thought to be updated to the latest state, but is actually read before it is updated - in which case the data is inconsistent. For example, in an image hosting scenario, a race condition might occur if one client sent a request to update an image of a dog, changing the title "Dog" to "Gizmo", while another client was reading the image. In such a situation, it is unclear which title, "Dog" or "Gizmo", would have been received by the second client.

.

There are, of course, some obstacles associated with data segmentation, but segmentation allows you to isolate each problem from the others: by data, by load, by usage patterns, etc. into managed blocks. This can help with scalability and manageability, but there is still risk. There are many ways to reduce risk and handle failures; however, in the interests of brevity they are not covered in this chapter. If you want more information on this topic, you should take a look at the blog post on fault tolerance and monitoring.

1.3. Building blocks for fast and scalable data access

Having looked at some basic principles in distributed system development, let's now move on to a more complex issue - scaling data access.

The simplest web applications, such as LAMP stack applications, are similar to the image in .


Figure 1.5: Simple web applications

As an application grows, two main challenges arise: scaling access to the application server and to the database. In a highly scalable application design, the web or application server is typically minimized and often implements a resource-sharing architecture. This makes the application server layer of the system horizontally scalable. As a result of this design, the hard work will move down the stack to the database server and support services; This layer is where the real scaling and performance issues come into play.

The rest of this chapter covers some of the most common strategies and techniques for improving the performance and scalability of these types of services by providing fast access to data.


Figure 1.6: Simplified web application

Most systems can be simplified to a circuit in
which is a good place to start looking. If you have a lot of data, it can be assumed that you want to have the same easy access and fast access like the box of candy in your top desk drawer. Although this comparison is overly simplified, it points to two complex problems: data storage scalability and fast data access.

For the purposes of this section, let's assume that you have many terabytes (TB) of data, and you allow users to access small portions of that data in random order. (Cm. .)
A similar task is to determine the location of an image file somewhere on a file server in a sample image hosting application.


Figure 1.7: Access to specific data

This is especially difficult because loading terabytes of data into memory can be very expensive and directly impacts disk I/O. The speed of reading from a disk is several times lower than the speed of reading from RAM - we can say that accessing memory is as fast as Chuck Norris, while accessing a disk is slower than the queue at the clinic. This difference in speed is especially noticeable for large sets data; In raw numbers, memory access is 6 times faster than disk reads for sequential reads, and 100,000 times faster for random reads (see Pathologies of Big Data, http://queue.acm.org/detail. cfm?id=1563874).). Moreover, even with unique identifiers, solving the problem of finding the location of a small piece of data can be as difficult as trying to pick the last chocolate-filled candy out of a box of hundreds of other candies without looking.

Fortunately, there are many approaches that can be taken to simplify things, with the four most important approaches being the use of caches, proxies, indexes, and load balancers. The remainder of this section discusses how each of these concepts can be used to make data access much faster.

Caches

Caching benefits from a characteristic of the underlying principle: recently requested data is likely to be needed again. Caches are used at almost every level of computing: hardware, OS, web browsers, web applications and more. A cache is like short-term memory: limited in size, but faster than the original data source and containing items that have been recently accessed. Caches can exist at all levels in the architecture, but are often found at the level closest to the front end, where they are implemented to return data quickly without significant backend load.

So how can a cache be used to speed up data access within our example API? In this case, there are several suitable cache locations. One possible placement option is to select nodes at the query level, as shown in
.


Figure 1.8: Cache placement on a query-level node

Placing the cache directly on the request-level node allows local storage of response data. Each time a service request is made, the node will quickly return local, cached data, if it exists. If it is not in the cache, the request node will request the data from disk. The cache on one query-level node could also be located both in memory (which is very fast) and on local disk node (faster than trying to access network storage).


Figure 1.9: Cache systems

What happens when you spread caching across many nodes? As you can see, if the request level includes many nodes, then it is likely that each node will have its own cache. However, if your load balancer randomly distributes requests among nodes, then the same request will go to different nodes, thus increasing cache misses. Two ways to overcome this obstacle are global and distributed caches.

Global cache

The meaning of a global cache is clear from the name: all nodes share one single cache space. In this case, you add a server or file storage of some kind that is faster than your original storage and that will be available to all request level nodes. Each of the request nodes queries the cache in the same way as if it were local. This type of caching scheme can cause some problems, since a single cache can be very easily overloaded if the number of clients and requests increases. At the same time, this scheme is very effective on certain architectures (especially those associated with specialized hardware that makes this global cache very fast, or that has a fixed set of data that must be cached).

There are two standard forms of global caches shown in the diagrams. The situation shown is that when the cached response is not found in the cache, the cache itself becomes responsible for retrieving the missing part of the data from the underlying storage. Illustrated is the responsibility of request nodes to retrieve any data that is not found in the cache.


Figure 1.10: Global cache, where the cache is responsible for retrieval



Figure 1.11: Global cache, where request nodes are responsible for retrieval

Most applications that leverage global caches tend to use the first type, where the cache itself manages replacement and fetch data to prevent clients from flooding requests for the same data. However, there are some cases where the second implementation makes more sense. For example, if the cache is used for very large files, a low percentage of successful cache access will lead to overload of the buffer cache with unsuccessful cache accesses; in this situation it helps to have large percentage shared dataset (or hot dataset) in cache. Another example is an architecture where files stored in the cache are static and should not be deleted. (This may occur due to underlying performance characteristics regarding such data latency - perhaps certain parts of the data need to be very fast for large data sets - where the application logic understands the replacement strategy or hotspots better than the cache.)

Distributed cache

These indexes are often stored in memory or somewhere very local to the incoming client request. Berkeley DB (BDB) and tree data structures, which are typically used to store data in ordered lists, are ideal for indexed access.

There are often many layers of indexes that act as a map, moving you from one location to another, and so on, until you have the piece of data you need. (Cm. )


Figure 1.17: Multi-level indexes

Indexes can also be used to create multiple other views of the same data. For large data sets, this is a great way to define different filters and views without having to create many additional copies of the data.

For example, let's say that the image hosting system mentioned above actually hosts images of book pages, and the service allows clients to query the text in those images, searching for all text content on a given topic in the same way that search engines allow you to search HTML. content. In this case, all these book images use so many servers to store files, and finding a single page to present to the user can be quite difficult. Initially, reverse indexes for querying arbitrary words and sets of words should be readily available; then there is the task of navigating to the exact page and location in that book and retrieving the correct image for the search results. So in this case, the inverted index would map to the location (such as book B), and then B could contain the index with all the words, locations, and number of occurrences in each part.

An inverted index, which Index1 might display in the diagram above, would look something like this: Each word or set of words serves as an index for those books that contain them.

The intermediate index will look similar, but will only contain the words, location, and information for book B. This layered architecture allows each index to take up less space than if all this information were stored in one large inverted index.

And this key moment in large-scale systems, because even when compressed, these indexes can be quite large and expensive to store. Let's assume that we have many books from all over the world in this system - 100,000,000 (see blog post "Inside Google Books") - and that each book is only 10 pages long (to simplify calculations) with 250 words per page : This gives us a total of 250 billion words. If we take the average number of characters in a word to be 5, and encode each character with 8 bits (or 1 byte, even though some characters actually take up 2 bytes), thus spending 5 bytes per word, then the index containing each word only once would require storage of more than 1 terabyte. So you can see that indexes that also contain other information, such as word sets, data locations, and usage counts, can grow in size very quickly.

Creating such intermediate indexes and presenting the data in smaller chunks makes the big data problem easier to solve. Data can be distributed across multiple servers and at the same time be quickly accessible. Indexes are the cornerstone of information retrieval and the basis for today's modern search engines. Of course, this section only scratches the surface of the topic of indexing, and there has been a lot of research on how to make indexes smaller, faster, contain more information (such as relevance), and easily updated. (There are some issues with managing competing conditions, as well as the number of updates required to add new data or change existing data, especially when relevance or scoring is involved).

Being able to find your data quickly and easily is important, and indexes are the simplest and most effective tool for achieving this goal.

Load Balancers

Finally, another critical an important part of any distributed system - load balancer. Load balancers are a core part of any architecture because their role is to distribute the load among the nodes responsible for serving requests. This allows multiple nodes to transparently serve the same function in the system. (See.) Their main purpose is to handle many simultaneous connections and route those connections to one of the requested nodes, allowing the system to scale by simply adding nodes to serve more requests.


Figure 1.18: Load Balancer

There are many different algorithms for servicing requests, including random node selection, cyclic algorithm or even selecting a node based on certain criteria such as usage central processor or RAM. Load balancers can be implemented as hardware devices or software. Among the load balancers on software The most widely used open source software is HAProxy.

In a distributed system, load balancers are often located at the "front edge" of the system, so that all incoming requests pass directly through them. It is very likely that in a complex distributed system a request will have to go through multiple balancers, as shown in
.


Figure 1.19: Multiple load balancers

Like proxies, some load balancers can also route requests differently depending on the type of request. They are also known as reverse proxies.

Managing data specific to a particular user session is one of the challenges when using load balancers. On an e-commerce site, when you only have one customer, it's very easy to allow users to place items in their cart and save the contents between visits (this is important because the likelihood of selling an item increases significantly if, when the user returns to the site, the product is still is in his cart). However, if a user is directed to one node for the first session, and then to a different node on his next visit, inconsistencies may occur because the new node may not have knowledge of the contents of that user's shopping cart. (Wouldn't you be upset if you put a package of Mountain Dew in your cart and when you come back it's not there?) One solution might be to make sessions "sticky" so that the user is always directed to the same node. However, taking advantage of some reliability features, such as automatic failover, will be significantly difficult. In this case, the user's cart will always have content, but if their sticky node becomes inaccessible then a special approach will be needed and the assumption about the contents of the cart will no longer be true (though hopefully this assumption will not be built into the application). Of course, this problem can be solved using other strategies and tools, such as those described in this chapter, such as services, and many others (such as browser caches, cookies, and URL rewriting).

If the system only has a few nodes, then techniques such as DNS carousel are likely to be more practical than load balancers, which can be expensive and add an unnecessary layer of complexity to the system. Of course, in large systems there are all sorts of different scheduling and load balancing algorithms, including something as simple as randomization or carousel, to more complex mechanisms that take into account the performance characteristics of the system's usage pattern. All these algorithms allow you to distribute traffic and requests, and can provide useful tools reliability, such as automatic failover or automatic removal of a damaged node (for example, when it stops responding). However, these advanced features can make diagnosing problems cumbersome. For example, in high-load situations, load balancers will remove nodes that may be running slow or timing out (due to a barrage of requests), which will only make things worse for other nodes. Extensive monitoring is important in these cases because even though overall system traffic and load appear to be decreasing (as nodes are serving fewer requests), individual nodes may be stretched to their limits.

Load balancers are an easy way to increase system capacity. Like the other methods described in this article, it plays an essential role in distributed system architecture. Load balancers also provide the critical function of checking the health of nodes. If, as a result of such a check, a node is not responding or is overloaded, then it can be removed from the request processing pool, and, thanks to the redundancy of your system, the load will be redistributed among the remaining working nodes.

Queues

So far we have looked at many ways to quickly read data. At the same time, another important part of scaling the data layer is effective records management. When systems are simple and have minimal processing loads and small databases, writing can be predictably fast. However, in more complex systems This process may take an indefinitely long time. For example, data may have to be written to multiple places on different servers or indexes, or the system may simply be under high load. In cases where writes, or even just any task, take a long time, achieving performance and availability requires building asynchrony into the system. A common way to do this is to organize a request queue.


Figure 1.20: Synchronous request

Imagine a system in which each client requests a remote service task. Each of these clients sends its request to the server, which performs the tasks as quickly as possible and returns their results to the corresponding clients. In small systems where a single server (or logical service) can serve incoming clients as quickly as they arrive, situations of this nature should work fine. However, when the server receives more requests than it can handle, then each client is forced to wait for other clients' requests to complete processing before responding to its own request will be generated. This is an example of a synchronous request, depicted in .

This type of synchronous behavior can significantly degrade client performance; in fact, being idle, the client is forced to wait until it receives a response to the request. Adding additional servers to cope with the system load does not actually solve the problem; Even with effective load balancing in place, it is extremely difficult to provide the even and fair load distribution needed to maximize client productivity. Moreover, if the server to process this request is unavailable (or it has crashed), then the client connected to it will also stop working. An effective solution to this problem requires an abstraction between the client's request and the actual work performed to serve it.


Figure 1.21: Using Queues to Manage Requests

Entry queues. The way the queue works is very simple: a task arrives, gets into the queue, and then the “workers” accept the next task as soon as they have the opportunity to process it. (See.) These tasks may be simple notes to a database or something as complex as generating an image preview for the document. When a client submits task requests to a queue, it no longer needs to wait for execution results; instead, requests only need confirmation that they were properly received. This confirmation can later serve as a reference to the work results when the client requests them.

Queues allow customers to work in an asynchronous way, providing a strategic abstraction of the client's request and response. On the other hand, in synchronous system, there is no differentiation between request and response and therefore they cannot be managed separately. In an asynchronous system, the client submits a task, the service responds with a message confirming that the task has been received, and the client can then periodically check the status of the task, only requesting the result once it has completed. While the client is making an asynchronous request, it is free to do other work, and even make asynchronous requests from other services. The latter is an example of how queues and messages work in distributed systems.

Queues also provide some protection against service interruptions and failures. For example, it is quite easy to create a very resilient queue that can retry service requests that fail due to momentary server failures. It is preferable to use a queue to enforce quality of service guarantees rather than expose clients to temporary service interruptions, requiring complex and often inconsistent error handling on the client side.

Queues are the basic principle in managing distributed transmission between various parts any large-scale distributed system, and there are many ways to implement them. There are quite a few open source queue implementations like RabbitMQ.
ActiveMQ
BeanstalkD, but some also use services like

  • scaling
  • distributed computing
  • web development
  • Kate Matsudaira
  • Add tags