Operational analytical processing. On-Line Analytical Processing (OLAP)

For many years, information technology has focused on building systems to support the processing of corporate transactions. Such systems must be visually fault-tolerant and provide fast response. An effective solution was provided by OLTP, which focused on a distributed relational database environment.

A more recent development in this area was the addition of a client-server architecture. Many tools have been published for the development of OLTP applications.

Access to data is often required by both OLTP applications and decision support information systems. Unfortunately, trying to service both types of requests can be problematic. Therefore, some companies have chosen the path of dividing the database into OLTP type and OLAP type.

OLAP (Online Analytical Processing - operational analytical processing) is an information process that allows the user to query the system, conduct analysis, etc. in operational mode (online). Results are generated within seconds.

On the other hand, in an OLTP system, huge volumes of data are processed as quickly as they are received as input.

OLAP systems are made for end users, while OLTP systems are made for professional IS users. OLAP includes activities such as generating queries, querying ad hoc reports, performing statistical analysis, and building multimedia applications.

Providing OLAP requires working with a data warehouse (or multidimensional warehouse) as well as a set of tools, typically multidimensional capabilities. These tools can be query tools, spreadsheets, data mining tools, data visualization tools, etc.

The OLAP concept is based on the principle of multidimensional data representation. E. Codd examined the shortcomings of the relational model, first of all pointing out the inability to combine, view and analyze data from the point of view of multiple dimensions, that is, in the most understandable way for corporate analysts, and identified general requirements for OLAP systems that expand the functionality of relational DBMSs and include multidimensional analysis as one of its characteristics.

In a large number of publications, the acronym OLAP denotes not only a multidimensional view of data, but also the storage of the data itself in a multidimensional database. Generally speaking, this is not true, since Codd himself notes that relational databases were, are and will be the most suitable technology for storing enterprise data. The need is not for new database technology, but rather for analysis tools that complement the functionality of existing DBMSs and are flexible enough to accommodate and automate the various types of mining inherent in OLAP.

According to Codd, a multidimensional conceptual view is a multiple perspective consisting of several independent dimensions along which specific sets of data can be analyzed. Simultaneous analysis across multiple dimensions is defined as multivariate analysis. Each dimension includes areas of data consolidation, consisting of a series of successive levels of generalization, where each higher level corresponds to a greater degree of data aggregation for the corresponding dimension. Thus, the Performer dimension can be determined by the direction of consolidation, consisting of levels of generalization “enterprise - division - department - employee”. The Time dimension can even include two directions of consolidation - “year - quarter - month - day” and “week - day”, since counting time by month and by week is incompatible. In this case, it becomes possible to arbitrarily select the desired level of detail of information for each of the dimensions. The operation of descent corresponds to the movement from the highest stages of consolidation to the lowest; on the contrary, the operation of ascent means movement from lower levels to higher ones.

Codd defined 12 rules that an OLAP class software product must satisfy. These rules:

1. Multidimensional conceptual representation of data.

2. Transparency.

3. Availability.

4. Steady performance.

5. Client - server architecture.

6. Equality of measurements.

7. Dynamic processing of sparse matrices.

8. Support for multi-user mode.

9. Unlimited support for cross-dimensional operations.

10. Intuitive data manipulation.

11. Flexible report generation mechanism.

12. Unlimited number of dimensions and aggregation levels.

The set of these requirements, which served as the actual definition of OLAP, should be considered as recommendations, and specific products should be assessed according to the degree of closeness to ideal full compliance with all requirements.

Data mining.

Data mining (DMA), or Data Mining, is a term used to describe knowledge discovery in databases, knowledge extraction, data mining, data mining, data sample processing, data cleaning and data mining; This also means accompanying software. All these actions are carried out automatically and allow even non-programmers to get quick results.

The request is made by the end user, possibly in natural language. The request is converted to SQL format. The SQL request is sent over the network to the DBMS, which manages the database or data storage. The DBMS finds the answer to the request and delivers it back. The user can then design the presentation or report as per their requirements.

Many important decisions in almost any area of ​​business and social sphere are based on the analysis of large and complex databases. IBP can be very helpful in these cases.

Data mining methods are closely related to OLAP technologies and data warehouse technologies. Therefore, the best option is an integrated approach to their implementation.

In order for existing data warehouses to facilitate management decision-making, the information must be presented to the analyst in the required form, that is, he must have developed tools for accessing and processing warehouse data.

Very often, information and analytical systems, created with the expectation of direct use by decision makers, turn out to be extremely easy to use, but severely limited in functionality. Such static systems are called Executive Information Systems. They contain predefined sets of queries and, while sufficient for everyday review, are unable to answer all questions about the available data that may arise when making decisions. The results of such a system, as a rule, are multi-page reports, after careful study of which the analyst has a new series of questions. However, each new request that was not foreseen when designing such a system must first be formally described, coded by the programmer, and only then executed. The waiting time in this case can be hours and days, which is not always acceptable. Thus, the external simplicity of statistical decision support information systems, for which most customers of information and analytical systems are actively fighting, results in a loss of flexibility.

Dynamic decision support systems, on the contrary, are focused on processing unregulated (ad hoc) analyst requests for data. The work of analysts with these systems consists of an interactive sequence of forming queries and studying their results.

But dynamic decision support systems can operate not only in the field of online analytical processing (OLAP). Support for making management decisions based on accumulated data can be performed in three basic areas.

1. Scope of detailed data. This is the scope of most information retrieval systems. In most cases, relational DBMSs cope well with the tasks that arise here. The generally accepted standard for the language for manipulating relational data is SQL. Information retrieval systems that provide an end-user interface in tasks of searching for detailed information can be used as add-ons both over individual databases of transactional systems and over a general data repository.

2. The scope of aggregate indicators. A comprehensive look at the information collected in a data warehouse, its generalization and aggregation, and multidimensional analysis are the tasks of OLAP systems. Here you can either focus on special multidimensional DBMSs, or remain within the framework of relational technologies. In the second case, pre-aggregated data can be collected in a star-shaped database, or information aggregation can be performed in the process of scanning detailed tables of a relational database.

3. The sphere of patterns. Intellectual processing is carried out using data mining methods, the main objectives of which are to search for functional and logical patterns in the accumulated information, build models and rules that explain the found anomalies and/or predict the development of certain processes.

The complete structure of the information and analytical system built on the basis of a data warehouse is shown in Fig. 3.2. In specific implementations, individual components of this circuit are often missing.

Fig.3.2. Structure of the corporate information and analytical system.

The OLAP concept is based on the principle of multidimensional data representation. In a 1993 article, E. F. Codd addressed the shortcomings of the relational model, primarily pointing out the inability to “combine, view and analyze data in terms of multiple dimensions, that is, in the most understandable way for enterprise analysts,” and defined the general requirements for OLAP systems that extend functionality of relational DBMS and including multidimensional analysis as one of its characteristics.

In a large number of publications, the acronym OLAP denotes not only a multidimensional view of data, but also the storage of the data itself in a multidimensional database. Generally speaking, this is not true, as Codd himself notes that “Relational databases were, are and will be the most suitable technology for storing enterprise data. The need is not for new database technology, but rather for analysis tools that complement the functions of existing DBMSs and sufficient flexible to enable and automate the various types of mining inherent in OLAP." Such confusion leads to oppositions like "OLAP or ROLAP", which is not entirely correct, since ROLAP (relational OLAP) at the conceptual level supports all the functionality defined by the term OLAP. It seems more preferable to use the special term MOLAP for OLAP based on multidimensional DBMSs, as is done in.

According to Codd, a multi-dimensional conceptual view is a multiple perspective consisting of several independent dimensions along which specific sets of data can be analyzed. Simultaneous analysis across multiple dimensions is defined as multivariate analysis. Each dimension includes areas of data consolidation, consisting of a series of successive levels of generalization, where each higher level corresponds to a greater degree of data aggregation for the corresponding dimension. Thus, the Performer dimension can be determined by the direction of consolidation, consisting of the levels of generalization “enterprise - division - department - employee”. The Time dimension can even include two consolidation directions - “year - quarter - month - day” and “week - day”, since counting time by month and by week is incompatible. In this case, it becomes possible to arbitrarily select the desired level of detail of information for each of the dimensions. The drilling down operation corresponds to the movement from higher to lower stages of consolidation; on the contrary, the rolling up operation means movement from lower to higher levels (Fig. 2).

Rice. 2. Dimensions and directions of data consolidation

Corporate databases of economic information systems

3. On-Line Analytical Processing (OLAP)

The technology for complex multidimensional data analysis is called OLAP (On-Line Analytical Processing). OLAP is a key component of data warehousing. The OLAP concept was described in 1993 by Edgar Codd and has the following requirements for multidimensional analysis applications:

multidimensional conceptual representation of data, including full support for hierarchies and multiple hierarchies (a key requirement of OLAP);

providing the user with analysis results in an acceptable time (usually no more than 5 s), at the cost of a less detailed analysis;

the ability to perform any logical and statistical analysis specific to a given application and save it in a form accessible to the end user;

multi-user access to data with support for appropriate locking mechanisms and authorized access means;

the ability to access any necessary information, regardless of its volume.

An OLAP system consists of many components. At the highest level of presentation, the system includes a data source, a multidimensional database (MDB), which provides the ability to implement a reporting mechanism based on OLAP technology, an OLAP server and a client. The system is built on the client-server principle and provides remote and multi-user access to the MDB server.

Let's look at the components of an OLAP system.

Sources. The source in OLAP systems is the server that supplies data for analysis. Depending on the use of the OLAP product, the source may be a data warehouse, a legacy database containing common data, a set of tables that aggregate financial data, or any combination of the above.

Data store. Source data is collected and stored in a warehouse designed according to data warehousing principles. The data warehouse is a relational database (RDB). The main data table (fact table) contains numerical values ​​of indicators for which statistical information is collected.

Multidimensional database. A data warehouse serves as a provider of information to a multidimensional database, which is a collection of objects. The main classes of these objects are dimensions and measures. Dimensions include sets of values ​​(parameters) by which data is indexed, for example, time, regions, type of institution, etc. Each dimension is filled with values ​​from the corresponding dimension tables of the data warehouse. The set of measurements determines the space of the process under study. Indicators refer to multidimensional data cubes (hypercubes). The hypercube contains the data itself, as well as aggregate sums for the dimensions included in the indicator. Indicators constitute the main content of the MDB and are filled in in accordance with the fact table. Along each axis of a hypercube, data can be organized into a hierarchy representing different levels of detail. This allows you to create hierarchical dimensions, which will be used to aggregate or drill down the data presentation during subsequent data analysis. A typical example of a hierarchical dimension is a list of territorial objects grouped by districts, regions, and districts.

Server. The application part of the OLAP system is the OLAP server. This component does all the work (depending on the system model), and stores all the information to which active access is provided. Server architecture is governed by various concepts. In particular, the main functional characteristic of OLAP products is the use of MDB or RDB for data storage.

Client application. Data structured accordingly and stored in the MDB is available for analysis using a client application. The user gets the opportunity to remotely access data, formulate complex queries, generate reports, and obtain arbitrary subsets of data. Obtaining a report comes down to selecting specific measurement values ​​and constructing a section of a hypercube. The cross section is determined by the selected measurement values. Data for other measurements are summarized.

The main concepts of a multidimensional data model are: Data Hypercube, Dimension, Memders, Cell and Measure.

A data hypercube contains one or more dimensions and is an ordered collection of cells. Each cell is defined by one and only one set of dimension values—labels. The cell can contain data - a measure or be empty.

A dimension is a set of marks that form one of the faces of a hypercube. An example of a time dimension is a list of days, months, quarters. An example of a geographical dimension could be a list of territorial objects: settlements, districts, regions, countries, etc.

To access the data, the user must specify one or more cells by selecting the dimension values ​​that correspond to the desired cells. The process of selecting measurement values ​​is called fixing labels, and the set of selected measurement values ​​is called a set of fixed labels.

Advantages of using server OLAP tools compared to client OLAP tools: when using server tools, calculation and storage of aggregate data occurs on the server, and the client application receives only the results of queries to them, which generally allows reducing network traffic and query execution time and resource requirements consumed by the client application.

1. Multidimensional data representation - end-user tools that provide multidimensional visualization and manipulation of data; The multidimensional representation layer abstracts from the physical structure of the data and treats the data as multidimensional.

2. Multidimensional processing - a means (language) for formulating multidimensional queries (the traditional relational language SQL is unsuitable here) and a processor that can process and execute such a query.

3. Multidimensional storage - means of physical organization of data, ensuring the effective execution of multidimensional queries.

The first two levels are mandatory in all OLAP tools. The third level, although widespread, is not necessary, since data for a multidimensional representation can be extracted from ordinary relational structures.

In any data warehouse - both regular and multidimensional - along with detailed data extracted from operational systems, aggregated indicators (total indicators), such as the sum of sales volumes by month, by product category, etc., are also stored.

The main disadvantages are the increase in the volume of stored information (when adding new dimensions, the volume of data that makes up the cube grows exponentially) and the time it takes to load them.

The degree of increase in data volume when calculating aggregates depends on the number of dimensions of the cube and the structure of these dimensions, i.e. the ratio of the number of “parents” and “descendants” at different levels of measurement. To solve the problem of storing aggregates, complex schemes are used, which make it possible to achieve a significant increase in query performance when calculating not all possible aggregates.

Both raw and aggregate data can be stored in either relational or multidimensional structures. In this regard, three methods of storing multidimensional data are currently used:

MOLAP (Multidimensional OLAP) - source and aggregate data are stored in a multidimensional database. Storing data in multidimensional structures allows you to manipulate the data as a multidimensional array, due to which the speed of calculating aggregate values ​​is the same for any of the dimensions. However, in this case, the multidimensional database is redundant, since the multidimensional data entirely contains the original relational data.

These systems provide a full cycle of OLAP processing. They either include, in addition to the server component, their own integrated client interface, or use external spreadsheet programs to communicate with the user.

ROLAP (Relational OLAP) - the original data remains in the same relational database where it was originally located. Aggregate data is placed in service tables specially created for storing it in the same database.

HOLAP (Hybrid OLAP) - the original data remains in the same relational database where it was originally located, and the aggregate data is stored in a multidimensional database.

Some OLAP tools support storing data only in relational structures, some only in multidimensional ones. However, most modern server OLAP tools support all three data storage methods. The choice of storage method depends on the volume and structure of the source data, requirements for the speed of query execution and the frequency of updating OLAP cubes.

OLAP technologies as a powerful tool for real-time data processing

Dr. E.F. (Ted) Coddy coined the phrase Online Analytical Processing (OLAP) in 1993...

Analytical data processing (OLAP). Information data warehouse. Data models used to build information warehouses

The main task for the OLTP model is fast query processing, maintaining data integrity, multi-access to the environment, its efficiency is measured by the number of transactions per second...

Selection and justification of the configuration of a personal computer oriented for working with sound at the amateur level

Random Access Memory (RAM) is a volatile part of a computer memory system that temporarily stores data and commands necessary for the processor to perform an operation...

Selection and justification of the configuration of a personal computer focused on performing a specific range of tasks

Random Access Memory (random access memory; computer jargon: Memory, RAM) is a volatile part of the computer memory system in which data and commands are temporarily stored...

processor scheduling algorithm In online processing systems, the average time for servicing requests is used as the main criterion for efficiency. It is easy to see that in the case when the solution times of problems are known a priori...

Study of resource management algorithms for single-processor servers during online task processing (SPT and RR algorithms)

The SPT algorithm is used when the solution times of problems (processes) are known. To do this, before directly solving it, he first sorts the problems in ascending order...

Corporate databases of economic information systems

The technology for complex multidimensional data analysis is called OLAP (On-Line Analytical Processing). OLAP is a key component of data warehousing. The concept of OLAP was described in 1993...

Prospects for PC development

Random Access Memory (RAM) is an array of crystalline cells capable of storing data. RAM is a very important element of a computer. It stores programs and data that the PC directly works with...

Designing a PC for calculating holiday expenses

Of particular importance in relation to computer technologies and telecommunications are the “online” and “offline” modes. “Online” mode is a non-autonomous mode of computer operation, a constant connection to the Internet. Software products...

Motherboards, types and specifications

Random Access Memory (RAM) is a volatile part of the computer memory system that temporarily stores data and commands necessary for the processor to perform operations (Figure 3)...

Modern multimedia computer equipment

As you know, a computer stores data mainly on a special device - a hard drive. And in the process of work he takes it from there. Where does the information go later? It's clear...

1.1.1 Skype software product Skype is a program that allows you to communicate via the Internet with your colleagues, friends, relatives around the world...

Comparative analysis of distance learning systems

1.2.2 Moodle Distance Learning System Moodle LMS is a distance learning environment designed for creating and using distance courses...

OLAP technology

When building an information system, OLAP functionality can be implemented using both server and client OLAP tools...

Tools of the OLAP class (On-Line Analytical Processing, traditional Russian translation - “online analytical processing”) are today popular analytical tools, without which it is almost impossible to imagine an information and analytical system. The term OLAP itself was coined in 1993 by Codd, who discussed the shortcomings of the relational model from the point of view of corporate analysts. The tool that was supposed to correct these shortcomings was the concept of OLAP. To be fair, it must be said that an approach similar to OLAP (namely, multidimensional data representation) was used before the introduction of this term, but the impetus for the widespread dissemination of the technology and its implementation in many analytical products was Codd’s article.

Among the disadvantages of the relational model and relational DBMS in relation to analysis tasks, Codd noted the following. First, analytical queries are quite complex and involve a large number of relatively slow relational join operations. Secondly, composing queries to relational databases is not available to corporate analysts (from now on we will call them “decision makers” or decision makers). The second drawback causes a rather long cycle for obtaining the necessary information from the decision maker - it is necessary, for example, to contact the information service, where they will prepare a report form with the relevant information, and then use the reports of this form. Codd saw a solution to these problems in an analytical tool that supports a multidimensional model, as understandable to the decision maker. That is, several dimensions are identified, in the context of which various indicators of the enterprise’s performance are considered. Such a model, due to its clarity and intuitiveness, should allow the decision maker to access the necessary information himself. On the other hand, responses to queries must be generated quickly enough (this requirement is responsible for the “On-Line” part of the OLAP acronym).

Codd also formulated 12 rules that an OLAP system must satisfy. Later, these rules were reworked into 18 properties, divided into 4 groups. This set of rules is not popular. Perhaps because, unlike Codd's well-known 1970 manifesto describing the relational data model, the 1993 paper contained much less fundamental justification and was less theoretically verified. In addition, it was published under the auspices of one reputable supplier of analytical systems and the rules formulated in it may not be universal, but take into account the specifics of the products of this supplier. One way or another, the so-called FASMI test is more popular, which can be mistaken for the definition of OLAP. FASMI is an acronym that stands for:

Fast – System response time should be measured in seconds. Independent studies show that the time a user waits for a response from a computer is about 20 seconds. After this period, the user begins to feel discomfort. Undoubtedly, achieving any queries on large amounts of information in seconds is a difficult task for OLAP tool manufacturers. In fact, this is one of the main directions of development in this area. However, as some surveys show, unsatisfactory operating speed is still one of the main complaints of users about tools of this class.

Analisys (analysis) - the system is designed for a comprehensive study of data, and this study may contain elements of business logic, support user-defined dependencies, and so on.

Shared (shared, multi-user) - the system must support multi-user work, while ensuring the necessary level of confidentiality. If user correction of data is allowed, then it must be controlled by known locking mechanisms at the required level.

Multidimensional - Data must be presented in multidimensional form. This is the main part of the definition of OLAP.

Information (information) - this component hints that the result of the analysis is information (as opposed to data stored in a relational database).

The FASMI test, like Codd's rules, sets a certain standard - the “ideal OLAP tool”. In fact, different products can be compared based on how well they satisfy these provisions. There are currently no products that would fully satisfy them.

Connection between OLAP and data storage

Data warehouses reflect the modern trend towards collecting and cleaning data from transactional systems and storing it for analysis purposes. The emergence of data warehouse technology is partly due to the same prerequisites as OLAP - the difference in analytical queries and typical queries to accounting systems. In addition, the desire to collect data from all sources in the enterprise to create a more holistic information picture turned out to be very relevant.

A type of data warehouse is data marts (or data marts). Their difference from data warehouses lies mainly in size. If enterprise data flows into the data warehouse, then the showcase presents data related to only one division, service or branch. A storefront can be created either independently or as a subset of a corporate data warehouse.

Collected from different sources, consistent and sometimes aggregated data is ideal for analysis. Therefore, in most cases, OLAP tools are deployed specifically on the basis of a warehouse or data mart, and are designed to analyze the data contained there. This is such a general trend that in some sources the concepts of Data Warehouse (data mart) and OLAP are not distinguished. However, out of methodological necessity, a distinction still needs to be made. Data warehouse technology is more focused on collecting, cleaning, and storing data, and OLAP is more focused on their processing and presentation.


Related information.


Term operational analytical processing(On-Line Analytical Processing - OLAP) was first mentioned in a report prepared for Arbor Software Corp. in 1993, although the definition of this term, as with data warehouses, was formulated much later. The concept denoted by this term can be defined as “the interactive process of creating, maintaining, analyzing data and issuing reports.” In addition, it is usually added that the data in question should be perceived and processed as if it were stored in multidimensional array. But before we discuss the multidimensional view itself, let's look at the relevant ideas in terms of traditional SQL tables.

The first feature is that analytical processing necessarily requires some aggregation data, usually performed using several different methods at once or, in other words, according to many different grouping criteria. In essence, one of the main problems of analytical processing is that the number of possible ways of grouping

very soon it becomes too big. However, users should consider all or most of these methods. Of course, the SQL standard now supports such aggregation, but any given SQL query produces only one table as its result, and all rows in that resulting table have the same form and the same interpretation10 (at least that's how it works)

9 Here's some advice from a book on data warehousing: "[Abandon] normalization... Trying to normalize any of the tables in a multidimensional database solely to save disk space [that's right!] is a waste of time... Dimensional tables should not be normalized... Normalized Dimension tables exclude the possibility of viewing."

10 Unless this result table includes any undefined values, or NULL values ​​(see Chapter 19, Section 19.3, "Additional Information about Predicates"). In fact, the SQL: 1999 constructs that are to be described in this section can be characterized as "based on the use" of this highly discouraged SQL tool (?); in fact, they highlight the fact that in their different manifestations, indefinite meanings can have different meanings, and therefore allow many different predicates to be represented in a single table (as will be shown below).

was before the advent of the SQL standard: 1999). Therefore, in order to implement P different grouping methods, you need to perform P separate queries and create l separate tables as a result. For example, consider the following sequence of queries performed on a supplier and parts database.

1. Determine the total number of deliveries.

2. Determine the total number of deliveries by supplier.

3. Determine the total number of parts supplied.

4. Determine the total number of deliveries by supplier and part.

(Of course, the "total" quantity for a given supplier and for a given part is simply the actual quantity for a given supplier and a given part. The example would be more realistic if a database of suppliers, parts and projects was used. But to keep this simple for example, we still settled on the usual database of suppliers and parts.)

Now let's assume that there are only two parts, with numbers P1 and P2, and the supply table looks like this.

Multidimensional Databases

So far we have assumed that OLAP data is stored in a regular database using the SQL language (not to mention that we have occasionally touched on the terminology and concept multidimensional databases). In fact, without explicitly indicating, we described the so-called system ROLAP(Relational OLAP relational OLAP). However, many believe that using the system MOLAP(Multi-dimensional OLAP- multidimensional OLAP) is a more promising path. In this subsection, the principles of constructing MOLAP systems will be discussed in more detail.

The MOLAP system maintains multidimensional databases, in which data is conceptually stored in cells of a multidimensional array.

Note. Although higher And was said about conceptual method of organizing storage, in fact the physical organization of data in MOLAP very similar to their logical organization.

The supporting DBMS is called multidimensional. A simple example is a three-dimensional array representing products, customers, and time periods, respectively. The value of each individual cell can represent the total volume of a specified item sold to a customer during a specified time period. As noted above, the crosstabs from the previous subsection can also be considered such arrays.

If there is a sufficiently clear understanding of the structure of the data set, then all the relationships between the data can be known. Moreover, variables such a collection (not in the sense of ordinary programming languages), roughly speaking, can be divided into dependent And independent. IN previous example product, customer And period of time can be considered independent variables, and quantity - the only dependent variable. In general, independent variables are variables whose values ​​together determine the values ​​of dependent variables (just as, in relational terminology, a candidate key is a set

columns whose values ​​determine the values ​​of the remaining columns). Consequently, independent variables specify the size of the array with which the data is organized, and also form addressing scheme11 for a given array. The values ​​of the dependent variables, which represent the actual data, are stored in array cells.

Note. The difference between the values ​​of independent, or dimensional, variables

and the values ​​of the dependent ones, or non-dimensional, variables are sometimes characterized as the difference between location And content.

" Therefore, array cells are addressed symbolically, rather than using numeric indices, which are usually used for working with arrays.

Unfortunately, the above characterization of multidimensional databases is too simplistic, since most data collections initially remain Not fully studied. For this reason, we usually strive to first analyze the data to better understand it. Often the lack of understanding can be so significant that it is impossible to determine in advance which variables are independent and which are dependent. Then the independent variables are selected according to the current understanding of them (i.e., based on some hypothesis), and the resulting array is then examined to determine how well the independent variables were selected (see Section 22.7). This approach results in a lot of trial-and-error iterations. Therefore, the system usually allows for the replacement of dimensional and non-dimensional variables, and this operation is called changing coordinate axes(pivoting). Other supported operations include array transposition And reordering dimensions. There should also be a way to add dimensions.

By the way, from the previous description it should be clear that array cells often turn out to be empty (and the more dimensions, the more often this phenomenon is observed). In other words, arrays are usually sparse. Suppose, for example, that product p was not sold to customer c during the entire period of time t. Then the cell [s, p, t] will be empty (or at best contain zero). Multidimensional DBMSs support various methods for storing sparse arrays in a more efficient, compressed representation12. It should be added that empty cells correspond to missing information and therefore systems need to provide some computational support for empty cells. Such support is indeed usually available, but its style, unfortunately, is similar to the style adopted in the SQL language. Please note that if this cell is empty, then the information is either unknown, or was not entered, or is not applicable, or is missing for other reasons.

(see chapter 19).

Independent variables are often related hierarchy, defining the paths along which aggregation of dependent data can occur. For example, there is a temporary

a hierarchy that links seconds to minutes, minutes to hours, hours to days, days to weeks, weeks to months, months to years. Or another example: a hierarchy is possible

compositions connecting parts with a set of parts, sets of parts with a unit, units with a module, modules with a product. Often the same data can be aggregated in many different ways, e.g. the same independent variable can belong to many different hierarchies. The system provides operators for passing up(drill up) and passing down(drill down) according to this hierarchy. Passing up means a transition from the lower level of aggregation to the upper one, and passing down -

transition in the opposite direction. There are other operations for working with hierarchies, such as an operation for reordering hierarchy levels.

Note. Between operations passing up(drill up) and accumulation of results(roll

up) there is one subtle difference: the operation accumulation of results - this is an implementation operation

12 Note the difference from relational systems. In the real relational analogue of this example, in the line Ic, p, t) there would be no empty quantity "cell" due to the fact that the line (s, p, t) would simply be absent. Therefore, when using the relational model, unlike multidimensional arrays, there is no need to support "sparse arrays", or rather "sparse tables", and therefore no sophisticated compression techniques are required to work with such tables.

required methods of grouping and aggregation, and the operation passing up- this is an operation access to the results of implementing these methods. And an example of the operation passing down The following query could be used: “The total number of deliveries is known; obtain the totals for each individual supplier.” Of course, to answer this request, more granular levels of data must be available (or computable).

Multivariate database products also provide a number of statistical and other mathematical functions that help formulate and test hypotheses (that is, hypotheses about hypothesized relationships). In addition, visualization and reporting tools are provided to help solve such problems. Unfortunately, there is as yet no standard query language for multidimensional databases, although research is underway to develop a calculus on which such a standard could be based. But, unfortunately, there is nothing like the relational theory of normalization that could serve as a scientific basis for the design of multidimensional databases.

Concluding this section, we note that some products combine both approaches - ROLAP and MOLAP. like this hybrid OLAP system called HOLAP. There is considerable debate about which of these three approaches is better, so it is worthwhile for us to try to say a few words on this issue13. In general, MOLAP systems provide faster calculations, but support smaller amounts of data compared to ROLAP systems, i.e. become less efficient as data volumes increase. And ROLAP systems provide more advanced scalability, concurrency, and control capabilities than similar capabilities of MOLAP systems. In addition, the SQL standard has recently been expanded to include many statistical and analytical functions (see Section 22.8). It follows that ROLAP products are now capable of providing advanced functionality as well.