Internet publication about high technologies. Relational database management systems basic concepts features. User Defined Functions

Database(DB) is a structured, organized set of data that describes the characteristics of a physical or virtual system.

Database is an organized structure designed to store information.

DBMS- instrumental software designed to organize database maintenance.

Based on the type of database model, they are divided into:

Hierarchical databases

Hierarchical DBMSs are based on a fairly simple data model, which can be imagined as a tree of a special type of acyclic directed graph. A tree consists of vertices, each of which, except one, has a single parent vertex and several (including none) children.

Network DBMS

Like the hierarchical one, the network model can also be thought of as a directed graph. But in this case, the graph may contain cycles, i.e. a vertex can have several parents.

Relational DBMS

Relational DBMSs are currently the most common. The relational model focuses on organizing data in the form of two-dimensional tables. A relational table is a two-dimensional array and has the following properties:

Each table element is one data element;

All columns in the table are homogeneous, i.e. all elements in a column have the same type (numeric, character, etc.) and length;

Each column has a unique name.

Significant role in success relational DBMS plays also SQL language(Structured Query Language), designed specifically for querying relational databases. This is a fairly simple and at the same time expressive language, with the help of which you can perform quite sophisticated queries to the database.

Object-oriented

a database in which data is formatted as object models that include application programs that are controlled by external events. In the most general and classical setting, the object-oriented approach is based on the concepts of: object and object identifier; attributes and methods; classes; hierarchy and class inheritance.

Multidimensional

OLAP software is used in processing data from various sources. These software products allow the implementation of many different data representations and are characterized by three main features: multidimensional data representation; complex calculations on data; calculations related to changes in data over time.

9. Programming languages. Machine code. Translators. Binary coding of information.

Programming language- a formal sign system designed to describe algorithms in a form that is convenient for the performer (for example, a computer). A programming language defines a set of lexical, syntactic, and semantic rules used to compose a computer program. It allows the programmer to determine exactly what events the computer will react to, how data will be stored and transmitted, and what actions should be performed on this data under various circumstances.

Processor machine code

The computer processor receives all commands and data in the form of electrical signals. They can be represented as collections of zeros and ones, that is, numbers. Correspond to different commands different numbers. Therefore, in reality, the program that the processor works with is a sequence of numbers called machine code .

Levels of programming languages

If a programming language is focused on a specific type of processor and takes into account its features, then it is called programming language low level . This means that the language operators are close to machine code and are focused on specific processor commands.

The lowest level language is assembly language , which simply represents each machine code instruction, not as numbers, but using symbolic notations called mnemonics.

High level programming languages much closer and more understandable to a person than to a computer. The features of specific computer architectures are not taken into account in them, therefore the created programs at the source text level are easily portable to other platforms for which a translator of this language has been created.

Popular programming languages today are:

Pascal (Pascal), was created in the late 70s by the founder of many modern programming ideas, Niklaus Wirth, and has the capabilities to be successfully used when creating large projects.

Basic(BASIC), d There are both compilers and interpreters for this language, and in terms of popularity it ranks first in the world. It was created in the 60s as educational language and very easy to learn. Its modern modification Visual Basic, compatible with Microsoft Office, allows you to expand the capabilities of Excel and Access packages.

C (Si), This language was created at the Bell Laboratory and was not initially considered as a mass language. It was planned to replace the assembler language in order to be able to create equally efficient and compact programs, and at the same time not depend on a particular type of processor. In the 70s, many application and system programs and a number of well-known operating systems (Unix) were written in this language.

Some languages, such as Java and C#, fall between compiled and interpreted. Namely, the program is not compiled into machine language, but into low-level machine-independent code, bytecode. The bytecode is then executed by the virtual machine. Interpretation is usually used to execute bytecode, although individual parts of it can be translated into machine code directly during program execution using Just-in-time compilation (JIT) to speed up the program. For Java, the bytecode is executed by the Java virtual machine (Java Virtual Machine, JVM), for C# - Common Language Runtime.

With this article we begin a new series devoted to databases, modern technologies for accessing and processing data. During this series we plan to review the most popular desktop and server management systems databases(DBMS), data access mechanisms (OLD DB, ADO, BDE, etc.) and utilities for working with databases (administration tools, report generators, graphical data presentation tools). In addition, we plan to pay attention to methods of publishing data on the Internet, as well as such popular methods of processing and storing data as OLAP (On-Line Analytical Processing) and creating data warehouses (Data Warehousing).

In this article we will look at the basic concepts and principles underlying database management systems. We will discuss the relational data model, the concept of referential integrity and principles of data normalization, as well as data design tools. Then we will explain what DBMSs are, what objects can be contained in databases, and how queries are made against these objects.

Basic Relational Database Concepts

Let's start with the basic concepts of DBMS and brief introduction into the theory of relational databases - the most popular method of data storage today.

Relational data model

Relational data model was proposed by Dr. E.F. Codd, a renowned database researcher, in 1969 while he was an IBM employee. The basic concepts of this model were first published in 1970. “A Relational Model of Data for Large Shared Data Banks”, CACM, 1970, 13 N 6).

A relational database is a data warehouse containing a set of two-dimensional tables. A set of tools for managing such storage is called relational database management system (RDBMS). An RDBMS may contain utilities, applications, services, libraries, application creation tools, and other components.

Any relational database table consists of lines(also called records) And columns(also called fields). In this series we will use both pairs of terms.

The rows of the table contain information about the facts presented in it (or documents, or people, in a word - about objects of the same type). At the intersection of a column and a row are the specific values of the data contained in the table.

The data in the tables meets the following principles:

Each value contained at the intersection of a row and a column must be atomic(that is, not divided into several values).
Data values in the same column must belong to the same type available for use in a given DBMS.
Each record in the table is unique, that is, there are no two records in the table with a completely matching set of values for its fields.
Each field has a unique name.
The sequence of fields in the table is not important.
The sequence of entries is also immaterial.

Despite the fact that table rows are considered unordered, any database management system allows you to sort rows and columns in selections from it in the way the user needs.

Since the sequence of columns in a table is not important, they are referred to by name, and these names are unique for a given table (but do not have to be unique for the entire database).

So now we know that relational databases are made up of tables. To illustrate some theoretical points and to create examples, we need to select some kind of database. In order not to “reinvent the wheel”, we will use the NorthWind database included with Microsoft SQL Server and Microsoft Access.

Now let's look at the relationships between tables.

Keys and connections

Let's take a look at a snippet of the Customers table from the NorthWind database (we've removed fields that aren't essential to illustrating the relationships between the tables).

Because the rows in a table are unordered, we need a column (or set of columns) to uniquely identify each row. Such a column (or set of columns) is called primary key (primary key). The primary key of any table must contain unique non-empty values for each row.

If the primary key has more than one column, it is called composite primary key (composite primary key).

A typical database usually consists of several related tables. Fragment of the Orders table.

The CustomerID field of this table contains the ID of the customer who placed the order. If we want to know the name of the company that placed the order, we must look for the same customer ID value in the CustomerID field of the Customers table and read the value of the CompanyName field in the found row. In other words, we need to link two tables, Customers and Orders, using the CustomerID field. A column that points to a record in another table that is related to a given record is called (foreign key foreign key

). As you can see, in the case of the Orders table, the foreign key is the CustomerID column (Fig. 1).

In other words, a foreign key is a column or set of columns whose values match the existing values of the primary key of another table. This relationship between tables is called (communication relationship

). A relationship between two tables is established by assigning the foreign key values of one table to the primary key values of the other. If each customer in the Customers table can place only one order, the two tables are said to be related by (one to one one-to-one relationship ). If each customer in the Customers table can place zero, one, or many orders, the two tables are said to be related by (one-to-many one-to-many relationship ) or ratio master-detail . Similar relationships between tables are most often used. In this case, the table containing the foreign key is called detail table , and a table containing a primary key that defines the possible values of a foreign key is called.

master table A group of related tables is called scheme Database (). Information about tables, their columns (names, data type, field length), primary and foreign keys, as well as other database objects is called metadata (metadata).

Any manipulation of data in databases such as selecting, inserting, deleting, updating data, changing or selecting metadata is called request to the database ( query). Typically, queries are formulated in some language, which can be either standard for different DBMSs or dependent on a specific DBMS.

Referential integrity

We already said above that the primary key of any table must contain unique non-empty values for a given table. This statement is one of the rules referential integrity (referential integrity). Some (but not all) DBMSs can control the uniqueness of primary keys. If the DBMS controls the uniqueness of primary keys, then if you try to assign a value to a primary key that already exists in another record, the DBMS will generate a diagnostic message, usually containing the phrase primary key violation. This message can later be transmitted to the application through which the end user manipulates the data.

If two tables are related by relationship ) or ratio, external key detail- the table should contain only those values that already exist among the primary key values master- tables. If the values are correct foreign keys is not controlled by the DBMS, we can talk about a violation of referential integrity. In this case, if we delete a record from the Customers table that has at least one associated with it detail- entry in the Orders table, this will lead to the Orders table containing records of orders placed by someone unknown. If the DBMS controls the correctness of the values of foreign keys, then when you try to assign a value to a foreign key that is not among the values of the primary keys of the master table, or when you delete or modify records in the master table, leading to a violation of referential integrity, the DBMS will generate a diagnostic message, usually containing the phrase foreign key violation, which can later be passed to the user application.

Most modern DBMSs, such as Microsoft Access 97, Microsoft Access 2000, and Microsoft SQL Server 7.0, are capable of monitoring compliance with referential integrity rules, if any are described in the database. For this purpose, such DBMSs use various database objects (we will discuss them a little later). In this case, all attempts to violate referential integrity rules will be suppressed with the simultaneous generation of diagnostic messages or exceptions ( database exceptions).

Introduction to Data Normalization

The data design process is the definition of metadata in accordance with the objectives of the information system in which the future database will be used. Details on how to analyze a subject area and create entity-relationship diagrams ( ERD - entity-relationship diagrams) and data models are beyond the scope of this cycle. Those interested in these issues can refer, for example, to the book by K. J. Date “Introduction to Database Systems” (Dialectics, Kyiv, 1998).

In this article we will discuss only one of the basic principles of data design - the principle normalization.

Normalization is the process of reorganizing data by eliminating repeating groups and other contradictions in data storage in order to bring tables to a form that allows consistent and correct editing of data.

Normalization theory is based on the concept of normal forms. A table is said to be in a given normal form if it satisfies a certain set requirements. In theory there are five normal forms, but in practice only the first three are usually used. Moreover, the first two normal forms are essentially intermediate steps to bring the database into third normal form.

First normal form

Let's illustrate the normalization process with an example using data from the NorthWind database. Let's assume we record all ordered products in the following table. The structure of this table is as follows (Fig. 2).

For a table to comply with first normal form, all its field values must be atomic, and

all records are unique. Therefore, any relational table, including the OrderedProducts table, by definition, is already in first normal form.

However, this table contains redundant data, for example, the same customer information is repeated in the record for each product ordered. Data redundancy results in modification anomalies data problems errors that occur when adding, changing, or deleting records. For example, when editing data in the OrderedProducts table, the following problems may occur:

A specific customer's address can only be contained in the database when the customer has ordered at least one product.
When you delete a record of an ordered product, information about the order itself and about the customer who placed it are simultaneously deleted.
If, God forbid, the customer changes his address, all records about the products he ordered will have to be updated.

Some of these problems can be solved by aligning the database second normal form.

Second normal form

It is said that a relational table is in second normal form, if it is in first normal form and its non-key fields completely dependent from the entire primary key.

The OrderedProducts table is in first normal form, but not second normal form, because the CustomerID, Address, and OrderDate fields depend only on the OrderID field, which is part of the composite primary key (OrderID, ProductID).

To move from first normal form to second normal form, you need to follow these steps:

Determine what parts the primary key can be broken into so that some of the non-key fields depend on one of these parts ( these parts do not have to consist of one column!).
Create new table for each such part of the key and the group of fields that depend on it and move them to this table. Part of the former primary key will become the primary key of the new table.
Remove fields from the source table that have been moved to other tables, except those that will become foreign keys.

For example, to bring the OrderedProducts table to second normal form, you need to move the fields CustomerID, Address and OrderDate to a new table (let's call it OrdersInfo), and the OrderID field will become the primary key of the new table (Fig. 3).

As a result, the new tables will look like this. However, tables that are in second normal form but not third normal form still contain data modification anomalies. Here they are, for example, for the OrdersInfo table:

A specific customer's address can still only be contained in the database when the customer has ordered at least one product.
Deleting an order entry in the OrdersInfo table will result in deleting the entry for the customer itself.
If the customer has changed the address, several records will have to be updated (although, as a rule, there are fewer of them than in the previous case).

These anomalies can be eliminated by moving to third normal form.

Third normal form

A relational table is said to be in third normal form, if it is in second normal form and all its non-key fields depend only on the primary key.

The OrderDetails table is already in third normal form. The non-key field Quantity is entirely dependent on the composite primary key (OrderID, ProductID). However, the OrdersInfo table is not in third normal form, since it contains a dependency between non-key fields (it is called transitive dependence- transitivedependency) - the Address field depends on the CustomerID field.

To go from second normal form to third normal form, you need to follow these steps:

Define all fields (or groups of fields) on which other fields depend.
Create a new table for each such field (or group of fields) and the group of fields that depend on it and move them to this table. The field (or group of fields) on which all other moved fields depend will become the primary key of the new table.
Remove the moved fields from the original table, leaving only those that will become foreign keys.

To bring the OrdersInfo table to third normal form, create a new Customers table and move the CustomerID and Address fields to it. We will remove the Address field from the source table, and leave the CustomerID field - now it is a foreign key (Fig. 4).

So, after bringing the original table to third normal form, there were three tables - Customers, Orders and OrderDetails.

Benefits of Normalization

Normalization eliminates data redundancy, which allows you to reduce the amount of stored data and get rid of the data change anomalies described above. For example, after reducing the database discussed above to third normal form, the following improvements are evident:

Customer address information can be stored in a database, even if it is only potential client, who has not yet placed any orders.
You can delete information about an ordered product without fear of deleting customer and order information.

Changing a customer's address or order registration date now only requires changing one record.

How databases are designed

Typically, modern DBMSs contain tools that allow you to create tables and keys. There are also utilities supplied separately from the DBMS (and even servicing several different DBMSs simultaneously) that allow you to create tables, keys and relationships.

Another way to create tables, keys and relationships in a database is to write a so-called DDL script (DDL - Data Definition Language; we'll talk about it a little later).

Finally, there is another way that is becoming more and more popular - the use of special tools called CASE tools (CASE stands for Computer-Aided System Engineering). There are several types of CASE tools, but the most commonly used tools for creating databases are entity-relationship diagrams (E/R diagrams). With the help of these tools, the so-called logical a data model that describes facts and objects to be registered in it (in such models, table prototypes are called entities, and fields are called their attributes). After establishing relationships between entities, defining attributes and performing normalization, a so-called physical a data model for a specific DBMS, in which all tables, fields and other database objects are defined. You can then generate either the database itself or a DDL script to create it.

List of currently most popular CASE tools.

Tables and fields

Tables are supported by all relational DBMSs, and their fields can store data different types. The most common data types.

Indexes

A little higher we talked about the role of primary and foreign keys. In most relational DBMSs, keys are implemented using objects called indexes, which can be defined as a list of record numbers indicating in what order to provide them.

We already know that records in relational tables are unordered. However, any record at a particular point in time has a very specific physical location in the database file, although this may change during the process of editing data or as a result of the “internal activities” of the DBMS itself.

Suppose at some point in time the records in the Customers table were stored in this order.

Let's say we need to get this data ordered by the CustomerID field.

1,6,4,2,5,3

Omitting the technical details, we can say that the index on this field is the sequence of record numbers in accordance with which they need to be displayed, that is:

5,4,1,6,2,3

If we want to sort records by the Address field, the sequence of record numbers will be different:

If we need to find data about customers whose CustomerID begins with the characters "BO", we can use the index to find the location of these records (in this case 2 and 5 (obviously, in the index the numbers of these records are consecutive), and then read exactly the second and fifth records, instead of scanning the entire table. Thus, the use of indexes reduces the time of data retrieval.

We have already said that the physical location of records can change during the process of editing data by users, as well as as a result of manipulations with database files carried out by the DBMS itself (for example, data compression, garbage collection, etc.). If corresponding changes occur in the index, it is called supported and such indexes are used in most modern DBMSs. The implementation of such indexes leads to the fact that any change in data in a table entails a change in the indexes associated with it, and this increases the time required by the DBMS to carry out such operations. Therefore, when using such DBMSs, you should create only those indexes that are really needed, and be guided by which queries will be encountered most often.

Restrictions and rules

Most modern server DBMSs contain special objects called restrictions(constraints), or rules(rules). These objects contain information about restrictions placed on possible field values. For example, using such an object, you can set the maximum or minimum value for a given field, and after this the DBMS will not allow you to save a record in the database that does not satisfy this condition.

In addition to the constraints associated with setting the range of data changes, there are also reference constraints (for example, a master-detail relationship between the Customers and Orders tables can be implemented as a constraint requiring that the value of the CustomerId field (foreign key) in the Orders table be equal to one of the existing values of the CustomerId field of the Customers table.

Note that not all DBMSs support restrictions. In this case, you can either use other objects (for example, triggers) to implement similar rule functionality, or store these rules in client applications that work with this database.

Representation

Almost all relational DBMSs support views. This object is a virtual table that provides data from one or more real tables. In reality, it does not contain any data, but only describes their source.

Often such objects are created to store complex queries in databases. In fact, view is a stored query.

The creation of views in most modern DBMSs is carried out by special visual tools that allow you to display the necessary tables on the screen, establish connections between them, select displayed fields, introduce restrictions on records, etc.

Often these objects are used to provide data security, for example, by allowing data to be viewed through them without providing access to the tables directly. In addition, some view objects can return different data depending, for example, on the username, which allows him to receive only the data that interests him.

Triggers and stored procedures

Triggers and stored procedures, supported in most modern server DBMSs, are used to store executable code.

A stored procedure is a special type of procedure that is executed by a database server. Stored procedures are written in a procedural language that depends on the specific DBMS. They can call each other, read and modify data in tables, and can be called from a client application running the database.

Stored procedures are typically used to perform frequently occurring tasks (for example, reconciling a balance sheet). They can have arguments, return values, error codes, and sometimes sets of rows and columns (this data set is sometimes called a dataset). However, the latter type of procedures is not supported by all DBMSs.

Triggers also contain executable code, but unlike procedures, they cannot be called from a client application or stored procedure. A trigger is always associated with a specific table and is executed when the event to which it is associated (for example, inserting, deleting, or updating a record) occurs while editing that table.

In most DBMSs that support triggers, you can define multiple triggers that execute when the same event occurs, and determine the order of execution.

Objects for generating primary keys

Very often, primary keys are generated by the DBMS itself. This is more convenient than generating them in client application, since in multi-user work, generating keys using a DBMS is the only way to avoid duplication of keys and obtain their consistent values.

Different DBMSs use different objects to generate keys. Some of these objects store an integer and the rules by which the next value is generated, usually done using triggers. Such objects are supported, for example, in Oracle (in which case they are called sequences) and in IB Database (in which case they are called generators).

Some DBMSs support special types fields for primary keys. When adding records, such fields are filled automatically with sequential values (usually integers). In the case of Microsoft Access and Microsoft SQL Server, such fields are called Identity fields, and in the case of Corel Paradox, they are called Autoincrement fields.

Users and roles

Preventing unauthorized access to data is a serious problem that can be solved in different ways. The simplest is password protection of either the entire table or some of its fields (this mechanism is supported, for example, in Corel Paradox).

Currently, another method of protecting data is more popular - creating a list of users with user names and passwords. In this case, any database object is owned by a specific user, and that user grants permission to other users to read or modify data from that object, or to modify the object itself. This method is used in all server and some desktop DBMSs (for example, Microsoft Access).

Some DBMSs, mainly server ones, support not only a list of users, but also roles. A role is a set of privileges. If a specific user receives one or more roles, and with them all the privileges defined for this role.

Database queries

Modification and selection of data, changing metadata and some other operations are carried out using queries. Most modern DBMSs (and some application development tools) contain tools for generating such queries.

One way to manipulate data is called “queries by example” (QBE). QBE is a tool for visually linking tables and selecting which fields to display in the query result.

In most DBMSs (with the exception of some desktop ones), visual construction of a query using QBE leads to the generation of query text using a special query language SQL (Structured Query Language). You can also write the query directly in SQL.

Cursors

Often the result of a query is a set of rows and columns (dataset). Unlike a relational table, the rows in such a set are ordered, and their order is determined by the original query (and sometimes by the presence of indexes). Therefore, we can define the current row in such a set and a pointer to it, which is called a cursor.

Most modern DBMSs support so-called bi-directional cursors, which allow you to navigate through the resulting data set both forward and backward. However, some DBMSs only support unidirectional cursors, which allow only forward navigation through a data set.

SQL language

Structured Query Language (SQL) is a non-procedural language used to formulate database queries in most modern DBMSs and is currently an industry standard.

The non-procedural nature of the language means that it can indicate what needs to be done with the database, but it cannot describe the algorithm for this process. All algorithms for processing SQL queries are generated by the DBMS itself and do not depend on the user. The SQL language consists of a set of statements that can be divided into several categories:

Data Definition Language (DDL) - a data definition language that allows you to create, delete and change objects in databases
Data Manipulation Language (DML) - a data management language that allows you to modify, add and delete data in existing database objects
Data Control Languages (DCL) - the language used to control user privileges
Transaction Control Language (TCL) - a language for managing changes made by groups of operators
Cursor Control Language (CCL) - operators for cursor definition, preparation SQL statements to perform some other operations.

We will tell you more about the SQL language in one of the following articles in this series.

User Defined Functions

Some DBMSs allow the use of user-defined functions (UDF-User-Defined Functions). These functions are typically stored in external libraries and must be registered in the database before they can be used in queries, triggers, and stored procedures.

Because user-defined functions are contained in libraries, they can be created using any development tool that allows you to create libraries for the platform on which the DBMS runs.

Transactions

A Transaction is a group of operations on data that are either performed together or canceled together.

Committing a transaction means that all operations included in the transaction have been successfully completed and the result of their work has been saved in the database.

Rollback of a transaction means that all already completed operations that were part of the transaction are canceled and all database objects affected by these operations are returned to their original state. To implement the ability to roll back transactions, many DBMSs support writing to log files, which allow you to restore the original data during a rollback.

A transaction can consist of several nested transactions.

Some DBMSs support two-phase commit, a process that allows transactions to be carried out on multiple databases belonging to the same DBMS.

To support distributed transactions (that is, transactions on databases managed by different DBMSs), there are special means, called transaction monitors.

Conclusion

In this article, we discussed the basic concepts of building relational DBMSs, the basic principles of data design, and also talked about what objects can be created in databases.

In the next article we will introduce our readers to the most popular desktop DBMSs: dBase, Paradox, Access, Visual FoxPro, Works and discuss their main features.

ComputerPress 3"2000

In this chapter we will highlight and characterize the main classes of the DBMS.

The main classification of DBMS is based on the database model used. Based on this criterion, several classes of DBMS are distinguished: hierarchical, network, relational, object and others. Some DBMSs can simultaneously support multiple data models.

Earlier DBMSs, such as hierarchical and network ones, have a tree structure and are built on the “Ancestor - Descendant” principle. But such systems have already become obsolete and are used less and less.

Relational DBMSs replaced hierarchical and network ones.

Characteristics of relational DBMS

The first theoretical developments in the field of relational DBMSs were obtained back in the 70s, at the same time the first prototypes of relational DBMSs appeared. For a long time it was considered impossible to achieve effective implementation of such systems. However, the gradual accumulation of methods and algorithms for organizing and managing relational databases led to the fact that already in the mid-80s, relational systems practically replaced early DBMSs from the world market.

The relational approach to organizing a DBMS assumes the presence of a set of relationships (two-dimensional tables) interconnected. A relationship in this case is the association of two or more relations (tables). A database that does not have connections between relationships has a very limited structure and cannot be called relational. Queries to such databases return a table that can be reused in the next query. The data in some tables, as we said, is related to the data in other tables, which is where the name “relational” comes from.

The relational approach to building a DBMS has a number of advantages. Baydak A.Ya., Bulgakov A.A. Modern DBMS and their application in the energy sector [Electronic resource]. - Access mode: http: //masters. donntu.edu.ua/2010/etf/baydak/library/article2. htm. - Cap. from the screen:

The presence of a small set of abstractions that make it possible to model most common problem domains in a relatively simple manner and allow for precise formal definitions while remaining intuitive;

The presence of a simple and at the same time powerful mathematical apparatus, based mainly on set theory and mathematical logic and providing the theoretical basis for the relational approach to organizing databases;

Possibility of non-navigational data manipulation without the need to know the specific physical organization of databases in external memory.

The relational model has a strong theoretical basis. This theory contributed to the creation of the declarative language SQL, which has now become the standard language for defining and manipulating relational databases. Other strengths relational model- simplicity, suitability for online transaction processing (OLTP) systems, ensuring data independence. However, the relational data model and the relational DBMS, in particular, have certain disadvantages.

The main disadvantage of relational DBMSs is the inherent limited use of these systems in areas that require fairly complex data structures. One of the main aspects of the traditional relational data model is the atomicity (uniqueness and indivisibility) of data, which is stored at the intersection of the rows and columns of the table. This rule was the basis of relational algebra when it was developed as a mathematical data model. In addition, the specific implementation of the relational model does not allow us to adequately reflect the real connections between objects in the described subject area. These restrictions significantly hinder the effective implementation modern applications, which require slightly different approaches to data organization.

The core principle of the relational model is to eliminate duplicate fields and groups through a process called normalization. Flat normalized tables are universal, easy to understand, and theoretically sufficient for presenting data in any subject area. They are well suited for data storage and display applications in traditional industries such as banking or accounting systems, but their application in systems based on more complex data structures is often difficult. This is mainly due to the primitiveness of the data storage mechanisms underlying the relational model Nikitin M. Has the era of relational DBMSs ended? [Electronic resource]. - Access mode: http: //www.cnews.ru/reviews/free/marketBD/articles/articles2. shtml. - Cap. from the screen.

Today, well-known manufacturers of relational DBMS are the following - ORACLE, Informix, IBM (DB2), Sybase, Microsoft (MS SQL Server), Progress and others. In their products, DBMS manufacturers focus on working on various types computers (from mainframes to laptops) and on various operating systems (OS). Also, DBMS manufacturers have not ignored products that run on desktop computers, such as dBase, FoxPro, Access and the like. These DBMSs are designed to work on a PC and solve local problems on one PC or a small group of PCs. Often, DBMS data is used as a mirror image of a small part of the overall corporate DBMS, to minimize the required hardware and resource costs for solving small problems.

Different DBMSs run under different operating systems and hardware. The most famous among these operating systems are UNIX, VAX, Solaris, Windows. Depending on the volume of data storage, the number of users accessing the data simultaneously, and the complexity of the tasks, various DBMSs are used on different platforms. For example, the Oracle DBMS on Unix, installed on a multiprocessor server, allows you to solve problems of providing data to hundreds of thousands of users Ponomarev I.S. Database management systems [Electronic resource]. - Access mode: http: //mathmod. aspu.ru/images/File/Ponomareva/TM10_About%20BD. pdf. - P. 2.

Currently, operational-oriented DBMSs are of greatest interest. Windows system using the Intel platform.

IN Lately allegations are heard about possible change paradigms – from relational to post-relational DBMS. However, according to analysts, so far it is relational DBMSs that are used in the vast majority of large projects related to the implementation of database management systems. The market clearly adheres to traditional approaches to solving such problems.

Database management systems (DBMS) are one of the fundamental components of computer software information processes, which is the basis for building most modern information systems. The main function of a DBMS is the efficient storage and provision of data in the interests of specific application tasks.

Commercial DBMSs date back to the mid-60s, when IBM released the first product of this class - the hierarchical DBMS IMS. In the early 70s, Edgar Codd laid the foundations of the relational data model, developed the structured query language SQL, and in the 80s, industrial DBMSs were created, which soon took a dominant position. Currently, the top three players - Microsoft, Oracle and IBM - completely control the market, with their flagship products Microsoft SQL Server, Oracle Database and IBM DB2 together holding a market share of about 90%. The DBMS market is actively growing and, according to Forrester analysts, by 2013 its total volume will reach $32 billion.

The main disadvantage of relational DBMSs is the inherent limited use of these systems in areas that require fairly complex data structures. One of the main aspects of the traditional relational data model is the atomicity (uniqueness and indivisibility) of data, which is stored at the intersection of the rows and columns of the table. This rule was the basis of relational algebra when it was developed as a mathematical data model. In addition, the specific implementation of the relational model does not allow us to adequately reflect the real connections between objects in the described subject area. These limitations significantly hinder the effective implementation of modern applications, which require slightly different approaches to data organization.

The core principle of the relational model is to eliminate duplicate fields and groups through a process called normalization. Flat normalized tables are universal, easy to understand, and theoretically sufficient for presenting data in any subject area. They are well suited for data storage and display applications in traditional industries such as banking or accounting systems, but their use in systems based on more complex data structures is often difficult. This is mainly due to the primitiveness of the data storage mechanisms underlying the relational model.

Experience in the development of applied information systems has shown that abandoning the atomicity of values leads to a qualitatively useful expansion of the data model. The introduction into the relational model of the ability to use multivalued fields as independent nested tables, provided that the nested table satisfies general criteria, allows us to naturally expand the capabilities of relational algebra. In the classical sense, this type of data model is called post-relational.

Since the post-relational model uses multidimensional structures that allow other tables to be stored in table fields, it is also called “not first normal form” or “multidimensional database”. The language used in this query model is advanced SQL, which allows you to retrieve complex objects from one table without join operations. We can say that relational and post-relational DBMSs differ in the ways they store and index data, but in all other respects they are similar. The first post-relational DBMSs to become quite famous were Ardent's Universe (later acquired by Informix, which in turn was acquired by IBM) and Software AG's ADABAS.

Object-relational DBMS

In addition to avoiding normalization, post-relational DBMSs allow you to store data of abstract, user-defined types in relation fields. This makes it possible to solve problems of a new level, store objects and data arrays focused on specific subject areas, and also makes post-relational DBMSs similar to another class - object-oriented DBMSs. The introduction of the object approach into the traditional relational model gave rise to the emergence of another direction - object-relational DBMS. The first representative of this class of systems is considered to be the Informix Universal Server system of the company of the same name.

As you know, the object-oriented approach to domain modeling is based on such concepts as an object and the properties of encapsulation, inheritance and polymorphism. Unlike relational DBMSs, when designing object-oriented databases, decomposition and normalization of objects allocated at the conceptual design stage are not required. Objects are presented in the same form in which they exist in reality, which gives object-oriented structures visibility and can significantly reduce the time for their design and development.

One of the most famous post-relational DBMS is the Postgres system, created in the mid-80s of the last century under the leadership of one of the leading DBMS developers, Michael Stonebraker. Stonebraker had (and continues to have) a huge influence on the DBMS industry, having a hand in almost all promising developments in this area. Postgres extended the traditional relational model by introducing object management mechanisms that allowed non-traditional data types to be stored and efficiently managed. Postgres also supported a multidimensional temporal model of data storage and access. All the main ideas and developments of Postgers were continued and developed in the freely distributed PostgreSQL DBMS, which is currently the most developed open DBMS.

Often, post-relational DBMSs are also called post-relational DBMSs, which allow you to present data both in the form of relational tables and object classes. A typical representative of this type of DBMS is the Cache system from InterSystems. According to its developers, this system most effectively combines relational and object approaches, based, respectively, on the SQL-92 and ODMG 2.0 standards. Mechanisms for working with objects and relational tables are on the same logical level, which provides more high speed access and work with data and functional completeness. Cache also uses a multidimensional data storage model and is optimized for transaction processing in systems with large and ultra-large databases (hundreds of gigabytes, terabytes) and big amount(thousands, tens of thousands) of concurrent users, while allowing for very high performance.

Development prospects

Modern industrial DBMSs are complex systems consisting of various elements, technologies and approaches. These components are combined and improved based on the needs of providing ideal conditions to solve problems of managing large volumes of data in various conditions. At the same time, all developers carry out large-scale research work. Many years of experience in developing a DBMS have shown that it takes a lot of time to ensure efficient, reliable and error-free operation of new functionality. Fierce competition in the DBMS market forces manufacturers to carefully monitor competitors' products, identify new trends, and the emergence of important new capabilities in one of the vendors forces others to implement similar functionality in their developments.

In turn, the needs of modern database developers are growing. First of all, this is due to the rapid development of the Internet, the active use of multimedia and the need to process semi-structured data.

According to the results of research by IDC, published at the end of 2009, traditional relational DBMSs are used in the vast majority of large projects related to the implementation of database management systems. Only about 7% are projects that use non-relational DBMSs. This balance of power in the real implementation market reflects the general situation: developers still actively adhere to traditional approaches to solving problems associated with the use of DBMS.

All of the above suggests that the development strategy chosen by the leading players in the DBMS market will allow them to continue to maintain their leadership positions. Their main products will be improved and sold new functionality, and developers will continue to choose universal and time-tested traditional solutions.

Maxim Nikitin

A data bank is understood as a set of databases, as well as software, language and other tools intended for the centralized accumulation of data and their use using electronic computers.

The data bank includes one or more databases, a database directory, a database management system (DBMS), as well as libraries of queries and application programs.

The data bank is designed to store large amounts of information, quick search necessary information and documents.

A data bank is created in a subscriber system of any performance - from personal computer to a supercomputer. But even the largest data bank is limited in its capabilities. Therefore, online banks specialize by collecting information in certain areas of science, technology, and products. The core of the bank is databases and knowledge bases. A database is an organized structure designed to store information. Data and information are interrelated concepts, but not identical, I should note the inconsistency in this definition. Today, most database management systems (DBMS) allow you to place in their structures not only data, but also methods (that is, program code) through which interaction with the consumer or with other software and hardware systems occurs. Thus, we can say that modern databases store not only data, but also information.

BnD has special tools that make it easier for users to work with data (DBMS).

Centralized data management has advantages over a conventional file system:

— reduction of data storage redundancy;

— reducing the labor intensity of development, operation and modernization of information systems;

Ensuring convenient access to data as users

both data professionals and end users.

Basic requirements for BnD:

- adequacy of the display of the subject area (completeness, integrity and - consistency of data, relevance of information;

— the ability to interact between users of different categories, high efficiency of data access;

— friendly interfaces, short training time;

— ensuring secrecy and restricting access to data for different users;

— reliability of storage and data protection.

The definition of the term Data Bank is given in the Temporary Regulations on State Accounting and Registration of Databases and Data Banks, approved by Decree of the Government of the Russian Federation dated February 28, 1996 No. 226, clause 2 (SZ RF, 1996, No. 12, Art. 1114)

Originally (early 60s) used file system storage To solve primarily engineering problems, characterized by a small amount of data and a significant amount of calculations, the data was stored directly in the program. A consistent way of organizing data was used, there was high redundancy, identical logical and physical structures, and complete data dependence. With the advent of economic and managerial tasks (management information system - MIS), characterized by large volumes of data and a small proportion of calculations, this organization of data turned out to be ineffective. Data ordering was required, which, as it turned out, could be carried out according to two criteria: use (information arrays); storage (databases). Initially, information arrays were used, but the superiority of databases soon became clear. The use of files to store only data was proposed by Mac Gree in 1959. Methods of access (including random access) to such files were developed, while the physical and logical structures were already different, and the physical location of the data could be changed without changing the logical representation.

In 1963, S. Bachman built the first industrial IDS database with a networked data model, which was still characterized by data redundancy and its use for only one application. Data was accessed using the appropriate software. In 1969, a group was formed that created a set of CODASYL standards for network model data.

In fact, modern database architecture began to be used. Architecture is understood as a type (generalization) of a structure in which any element can be replaced by another element, the characteristics of the inputs and outputs of which are identical to the first element. A significant leap in the development of database technology was given by the paradigm of the relational data model proposed by M. Codd in 1970. A paradigm is understood as a scientific theory embodied in a system of concepts reflecting the essential features of reality. Now logical structures could be obtained from the same physical data, i.e. The same physical data could be accessed by different applications through different paths. It has become possible to ensure data integrity and independence.

At the end of the 70s, modern DBMSs appeared, providing physical and logical independence, data security, and having developed database languages. The last decade is characterized by the emergence of distributed and object-oriented databases, the characteristics of which are determined by the applications of automated design tools and database intellectualization.

Closely related to the concept of a database is the concept of a database management system - this is a set of software tools designed to create the structure of a new database, fill it with content, edit content and visualize information. Visualization of database information means the selection of displayed data in accordance with a given criterion, their ordering, design and subsequent output to an output device or transmission via communication channels.

There are many database management systems in the world. Although they may work differently with different objects and provide the user with various functions and tools, most DBMSs rely on a single, well-established set of basic concepts. This gives us the opportunity to consider one system and generalize its concepts, techniques and methods to the entire class of DBMS.

The DBMS organizes the storage of information in such a way that it is convenient:

browse,

replenish,

change,

look for the information you need,

make any selections

sort in any order.

Database classification:

a) according to the nature of the stored information:

Factual (card indexes),

Documentary (archives)

b) according to the method of data storage:

Centralized (stored on one computer),

Distributed (used in local and global computer networks).

c) according to the data organization structure:

Tabular (relational),

Hierarchical,

Modern DBMSs make it possible to include not only textual and graphic information, but also sound fragments and even video clips.

The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data, as well as easy access to them.

Popular DBMS - FoxPro, Access for Windows, Paradox. For less complex applications, instead of DBMS, information retrieval systems (IRS) are used, which perform the following functions:

storing a large amount of information;

quick search for required information;

adding, deleting and changing stored information;

output it in a form convenient for humans.

Information in databases is structured into individual records, which are a group of related data elements. The nature of the relationship between records determines two main types of database organization: hierarchical and relational.

If there is no data in the database (empty database), then it is still a full-fledged database. This fact has methodological significance. Although there is no data in the database, there is still information in it - this is the structure of the database. It defines methods for entering data and storing it in the database. The simplest “non-computer” version of a database is a business diary, in which each calendar day is allocated a page. Even if not a single line is written in it, it does not cease to be a diary, since it has a structure that clearly distinguishes it from notebooks, workbooks and other stationery products.

Databases can contain various objects, but the main objects of any database are its tables. The simplest database has at least one table. Accordingly, the structure of the simplest database is identical to the structure of its table.

Currently, there is a rapid growth in the number of electronic commerce systems (ECS). E-commerce has a number of distinctive features that sharply distinguish it from all previous ones. known methods classical commerce thanks to the exceptional communication characteristics of the Internet

E-commerce systems must have the ability to coordinate business transactions across multiple business applications, be able to extract individual pieces of information from various sources, and deliver them to the customer in a timely and seamless manner. necessary information, - all based on a single user Web request.

SECs have a set of specific properties that distinguish them from classical commerce systems (regular stores, supermarkets, stock exchanges, etc.). At the same time, these properties must be taken into account when constructing and analyzing process models in SEC, since the classical formulation of the optimization problem optimal control discrete system doesn't fit. So, the properties of SEC: The operating time is unlimited, unlike classical systems, where there is a strictly regulated work schedule. We can say that the flow of visitors is distributed evenly over time. Unlike classical systems in SEC (this is especially typical for B2C class systems), visitors come not only to make purchases, but also to receive some information: to get acquainted with the assortment, prices, terms of payment and delivery of goods.

At the same time, classic systems are characterized by such a feature that visitors are very likely to become buyers. Therefore, it is possible to consider various models and methods for assessing the efficiency of SEC functioning: the ratio of the number of buyers to the number of visitors, the impact of the operation of SEC and feedback to the input flow of applications.

It is typical for SECs that many visitors come there several times to get some information, and only after they are satisfied with all the conditions will they make a purchase.

SEC can serve a fairly large number of visitors at the same time. This characteristic is limited only by the software and hardware capabilities of the SEC. That is, in the case of SEC, from the user’s point of view, there are no queues waiting for service. This is especially true for fully or partially automated SECs.

In SEC, a case is possible when a visitor, who has placed products in a virtual cart, leaves the system without making a purchase (it is natural that all products remain in the system, since it is simply impossible to steal them). Drawing an analogy with classical shopping systems, it is again difficult to imagine a situation where a visitor, upon entering a store, first loads a full cart with goods, and then unloads everything and leaves the store. In SEC this case is possible if the set of control factors is not optimal (or suboptimal)

Database management systems allow you to combine large amounts of information and process them, sort them, make selections according to certain criteria, etc.

Modern DBMSs make it possible to include not only text and graphic information, but also sound fragments and even video clips. The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data.

1.2 Relational databases

Relational DBMS (RSDBMS; otherwise Relational Database Management System, RDBMS) is a DBMS that manages relational databases.

The concept of relation is associated with the developments of the famous English specialist in the field of database systems, Edgar Codd.

These models are characterized by simplicity of data structure, user-friendly tabular representation and the ability to use the formal apparatus of relational algebra and relational calculus for data processing.

The relational model focuses on organizing data in the form of two-dimensional tables. Each relational table is two-dimensional array and has the following properties:

– each table element is one data element;

– all columns in the table are homogeneous, that is, all elements in the column have the same type (numeric, character, etc.);

– each column has a unique name;

– identical lines not in the table;

– the order of rows and columns can be arbitrary.

The basic concepts of relational DBMS are: 1) attribute; 2) relationships; 3) tuple.

A database, then, is nothing more than a collection of tables. RDBS and record-oriented systems are organized based on the B-Tree standard or the Indexed Sequential Access Method (ISAM) and are the standard systems used in most modern software products. To enable the combination of tables to define relationships between data that are almost completely absent in most software implementations B-Tree and ISAM use languages like SQL (IBM), Quel (Ingres) and RDO (Digital Equipment), with SQL now the industry standard and supported by all relational DBMS vendors.

Original SQL version is an interpreted language designed to perform operations on databases. The SQL language was created in the early 70s as an interface for interacting with databases based on the then new relational theory. Real applications are usually written in other languages that generate SQL code and pass it to the DBMS as ASCII text. It should also be noted that almost all real relational (and not only relational) systems, in addition to implementing the ANSI SQL standard, now known in the latest edition under the name SQL2 (or SQL-92), include additional extensions, for example, support for client-server architecture or application development tools.

The table rows are made up of fields that are known in advance to the database. Most systems cannot add new data types. Each row in the table corresponds to one record. The position of a given row can change as new rows are deleted or inserted.

To uniquely identify an element, it must be associated with a field or set of fields that guarantees the element's uniqueness within the table. This field or fields are called the table's primary key and are often numbers. If one table contains the primary key of another, this allows you to organize a relationship between elements different tables. This field is called a foreign key.

Since all the fields of one table must contain a constant number of fields of predefined types, it is necessary to create additional tables that take into account the individual characteristics of the elements using foreign keys. This approach greatly complicates the creation of any complex relationships in the database. Another major disadvantage of relational databases is the high complexity of manipulating information and changing relationships.

Despite the considered disadvantages of relational databases, they have a number of advantages:

dividing tables using different programs;

expanded “return code” for errors;

high speed of query processing (SQL SELECT command; the result of the selection is a table that contains fields that satisfy the specified criterion);

the concept of object databases itself is quite complex and requires serious and lengthy training from programmers;

relatively high speed when working with large amounts of data.

In addition, significant amounts of money have already been invested in relational DBMSs around the world. Many organizations are not confident that the costs associated with migrating to object databases will be worth it.

Therefore, many users are interested in a combined approach that would allow them to take advantage of the benefits of object databases without completely abandoning their relational databases. Such solutions do exist. If the transition from a relational database to an object database is too expensive, then using the latter as an extension and addition to relational DBMS is often a more cost-effective alternative. Compromise solutions allow you to maintain a balance between objects and relational tables.

Object-relational adapters - uh This method involves the use of the so-called object-relational adapter, which automatically allocates program objects and stores them in relational databases. An object-oriented application works like an ordinary DBMS user. Although there is some performance penalty, this option allows programmers to concentrate entirely on object-oriented development. In addition, all applications in the enterprise can still access data stored in relational form.

Some object DBMSs, such as GemStone from GemStone Systems, can themselves act as a powerful object-relational adapter, allowing object-oriented applications to access relational databases.

Object-relational adapters, such as Hewlett-Packard's Odapter for Oracle, can be useful in many areas, such as middleware that integrates object-oriented applications with relational databases.

Object-relational gateways - p When using this method, the user interacts with the database using the OODBMS language, and the gateway replaces all object-oriented elements of this language with their relational components. You again have to pay for this in productivity. For example, a gateway must transform objects into a set of relationships, generate original identifiers (OIDs) of objects, and pass this to a relational database. The gateway must then, each time the RDBMS interface is used, convert the OID found in the database to the corresponding object stored in the RDBMS.

Performance in the two approaches considered depends on the method of accessing the relational database. Each RDBMS consists of two layers: the data manager layer and the storage manager layer. The first of them processes statements in SQL language, and the second displays data into the database. The gateway or adapter can interact with both the data layer (that is, access the RDBMS when SQL help), and with the carrier level (low-level procedure calls). Performance in the first case is much lower (for example, the Hewlett-Packard OpenODB system, which can act as a gateway, supports only at a high level).

Hybrid DBMS - e Another solution would be to create hybrid object-relational DBMSs that can store both traditional tabular data and objects. Many analysts believe that the future lies with such hybrid databases. Leading relational DBMS vendors are beginning (or planning) to add object-oriented capabilities to their products. In particular, Sybase and Informix are assembled in next versions DBMS introduce object support. Independent companies also intend to conduct similar developments. For example, the Shores company is preparing to equip the Oracle8 DBMS with object-oriented tools, which is scheduled for release at the end of 1996.

On the other hand, object DBMS vendors such as Object Design are aware that object-oriented databases will not replace relational DBMSs in the foreseeable future. This forces them to create gateways to support relational and hierarchical databases or various kinds of interfaces, a typical example of which is the Ontos Integration Server object-relational interface from Ontos, used in combination with its Ontos/DB OODB.

1.3 Multidimensional databases

A powerful database with a special storage organization - cubes, allowing users to analyze large volumes of data. A multidimensional database allows for high speed work with data stored as a collection of facts, dimensions and pre-computed aggregates.

In specialized DBMSs based on a multidimensional representation of data, data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays:

Hypercubes - all cells stored in the database must have the same dimension, that is, be in the most complete measurement basis

Polycubes - each variable is stored with its own set of dimensions, and all associated processing complexities are transferred to the internal mechanisms of the system.

The use of multidimensional databases in online analytical processing systems has the following advantages:

high performance. Products belonging to this class usually have a multidimensional database server. Data in the analysis process is selected exclusively from a multidimensional structure, and in this case, searching and retrieving data is much faster than with a multidimensional conceptual view of a relational database, since the multidimensional database is denormalized, contains pre-aggregated indicators and provides optimized access to the requested cells

searching and retrieving data is carried out much faster than with a multidimensional conceptual view of a relational database - the average response time to an ad hoc query when using a multidimensional DBMS is usually one to two orders of magnitude less than in the case of a relational DBMS with a normalized data schema

structure and interfaces the best way correspond to the structure of analytical queries. This method is more akin to the human mental model, since the analyst is accustomed to operating with flat tables. By cutting a cube with a two-dimensional plane in one direction or another, it is easy to obtain the interdependence of any pair of quantities relative to the chosen measure. For example, how the cost of manufacturing a product (measure) changed over time (dimension) broken down by sections, workshops and production facilities (another dimension)

multidimensional DBMSs easily cope with the tasks of including various built-in functions in the information model, while objectively existing limitations of the SQL language make performing these tasks based on relational DBMSs quite difficult and sometimes impossible.

MOLAP can only work with its own multidimensional databases and is based on proprietary technologies for multidimensional DBMSs, therefore it is the most expensive. These systems provide a full cycle of OLAP processing and either include, in addition to the server component, their own integrated client interface, or use external programs to communicate with the user. spreadsheets. To maintain such systems, a special staff of employees is required to install, maintain the system, and create data views for end users.

Other disadvantages of MOLAP models include:

do not allow working with large databases. Today their real limit is 10-20 gigabytes. In addition, due to denormalization and pre-executed aggregation, 20 gigabytes in a multidimensional database, as a rule, correspond (according to Codd’s estimate) to 2.5-100 times less volume of the original detailed data, that is, best case scenario several gigabytes.

Compared to relational ones, they use external memory very inefficiently. Hypercube cells are stored in them in the form of logically ordered arrays (fixed-length blocks), and such a block is the minimum indexed unit. Although multidimensional DBMSs do not store blocks that do not contain any specific value, this only partially solves the problem. Because data is stored in an ordered manner, undefined values are not always completely removed, and then only when the sort order allows the data to be organized into the largest possible contiguous groups. But the sort order most often used in queries may not be the order in which they should be sorted to maximize the elimination of non-existent values. Thus, when designing a multidimensional database, you often have to sacrifice either performance (and this is one of the first advantages and the main reason for choosing a multidimensional DBMS), or external memory(although, as noted, maximum size multidimensional database is limited)

there are no uniform standards for the interface, languages for describing and manipulating data

do not support data replication, which is often used as a loading mechanism. Therefore, the use of multidimensional DBMS is justified only under the following conditions:

the volume of source data for analysis is not too large (no more than a few gigabytes), that is, the level of data aggregation is quite high.

the set of information dimensions is stable (since any change in their structure almost always requires a complete restructuring of the hypercube).

The system's response time to unregulated requests is the most critical parameter.

required wide use complex built-in functions for performing cross-dimensional calculations on hypercube cells, including the ability to write custom formulas and functions.

2. Practical part

2.1 Problem statement

2.1.1 Purpose of solving the problem

The management of the company Stroy-design LLC, which carries out activities related to the performance of work on the repair of premises, wants to automate calculations for calculating the cost of work performed in order to promptly provide an invoice to the client. This will help reduce settlement time, avoid human errors and increase customer satisfaction with the services provided. Therefore, it was decided to calculate the cost of work performed and create an invoice for payment, which should contain the name of the work, the volume of work performed, the price per unit of product, and the cost of the work. The task that will be solved in the MS Excel software environment on a monthly basis is called “Calculating the cost of completed work.”

The goal of solving this problem is to timely calculate the cost of work for the prompt provision of a detailed invoice to clients.

2.1.2 Problem condition

Input operational information The document “Calculation of the cost of work performed” is used, which contains the details: name of the work, volume of work performed, price per unit of product (rub.), cost of work (rub.), the last two details must be calculated and calculated. Based on it, the following screen form is created:


Name work	Units measurements	Volume carried out works		Price works, rub.
		Qi	C i	S i

Conditionally permanent information (reference) serves as the price list of the organization, containing the following details ( conditional form): name of work, price per unit of production (rub). Based on it, the following screen form is created:

Price list

Job title	Price per unit of production, rub.

Latin letters in the table indicate the elements of the corresponding calculation formulas.

As a result You should receive an invoice with the following details: name of the work, price per unit of product (rubles), volume of work performed, cost of work (rubles), invoice number (filled in automatically). The client's name and date are entered manually. Information is provided in the following documents:

Structure of the resulting document “Invoice”

Stroyservis LLC
ACCOUNT No.
			date		20__
		Client's full name

№ p/p		Name work		Units measurements	Volume carried out works	Price per unit of production, rub.	Price works, rub.
		Replacing batteries		PC.
		Wallpaper sticker		m 2
		Pipe replacement
		Parquet flooring		m 2
TOTAL:							ΣS i
VAT:							N
VALUE WITH VAT:							S.N.

	Ch. accountant

In addition, the information contained in the tables for analysis must be presented in the form of diagrams.

In technology, organize inter-table connections for the automatic generation of the “Invoice” document using the VLOOKUP or VIEW functions.

2.2. Computer model for solving the problem

2.2.1. Information model for solving a problem

Information model, reflecting the relationship between the source and resulting documents, is shown in Fig. 2.

2.2.2. Analytical model for solving the problem

To receive the document " Calculation of the cost of performed
works » it is necessary to calculate the following indicators:

cost of work, rub.;

VAT, rub.;

amount including VAT, rub..

Calculations are performed using the following formulas:

S i = C i ∙Q i ,

N = ΣS i ∙ 0.18 ,

SN = ΣS i + N,

Where S i
- price i th work; C i
- price for i-th unit of production; Q i - volume of work performed i th work; N- VAT;S.N.- Value with VAT.

2.2.3. Technology for solving MS Excel problems

Solving the problem using MS Excel

Call Excel:

click the "Start" button;

select the “Programs” command in the main menu;

From the Microsoft Office menu, select MS Excel.

Rename “Sheet 1” to “Price List”:

select in context menu“Rename” command and click left button mice;

press the "Enter" key.

Enter the table title "Price List":

type “Price list” on the keyboard;

4. Format the title:

Rice. 2. Example of selecting a group of cells

On the toolbar in the “Home” tab, select the “Alignment” section and click the button.

5. Format cells A2:B2 for entering long headings:

select cells A2:B2;

execute the “Alignment” command in the “Format Cells” section of the “Home” menu on the toolbar;

select the “Alignment” tab;

in the “Display” option group, check the “word wrap” option (Fig. 3);

Rice. 3. Setting word hyphenation when entering long words into a cell

headers

Click OK.

6. Enter in cells A2:B2 the information presented in Fig. 4.

Rice. 4. Names of fields in the “Price list” table

7. Format cells A3:A8 to enter text characters:

select cells A3:A8;

on the toolbar in the “Home” menu, select “Cells”, where in the “Format” item, select the “Format Cells” command;

select the “Number” tab;

select the “Text” format (Fig. 5);

Click OK.

Rice. 5. Selecting cell format

8. Repeat step 9 for the range of cells B3:B8, selecting the “Numeric” format.

9. Enter the initial data (Fig. 6).

Rice. 6. “Price list” table view

10. Give the group of cells a name:

select cells A3:B8;

select the “Assign a name” command in the “Defined names” section of the “Formula” menu (Fig. 7);

Rice. 7. View of the “Name Creation” window

Click the "OK" button.

11. Rename “Sheet 2” to “Calculation of the cost of work” (similar to steps in step 2).

12. Create a table “Calculation of the cost of work performed” (similar to steps 3 - 7, 8) (Fig. 8).

Rice. 8. Table view “Calculation of cost of work”

13. Fill in the columns “Name of work” and “Price per unit of product, rub.”:

make cell A3 active;

in the “Data” menu, select the “Data Validation” command, in the “Data Type” field of which select “List”;

enter the value in the “Source” field, highlighting the range A3:A8 in the “Price List” (Fig. 9);

Rice. 9. Setting up a list of payers

click "OK" button;

in order to enter the name of a job from the list in each cell of column A (“Job Name”), make cell A3 active and, placing the cursor on the marker in the lower right corner, left-click and drag it to cell A6 (Fig. 10 );

Rice. 10. View of the “Work cost calculation” sheet when setting up the list

in the “Select function” field, click “VLOOKUP” (Fig. 11);

Rice. 11. View of the first window of the function wizard

click "OK" button;

enter the name of the work in the “Searched_value” field by clicking on cell A3;

press "Enter";

enter information in the “Table” field;

use the “Use in formula” command of the “Formulas” menu, selecting “Insert names”;

select “Name:” “Price_list” (Fig. 12);

Rice. 12. Entering an array name as a formula argument

click "OK" button;

press "Enter";

enter the information - number 2 in the "Column_Number" field;

enter the information - number 0 in the “Interval_viewing” field (Fig. 13);

Rice. 13. View of the second window of the function wizard

Click "OK" button;

14. Fill in the column “Scope of work performed.”

15. Enter the names of the works in cells A4:A6:

Make cell A4 active;

Click on the button next to cell A4 and from the proposed list select the name of the work - Battery replacement, pcs. Cell C4 - “Price per unit of production, rub.” will be filled in automatically (Fig. 14);

Rice. 14. Automatic filling of the Price per unit of product by its name

similarly fill in cells A5:A6, cells C5:C6 will also be filled in automatically.

16. Fill in the column “Cost of work, rubles”
table “Calculation of the cost of work performed.”
For this:

enter the formula =B3*C3 into cell D3;

multiply the formula entered in cell D3 for the remaining cells D4:D6 of this column (using the autocomplete function).

Thus, a loop will be executed whose control parameter is the line number.

17. The completed table looks like this (Fig. 15).

Rice. 15. Result of filling out the table “Calculation of the cost of work”

18. Rename "Sheet 3" to " Check "(similar to steps in paragraph 2).

19. On the “Account” worksheet, create the necessary table, following the previous paragraphs.

20. Use the LOOKUP() function to create inter-table relationships. However, before doing this, sort the values of the table “Calculations of the cost of work performed” in ascending order by the column “Name of work”. For this:

select the range of cells A2:D6;

select “Sorting and Filter” on the Home page, and then “Custom Sorting”;

in the window that pops up, select “Sort by” “Name of works”;

Click OK.

use the “Insert Function” command in the “Formula” menu;

in the “Select a function” field, click “VIEW”;

click "OK" button;

enter the name of the work in the “Searched_value” field by clicking on cell C9;

press "Enter";

enter information in the “Vector to be viewed” field, namely ‘Work cost calculation’!$A$3:$A$6;

press "Enter";

enter information in the “Required vector” field, namely ‘Calculation of the cost of work’!$С$3:$С$6;

press “Enter” (Fig. 16);

Rice. 16. View of the second window of the VIEW function wizard

click "OK" button;

22. Repeat steps similar to step 22 for cells D9:D12, E9:E12.

23. Fill in the “TOTAL” column of the table as follows:

enter the formula =SUM(F9:F12) in cell F13.

24. Fill in the “VAT” column. To do this, enter the formula =F13*0.18 in cell F14.

25. Fill in the column “AMOUNT WITH VAT”. To do this, enter the formula =F13+F14 in cell F15.

26. As a result, you should get the table shown in Fig. 17.

Rice. 17. Form of invoice for payment for work performed

27. To analyze information about the cost of each type of work for a received order:

make the “Account” sheet active;

select range C9:F12;

select the “Histogram” command in the “Charts” section of the “Insert” menu;

select the required histogram type;

rename the histogram “Cost of each type of work” (Fig. 18).

Rice. 18. Histogram "Cost of each type of work"

2.3. Results of a computer experiment and their analysis

2.3.1. Results of a computer experiment

To test the correctness of the problem solution, fill out the input documents and then calculate the results.

Price list
Job title	Price per unit of production, rub.

Bathtub replacement, pcs.
Replacement of pipes, m
Wallpaper sticker, m2
Parquet flooring, m2
Whitewashing of the ceiling, m2

Calculation of the cost of work performed
Job title	Scope of work performed	Price per unit of production, rub.	Cost of work, rub.
Battery replacement, pcs.			1000
Replacement of pipes, m
Wallpaper sticker, m2			1400
Parquet flooring, m2			1200

Stroy-design LLC
ACCOUNT No.
	date		. .20
	Client's full name

No.	Job title	Scope of work performed	Price per unit of production, rub.	Cost of work, rub.


	Battery replacement, pcs.			1000
	Wallpaper sticker, m2			1400
	Replacement of pipes, m
	Parquet flooring, m2			1200
TOTAL:				4560
VAT:				820,8
VALUE WITH VAT:				5380,8

As a result of solving the problem, the statements obtained using the computer coincide with the test ones.

2.3.2. Analysis of the results obtained

Thus, the formation of the resulting document (table) “Invoice” allows us to solve the problem - reduce the time for calculating the cost of work, eliminate errors caused by the human factor and increase the degree of customer satisfaction. Creating various charts (histograms, graphs) based on table data using MS Excel allows you not only to visually present the results of information processing for analysis in order to make decisions, but also to quickly carry out manipulations in the area of their construction in favor of the most convenient presentation of visualization results according to specified user (analyst) parameters.

Basic ideas of modern information technology are based on the concept that data must be organized into databases in order to adequately reflect the changing real world and meet the information needs of users. These databases are created and operate under the control of special software systems called database management systems (DBMS).

The increase in the volume and structural complexity of stored data and the expansion of the circle of users of information systems have led to the widespread use of the most convenient and relatively easy-to-understand relational (tabular) DBMS. To ensure simultaneous access to data by many users, often located quite far from each other and from the place where databases are stored, network multi-user versions of databases based on a relational structure have been created. In one way or another, they solve specific problems of parallel processes, integrity (correctness) and security of data, as well as access authorization.

The DBMS must provide access to data to any users, including those who have virtually no and (or) do not want to have any idea about: the physical placement of data and their descriptions in memory; mechanisms for searching the requested data; problems that arise when many users (application programs) request the same data simultaneously; ways to ensure data protection from incorrect updates and (or) unauthorized access; keeping databases up to date and many other DBMS functions.

Today, relational databases remain the most common due to their simplicity and clarity both during the creation process and at the user level.

The main advantage of relational databases is compatibility with the most popular query language SQL. With a single query in this language, you can join several tables into a temporary table and cut out the required rows and columns from it (selection and projection). Since the tabular structure of a relational database is intuitive for users, the SQL language is simple and easy to learn. The relational model has a solid theoretical foundation on which the evolution and implementation of relational databases were based. Riding the wave of popularity generated by the success of the relational model, SQL became the primary language for relational databases.

In the process of analyzing the above information, the following disadvantages of the considered database model were identified: since all fields of one table must contain a constant number of fields of predefined types, it is necessary to create additional tables that take into account the individual characteristics of the elements using foreign keys. This approach makes it very difficult to create any complex relationships in the database; high complexity of manipulating information and changing connections.

In the practical part, using MS Excel 2010, the task was solved in relation to a fictitious enterprise - the company Stroy-design LLC, which carries out activities related to the performance of renovation work. Tables were built based on the data given in the task. The cost of work for the received order has been calculated; the calculation data is entered into the table. Inter-table connections have been organized using the VLOOKUP or VIEW functions to automatically generate an invoice issued to the client for payment for work performed. The document “Invoice for payment for work performed” has been generated and filled out. The results of calculating the cost of each type of work for the received order are presented in graphical form.

Computer training program in the discipline Informatics” / A.N. Romanov, V.S. Toroptsov, D.B. Grigorovich, L.A. Galkina, A.Yu. Artemyev, N.I. Lobova, K.E. Mikhailov, G.A. Zhukov, O.E. Krichevskaya, S.V. Yasenovsky, L.A. Vdovenko, B.E. Odintsov, G.A. Titorenko, G.D. Savichev, V.I. Gusev, S.E. Smirnov, V.I. Suvorova, G.V. Fedorova, G.B. Konyashina. – M.: VZFEI, 2000. Date updated 11/24/2010. – Access by login and password.

Computer training program in the discipline “Information systems in economics” / A.N. Romanov, V.S. Toroptsov, D.B. Grigorovich, L.A. Galkina, A.V. Mortvichev, B.E. Odintsov, G.A. Titorenko, L.A. Vdovenko, V.V. Braga, G.D. Savichev, V.I. Suvorov. – M.: VZFEI, 2005. Date updated 10/15/2010. – URL: . Access by login and password.

COLLECTION OF SOCIOLOGICAL DATA USING DATABASE TECHNOLOGIES. CREATION OF TABLES AND DB FORMS