Three-tier database architecture. Components of an information system with a database and a three-level architecture of a database management system

The third level of the architecture is the internal level. An internal view is a low-level view of the entire database; it consists of many instances of each internal record type. The term "internal record" belongs to ANSI/SPARC terminology and refers to a construct called a stored record. The internal representation, like the external and conceptual representation, is not related to the physical layer because it does not consider the physical areas of the storage device, such as cylinders and tracks. In other words, the internal representation assumes an infinite linear address space; details of how the address space is mapped to physical device storage are very system dependent and are deliberately not included in the overall architecture.

The internal representation is described using an internal schema, which defines not only Various types stored records, but also existing indexes, ways of representing stored fields, physical sequence of stored records, etc. The internal schema is written using another data definition language - internal.

In conclusion, we note that in some exceptional situations, application programs, in particular those called utilities, can perform operations directly at the internal rather than at the external level. Of course, this practice is not recommended; it defines the risk in terms of security (security rules are ignored) and integrity (integrity rules are also ignored), in addition, the program will depend on the loaded data; but sometimes this may be the only way to achieve the required function or achieve the required performance - just as a user of a high-level language sometimes needs to resort to assembly language for the same reasons.

Applications that use databases are usually classified into one of the software architectures, which have their own pros and cons.

Databases and software their creation and maintenance (DBMS) have a multi-level architecture.

There are conceptual, internal and external levels of database data representation, which correspond to models of similar purposes.

The conceptual level corresponds to the logical aspect of data presentation subject area in an integrated form. The conceptual model consists of many instances of different data types, structured in accordance with the DBMS requirements for the logical structure of the database.

The internal layer represents the required organization of data in the storage environment and corresponds to the physical aspect of data presentation. The internal model consists of individual record instances physically stored in external media.

The outer layer supports private views of data required by specific users. The external model is a subset conceptual model. Possible intersection external models according to. Private logical structure data for separate application(task) or user corresponds to an external model or database subschema. With the help of external models, authorized access to application database data is supported (the composition and structure of the data of the conceptual database model available in the application is limited, and the acceptable modes for processing this data are specified: input, editing, deletion, search).

New arrivals or changes information needs existing applications require the definition of correct external models for them, while no changes occur at the level of the conceptual and internal data model. Changes in the conceptual model, caused by the emergence of new types of data or changes in structures, may not affect all applications, i.e. a certain independence of programs from data is ensured. Changes in the conceptual model should be reflected in the internal model, and if the conceptual model remains unchanged, it is possible to independently modify the internal database model in order to improve its characteristics (data access time, memory consumption external devices and etc.). Thus, the database implements the principle of relative independence of logical and physical organization data.

Talking about what it must be like to be so complex software, as a DBMS, first of all it is necessary to clearly define the basic concept of the system, which determines all subsequent stages of its development.

The DBMS architecture must ensure, first of all, the distinction between the user and system levels;

It is necessary to give each user the opportunity to have his own, different from others, idea of ​​​​the properties of the stored data.

Then the initial stage of designing any specific information system There should be abstract descriptions of the information needs of each user group, on the basis of which an abstract, but already common for the entire organization, description of the structures of stored data is also generated, and the DBMS through which this IS will be created and maintained must have certain capabilities for this.

REVIEW LECTURE NOTES

For specialty students
T1002 “Information technology software”

(L.V. Rudikova, Ph.D., Associate Professor)

Question 31. DBMS ARCHITECTURE. RELATIONAL DATA MODEL

1. The concept of a database.

2. Three-tier database architecture.

3. Life cycle Database.

4. DBMS architecture.

5. Relational data model.

6. Design relational databases data.

7. Normal forms relationships.

8. Relational algebra.

1. The concept of a database.

A database system is any computer-based information system in which data can be shared among many applications.

Information system – an automatic system that organizes data and provides information.

Information and management system – a system that provides information support management.

Data – scattered facts.

Information – organized and processed data.

Under database refers to a set of interconnected elementary groups of data (information) that can be processed by one or more application systems. Database system consists of a database; general purpose software called database management system (DBMS) , and serves to manage the database; appropriate equipment and people.

Each DBMS must meet the following requirements:

· provide the user with the ability to create new databases and define them schema (logical data structure) using a special language - data definition language; support multiple views of the same data;

· let " request» data and change them using query language, or data manipulation language; allow integration and sharing data from various applications;

· support the storage of very large amounts of data, measured in gigabytes or more, for a long time, protecting them from accidental damage and unauthorized use, and also provide database modification and access to data through queries, i.e. guarantee the security and integrity of data;

· control access to data simultaneously for many users; exclude the influence of one user’s request on another’s request and prevent simultaneous access, which could corrupt the data, i.e. ensure concurrency control of data access.

The database system consists of the following components:

· Users, i.e. people who use data.

· Applications, i.e. user programs that require data from the system.

· DBMS is software that manages access to data and provides specified functionality database systems.

· Data, i.e. strings stored in files.

· Host system is the computer system on which files are stored. Data rows are accessed by the host system. The role of the DBMS is to generate queries that enable the use of the host system's file management system functionality to serve various applications. A DBMS is an additional layer of software built on top of the host system software.

Thus, a system with a database can be represented as the following sequence of levels:

At the lowest level there is data stored in physical files (physical database memory). At the top level - applications with their own representations of the same physical data. Each database view is a specific logical structure built from the underlying physical data. To provide an interface between physical memory The database and its various logical versions (many supported views) The DBMS, in turn, must consist of several levels.

2. Three-level database architecture.

The distinction between logical and physical representation of data was formally recognized in 1978 when the committee ANSI/SPARC proposed a generalized structure of database systems. This structure is called three-tier architecture. The three levels of architecture are: internal, conceptual and external.

Internal level - this is the level that determines physical appearance database, which is closest to physical storage and is associated with methods of storing information on physical storage devices. Associated with this layer are disk drives, physical addresses, indexes, pointers, etc. This layer is the responsibility of physical database designers who decide which physical devices will store data, what access methods will be used to retrieve and update data, and what measures should be taken to maintain or improve the performance of the database management system. Users do not touch this level.

Conceptual level – structural level that determines logic circuit Database. On this level conceptual database design is performed, which includes analyzing the information needs of users and identifying the data elements they need. The result of conceptual design is a conceptual diagram, a logical description of all data elements and the relationships between them.

External level – the structural level of the database, which defines user views of data. Each user group receives its own view of the data in the database. Each such data view provides a user-centered description of the data elements that make up the data view and the relationships between them. It can be directly derived from the conceptual framework. The collection of such user data views provides the external level.

User and Application Views

External level

Displays

Conceptual diagram

Conceptual level

Display

Internal level

Host system

Stored data

Rice. DBMS levels

3. Database life cycle.

The process of designing, implementing and maintaining a database system is called life cycle of the database (LDC). The procedure for creating a system is called system life cycle (SLC).

Understanding and the right approach to LCBD is very important and requires detailed consideration, since it is based on the approach data-centric. Data elements are more stable than the system functions performed. Creation correct structure data requires complex analysis of classes of data units and the relationships between them. If you build a logical database schema, then in the future you can create any number of functional systems that use this schema. The function-oriented approach can only be used to create temporary systems that are designed for a short period of operation.

LCBD consists of the following stages:

1. Pre-planning – database planning, carried out in the process of developing a strategic database plan. During the planning process it is collected the following information:

· what application programs are used and what functions they perform;

· what files are associated with each of these applications;

· what new applications and files are in the works.

This information helps determine how application information is used and determine future requirements for the database system.

The information from this stage is documented in the form of a generalized data model.

2. Feasibility check . Here the technological, operational and economic feasibility of the database creation plan is determined, i.e.:

· technological feasibility – is the technology available to implement the planned database?

· Operational Feasibility – Are the funds and experts needed to successfully implement the database plan?

· economic feasibility – can conclusions be determined? Will the planned system pay for itself? Is it possible to estimate costs and benefits?

3. Defining Requirements includes the selection of database goals, clarification of information requirements for the system and hardware requirements, and software. Thus, on at this stage collecting data and defining requirements is created general information model, expressed in the following tasks:

· The goals of the system are determined by analyzing information needs. It also necessarily indicates what kind of database should be created (distributed, holistic) and what communication tools are needed. The output document is a comment describing the goals of the system.

· Determination of user requirements: documentation in the form of generalized information (comments, reports, surveys, questionnaires, etc.); fixing system functions and identifying application systems that will fulfill these requirements. The data is presented in the form of relevant documents.

· Determination of general hardware and software requirements related to maintaining the desired level of performance. (Find out the number of system users, the number of input messages per day, the number of printouts). This information is used to select types of computers and DBMS, disk capacity, and number of printers. The data from this stage is presented in a report containing sample hardware and software configurations.

· Developing a plan for the phased creation of the system, including the selection of initial applications.

4. Conceptual design – creation of a conceptual database diagram. Specifications are developed to the extent necessary to move to implementation.

The main output document is a single information model(or database schema at the conceptual level). When developing this model, information and functions that the system must perform, determined at the stage of collecting and determining system requirements, are used. At this stage, it is also desirable to define: 1) rules for the data; 2) rules for processes; 3) rules for the interface.

5. Implementation the process of turning a conceptual model into a functional database. It includes the following steps.

1) Selecting and purchasing the necessary DBMS.

2) Converting the conceptual (infological) database model into a logical and physical data model:

· Based on the infological data model, a data schema is built for a specific DBMS; if necessary, the database is denormalized in order to speed up query processing in all time-critical applications;

· it is determined which application processes need to be implemented in the data schema as stored procedures;

· implement constraints designed to ensure data integrity and enforce data rules;

· design and generate triggers to implement all centrally defined data rules and data integrity rules that cannot be specified as constraints;

· develop an indexing and clustering strategy; estimate the sizes of all tables, clusters and indexes;

· determine user access levels, develop and implement security and audit rules. Create roles and synonyms to provide multi-user access with consistent levels of access permissions.

· develop a database network topology and a mechanism for seamless access to remote data (replicated or distributed database).

3) Construction of a data dictionary that defines the storage of database data structure definitions. The data dictionary also contains information about access permissions, data protection rules and data control.

4) Filling the database.

5) Creation application programs, management control.

6) User training.

6. Evaluation and improvement of the database schema. Involves surveying users to identify unmet functional needs. Changes are made as necessary, adding new programs and data elements as needs change and expand.

Thus, the LCBD includes:

· Study the subject area and provide relevant documentation (1-3).

· Construction of an information model (4).

· Implementation (5).

· Performance evaluation and database support (6).

4. DBMS architecture.



Rice. Main components of the DBMS

Data, metadata - contain not only data, but also information about the structure of the data ( metadata). In a relational DBMS, metadata includes system tables (relationships), names of relationships, names of attributes of those relationships, and data types of those attributes.

Often the DBMS supports indices data. Index is a data structure that helps quickly find data elements given part of their value (for example, an index that finds tuples of a particular relation that have set value one of the attributes). Indexes are part of the stored data, and descriptions indicating which attributes the indexes have are part of the metadata.

Memory Manager -receives the required information from the data storage location and changes the information in it at the request of higher levels of the system.

In simple database systems, the memory manager can be the operating system's file system. However, to improve efficiency, the DBMS usually performs direct memory control. The memory manager consists of two components:

· File manager monitors the location of files on the disk and obtains the block or blocks containing the files when requested by the buffer manager (the disk is generally divided into disk blocks- adjacent memory areas containing from 4000 to 16000 bytes).

· Buffer manager manages main memory. It receives blocks of data from disk through a file manager and selects a main memory page to store a particular block. It can temporarily store a disk block in main memory, but returns it to disk when a main memory page is needed for another block. Pages are also returned to disk when requested by the transaction manager.

"Request" processor - processes requests and requests changes to data or metadata. It suggests the best way to perform the required operation and issues appropriate commands to the memory manager.

The query processor (manager) turns a query or database action that can be completed very quickly high level(for example, in the form of a request SQL ), into a sequence of requests for stored data such as individual tuples of a relation or parts of an index on a relation. Often the most difficult part of processing request is his organization, i.e. choosing good query plan or a sequence of requests to the memory system responding to the request.

Transaction Manager - is responsible for the integrity of the system and must ensure the simultaneous processing of many requests, the absence of interference of requests (addition, min, max ) and data protection in case of system failure. It interacts with the query manager because it needs to know what data is affected by current queries (to avoid conflicts) and can defer some queries and operations to avoid conflicts. The transaction manager also interacts with the memory manager because data protection schemes typically involve storing a data change log. At in the right order perform file operation registration will contain a record of changes, so you can re-execute even those changes that did not reach the disk due to a system failure.

Typical DBMSs allow the user to group multiple queries and/or changes into a single transaction. Transaction is a group of operations that must be performed sequentially as one whole.

Typically, a database system supports multiple transactions simultaneously. It is the correct execution of all such transactions that ensures transaction manager. Correct execution of transactions is ensuredACID -properties (atomicity, consistency, isolation, durability):

· atomicity- execution of either all transactions or none of them (for example, withdrawing money from an ATM and making a corresponding debit to the client’s account must be a single atomic transaction; each of these operations is not allowed to be performed separately);

· consistency - a state in which the data meets all possible expectations (for example, the consistency condition for an airline database is that no seat on the plane is reserved for two passengers);

· insulation- when two or more transactions are executed in parallel, their results must be isolated from each other. The simultaneous execution of two transactions at the same time should not lead to a result that would not have occurred if they were carried out sequentially (for example, when selling tickets for the same flight in case of free last place when there is a simultaneous request from two agents, the request of one must be fulfilled, the request of the other must not);

· longevity - after the transaction is completed, the result should not be lost in the event of a system failure, even if this failure occurs immediately after the completion of the transaction.

Let's also consider 3 types of access to the DBMS:

1. Requests - Questions about data can be generated in two ways:

a)by using common query interface(for example, a relational DBMS allows queries SQL , which are transmitted to the request processor, and also receives responses to them);

b) with the help application program interfaces- requests are transmitted through a special interface (arbitrary requests cannot be transmitted through this interface);

2. Modifications - These are operations to change data. They can also be executed either through a common interface or through an application program interface;

3. Circuit modifications - These are database administrator teams that have the right to change the database schema or create a new database.

Client/server architecture. Many versions of modern software implement the architecture client/server: One process (the client) sends a request to another process (the server) to execute. Typically, a database is often split into a server process and several client processes.

In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces, which interact with the user and send queries or other commands to the server. For example, a relational DBMS often uses the language SQL to represent requests from client to server. The database server then provides the client with a response in the form of a table (relationship). There is a tendency to increase the load on the client, since if there are many simultaneously working database users, problems may arise with the server.

5. Relational data model.

The RMD of a certain subject area is a set of relationships that change over time. When creating an information system, a set of relationships allows you to store data about objects of the subject area and model the connections between them.

Attitude is a two-dimensional table containing some data. Mathematically underN -ary relation R understand the Cartesian product set D 1 D 2 … D n sets ( domains) D 1, D 2, …, D n (), optionally different:

R D 1 D 2 … D n ,

where D 1 D 2 … D n – complete Cartesian product, i.e. a set of all possible combinations of n elements each, where each element is taken from its own domain.

Domain - This semantic concept. A domain can be thought of as a subset of values ​​of some data type that have a specific meaning. The domain is characterized by the following properties:

· The domain has unique name(within the database).

· The domain is defined at some simple data type or on a different domain.

· A domain may have some logical condition, which allows you to describe the subset of data that is valid for a given domain.

· The domain carries a certain semantic load.

Relationship attribute there are a couple of the kind<Имя_атрибута: Имя_домена>. Attribute names must be unique within the relationship. Often the attribute names of a relationship are the same as the names of the corresponding domains.

Ratio , defined on multiple domains, contains two parts: a header and a body.

Relationship header is a fixed number of relation attributes:

The relation header describes the Cartesian product of domains on which the relation is defined. The header is static; it does not change while working with the database. If attributes are changed, added or deleted in a relation, then the result will be other relationship (even with the same name).

Relationship body contains many tuples relationship. Every tuple relation represents a set of pairs of the form<Имя_атрибута: Значение_атрибута>:

such that the attribute value belongs to the domain . The body of the relation is a set of tuples, i.e. a subset of the Cartesian product of domains. Thus, the body of a relation is actually a relation in the mathematical sense of the word. The body of the relationship can change while working with the database - tuples can be changed, added and deleted.

The relationship is usually written as:

or shorter

,

or simply

The number of attributes in a relation is called degree (or -arity ) relationship. The cardinality of a set of tuples of a relation is called power relationship.

Relationship diagram is a list of attribute names of a given relationship indicating the domain to which they belong:

If attributes take values ​​from the same domain, then they are called -comparable, where is the set of valid comparison operations specified for a given domain. For example, if a domain contains numeric data, then all comparison operations are valid for it, then . However, for domains containing character data, not only comparison operations for equality and inequality of values ​​can be specified. If a given domain has a lexicographic ordering, then it also has a full range of comparison operations.

Schemes of two relations are called equivalent , if they have the same degree and it is possible to order the attribute names in the schemas in such a way that comparable attributes, that is, attributes that take values ​​from the same domain, will be in the same places:

Let – relationship diagram. – schema of the relationship after ordering the attribute names. Then

~

Thus, for equivalent relations the following conditions are satisfied:

· The tables have the same number of columns.

· The tables contain columns with the same names.

· Columns with the same names contain data from the same domains.

· The tables have the same rows, but the order of the columns may vary.

All such tables are different Images the same relationship.

Properties of relationships. The properties of relations follow directly from the above definition of relation. These properties are the main differences between relationships and tables.

· There are no identical tuples in a relation .

· Tuples are not ordered (top to bottom) .

· Attributes are not ordered (from left to right) .

· All attribute values ​​are atomic .

Rice. Schematic representation of the relationship

Relational model is a database in the form of a set of interconnected relationships. In each connection, one relationship can act as the main one, and another relationship acts as a subordinate one. Thus, one tuple of a main relation can be associated with several tuples of a subordinate relation. To support these relationships, both relationships must contain the sets of attributes by which they are related. Basically this is primary key of the relationship , which uniquely defines the tuple of the main relation. To model a relationship, a subrelation must have a set of attributes that matches the primary key of the main relationship. However, here this set of attributes is already secondary key or foreign key , i.e. it defines a set of relation tuples that are associated with a single tuple of the main relation.

6. Design of relational databases.

When designing a relational database, the following problems must be solved:

1) Taking into account the semantics of the subject area, it is necessary to best represent the objects of the subject area in the form of an abstract data model (data design). Those. - decide on the database schema: what relationships the database should consist of, what attributes these relationships should have, what are the connections between the relationships.

2) Ensure the efficiency of executing database queries (physical database design).

After the datalogical design stage, the following resulting documents should be obtained:

· Building a correct data schema based on the relational data model.

· Description of the database schema in terms of the selected DBMS.

· Description of external models in terms of the selected DBMS.

· Description of declarative rules for maintaining database integrity.

· Development of procedures for maintaining the semantic integrity of the database.

So, the task of designing a relational database is to select a database schema from many alternative options.

Correct is a database schema in which there are no unwanted dependencies between relation attributes. The process of developing a correct database schema is called logical design .

Designing a database schema can be done in two ways:

· Decomposition (partition) method the original set of relations included in the database schema is replaced by another set of relations that are projections of the original relations! At the same time, the number of relationships increases.

· Synthesis method layout of a database schema from given initial elementary dependencies between objects of the subject area.

Classical database design is associated with theory normalization , which is based on the analysis functional dependencies between relationship attributes. Functional dependencies define stable relationships between objects and their properties in the subject area under consideration.

The decomposition method is a process of sequential normalization of relation schemes: each new iteration corresponds to a higher order normal form and has best properties compared to the previous one. Thus, the existence of a universal relation containing all the attributes of the database is initially assumed, then, based on the analysis of connections between attributes, a decomposition of the universal relation is carried out (or an attempt is made), i.e. transition to several relations of lower dimension, and the original relation must be restored using a natural join operation.

So, each normal form corresponds to some specific set restrictions and relations is in some normal form if it satisfies its inherent set of restrictions.

In the theory of relational databases, the following normal forms are usually distinguished:

first normal form (1 NF);

· second normal form (2 NF);

· third normal form (3 NF);

· Bays-Codd normal form ( BCNF);

· fourth normal form (4 NF);

· fifth normal form or projection form - compounds (5 NF or PYNF).

Basic properties of normal forms:

· each successive normal form is in some sense better than the previous one;

· when moving to the next normal form, the properties of the previous normal properties are preserved.

Database schemas are called equivalent, if the contents of the source database can be obtained by a natural connection of the relations included in the resulting schema, and no new tuples appear in the source database.

7. Normal forms of relationships.

The normalization process is based on an adequate reflection of the subject area in the form of tables containing data about the modeled object, and the ability to change the state of the database over time. As a rule, due to a mismatch between the domain data model, anomalies may occur that appear when performing the corresponding operations:

· Insertion Anomalies (INSERT) – storage of heterogeneous information in one respect.

· Update anomalies (UPDATE) –Redundancy of relationship data due to heterogeneous storage.

· Deletion anomalies (DELETE) – storage of heterogeneous information in one relation.

It is also necessary to take into account the emerging undefined ( NULL) values. In different DBMSs when executing various operations(comparing, combining, sorting, grouping, etc.) two NULL -values ​​may or may not be equal to each other, have different effects on the result of performing operations to determine average values ​​and find the number of values. To eliminate errors in many DBMSs it is possible to replace NULL -values ​​are zero when performing calculations, declaring all NULL -values ​​equal to each other, etc.

Normalization – splitting a table into several, which have better properties when updating, inserting and deleting data. Those. normalization is the process of sequentially replacing a table with its complete decompositions until they are all in 5NF; however, in practice it is sufficient to convert the tables to BCNF.

The normalization procedure is based on the fact that the only functional dependencies in any table should be dependencies of the form , where is the primary key and is some other field. Therefore, during the normalization process, you should get rid of all “other” functional dependencies, i.e. from those that have a different appearance than .

If we replace the codes of primary (foreign) keys during normalization, then we should consider 2 cases:

1. The table has a composite primary key, for example, and a field that functionally depends on part of this key, for example, from (from full key does not depend). It is recommended to create another table containing and ( – primary key) and delete from the original table:

Replace, primary key, federal law

on , primary key

and , primary key .

2. The table has a primary (possible) key, a field that is not a possible key, but functionally depends on, and another non-key field that functionally depends on:. It is recommended to create a table containing both ( - primary key) and - delete from the original table: It should be noted that to carry out such operations, one should initially have some “large” (universal) relations as input data.

Def.1. The relationship is in first normal form (1NF) if and only if none of its rows contains a single value in any of its fields and none of the key fields of the relation are empty.

According to definition 1, any relation will be in 1NF, i.e. a relation that satisfies the properties of relations: there are no identical tuples in the relation; tuples are not ordered; attributes are not ordered and differ by name; all attribute values ​​are atomic.

Def.2. The relationship is in second normal form (2NF) if and only if the relation is in 1NF and there are no non-key attributes that depend on the part complex key(i.e., all fields not included in the primary key have a full functional dependence on the primary key).

If the candidate key is prime, then the relation is automatically in 2NF.

To eliminate the dependence of attributes on part of a complex key, it is necessary to perform decomposition multi-relationship relationships. Attributes that depend on part of a complex key are placed in a separate relation.

The attributes of a relationship are called mutually independent , if neither of them is functionally dependent on the other.

Def.3. The relationship is in third normal form (3NF) if and only if the relation is in 2NF and all non-key attributes are mutually independent (that is, none of the non-key fields of the relation are functionally dependent on any other non-key field).

To eliminate the dependency of non-key attributes, you need to decompose the relationship into several relationships. In this case, those non-key attributes that are dependent are placed in a separate relation.

When reducing relations using the normalization algorithm to relations in 3NF, it is assumed that all relations contain one candidate key. This is not always true. There are times when a relation may contain multiple keys.

Def.4. The relationship is in Bays-Codd normal form (NFBK) if and only if the determinants of all functional dependencies are potential keys (or if any functional dependence between its pals is reduced to a complete functional dependence on a possible key).

If a relation is in BCNF, then it is automatically in 3NF, as follows from Definition 4. To eliminate the dependence on determinants that are not potential keys, decomposition should be carried out, placing these determinants and the parts that depend on them into a separate relation.

There are times when a relation does not contain any functional dependencies. Those. attitude is completely key, i.e. the key of a relationship is the entire set of attributes. Thus, we have multivalued dependence, because There is still a relationship between the attributes.

Def.5. The relationship is in fourth normal form (4NF) if and only if the relation is in BCNF and does not contain non-trivial multivalued dependencies.

Relations with non-trivial multivalued dependencies arise, as a rule, as a result of a natural connection of two relations over a common field, which is not key in any of the relations. In reality, this leads to storing information about two independent entities in one relation.

To eliminate non-trivial multivalued dependencies, you can decompose the original relation into several new ones.

Def.6. The relationship is in fifth normal form (5NF) if and only if any connection dependency present is trivial.

Def.6. identically also follows the definition.

Def.7. A relation is not in 5NF if the relation has a non-trivial join dependency.

That. If in every complete decomposition all projections of the original relation contain a possible key, we can conclude that the relation is in 5NF. A relation that does not have any complete decomposition is also in 5NF.

Without knowing anything about what potential keys are in a relation and how the attributes are interconnected, one cannot say that a given relation is in 5NF or other normal forms.

Possible key relation is a set of relation attributes that completely and uniquely (functionally completely) determine the values ​​of all other attributes of the relation. In general, a relation can have multiple possible keys. Among all possible keys of a relationship, one is usually selected, which is considered the main one and which is called the primary key of the relationship.

Mutually independent attributes these are attributes that do not depend on one another. If there are several physical laws in a relation, then each attribute or set of attributes on which another attribute depends is called a determinant of the relation.

9. Relational algebra.

Relational algebra provides a framework for accessing relational data. The main purpose of algebra is to provide expressions that can be written down. Expressions can be used for:

· area definitions samples, i.e. defining the data for selection as a result of the sampling operation;

· area definitions updates, i.e. defining data to be inserted, modified, or deleted as a result of an update operation;

· definition (named) virtual relations, i.e. presentation of data for visualization through views;

· snapshot definition, i.e. defining the data to be stored as a “snapshot” of the relationship;

· defining safety rules, i.e. determination of data for which access control is carried out;

· determination of sustainability requirements, i.e. determining the data that is included in the scope for certain concurrency control operations;

· defining integrity rules, i.e. some special rules that the database must satisfy, along with general rules, representing part of the relational model and applied to each database.

In specific implementations relational DBMS not currently used in pure form neither relational algebra, nor relational calculus. The de facto standard for accessing relational data has become SQL language(Structured Query Language).

Relational algebra, defined by Codd, consists of 8 operators comprising 2 groups:

  • traditional set operations (union, intersection, subtraction, Cartesian product);
  • special relational operations (selection, projection, connection, division).

In addition, the algebra includes an assignment operation, which allows you to save the results of calculating algebraic expressions in the database, and an attribute renaming operation, which makes it possible to correctly form the header (schema) of the resulting relationship.

A brief overview of relational algebra operators.

Samplereturns a relation that contains all tuples of a certain relation that satisfy some conditions. The sampling operation is also called the limiting operation ( restrict - limitation, now sampling is more often accepted - SELECT ).

Projectionreturns a relation containing all the tuples (i.e. - sub-tuples) of a particular relation after excluding some attributes from it.

Workreturns a relation containing all possible tuples that are a combination of two tuples belonging to two defined relations, respectively.

An associationreturns a relation containing all tuples that belong to either or both of two defined relations.

Intersection –returns a relation containing all tuples that belong simultaneously to two defined relations.

Subtraction –returns a relation containing all tuples that belong to the first of two defined relations and not to the second.

Connection (natural) – returns a relation whose tuples are a combination of two tuples (belonging respectively to two defined relations) that have a common value for one or more common attributes of the two relations (and such common values ​​appear only once in the resulting tuple, not twice).

Division –for two relations, binary and unary, returns a relation containing all the values ​​of one attribute of the binary relation that match (in the other attribute) all the values ​​in the unary relation.

LITERATURE

1. Date K.J. Introduction to Database Systems, 6th edition: Trans. from English - TO.; M.; St. Petersburg: Williams Publishing House, 2000. – 848 p.

2. Connolly T., Begg K., Strachan A. Databases: design, implementation and maintenance. Theory and practice, 2nd ed.: Trans. from English – M.: Williams Publishing House, 2000. – 1120 p.

3. Karpova T.S. Databases: models, development, implementation. – St. Petersburg: Peter, 2001. – 304 p.

4. Faronov V.V., Shumakov P.V. Delphi 4. Database Developer's Guide. – M.: “Knowledge”, 1999. – 560 p.

5. J. Groff, P. Weinberg. SQL: Complete Guide: Per. from English – K.: BHV Publishing Group, 2001. – 816 p.

6. Ken Goetz, Paul Litwin, Mike Gilbert. Access 2000. Developer's Guide. T.1, 2. Per. from English – K.: BHV Publishing Group, 2000. – 1264 p., 912 p.

7. Maklakov S.V BPwin and EPwin. CASE-tools for information systems development. – M.: DIALOG-MEPhI, 2001. – 304 p.

8. Ullman D., Widom D. Introduction to database systems / Transl. from English – M.: “Lori”, 2000. – 374 p.

9. Khomonenko A.D., Tsygankov V.M., Maltsev M.G. Databases: Textbook for higher education educational institutions/ Ed. Prof. A.D. Khomonenko. – St. Petersburg: CORONA print, 2000. – 416 p.

DBMS architecture

The database architecture proposed by the ANSI/SPARC research group includes three levels: internal, conceptual and external. IN general outline they are as follows:

External level

The external level is the individual user level. Each user has their own language of communication.

For application programmer it is either one of the common programming languages.

For the end user, this is either a special query language or a language special purpose, possibly form and menu based, custom built to suit the requirements and supported by some operational application.

Conceptual level

A conceptual view is a representation of all database information in a slightly more abstract form (as is the case with an external view) compared to physically data storage. However, a conceptual representation is significantly different from the way data is presented to someone individual user. Generally speaking, a conceptual view is a representation of data as “it really is,” rather than as the user is forced to see it within the framework of, for example, a particular language or the hardware being used.

A conceptual representation consists of multiple instances of each type of conceptual record. For example, it might consist of a set of record instances containing information about individuals, plus a set of instances containing information about parts, etc. The conceptual record does not necessarily have to coincide with the external recording, on the one hand, and with the stored record, on the other.

A conceptual view is defined using a conceptual schema, which includes definitions for each type of conceptual record. A conceptual schema uses a different data definition language, conceptual.

A conceptual view is a representation of the entire contents of a database, and a conceptual schema is the definition of such a representation. However, it would be a mistake to assume that a conceptual diagram is nothing more than a set of definitions, more like simple relationships between entries in a program.

Internal level

An internal view is a low-level view of the entire database; it consists of many instances of each internal record type.

The term "internal record" belongs to ANSI/SPARC terminology and refers to a construct called a stored record. The internal representation, like the external and conceptual representation, is not related to the physical layer because it does not consider the physical areas of the storage device, such as cylinders and tracks. In other words, the internal representation assumes an infinite linear address space; the details of how the address space is mapped to the physical storage device are very system-specific and are intentionally not included in the overall architecture.

The internal representation is described using an internal schema, which defines not only the different types of records being stored, but also the indexes that exist, the way stored fields are represented, the physical sequence of stored records, etc. The internal schema is written using another data definition language - internal.

Applications that use databases are usually classified into one of the software architectures, which have their own pros and cons.

Local architecture.

Both the program and the database are located on the same computer. Most desktop applications run on this architecture.

File - server architecture.

The database is located on a powerful dedicated computer (server), and personal computers connected to it by local network. These computers have client programs installed that access the database over the network. The advantage of this architecture is the ability for several users to work simultaneously with one database.

The disadvantage of this approach is the large volumes of information transmitted over the network. All processing is performed on client places ah, where a copy of the database is actually formed. This results in limiting the maximum possible number of users and long delays when working with the database. These delays are caused by the fact that simultaneous access is not possible at the specific table level. Until a program on one of the client sites finishes working with the table (for example, modifying records), other programs cannot access this table. This is called table-level locking and prevents confusion about the table's contents.

Client - server architecture.

In this architecture, the server not only stores the database, but also runs a DBMS program that processes user requests and returns sets of records to them. In this case, user programs no longer work, for example, with the database as a set of physical files, but turn to the DBMS, which performs operations. This removes the load from client sites, since most of the work happens on the server. The DBMS automatically monitors the integrity and security of the database, and also controls access to information using a password service. Client-server DBMSs allow blocks at the record level and even at the individual field level. This means that any number of users can work with the table, but only one of them has access to the function of changing a specific record or one of its fields.

The main disadvantage of this architecture is not very high reliability. If the server goes down, all work stops.

Distributed architecture.

There are several servers running on the network and the database tables are distributed among them to achieve increased efficiency. Each server has its own copy of the DBMS. Besides, in similar architecture usually used special programs, so-called application servers. They allow you to optimize request processing large number users and evenly distribute the load between computers on the network.

Flaw distributed architecture lies in the rather complex and expensive process of its creation and maintenance (administration), as well as high requirements for servers.

Internet architecture.

Access to the database and DBMS (distributed on one computer or on a network) is carried out from a browser using standard protocol. This presents

minimum requirements for client equipment. Such programs are called " thin clients", because they are able to work even on weak PCs, for example, you can not organize a local network, but access the server via the Internet on a local network (in this case we talk about intranet technologies). In this case, there is no need to develop special client programs or invent own specifications for data exchange between the server and client places. It is enough to use ready-made browsers and software solutions.

The DBMS architecture must ensure, first of all, the distinction between the user and system levels. Currently, the most commonly supported architecture is a three-level database description architecture with three levels of abstraction at which the database can be viewed. This architecture includes: external level, internal level, conceptual level.

The main purpose of the three-tier architecture is to ensure data independence. The essence of this independence is that changes at lower levels do not affect upper levels. There are two types of data independence: logical (means complete security external circuits from changes made to the conceptual scheme) and physical (security of the conceptual scheme from changes made to the internal scheme).

At the external level, users perceive data, where separate groups users have their own view (PP) of the database. Each type of user can use their own language of communication to work with the database. End users use either a query language or a special language supported by applications and calling user-specific screen forms and custom menus.

The conceptual layer is the intermediate layer in the three-layer architecture and provides an abstract representation of all database information. The description of the database at this level is called a conceptual diagram, which includes objects and their attributes, relationships between objects, restrictions imposed on data, semantic information about data, ensuring security and maintaining data integrity.

The internal diagram describes physical implementation database and is designed to achieve optimal performance and ensuring economical use disk space. At the internal level, the DBMS interacts with the access methods of the operating system for the purpose of placing data on storage devices, creating indexes, retrieving data, etc. The following information is stored at the internal level: distribution of disk space for storing data and indexes, description of the details of saving records, information about the placement of records, information about data compression and the selected methods of encryption. Below the internal layer is the physical layer, which is controlled by the operating system but under the guidance of the DBMS. Physical layer takes into account how the data will be presented in the machine.

The implementation of a three-level database architecture requires that the DBMS transfer information from one level to another, that is, transform addresses and pointers into corresponding logical names and relations and vice versa. The benefit of such a translation is the independence of logical and physical representation data, but the price for this independence is not small - a large system delay.

According to their architecture, DBMSs are divided into one-, two- and three-tier [ 191. In a single-tier architecture (Fig. 1.11, a) a single link (client) is used, which provides the necessary logic for data management and their visualization. In a two-tier architecture (Fig. 1.11, 6) a significant part of the data management logic is implemented by the database server (DB server), while the client link is mainly occupied with displaying data in a user-friendly form. In three-tier DBMS (Fig. 1.11, V) an intermediate link is used - an application server,

Rice. 1.11.

A - single-link; 6 - two-link; V - three-tier, which is an intermediary between the client and the database server. The application server allows you to completely relieve the client of the functions of data management and communication with the database server.

Depending on the location of the individual parts of the DBMS, local and network DBMSs are distinguished. All parts of a local DBMS are located on the computer of the user accessing the database. So that several users can work with the same database at the same time, each user computer must have access to its copy of the local database. A significant problem with a DBMS of this type is the synchronization of the contents of data copies (data replication), which is why local DBMSs are not suitable for solving problems that require the collaboration of several users.

Network DBMSs include file-server, client-server and distributed DBMSs. An indispensable attribute of these systems is a network that provides hardware communication between computers and makes it possible working together multiple users with the same database.

In file server DBMSs, the entire database is usually located on one or more storage devices of a sufficiently powerful machine, specially dedicated for these purposes and constantly connected to the network. Such a computer is called a file server. The undoubted advantage of a DBMS of this type is the relative simplicity of its creation and maintenance, since in fact it all comes down to organizing a local network and installing networked computers on the computers connected to it. operating systems. There are no particular differences between local and file server versions of the DBMS, since in them all parts of the DBMS are concentrated on the user’s computer. They are usually single-tier in architecture, but in some cases they may use an application server. The disadvantage of file server systems is the significant load on the network. For example, if a user working on a client computer needs to find information about one of the books available in the library, then the entire file containing information about all books is first transferred over the network, and only then in the file created in this way local copy data, the necessary information is found. When working intensively with data from several dozen users throughput the network may not be sufficient, and the user will be annoyed by significant delays in the DBMS response to his requests. File server DBMS can be successfully used in relatively small organizations with the number of client seats up to several dozen.

Client-server (two-tier) systems significantly reduce the load on the network, since the client communicates with data through a specialized intermediary - a database server, which is located on the machine with the database. The database server receives a request from the client, finds the required record in the data and transmits it to the client. Thus, but the networks are transmitted relatively short request and the only one required entry, even if the database contains hundreds of thousands of records. As a rule, a request to the server is formed on special language SQL queries, which is why database servers are often called SQL servers. Database servers are relatively complex programs, developed by various companies, for example: Microsoft SQL Server(SQL Server) produced by Microsoft Corporation, Sybase Adaptive Server by Sybase Corporation, Oracle produced by the corporation of the same name, DB2 by IBM Corporation, InterBase by Borland Corporation, etc. Client-server DBMSs provide operation, or scale, to hundreds and thousands of client locations.

Distributed DBMSs can contain several tens or hundreds of database servers. The number of client places in them can reach tens and hundreds of thousands. Typically, such DBMSs support the work of state-level organizations (for example, the Central Election Commission of the Russian Federation), individual divisions of which are dispersed over a large territory. In distributed DBMSs, some servers can duplicate each other in order to achieve an extremely low probability of failures and failures that can distort vital information.

The relevance of distributed DBMSs has increased due to rapid development Internet. Based on the capabilities of the Internet, distributed systems are being built not only by state-level organizations, but also by relatively small commercial enterprises, allowing their employees to work with corporate data at home and on business trips.