Development of a relational database. Designing relational databases based on normalization principles: the first steps of normalization

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Test

Relational Database Design

Normalization of relations
Functional dependencies
Normal Boyce-Codd form
Literature

Relational Database Design

The main goal of database design is to reduce the redundancy of stored data, and therefore save the amount of memory used, reduce the cost of multiple update operations of redundant copies and, first of all, eliminate the possibility of inconsistencies due to storing information about the same volume in different places same object. Redundancy means that some data or groups of data may be repeated many times.

During the database design process, the following problems may arise:

Update anomalies - Due to data redundancy, all data must be reviewed when updating it, however, a situation may arise where not all data is updated (potential data inconsistency).

Inclusion anomalies - it is possible that data cannot be entered into the database until some additional information has been received and entered.

Deletion anomalies - the opposite problem may occur when some data is deleted (possible loss of useful information).

The number of Null values is not minimized. Just like redundancy, null values are a source of potential problems in relational databases because it is impossible to determine what they mean. Therefore, it is advisable to minimize their use.

The first three problems are resolved in the process of normalizing relations.

relational base functional dependence

Normalization of relations

Normalization is a partition (or decomposition) of a table into two or more that have better properties for adding, changing and deleting data. The ultimate goal of normalization is to achieve a "clean" database design in which " every fact stored only V one place" , i.e. data redundancy is eliminated. This is done not so much to save memory as to eliminate possible inconsistencies in the stored data.

Every table in a relational database satisfies the condition that there is always a single atomic value at the intersection of each row and column of the table, and there can never be multiple such values. Any table that satisfies this condition is called normalized. In fact, unnormalized tables, that is, tables containing repeating groups, are not even considered in a relational database.

The normalized table corresponds to first normal form, abbreviated 1NF. Thus, "normalized" and "in 1NF" mean the same thing for a table. However, in practice, the term "normalized" is often used in a narrower sense - "fully normalized", which means that the design does not violate any normalization principles.

Now, in addition to 1NF, further levels of normalization can be defined - second normal form (2NF), third normal form (3NF), etc. A table is considered to be in 2NF if it is in 1NF and, in addition, satisfies some additional condition, the essence of which will be discussed below. A table is in 3NF if it is in 2NF and also satisfies another additional condition, etc.

Thus, each normal form is in some sense more limited, but also more desirable, than the previous one. This is because (n+1)-n normal form does not have some of the unattractive features of nth normal form. The general meaning of the additional condition imposed on the (n+1)th normal form with respect to the nth normal form is to eliminate these unattractive features.

The procedure for normalizing relations is reversible. For example, a set of relations in 3NF can be converted to relations in 2NF. This very important property of normalization means that no information is lost during the normalization process.

Normalization theory is based on the presence of certain dependencies between table fields. Emphasis is placed on functional, multivalued, and connection dependencies.

Functional dependencies

Let X and Y be arbitrary subsets of a set of attributes of a relation R. Y is functionally dependent on X if and only if every value of X is associated with exactly one value of Y. Notation: XY (read "X functionally determines Y"). The left and right parts of a symbolic notation are called the determinant and dependent part, respectively.

Rice. 1. PIC supply table

In other words, if two tuples of a relation R coincide in the value of X, then they also coincide in the value of Y. For clarification, consider the figure shown in Fig. 1 is a slightly modified version of the supply table shown in Fig. 2.

All tuples of the PIC relation with the same value of the P№ attribute have the same values of the Hor attribute. This means that the attributes of Mountains functionally depend on the attributes of P№: (P№)(Gor). Moreover, in this regard, there are other constant functional dependencies: (P№, D№)(Kol), (P№, D№)(Gor), (P№, D№)(Gor, Col), (P No., DNo.)(PNo.), (PNo., DNo.)(PNo., DNo., Gor, Kol), as well as dependencies that are functional at any given moment, but not all the time, for example, (P No. (Qty).

Note that if X is a candidate key of a relation R, then all attributes Y of the relation R must be functionally dependent on X (this is a consequence of the definition of a candidate key). In fact, if the relation R satisfies the functional dependence of AB and A is not a potential key, then R will be characterized by some redundancy. For example, with respect to PIC, the information that each supplier is located in a certain city will be repeated many times.

Functional dependencies are integrity constraints, so they must be checked every time the database is updated. An obvious way to reduce many functional dependencies is to eliminate trivial dependencies, i.e. those that cannot fail to be fulfilled. For example, (P№, D№)(P№). Functional dependence is trivial if and only if the right side of the symbolic notation is a subset of the left side. Such dependencies are of no interest from a practical point of view.

When analyzing relationships, a special role is given to irreducible dependencies. An attribute B is irreducibly dependent on a composite attribute A if it functionally depends on A and does not functionally depend on any subset of the attribute A. In early publications, instead of the term irreducible addiction term used full functional addiction.

Functional dependencies can be depicted using diagrams. For the supplier and parts database (Fig. 1), the functional dependency diagram is shown in Fig. 2.

Each arrow in the diagram begins with the primary key of the corresponding relation. Other arrows are possible on the diagram. In this case, the normalization procedure can be informally characterized as a procedure for eliminating arrows that do not start on the primary key.

Normal forms justified by functional dependencies

We mentioned first normal form (1NF). Let us give a more strict definition of it, as well as definitions of other normal forms.

A table is in first normal form (1NF) if and only if none of its rows contains more than one value in any of its fields and none of its key fields is empty.

For example, the table shown in Fig. 3 does not satisfy these requirements (the data in the D№ field is not atomic):

Rice. 3. Example of a table that is not a relational relationship

Such tables are not even considered in relational models.

If we are developing a relational database, then in the first stage a table can be created that combines all the data under consideration, for example, Suppliers, Parts, Supplies. The table in Fig. 3 represents a correct relational relationship. He is called universal attitude the designed database. One universal relation includes all the attributes of interest and can contain all the data that is expected to be stored in the database in the future. For small databases, the universal relation can be used as a starting point for database design. The primary key of a table is a combination of the fields P№ and D№. This table satisfies all the requirements of 1NF.



				Dosimeter
				Radiometer




				Dosimeter
				Dosimeter
				Dosimeter

Rice. 4. Relation in first normal form

The diagram of functional dependencies of such a relationship has the form shown in Fig. 4 (we will assume that the status of the supplier is determined by the city).

The relation in question, which is in 1NF, has a structure that for some reason is not entirely desirable. For example, there is obvious redundancy of information. This leads not only to an increase in the size of the database, but also to various anomalies:

Insert. You cannot insert data about the supplier (P5) without specifying the part (Null value in the key field is not allowed).

Delete. When you delete a certain tuple, you have to delete too much other information (deleting the supply information deletes the supplier information).

Update. Excessive information can lead to inconsistent results. If supplier P1 moved to another city, and the update was not made in all tuples, then the database will contain conflicting information.

These anomalies can be eliminated by reducing the relation to second normal form, breaking it into two.

A table is in second normal form (2NF) if it satisfies the definition of 1NF and all of its fields that are not part of the primary key are irreducibly dependent on the primary key (or are fully functionally dependent on the primary key).

The functional dependencies of the relations in our database reduced to 2NF are shown in Fig. 4, and the corresponding tables are in Fig. 5.

Now you can enter information about suppliers into the database without information about their product; when you delete information about a product, the remaining data remains (about suppliers, for example), information about the city appears only once, and this eliminates the problem associated with redundancy of information. That is, thanks to decomposition, we got rid of many of the problems that were present in the relation in 1NF. At the same time, the relations shown in Fig. 5 can be combined and then we will return to the relation shown in Fig. 3 - which means the decomposition was carried out without loss of data.

Thus, first stage procedures normalization relationship is Creation projections For exceptions " cited" functional dependencies.

Rice. 7. Relationships in 2NF

However, the structure of the relationships shown in Figure 7 may create some problems associated with a Supplier relationship in which non-key attributes are not mutually independent. The dependence of the Status attribute on the P№ attribute is functional and irreducible, but this dependence is also transitive through the City attribute - each P№ value determines the City value, and each City value determines the Status value. But if dependencies AB and BC are satisfied, then dependency AC is also satisfied. Transitive dependencies can again lead to update anomalies:

Insert - you cannot include data about a certain city and its status while there is no supplier in it.

Deletion - when a supplier is deleted, information about the status of the city is lost (obviously, the cause of this problem is joint information - the table contains information about both suppliers and the city).

Update - the status of cities is repeated several times. When changing the status of a city, you have to look through many lines to avoid getting inconsistent results, but the possibility of error remains.

The problem is solved by bringing the Supplier relation to third normal form through its decomposition:

This procedure eliminates transitive dependence and resolves all difficulties.

Attitude located V third normal form (3NF) Then And only Then, When it located V 2NF And every non-key attribute intransitive depends from primary key.

In other words: table located V third normal form (3NF), If she located V 2NF And neither one from her non-key fields Not depends functional from any another non-key fields.

Thus, second stage normalization is Creation projections For exceptions transitive dependencies.

During the normalization procedure, situations often arise when a relation can be subjected to decomposition in several ways. For example, the Supplier relationship (Fig. 7) with the functional dependencies P#City and CityStatus and, therefore, the transitive dependency P#Status. There are possible options for decomposing this relation into two projections located in 3NF:

A: (P№, City) and (City, Status) (as was proposed earlier) and B: (P№, City) and (P№, Status)

The third option of decomposition into projections (P№, Status) and (City, Status) cannot be applied, since it is carried out with a loss of information - several cities can have the same status, then information about the city where the supplier is located will be lost.

For some reasons, decomposition B is less desirable than decomposition A. For example, after performing decomposition B, it is not possible to insert information that a certain city has a certain status without specifying the supplier from that city.

In decomposition A, both projections are independent of each other in the sense that updates in each of the projections can be performed completely independently of each other. In decomposition B, updates to either of the two projections must be controlled so as not to break the original CityStatus dependency. That is, the projections of decomposition B are not independent of each other.

Concept independence projections provides a criterion for choosing one of several possible decompositions. The projections R1 and R2 of the relation R are independent in the sense mentioned above if and only if

Each functional dependence in relation to R is a logical consequence of the functional dependences in the projections R1 and R2;

The common attributes of the projections R1 and R2 form a candidate key for at least one of them.

In the example under consideration in decomposition A, the two projections are independent, since their common attribute City is a potential key for the second projection and each functional dependence of the original relationship is preserved in the projections. On the contrary, in decomposition B the two projections are not independent, because the CityStatus dependency cannot be derived from the functional dependencies of these projections, although their common attribute P№ is a potential key for both projections.

The idea of normalization with decomposition into independent projections was proposed by Rissanen and is called decomposition With preservation dependencies.

Normal Boyce-Codd form

So far, we have assumed for simplicity that each relationship has only one candidate key - the primary key. The above definition of 3NF is not entirely suitable if

- the relation has two or more candidate keys;

- two candidate keys are complex and they overlap (have at least one common attribute).

Therefore, the definition of 3NF was supplemented normal shape Boyce-Codd ( Boyce-Codd) - NFBC. It can be formulated like this:

Attitude located V normal form Boyce-Codd Then And only Then, When determinants are potential keys.

In other words, in a functional dependency diagram, the arrows should only start with candidate keys.

A combination of such conditions is not often found in practice, so for relationships without such conditions, 3NF and BCNF are equivalent.

Let's give one more definition: Table located V normal form Boyce-Codd (NFBC), Then And only Then, When any functional addiction between her fields comes down to To irreducible functional dependencies from potential key.

Consider an example involving two non-overlapping candidate keys:

Supplier (P№, Name_P, Status, City),

where the attributes P№ and Name_P are potential keys, and the attributes Status and City are completely independent. The diagram of functional dependencies is shown in Fig. 8. This relationship is found in the BCNF. Here, all determinants are potential keys, and all arrows begin with potential keys.

Here are examples of relationships in which potential keys overlap.

First example: Delivery relation (P№, Name_P, D№, Qty).

There is some redundancy in this regard that causes update anomalies. The potential keys here are (P№, D№) and (Name_P, D№), and P№ and Name_P mutually define each other. This relation is not in second normal form and can be divided into two projections (P№, Name_P) and (P№, D№, Quantity) to obtain irreducible functional dependencies. But the same decomposition can be proposed based on the fact that the relation is not in BCNF, because contains two determinants that are not potential keys (P№ and Name_P are determinants, since they define each other):

Supplier (P#, Name_P) and Supplies 1 (P#, D#, Qty).

Second example: Ratio SDP (S, D, P),

where attributes stand for Students, Disciplines and Teachers. The tuple of the relation SDP means that some student C is studying some discipline D from some teacher P. There are restrictions:

- Each student studies a given subject with one teacher;

- Each teacher teaches only one subject (but each subject can be taught by several teachers).

The first constraint implies the dependence (C, D)P, and the second – PD. Figure 9 shows an example of a table and diagram of the functional dependencies of such a relationship. In the example under consideration, there are two overlapping potential keys - (S, D) and (S, P). The relation is in 3NF (the transitive dependency present here is on the key attribute), but is not in BNFB and has some update anomalies. For example, if we delete the information that Oleg is studying physics, then we will lose the information that Petrov teaches physics. This problem is caused by the fact that P is a determinant but is not a candidate key. To solve this problem, the initial relationship must be divided into two projections: SP and PD.

Thus, the concept of NFBC allows us to get rid of some of the problems inherent in relationships in 3NF. The definition of BCNF is simpler than the definition of 3NF, because it does not use the concepts of normal forms, primary key, and transitive dependency. In addition, the concept of a candidate key can be replaced by the introduction of a more fundamental concept of functional dependence. But on the other hand, the concepts of primary key, transitive dependency, etc. are useful in practice because they introduce the idea of an incremental process performed by a developer to reduce an arbitrary relation to an equivalent set of relations in BCNF.

Normal forms justified by more complex dependencies

Rice. 10. Non-normalized DPU ratio

The following normal forms (4NF and 5NF) take into account not only functional, but also multivalued and connection dependencies between relation attributes. To get acquainted with them, consider the non-normalized ratio shown in Fig. 10. Each relation tuple contains the name of a discipline, a group of teacher names, and a set of textbooks. This means that each course can be taught by any teacher using any textbook. Let's transform this ratio into an equivalent normalized one. No functional dependencies have been defined for the data presented. Therefore, there is no formal basis for the decomposition of this relation and the normalized relation is depicted in Fig. eleven.


		Mechanics

		Mechanics

Mathematics		Geometry
Mathematics		Mat. analysis

Rice. 11. Normalized DPU ratio

Rice. 12. Projections (D,P) and (D,U) of the relation DPU

Obviously, the DPU relation is characterized by significant redundancy and leads to update anomalies, for example, when adding a new teacher, you need to enter a tuple for each textbook. However, attitude is entirely key and is therefore in the BCNF. The problems that arise are caused by the fact that teachers and textbooks are completely independent of each other. The problem of the normalized DPU ratio would not arise if all independent repeat groups were initially separated. In our case, it was possible to improve the situation by replacing the DPU ratio with projections (D, P) and (D, U) (Fig. 12). In this case, both projections are completely key and are located in the BCNF, and their connection gives the original table, that is, the decomposition is performed without loss. This decomposition cannot be done based on functional dependencies, which are not present in this example. It can be implemented on the basis of a multivalued relationship. Multivalued dependencies are a generalization of functional dependencies in the sense that every functional dependency is a multivalued one whose dependent part is a singleton set.

In relation to DPU, there are two multivalued dependencies: DP and DU.

The first of these multi-valued dependencies means that although for each discipline there is not one teacher corresponding only to this discipline, i.e. the functional dependence of the DP is not fulfilled; however, each discipline has a certain set of teachers, regardless of the name of the textbook.

The second multivalued dependence is interpreted similarly.

Let A, B, C are arbitrary subsets sets attributes relationship R. IN ambiguous depends from A (A IN) Then And only Then, When a bunch of values IN, appropriate given couple values (A, WITH) relationship R, depends only from A, But Not depends from WITH.

Obviously, the multivalued dependence AB is satisfied only when the multivalued dependence AC is satisfied. Multivalued dependencies always form related pairs: AB||C.

Returning to the problems of the DPU relation, we can say that they are associated with the existence of multivalued dependencies that are not functional (it is the presence of such dependencies that requires inserting two tuples when it is necessary to add data about another physics teacher). Projections (D, P) and (D, U) do not contain multivalued dependencies and are therefore more desirable. Before defining the fourth normal form, let's get acquainted with R. Fagin's theorem:

Let A, B, C be sets of attributes of the relation R(A, B, C). The relation R will be equal to the combination of its projections (A, B) and (A, C) if and only if the multivalued dependencies AB and AC are satisfied for the relation R.

Attitude R located V fourth normal form (4NF) Then And only Then, When V case existence polysemantic dependencies A B All rest attributes R functional depend from A.

In other words:

Attitude R located V 4NF, If it located V NFBC And All polysemantic dependencies relationship R actually are functional dependencies from potential keys.

The DPU relation is not in 4NF because it contains a multivalued dependency that is not a functional dependency. However, both projections (D, P) and (D, U) are in 4NF, which, compared to BCNF, allows you to create an improved structure.

Note that Rissanen's concept of independent projections, based on functional dependencies (the relation R(A,B,C), satisfying the functional dependencies A>B and B>C, should be divided into projections (A,B) and (B,C), and not (A,B) and (A,C)), is also applicable to the choice of decomposition path, if instead of functional dependencies there are multivalued dependencies A>>B and A>>C. In this case, decomposition into relations (A,B) and (A,C) should be carried out.

In all the normalization procedures considered up to this point, the decomposition of one relation into two was carried out. Sometimes this cannot be done, but decomposition into a larger number of relations is possible, each of which has better properties. Such a relation is called an n-decomposed relation, for which n>2.

Consider, for example, the relationship P-D-Pr (Suppliers-Parts-Projects) (Fig. 13). The same supplier may supply multiple types of parts for different projects. The primary key of this relationship is the complete set of its attributes; there are no functional and multi-valued dependencies (there is no multi-valued dependency, since for P1 the set of parts depends on the project). Therefore the relation is in 4NF. However, there may be anomalies in it (not always obvious), which can be eliminated by decomposition into three relations (decomposition into two relations is impossible, since the reverse operation does not allow returning to the original relation). Moreover, the degree of decomposition depends on the tuples. For example, if in the original relation one of the first three tuples is removed or the tuple (P2, D1, P2) is added, then it can be divided into two projections. If in the original relation the last tuple is removed or replaced with a tuple (P2, D1, Pr2), then it cannot be divided into two or three projections without violating the integrity of the data. The decomposibility of this relation can be a fundamental and time-independent property if an additional constraint is added.

The statement that PDPr is equal to the combination of three projections PD, DPr, PrP is equivalent to the following statement:

IFpair (P1, D1) belongs to the PD relation

Ipara (D1, Pr1) belongs to the relation Dpr

Ipara (Pr,1P1) belongs to the relation PrP,

TOTriple (P1, D1, Pr1) belongs to the relation PDPr.

This is obvious, since the triple P1, D1, Pr1 is located in the connection of the projections PD, Dpr, PrP. The converse statement is also always true.

On the other hand, it is true that the pair (P1, D1) is present in relation to PD if the triple (P1, D1, Pr2) is present in relation to PDP, the pair (P1, D1) is present in relation to PPr if (P1, D2, Pr1) is in the DPR, and the pair (D1, Pr1) is in relation to the DPR, if (P2, D1, Pr1) is in the DDP. Then, if we take into account our first statement, then the tuple (P1, D1, Pr1) must be present in such a relation! This means that in order to ensure the correctness of the DPR ratio at any time, it is necessary to introduce the following restriction:

If tuples (P1, D1 , Pp2), (P2, D1 , Pp1) And (P1, D2 , Pp1) belong attitude PDPr, That And motorcade (P1, D1 , Pp1) Also belongs this attitude.

If this statement is always true, that is, for all possible additional tuples of the PDPr relation, then a time-independent constraint on this relation will be obtained, which is called a 3D constraint. Since a 3D constraint is satisfied when a relation is equivalent to a connection of some of its projections, such a constraint is called a connection constraint.

You can pay attention to the fact that in the example we are considering there is some cyclicity in the data. The criterion for n-decomposition of a relation for n>2 is some cyclic constraint. What does cyclic constraint mean? In our example, let the last tuple mean that Smith is supplying wrenches for the Manhattan Project. The first three tuples carry the information that Smitt supplies wrenches, Smitt is a supplier to the Manhattan Project, and wrenches are used in the Manhattan Project. But these statements do not imply that it is Smith who supplies the keys for this project. If we decompose the relation PDPr, consisting of these three tuples, into three projections, then their connection will not be equal to the original one - an “extra” fourth tuple (P1, D1, Pr1) will appear, as mentioned above. To avoid such a discrepancy, an additional constraint is introduced, which can be easily implemented by decomposing the relationship. Such decomposition is possible without loss of information only if there is a connection dependency:

Attitude R (X,Y,. , Z) satisfies dependencies connections * (X,Y,. , Z) V volume And only V volume case, When R is being restored without losses by connections their projections on X, Y,. , Z.

Let's look at two examples of anomalies that exist in a relationship that is subject to a 3D constraint.

2. In the relation shown in Fig. 15, the tuple (P2, D1, Pr1) can be deleted without problems. But if you delete (P1, D1, Pr1), then you need to delete one of the remaining ones so that there is no cyclicity in the data.

Now Feigin’s theorem can be formulated as follows:

Attitude R (A, B, C) satisfies dependencies connections * (AB, AWITH) Then And only Then, When it satisfies ambiguous dependencies A IN And A WITH.

A join dependency is a generalization of the concept of a multivalued dependency. Moreover, it is the most common form of addiction.

Returning to the Suppliers-Parts-Projects relationship, you will find that it contains a join dependency PDPr * (PD, DPr, PrP), which is neither a functional nor a multi-valued dependency and is not implied by its only potential key - a combination of all attributes. It is recommended to decompose such a relationship into projections specified by the connection dependency. This decomposition process can be repeated until all resulting relationships are in fifth normal form (5NF).

Attitude R located V fifth normal form V volume And only V volume case, When any addiction connections V R should from existence some possible key V R.

A less strict definition of 5NF:

Table located V fifth normal form (5NF) Then And only Then, When V each her full decomposition All projections contain possible key. Table, Not having neither one full decomposition, Also located V 5NF.

Now we can say that after the 3-decomposition of the relations PDPr its projections PD, DPR and PPR are in 5 normal form, since for them there is no connection dependence at all.

Fourth normal form (4NF) is a special case of 5NF, where the complete decomposition must be a union of exactly two projections. It is not easy to find a real table that would be in 4NF, but not in 5NF.

For a given relation R, we can say that it is in 5NF, provided that all potential keys and all connection dependencies are known. However, there is no algorithm that allows you to determine all the dependencies of a connection. But such relationships are extremely rare in practice.

Fifth normal form is the last normal form that can be obtained by decomposition. Its conditions are quite non-trivial, but it is practically not used.

Normalization and design procedure

We looked at lossless decomposition technology used for database design. The basic idea of this technology is to systematically reduce the original relation, found in 1NF, to a set of smaller relations, which in some given sense is equivalent to the original relation, but is more preferable. Each stage of the reduction process consists of partitioning into projections the relations obtained at the previous stage. In this case, the specified restrictions are used at each step of the normalization procedure to select projections at the next stage. Normalization is the partitioning of a relation (table) into several relations that have better properties when updating, including, and deleting data. This process of successively replacing a table with its complete decompositions is carried out until all of them are in 5NF (in practice, they are usually limited to reducing the relation to Boyce-Codd normal form). In general, the following goals of the normalization process can be distinguished:

eliminating certain types of redundancy;

fixes some update, enable and delete anomalies;

designing a database layout that is a “good” representation of the real world, is intuitive, and provides a good basis for further development;

simplifying the process of imposing integrity constraints.

Let us list the basic rules that are used in the normalization procedure.

1. The unified relation must be reduced to 1NF.

2. Relations in 1NF should be divided into projections to exclude all functional dependencies that are not irreducible.

In other words, if a relation has a composite primary key of the form (K1, K2) and also includes a field F, which functionally depends on part of this key, for example, on K2, but not on the full key, then in this case it is recommended to form another relation containing K2 and F (primary key - K2), and remove F from the original relation:

As a result of this action, a set of relations in 2NF will be obtained.

3. Relations in 2NF should be divided into projections to exclude any transitive functional dependencies. In other words, if a relation has a candidate key K, a non-candidate key attribute F1 that is functionally dependent on K, and another non-key attribute F2 that is functionally dependent on F1, then it is recommended to remove the F2 attribute from the original relation and form another relation containing F1 and F2, with the primary key F1.

The result will be a set of relations in 3NF.

5. Relationships in BCNF should be broken down into projections to eliminate all multivalued dependencies that are not functional dependencies. The result will be a set of relations in 4NF (in practice, such multivalued dependencies are usually excluded when creating the original relations, separating independent repeating groups).

6. Relationships should be partitioned into projections to eliminate any join dependencies that are not implied by candidate keys, if they can be identified. This will result in a set of relations in 5NF (complete decomposition of relations).

When following the proposed rules, it is necessary to remember that partitioning into projections must be performed without data loss and while preserving functional and multi-valued dependencies.

The suggested normalization guidelines are merely guidelines, and there may be situations where normalization should not be completed from start to finish. There are several reasons for this assumption. First, normalization can help capture some integrity constraints in a simple form, but in addition to functional, multivalued, and join dependencies, other types of dependencies may exist in practice. Secondly, there are few criteria for choosing a preferred decomposition. Third, the process of normalization and the persistence of addiction are not always compatible. Fourth, not all redundancy can be eliminated in the normalization process.

The design of database systems begins with the construction of an informational data model, i.e. entity identification. The following design procedure steps must then be completed:

1. Represent each independent entity as a database table (base table) and define the primary key of this base table.

2. Represent each association (relationship between entities) as a base table. Use foreign keys in this table to identify the members of the association and specify the constraints associated with each of these foreign keys.

3. Represent entity properties as base tables with a foreign key identifying the corresponding entities. Specify constraints on the foreign keys of these tables and their primary keys.

4. In order to exclude unintentional violations of any normalization principles in the project, perform the normalization procedure.

5. If during the normalization process any tables were divided, then the information model of the database should be modified and the above steps should be repeated.

6. Indicate the integrity constraints of the designed database and give (if necessary) a brief description of the resulting tables and their fields.

To visually represent the structure of the designed system, the information modeling language “Table-relationship”, used in the most common relational databases, can be used. In it, all entities are represented as single-column tables with headings consisting of the entity name. The rows of the table are a list of entity attributes, and those that make up the primary key are highlighted. Relationships between entities are indicated by arrows directed from the primary keys or their components.

7. Database design example

Purpose and subject area

The database is designed to store information about the personnel of a certain company. The company has several departments. Each department has several employees, several projects and several offices. Each employee has several tasks. For each task, there is a statement with a list of amounts of money received by the employee for performing this work. Each office has several telephones.

The following information should be stored in the database:

For each department: a unique department number, budget and a unique number of the department head;

for each employee: a unique employee number, current project number, office number, telephone number, as well as the name of the work being performed, along with the dates and amounts of all payments received for performing this work;

for each project: unique project number and budget;

for each office: unique office number, area, all telephone numbers.

Semantic statements (restrictions): No employee is simultaneously the head of several departments; no employee works in more than one department at the same time; no employee works simultaneously on more than one project; no employee has more than one office at a time; no employee has more than one telephone at a time; no employee has more than one task at a time; no project is given to more than one department at a time; no office belongs to more than one department at the same time.

Database design

Analysis of the objects and attributes defined above allows us to identify the entities of the designed database and build its information model in the form of a “Link Table” (Fig. 16).

Rice. 16. Information about the company that should be stored in the database

The original hierarchical structure can be viewed as a non-normalized relation:

DEPARTMENTS (DEPARTMENT NO., BUDGET_O, MANAGEMENT NO., EMPLOYEES, PROJECTS, OFFICES) CANDIDATE KEY (DEPARTMENT NO.) CANDIDATE KEY (MANAGEMENT NO.)

Here, the meaning of the attributes DTD# (unique department number), BUDGET_O, RUK# (manager number) is clear from the names, and the attributes EMPLOYEES, PROJECTS, OFFICES consist of relationship values. We can describe their nested attributes:

DEPARTMENTS (DEPARTMENT No., BUDGET, MANAGEMENT No., EMPLOYEES (STEAM No., PROJECT No., OFFICE No., TEL No., WORK (TOPIC, PAYMENT (DATE, AMOUNT))), PROJECTS (PROJECT No., BUDGET_P), OFFICES (OFFICE NO., AREA, TELEPHONE (TEL#))) CANDIDATE KEY (DEPARTMENT#) CANDIDATE KEY (RUNK#)

Now we can reduce this relation to a set of relations in 1NF. At the same time, considering each value-relationship separately, we exclude all multivalued dependencies that are not functional dependencies.

DEPARTMENTS1 (DEPARTMENT#, BUDGET_O, MANAGEMENT#) PRIMARY KEY (DEPARTMENT#) ALTERNATE KEY (MANAGEMENT#)

EMPLOYEE1 (COTR No., DEPARTMENT No., PROJECT No., CABIN No., TEL No.) PRIMARY KEY (EMPLOYEE NO.)

WORK1 (SUBJECT, COPYER NO.) PRIMARY KEY (SUBJECT, COPYRATE NO.)

PAYMENT1 (CORPORATION NO., SUBJECT, DATE, AMOUNT) PRIMARY KEY (CORPORATION NO., SUBJECT, DATE)

PROJECTS1 (PROJECT#, BUDGET_P, DEPARTMENT#) PRIMARY KEY (PROJECT#)

OFFICES1 (ROOM NO., AREA, DEPARTMENT No.) PRIMARY KEY (ROOM NO.)

TELEPHONES1 (TEL#, room#) PRIMARY KEY (TEL#)

The relationships DEPARTMENTS1, EMPLOYEE1, PAYMENT1, PROJECTS1, OFFICES1 and PHONES1 are already in 2NF.

The JOB1 relation is a projection of the PAYMENT1 relation, therefore it carries redundant information and can be deleted without data loss. At the same time, the PHONES1 relation is a projection of the EMPLOYEES1 relation, but if it is deleted, update anomalies will appear - data about phones will not exist without data about specific employees.

Let us now show the structure of a database, the relationships of which are reduced to 2NF, using the “Table-Relationship” modeling language, which is used in the MS ACCESS DBMS:

Further, excluding transitive dependencies, we can reduce the relations to an equivalent set of relations in 3NF. The only relation that is not in 3NF is the relation COTRUDN, in which the attributes KABNo. and DTDNo. transitively depend on the primary key COTRNO. - KABNo. through TELNo., and DTDNo. through PROJECTNo. and, in addition, through KABNo. and TEL NO. Then the relation COTRUDN can be replaced by a set of projections located in 3NF:

X (TEL#, CAB#) PRIMARY KEY (TEL#)

Y (PROJECT No., DEPARTMENT No.) PRIMARY KEY (PROJECT No.)

Z (KAB#, OTD#) PRIMARY KEY (KAB#)

But the X relation is an analogue of the PHONE2 relation, Y is a projection of the PROJECT2 relation, Z is a projection of OFFICE2 and, therefore, can be removed from the database model. Therefore, a database model whose relations are reduced to 3NF will look like this:

DEPARTMENTS3 (DEPARTMENT#, BUDGET_O, MANAGEMENT#) PRIMARY KEY (DEPARTMENT#) ALTERNATE KEY (MANAGEMENT#)

EMPLOYEE3 (COTRNO., PROJECT NO., TEL NO.) PRIMARY KEY (COTRNO.)

PAYMENT3 (CORPORATION NO., SUBJECT, DATE, AMOUNT) PRIMARY KEY (CORPORATION NO., SUBJECT, DATE)

PROJECTS3 (PROJECT#, BUDGET_P, DEPARTMENT#) PRIMARY KEY (PROJECT#)

OFFICES3 (ROOM NO., AREA, DEPARTMENT No.) PRIMARY KEY (ROOM NO.)

TELEPHONES3 (TEL#, room#) PRIMARY KEY (TEL#)

Each of these relationships is in the BCNF. Moreover, they are in 4NF - we got rid of possible multi-valued dependencies at the stage of bringing the model to 1NF. All relationships contain no visible anomalies and therefore it can be assumed that the database is constructed correctly.

Literature

1. Beck, Kent Enterprise Application Implementation Patterns; M.: Williams, 2008. - 369 p.

2. Weimayer, R.; Sawtel, R. Master Microsoft SQL Server 2000 on your own in 21 days (+ CD-ROM); M.: Williams, 2013. - 549 p.

3. Ganderloy, Mike; Harkins, Susan Sales Automating Microsoft Access with VBA; M.: Williams, 2013. - 416 p.

4. Goetz, Ken; Ginbert, Michael; Litvin, Paul Access 2000. Developer's Guide. Volume 1. Desktop Applications. volume 1; Kyiv: BHV, 2008. - 576 p.

5. Golitsyna, O.L. and other databases; Forum; Infra-M, 2013. - 399 p.

6. Grinchenko, N.N. etc. Database design. Microsoft Access DBMS; Hot Line Telecom, 2012. - 613 p.

7. Date, K. J. Introduction to Database Systems; K.: Dialectics; 6th edition, 2012. - 360 p.

8. Davidson, Louis database design on SQL Server 2000; Beanom, 2009. - 631 p.

9. Duval, Paul M. Continuous integration. Improving software quality and reducing risk; M.: Williams, 2008. - 497 p.

10. Karatygin, S.; Tikhonov, A. Working in Paradox for Windows 5.0 with examples; M.: Binom, 2011. - 512 p.

11. Karatygin, Sergey Access 2000 with examples. User's Guide with examples; M.: Laboratory of Basic Knowledge, 2012. - 376 p.

12. Kaufeld, John Microsoft Office Access 2003 for Dummies; M.: Dialectics, 2013. - 439 p.

13. Couchman, Jason; Schwinn, Ulrike Oracle 8i CertifiedProfessionaql DBA Database administrator training; LORI, 2009. - 510 p.

Information Systems

Humanity today is experiencing an information explosion. The volume of information reaching a person through all information media is constantly growing. Therefore, for every person living in an information society, it is very important to master the means of optimally solving the problem of accumulating, organizing and rationally using information.

Human information processing capabilities have increased dramatically with the use of computers. In the use of computers to solve information service problems, two periods can be distinguished:

 the initial period, when a small circle of people - system programmers - were engaged in solving problems of information processing and organizing data. This period is characterized by the fact that software tools were created to solve a specific data processing problem. At the same time, to solve another problem in which the same data was used, it was necessary to create new programs;

 period of systemic use of computers. To solve a set of problems on a computer, software tools are created that operate on the same data and use a single information model of the object. These tools do not depend on the nature of the object or its model; they can be used for information services for various tasks. Humanity has come to organize information in information systems.

Information systems (IS) are large amounts of data together with software and hardware for processing them. The following types of information systems are distinguished: factual, documentary and expert systems.

Factual IP - this is an array of facts - specific data values about objects of the real world.

Information in the factual IS is stored in a clearly structured form, so it is capable of giving unambiguous answers to the questions posed, for example: “Who is the winner of the Russian Gymnastics Championship in 1999?”, “Who owns the AUDI 80 car with registration number PA899P77?”, “What is the telephone number in the accounting department of Moscow State University?”, “Who became the President of Russia in the elections in March 2002?” etc. Factual information systems are used in literally all spheres of human activity - in science, material production, transport, medicine, government and public life, trade, criminology, art, sports.

Documentary information systems serve a fundamentally different class of tasks that do not require a clear answer to the question posed. The database of such systems is formed by a set of unstructured text documents (articles, books, abstracts, texts of laws) and graphic objects, equipped with one or another formalized search apparatus. The purpose of the system, as a rule, is to provide, in response to a user request, a list of documents or objects that to some extent satisfy the conditions formulated in the request. For example: display a list of all articles in which the word “Pushkin” appears. The fundamental feature of the documentary system is its ability, on the one hand, to produce documents unnecessary to the user (for example, where the word “Pushkin” is used in a different sense than intended), and on the other hand, not to produce the necessary ones (for example, if the author used some synonym or misspelled). The documentary system must be able to determine the meaning of a particular term based on the context, for example, distinguish between “daisy” (plant), “daisy” (type of printer print head).

Expert systems (ES) - intelligent systems designed to play the role of an “adviser” are built on the basis of the formalized experience and knowledge of an expert. The core of the ES are knowledge bases that contain the knowledge of experts (specialists) in a certain field, on the basis of which the ES allows one to model the reasoning of specialists from a given subject area.

The specified classification and assignment of information systems to one type or another are outdated, since modern factual systems often work with unstructured blocks of information (texts, graphics, sound, video) equipped with structured descriptors.

The simplicity and efficiency of databases based on the relational model continues to determine their dominant position in software applications. Currently, the use of relational databases in object-oriented software systems is considered the norm. This appears to be a long-term and sustainable trend.

A relational database consists of many two-dimensional tables. Tables store various data. For example, the database may contain tables of customers, goods, accounts, etc. The typical structure of a relational database table is shown in Fig. 1.2.

Rice. 1.2. Typical structure of a relational database table

The table rows are called records or in tuples. The columns are called attributes. At the intersection of a row and a column is the indivisible (atomic) value of the data item. The set of valid values for an attribute (column) is determined by its domain. The domain can be very small. So, the attribute values Size The table of sports suits are L, XL and XXL. Conversely, the attribute domain Surname very big. In a database, a domain is implemented using a domain constraint. Every time a value is written to the database, its compliance with the domain recorded for a given attribute is checked. Thus, the database is protected from entering invalid values, for example, the date May 32.

The virtual analogue of the table is performance, which behaves like a regular table from the client's point of view, but does not exist on its own. A regular table contains data. The view does not contain any data, but only specifies its sources (one or more regular tables, selectable rows, selectable columns). In fact, the view is stored in the database as a request to create a specific set of data. The result of this query is the content of the view. When the data in the source tables changes, the content of the view also changes.

A key is used to identify an individual record in a table. Primary key(PrimaryKey, RK) each table has. This is a column that uniquely identifies each record in the table. In our example, as RK could be a column Surname. This is correct until, for example, another Bender appears. To ensure the uniqueness of the primary key value, two techniques are used. Firstly, it can be used composite primary key (CompositePrimaryKey), formed by several columns (natural attributes) of a table. Secondly, as RK you can enter an additional column into the table that does not make sense from the point of view of the subject area. He is called surrogate key. For example, a surrogate key could be Customer number or Numberorder.

Another key plays an important role in relational databases - External key. External key (ForeignKey,FK) is a column in one table that references the primary key of another table. Using foreign keys, connections are established between different database tables (an example is shown in Fig. 1.3) - accelerating access to the table using an index.

Rice. 1.3. External key

This example shows that the invoice and customer tables are linked by the key Customer number. If we look at the table of accounts, then Account number will be the primary key and Customer number - foreign key.

To ensure the integrity of database data, foreign keys must satisfy the constraint referential integrity. It means that every foreign key value in one table must have a corresponding value in an existing primary key in another table. This is the most important of all constraints because it ensures consistency of cross-references between tables. If the values are correct foreign key do not check, the referential integrity of the database data may be violated. For example, deleting a row from the customers table can lead to the fact that in the orders table there will remain records of orders made by a now unknown customer (and who will pay for the order?). Referential integrity constraints must be automatically maintained. Every time database data is entered or changed, controls check the constraints and ensure that they are met. If restrictions are violated, data modification is prohibited.

In addition, the table may contain secondary keys - indexes. They are used as a subject index in the book. To find a specific term in a book, you don’t need to flip through all the pages in a row - just look in the index and find the desired page number. For example, you can create an index on a column Surname(Fig. 1.4).

Rice. 1.4. Speed up table access using an index

As a result, a small table will be formed that stores only last names and links to the record position in the main table. Now you don't have to look through the entire large table to search for records. As a result, we get a gain in performance. However, when adding and deleting records (in the main table), the index table must be created anew. This slows down operations.

Operational processing of data in relational databases is performed stored processesfools. A type of stored procedure is triggers. A trigger is always associated with a specific table and is called automatically when a specific event occurs (for example, inserting, deleting, or updating a record).

Let's discuss the relationships between tables. After the tables are formed, they decide how to combine their data when extracting from the database. The first step is to define the relationships between the tables. After this, it is possible to create queries, forms and reports that display data from several tables at once. For example, to print an invoice, you need to take data from different tables and combine them. A table relationship establishes relationships between matching values in key fields. Let's look at the types of relationships.

One-to-one relationship. In this case, each row (record) of one table is associated with a row of another table (Fig. 1.5).

Rice. 1.5. One-to-one relationship

An example would be the relationship between a table of employees and a table of their addresses. This relationship is rare because the relevant data can easily be placed in one table.

One-to-many relationship. One record of the first table is associated with several records in the second table (Fig. 1.6).

Rice. 1.6. One-to-many relationship

Each record in the second table cannot have more than one corresponding record in the first table.

Many-to-many relationship. One record of the first table can correspond to several records in the second table, and one record of the second table can correspond to several records of the first (Fig. 1.7).

Rice. 1.7. Many-to-many relationship

Typically, to organize such relationships, an auxiliary table is required, which consists of the primary keys of two main tables. This relationship occurs between orders and products. One order may include several product names, and one product name may be included in several orders. Thus, there must be an order table, a product table, and a table with order-product pairs.

Let us now briefly consider the normalization of relational databases. Normalization ensures optimization of the database structure. It leads to the elimination of redundancy in data sets. Database normalization is performed sequentially, step by step. Normalization rules are presented in the form of normal forms.

First normal form (1NF) requires that the values of all data elements in columns be atomic. Second normal form (2NF) requires that every non-key column be completely dependent on the primary key. Third normal form (3NF) requires that all non-key columns (attributes) be mutually independent and fully dependent on the primary key. A dependency exists if, for example, the values of one column are calculated from data in other columns. The result of normalization is an optimal database structure in which there is the necessary duplication of data, but there is no redundancy.

REVIEW LECTURE NOTES

For specialty students
T1002 “Information technology software”

(L.V. Rudikova, Ph.D., Associate Professor)

Question 31. DBMS ARCHITECTURE. RELATIONAL DATA MODEL

1. The concept of a database.

2. Three-tier database architecture.

3. Database life cycle.

4. DBMS architecture.

5. Relational data model.

6. Design of relational databases.

7. Normal forms of relationships.

8. Relational algebra.

1. The concept of a database.

A database system is any computer-based information system in which data can be shared among many applications.

Information system – an automatic system that organizes data and provides information.

Information and management system – a system that provides information support to management.

Data – scattered facts.

Information – organized and processed data.

Under database refers to a set of interconnected elementary groups of data (information) that can be processed by one or more application systems. Database system consists of a database; general purpose software called database management system (DBMS) , and serves to manage the database; appropriate equipment and people.

Each DBMS must meet the following requirements:

· provide the user with the ability to create new databases and define them schema (logical data structure) using a special language - data definition language; support multiple views of the same data;

· let " request» data and change them using query language, or data manipulation language; allow integration and sharing of data across applications;

· support the storage of very large amounts of data, measured in gigabytes or more, for a long time, protecting them from accidental damage and unauthorized use, and also provide database modification and access to data through queries, i.e. guarantee the security and integrity of data;

· control access to data simultaneously for many users; exclude the influence of one user’s request on another’s request and prevent simultaneous access, which could corrupt the data, i.e. ensure concurrency control of data access.

The database system consists of the following components:

· Users, i.e. people who use data.

· Applications, i.e. user programs that require data from the system.

· DBMS is software that manages access to data and provides the specified functionality of a database system.

· Data, i.e. strings stored in files.

· Host system is the computer system on which files are stored. Data rows are accessed by the host system. The role of the DBMS is to generate queries that enable the host system's file management functionality to be used to serve various applications. A DBMS is an additional layer of software built on top of the host system software.

Thus, a system with a database can be represented as the following sequence of levels:

At the lowest level there is data stored in physical files (physical database memory). At the top level - applications with their own representations of the same physical data. Each database view is a specific logical structure built from the underlying physical data. To provide an interface between the physical memory of the database and its various logical versions (multiple supported views), the DBMS, in turn, must consist of several layers.

2. Three-level database architecture.

The distinction between logical and physical representation of data was formally recognized in 1978 when the committee ANSI/SPARC proposed a generalized structure of database systems. This structure is called three-tier architecture. The three levels of architecture are: internal, conceptual and external.

Internal level – this is the level that determines the physical appearance of the database, closest to physical storage and is associated with methods of storing information on physical storage devices. Associated with this layer are disk drives, physical addresses, indexes, pointers, etc. This layer is the responsibility of physical database designers who decide which physical devices will store data, what access methods will be used to retrieve and update data, and what measures should be taken to maintain or improve the performance of the database management system. Users do not touch this level.

Conceptual level – structural level that defines the logical design of the database. At this level, the conceptual design of the database is performed, which includes analyzing the information needs of users and identifying the data elements they need. The result of conceptual design is a conceptual diagram, a logical description of all data elements and the relationships between them.

External level – the structural level of the database, which defines user views of data. Each user group receives its own view of the data in the database. Each such data view provides a user-centered description of the data elements that make up the data view and the relationships between them. It can be directly derived from the conceptual framework. The collection of such user data views provides the external level.

User and Application Views	External level

Displays

Conceptual diagram	Conceptual level

Display	Internal level

Host system

Stored data

Rice. DBMS levels

3. Database life cycle.

The process of designing, implementing and maintaining a database system is called life cycle of the database (LDC). The procedure for creating a system is called system life cycle (SLC).

Understanding and the correct approach to LCBD is very important and requires detailed consideration, since it is based on the approach data-centric. Data elements are more stable than the system functions performed. Creating the right data structure requires complex analysis of classes of data units and the relationships between them. If you build a logical database schema, then in the future you can create any number of functional systems that use this schema. The function-oriented approach can only be used to create temporary systems that are designed for a short period of operation.

LCBD consists of the following stages:

1. Pre-planning – database planning, carried out in the process of developing a strategic database plan. During the planning process, the following information is collected:

· what application programs are used and what functions they perform;

· what files are associated with each of these applications;

· what new applications and files are in the works.

This information helps determine how application information is used and determine future requirements for the database system.

The information from this stage is documented in the form of a generalized data model.

2. Feasibility check . Here the technological, operational and economic feasibility of the database creation plan is determined, i.e.:

· technological feasibility – is the technology available to implement the planned database?

· operational feasibility – are there the funds and experts needed to successfully implement the database plan?

· economic feasibility – can conclusions be determined? Will the planned system pay for itself? Is it possible to estimate costs and benefits?

3. Defining Requirements includes the selection of database goals, clarification of information requirements for the system and requirements for hardware and software. Thus, at this stage of data collection and requirements definition, a general information model, expressed in the following tasks:

· The goals of the system are determined by analyzing information needs. It also necessarily indicates what kind of database should be created (distributed, holistic) and what communication tools are needed. The output document is a comment describing the goals of the system.

· Determination of user requirements: documentation in the form of generalized information (comments, reports, surveys, questionnaires, etc.); fixing system functions and identifying application systems that will fulfill these requirements. The data is presented in the form of relevant documents.

· Determination of general hardware and software requirements related to maintaining the desired level of performance. (Find out the number of system users, the number of input messages per day, the number of printouts). This information is used to select types of computers and DBMS, disk capacity, and number of printers. The data from this stage is presented in a report containing sample hardware and software configurations.

· Developing a plan for the phased creation of the system, including the selection of initial applications.

4. Conceptual design – creation of a conceptual database diagram. Specifications are developed to the extent necessary to move to implementation.

The main output document is a single information model(or database schema at the conceptual level). When developing this model, information and functions that the system must perform, determined at the stage of collecting and determining system requirements, are used. At this stage, it is also desirable to define: 1) rules for the data; 2) rules for processes; 3) rules for the interface.

5. Implementation – the process of turning a conceptual model into a functional database. It includes the following steps.

1) Selecting and purchasing the necessary DBMS.

2) Converting the conceptual (infological) database model into a logical and physical data model:

· Based on the infological data model, a data schema is built for a specific DBMS; if necessary, the database is denormalized in order to speed up query processing in all time-critical applications;

· it is determined which application processes need to be implemented in the data schema as stored procedures;

· implement constraints designed to ensure data integrity and enforce data rules;

· design and generate triggers to implement all centrally defined data rules and data integrity rules that cannot be specified as constraints;

· develop an indexing and clustering strategy; estimate the sizes of all tables, clusters and indexes;

· determine user access levels, develop and implement security and audit rules. Create roles and synonyms to provide multi-user access with consistent levels of access permissions.

· develop a network topology of the database and a mechanism for seamless access to remote data (replicated or distributed database).

3) Construction of a data dictionary that defines the storage of database data structure definitions. The data dictionary also contains information about access permissions, data protection rules and data control.

4) Filling the database.

5) Creation of application programs, management control.

6) User training.

6. Evaluation and improvement of the database schema. Involves surveying users to identify functional unmet needs. Changes are made as necessary, adding new programs and data elements as needs change and expand.

Thus, the LCBD includes:

· Study the subject area and provide relevant documentation (1-3).

· Construction of an information model (4).

· Implementation (5).

· Performance evaluation and database support (6).

4. DBMS architecture.

Rice. Main components of the DBMS

Data, metadata - contain not only data, but also information about the structure of the data ( metadata). In a relational DBMS, metadata includes system tables (relationships), names of relationships, names of attributes of those relationships, and data types of those attributes.

Often the DBMS supports indices data. Index is a data structure that helps quickly find data elements given part of their value (for example, an index that finds tuples of a particular relation that have a given value of one of the attributes). Indexes are part of the stored data, and descriptions indicating which attributes the indexes have are part of the metadata.

Memory Manager -receives the required information from the data storage location and changes the information in it at the request of higher levels of the system.

In simple database systems, the memory manager can be the operating system's file system. However, to improve efficiency, the DBMS usually performs direct memory control. The memory manager consists of two components:

· File manager monitors the location of files on the disk and obtains the block or blocks containing the files when requested by the buffer manager (the disk is generally divided into disk blocks- adjacent memory areas containing from 4000 to 16000 bytes).

· Buffer manager manages main memory. It receives blocks of data from disk through a file manager and selects a main memory page to store a particular block. It can temporarily store a disk block in main memory, but returns it to disk when a main memory page is needed for another block. Pages are also returned to disk when requested by the transaction manager.

"Request" processor - processes requests and requests changes to data or metadata. It suggests the best way to perform the required operation and issues appropriate commands to the memory manager.

The query processor (manager) turns a query or database action that can be executed at a very high level (for example, as a query SQL ), into a sequence of requests for stored data such as individual tuples of a relation or parts of an index on a relation. Often the most difficult part of processing request is his organization, i.e. choosing good query plan or a sequence of requests to the memory system responding to the request.

Transaction Manager - is responsible for the integrity of the system and must ensure the simultaneous processing of many requests, the absence of interference of requests (addition, min, max ) and data protection in case of system failure. It interacts with the query manager because it needs to know what data is affected by current queries (to avoid conflicts) and can defer some queries and operations to avoid conflicts. The transaction manager also interacts with the memory manager because data protection schemes typically involve storing a data change log. If the operation is performed correctly, the file registration will contain a record of changes, so you can re-execute even those changes that did not reach the disk due to a system failure.

Typical DBMSs allow the user to group multiple queries and/or changes into a single transaction. Transaction is a group of operations that must be performed sequentially as one whole.

Typically, a database system supports multiple transactions simultaneously. It is the correct execution of all such transactions that ensures transaction manager. Correct execution of transactions is ensuredACID -properties (atomicity, consistency, isolation, durability):

· atomicity- execution of either all transactions or none of them (for example, withdrawing money from an ATM and making a corresponding debit to the client’s account must be a single atomic transaction; each of these operations is not allowed to be performed separately);

· consistency - a state in which the data meets all possible expectations (for example, the consistency condition for an airline database is that no seat on the plane is reserved for two passengers);

· insulation- when two or more transactions are executed in parallel, their results must be isolated from each other. The simultaneous execution of two transactions at the same time should not lead to a result that would not have occurred if they were executed sequentially (for example, when selling tickets for the same flight in the event of a free last seat when two agents request simultaneously, the request of one must be fulfilled, the other - No);

· longevity - after the transaction is completed, the result should not be lost in the event of a system failure, even if this failure occurs immediately after the completion of the transaction.

Let's also consider 3 types of access to the DBMS:

1. Requests - Questions about data can be generated in two ways:

a)by using common query interface(for example, a relational DBMS allows queries SQL , which are transmitted to the request processor, and also receives responses to them);

b) with the help application program interfaces- requests are transmitted through a special interface (arbitrary requests cannot be transmitted through this interface);

2. Modifications - These are operations to change data. They can also be executed either through a common interface or through an application program interface;

3. Circuit modifications - These are database administrator teams that have the right to change the database schema or create a new database.

Client/server architecture. Many versions of modern software implement the architecture client/server: One process (the client) sends a request to another process (the server) to execute. Typically, a database is often split into a server process and several client processes.

In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces, which interact with the user and send queries or other commands to the server. For example, a relational DBMS often uses the language SQL to represent requests from client to server. The database server then provides the client with a response in the form of a table (relationship). There is a tendency to increase the load on the client, since if there are many simultaneously working database users, problems may arise with the server.

5. Relational data model.

The RMD of a certain subject area is a set of relationships that change over time. When creating an information system, a set of relationships allows you to store data about objects of the subject area and model the connections between them.

Attitude is a two-dimensional table containing some data. Mathematically underN -ary relation R understand the Cartesian product set D 1 D 2 … D n sets ( domains) D 1, D 2, …, D n (), optionally different:

R D 1 D 2 … D n ,

where D 1 D 2 … D n – complete Cartesian product, i.e. a set of all possible combinations of n elements each, where each element is taken from its own domain.

Domain is a semantic concept. A domain can be thought of as a subset of values of some data type that have a specific meaning. The domain is characterized by the following properties:

· The domain has unique name(within the database).

· The domain is defined at some simple data type or on a different domain.

· A domain may have some logical condition, which allows you to describe the subset of data that is valid for a given domain.

· The domain carries a certain semantic load.

Relationship attribute there are a couple of the kind<Имя_атрибута: Имя_домена>. Attribute names must be unique within the relationship. Often the attribute names of a relationship are the same as the names of the corresponding domains.

Ratio , defined on multiple domains, contains two parts: a header and a body.

Relationship header is a fixed number of relation attributes:

The relation head describes the Cartesian product of domains on which the relation is defined. The header is static; it does not change while working with the database. If attributes are changed, added or deleted in a relation, then the result will be other relationship (even with the same name).

Body relationship contains many tuples relationship. Every tuple relation represents a set of pairs of the form<Имя_атрибута: Значение_атрибута>:

such that the attribute value belongs to the domain . The body of the relation is a set of tuples, i.e. a subset of the Cartesian product of domains. Thus, the body of a relation is actually a relation in the mathematical sense of the word. The body of the relationship can change while working with the database - tuples can be changed, added and deleted.

The relationship is usually written as:

or shorter

or simply

The number of attributes in a relation is called degree (or -arity ) relationship. The cardinality of a set of tuples of a relation is called power relationship.

Relationship diagram is a list of attribute names of a given relationship indicating the domain to which they belong:

If attributes take values from the same domain, then they are called -comparable, where is the set of valid comparison operations specified for a given domain. For example, if a domain contains numeric data, then all comparison operations are valid for it, then . However, for domains containing character data, not only comparison operations for equality and inequality of values can be specified. If a given domain has a lexicographic ordering, then it also has a full range of comparison operations.

Schemes of two relations are called equivalent , if they have the same degree and it is possible to order the attribute names in the schemas in such a way that comparable attributes, that is, attributes that take values from the same domain, will be in the same places:

Let – relation diagram. – schema of the relationship after ordering the attribute names. Then

Thus, for equivalent relations the following conditions are satisfied:

· The tables have the same number of columns.

· The tables contain columns with the same names.

· Columns with the same names contain data from the same domains.

· The tables have the same rows, but the order of the columns may vary.

All such tables are different Images the same relationship.

Properties of relationships. The properties of relations follow directly from the above definition of relation. These properties are the main differences between relationships and tables.

· There are no identical tuples in a relation .

· Tuples are not ordered (top to bottom) .

· Attributes are not ordered (from left to right) .

· All attribute values are atomic .

Rice. Schematic representation of the relationship

Relational model is a database in the form of a set of interconnected relationships. In each connection, one relationship can act as the main one, and another relationship acts as a subordinate one. Thus, one tuple of a main relation can be associated with several tuples of a subordinate relation. To support these relationships, both relationships must contain the sets of attributes by which they are related. Basically this is primary key of the relationship , which uniquely defines the tuple of the main relation. To model a relationship, a subrelation must have a set of attributes that matches the primary key of the main relationship. However, here this set of attributes is already secondary key or foreign key , i.e. it defines a set of relation tuples that are associated with a single tuple of the main relation.

6. Design of relational databases.

When designing a relational database, the following problems must be solved:

1) Taking into account the semantics of the subject area, it is necessary to best represent the objects of the subject area in the form of an abstract data model (data design). Those. - decide on the database schema: what relationships the database should consist of, what attributes these relationships should have, what are the connections between the relationships.

2) Ensure the efficiency of executing database queries (physical database design).

After the datalogical design stage, the following resulting documents should be obtained:

· Building a correct data schema based on the relational data model.

· Description of the database schema in terms of the selected DBMS.

· Description of external models in terms of the selected DBMS.

· Description of declarative rules for maintaining database integrity.

· Development of procedures for maintaining the semantic integrity of the database.

So, the task of designing a relational database is to select a database schema from many alternative options.

Correct is a database schema in which there are no unwanted dependencies between relation attributes. The process of developing a correct database schema is called logical design .

Designing a database schema can be done in two ways:

· Decomposition (partition) method – the original set of relations included in the database schema is replaced by another set of relations that are projections of the original relations! At the same time, the number of relationships increases.

· Synthesis method – layout of a database schema from given initial elementary dependencies between objects of the subject area.

Classical database design is associated with theory normalization , which is based on the analysis of functional dependencies between relationship attributes. Functional dependencies define stable relationships between objects and their properties in the subject area under consideration.

The decomposition method is a process of sequential normalization of relation schemes: each new iteration corresponds to a higher order normal form and has better properties compared to the previous one. Thus, the existence of a universal relation containing all the attributes of the database is initially assumed, then, based on the analysis of connections between attributes, a decomposition of the universal relation is carried out (or an attempt is made), i.e. transition to several relations of lower dimension, and the original relation must be restored using a natural join operation.

So, each normal form corresponds to a certain set of constraints, and a relation is in a certain normal form if it satisfies its inherent set of constraints.

In the theory of relational databases, the following normal forms are usually distinguished:

first normal form (1 NF);

· second normal form (2 NF);

· third normal form (3 NF);

· Bays-Codd normal form ( BCNF);

· fourth normal form (4 NF);

· fifth normal form or projection form - compounds (5 NF or PYNF).

Basic properties of normal forms:

· each successive normal form is in some sense better than the previous one;

· when moving to the next normal form, the properties of the previous normal properties are preserved.

Database schemas are called equivalent, if the contents of the source database can be obtained by a natural connection of the relations included in the resulting schema, and no new tuples appear in the source database.

7. Normal forms of relationships.

The normalization process is based on an adequate reflection of the subject area in the form of tables containing data about the modeled object, and the ability to change the state of the database over time. As a rule, due to a mismatch between the domain data model, anomalies may occur that appear when performing the corresponding operations:

· Insertion Anomalies (INSERT) – storage of heterogeneous information in one respect.

· Update anomalies (UPDATE) –Redundancy of relationship data due to heterogeneous storage.

· Deletion anomalies (DELETE) – storage of heterogeneous information in one relation.

It is also necessary to take into account the emerging undefined ( NULL) values. In different DBMSs, when performing various operations (comparing, merging, sorting, grouping, etc.) two NULL -values may or may not be equal to each other, have different effects on the result of performing operations to determine average values and find the number of values. To eliminate errors in many DBMSs it is possible to replace NULL -values are zero when performing calculations, declaring all NULL -values equal to each other, etc.

Normalization – splitting a table into several, which have better properties when updating, inserting and deleting data. Those. normalization is the process of sequentially replacing a table with its complete decompositions until they are all in 5NF; however, in practice it is sufficient to convert the tables to BCNF.

The normalization procedure is based on the fact that the only functional dependencies in any table should be dependencies of the form , where is the primary key and is some other field. Therefore, during the normalization process, you should get rid of all “other” functional dependencies, i.e. from those that have a different appearance than .

If we replace the codes of primary (foreign) keys during normalization, then we should consider 2 cases:

1. The table has a composite primary key, for example, and a field that functionally depends on part of this key, for example, on (it does not depend on the full key). It is recommended to create another table containing and ( – primary key) and delete from the original table:

Replace, primary key, federal law

on , primary key

and , primary key .

2. The table has a primary (possible) key, a field that is not a possible key, but functionally depends on, and another non-key field that functionally depends on:. It is recommended to create a table containing both ( - primary key) and - delete from the original table: It should be noted that to carry out such operations, one should initially have some “large” (universal) relations as input data.

Def.1. The relationship is in first normal form (1NF) if and only if none of its rows contains a single value in any of its fields and none of the key fields of the relation are empty.

According to definition 1, any relation will be in 1NF, i.e. a relation that satisfies the properties of relations: there are no identical tuples in the relation; tuples are not ordered; attributes are not ordered and differ by name; all attribute values are atomic.

Def.2. The relationship is in second normal form (2NF) if and only if the relation is in 1NF and there are no non-key attributes that depend on part of the complex key (that is, all fields not included in the primary key are related by full functional dependence to the primary key).

If the candidate key is prime, then the relation is automatically in 2NF.

To eliminate the dependence of attributes on part of a complex key, it is necessary to perform decomposition multi-relationship relationships. Attributes that depend on part of a complex key are placed in a separate relation.

The attributes of a relationship are called mutually independent , if neither of them is functionally dependent on the other.

Def.3. The relationship is in third normal form (3NF) if and only if the relation is in 2NF and all non-key attributes are mutually independent (that is, none of the non-key fields of the relation depend functionally on any other non-key field).

To eliminate the dependency of non-key attributes, you need to decompose the relationship into several relationships. In this case, those non-key attributes that are dependent are placed in a separate relation.

When reducing relations using the normalization algorithm to relations in 3NF, it is assumed that all relations contain one candidate key. This is not always true. There are times when a relation may contain multiple keys.

Def.4. The relationship is in Bays-Codd normal form (NFBK) if and only if the determinants of all functional dependencies are potential keys (or if any functional dependence between its pals is reduced to a complete functional dependence on a possible key).

If a relation is in BCNF, then it is automatically in 3NF, as follows from Definition 4. To eliminate the dependence on determinants that are not potential keys, decomposition should be carried out, placing these determinants and the parts that depend on them into a separate relation.

There are times when a relation does not contain any functional dependencies. Those. attitude is completely key, i.e. the key of a relationship is the entire set of attributes. Thus, we have a multivalued dependence, because There is still a relationship between the attributes.

Def.5. The relationship is in fourth normal form (4NF) if and only if the relation is in BCNF and does not contain non-trivial multivalued dependencies.

Relations with non-trivial multivalued dependencies arise, as a rule, as a result of a natural connection of two relations over a common field, which is not key in any of the relations. In reality, this leads to storing information about two independent entities in one relation.

To eliminate non-trivial multivalued dependencies, you can decompose the original relation into several new ones.

Def.6. The relationship is in fifth normal form (5NF) if and only if any connection dependency present is trivial.

Def.6. identically also follows the definition.

Def.7. A relation is not in 5NF if the relation has a non-trivial join dependency.

That. If in every complete decomposition all projections of the original relation contain a possible key, we can conclude that the relation is in 5NF. A relation that does not have any complete decomposition is also in 5NF.

Without knowing anything about what potential keys are in a relation and how the attributes are interconnected, one cannot say that a given relation is in 5NF or other normal forms.

Possible clue relation is a set of relation attributes that completely and uniquely (functionally completely) determine the values of all other attributes of the relation. In general, a relation can have multiple possible keys. Among all possible keys of a relationship, one is usually selected, which is considered the main one and which is called the primary key of the relationship.

Mutually independent attributes – these are attributes that do not depend on one another. If there are several physical laws in a relation, then each attribute or set of attributes on which another attribute depends is called a determinant of the relation.

9. Relational algebra.

Relational algebra provides a framework for accessing relational data. The main purpose of algebra is to provide expressions that can be written down. Expressions can be used for:

· area definitions samples, i.e. defining the data for selection as a result of the sampling operation;

· area definitions updates, i.e. defining data to be inserted, modified, or deleted as a result of an update operation;

· definition (named) virtual relations, i.e. presentation of data for visualization through views;

· snapshot definition, i.e. defining the data to be stored as a “snapshot” of the relationship;

· defining safety rules, i.e. determination of data for which access control is carried out;

· determination of sustainability requirements, i.e. determining the data that is included in the scope for certain concurrency control operations;

· defining integrity rules, i.e. some specific rules that a database must satisfy, along with general rules that represent part of the relational model and apply to every database.

Implementations of specific relational DBMSs currently do not use pure relational algebra or relational calculus in their pure form. The de facto standard for accessing relational data has become SQL (Structured Query Language).

Relational algebra, defined by Codd, consists of 8 operators comprising 2 groups:

traditional set operations (union, intersection, subtraction, Cartesian product);
special relational operations (selection, projection, connection, division).

In addition, the algebra includes an assignment operation, which allows you to save the results of calculating algebraic expressions in the database, and an attribute renaming operation, which makes it possible to correctly form the header (schema) of the resulting relationship.

A brief overview of relational algebra operators.

Sample– returns a relation that contains all tuples of a certain relation that satisfy some conditions. The sampling operation is also called the restriction operation ( restrict - limitation, now sampling is more often accepted - SELECT ).

Projection– returns a relation containing all the tuples (i.e. - sub-tuples) of a particular relation after excluding some attributes from it.

Work– returns a relation containing all possible tuples that are a combination of two tuples belonging to two defined relations, respectively.

An association– returns a relation containing all tuples that belong to either or both of two defined relations.

Intersection –returns a relation containing all tuples that belong simultaneously to two defined relations.

Subtraction –returns a relation containing all tuples that belong to the first of two defined relations and not to the second.

Connection (natural) – returns a relation whose tuples are a combination of two tuples (belonging respectively to two defined relations) that have a common value for one or more common attributes of the two relations (and such common values appear only once in the resulting tuple, not twice).

Division –for two relations, binary and unary, returns a relation containing all the values of one attribute of the binary relation that match (in the other attribute) all the values in the unary relation.

LITERATURE

1. Date K.J. Introduction to Database Systems, 6th edition: Trans. from English - TO.; M.; St. Petersburg: Williams Publishing House, 2000. – 848 p.

2. Connolly T., Begg K., Strachan A. Databases: design, implementation and maintenance. Theory and practice, 2nd ed.: Trans. from English – M.: Williams Publishing House, 2000. – 1120 p.

3. Karpova T.S. Databases: models, development, implementation. – St. Petersburg: Peter, 2001. – 304 p.

4. Faronov V.V., Shumakov P.V. Delphi 4. Database Developer's Guide. – M.: “Knowledge”, 1999. – 560 p.

5. J. Groff, P. Weinberg. SQL: Complete Guide: Per. from English – K.: BHV Publishing Group, 2001. – 816 p.

6. Ken Goetz, Paul Litwin, Mike Gilbert. Access 2000. Developer's Guide. T.1, 2. Per. from English – K.: BHV Publishing Group, 2000. – 1264 p., 912 p.

7. Maklakov S.V BPwin and EPwin. CASE-tools for information systems development. – M.: DIALOG-MEPhI, 2001. – 304 p.

8. Ullman D., Widom D. Introduction to database systems / Transl. from English – M.: “Lori”, 2000. – 374 p.

9. Khomonenko A.D., Tsygankov V.M., Maltsev M.G. Databases: Textbook for higher educational institutions / Ed. Prof. A.D. Khomonenko. – St. Petersburg: CORONA print, 2000. – 416 p.

Development of a relational database. Designing relational databases based on normalization principles: the first steps of normalization

Send your good work in the knowledge base is simple. Use the form below

Relational Database Design

During the database design process, the following problems may arise:

Update anomalies - Due to data redundancy, all data must be reviewed when updating it, however, a situation may arise where not all data is updated (potential data inconsistency).

Inclusion anomalies - it is possible that data cannot be entered into the database until some additional information has been received and entered.

Deletion anomalies - the opposite problem may occur when some data is deleted (possible loss of useful information).

The number of Null values ​​is not minimized. Just like redundancy, null values ​​are a source of potential problems in relational databases because it is impossible to determine what they mean. Therefore, it is advisable to minimize their use.

The first three problems are resolved in the process of normalizing relations.

Normalization of relations

The procedure for normalizing relations is reversible. For example, a set of relations in 3NF can be converted to relations in 2NF. This very important property of normalization means that no information is lost during the normalization process.

Normalization theory is based on the presence of certain dependencies between table fields. Emphasis is placed on functional, multivalued, and connection dependencies.

Functional dependencies

Normal forms justified by functional dependencies

We mentioned first normal form (1NF). Let us give a more strict definition of it, as well as definitions of other normal forms.

Normal Boyce-Codd form

So far, we have assumed for simplicity that each relationship has only one candidate key - the primary key. The above definition of 3NF is not entirely suitable if

- the relation has two or more candidate keys;

- two candidate keys are complex and they overlap (have at least one common attribute).

Therefore, the definition of 3NF was supplemented normal shape Boyce-Codd ( Boyce-Codd) - NFBC. It can be formulated like this:

Attitude located V normal form Boyce-Codd Then And only Then, When determinants are potential keys.

In other words, in a functional dependency diagram, the arrows should only start with candidate keys.

A combination of such conditions is not often found in practice, so for relationships without such conditions, 3NF and BCNF are equivalent.

Let's give one more definition: Table located V normal form Boyce-Codd (NFBC), Then And only Then, When any functional addiction between her fields comes down to To irreducible functional dependencies from potential key.

Consider an example involving two non-overlapping candidate keys:

Supplier (P№, Name_P, Status, City),

Here are examples of relationships in which potential keys overlap.

First example: Delivery relation (P№, Name_P, D№, Qty).

Supplier (P#, Name_P) and Supplies 1 (P#, D#, Qty).

Second example: Ratio SDP (S, D, P),

where attributes stand for Students, Disciplines and Teachers. The tuple of the relation SDP means that some student C is studying some discipline D from some teacher P. There are restrictions:

- Each student studies a given subject with one teacher;

- Each teacher teaches only one subject (but each subject can be taught by several teachers).

Normal forms justified by more complex dependencies

Normalization and design procedure

eliminating certain types of redundancy;

fixes some update, enable and delete anomalies;

designing a database layout that is a “good” representation of the real world, is intuitive, and provides a good basis for further development;

simplifying the process of imposing integrity constraints.

Let us list the basic rules that are used in the normalization procedure.

1. The unified relation must be reduced to 1NF.

2. Relations in 1NF should be divided into projections to exclude all functional dependencies that are not irreducible.

Literature

Similar documents

Database Design Guide.

1. Introduction.

Relational model.

Examples.

Design independent of RDBMS.

2. History.

Database tables.

History of the relational model.

3. Characteristics of relational databases.

Using keys.

No data redundancy.

Input limitation.

Maintaining data integrity.

Assignment of rights.

Structured Query Language (SQL).

Portability.

Information Systems

Rice. Main components of the DBMS

The number of Null values is not minimized. Just like redundancy, null values are a source of potential problems in relational databases because it is impossible to determine what they mean. Therefore, it is advisable to minimize their use.