Data models. using screen forms specially created for this by the user. Hierarchical database model

As noted, the information model displays real world into some human-readable concepts that are completely independent of the parameters of the data storage environment. There are many approaches to building such models: graph models, semantic networks, entity-relationship model, etc. The most popular of these has proven to be the entity-relationship model, which will be discussed in Chapter 2.

The information model must be mapped into a computer-oriented datalogical model that is “understandable” by the DBMS. In the process of developing theory and practical use databases, as well as tools computer technology DBMSs were created that supported various datalogical models.

First, hierarchical datalogical models began to be used. Ease of organization, availability in advance given connections between entities, similarity with physical data models made it possible to achieve acceptable performance of hierarchical DBMSs on slow computers with very limited amounts of memory. But, if the data did not have a tree structure, then a lot of difficulties arose when building a hierarchical model and the desire to achieve the desired performance.

Network models were also created for low-resource computers. These are quite complex structures consisting of “sets” - named two-level trees. “Sets” are connected using “link records”, forming chains, etc. When developing network models, many “little tricks” were invented that made it possible to increase the performance of the DBMS, but significantly complicated the latter. Application programmer must know a lot of terms, study several internal DBMS languages, present in detail logical structure databases for navigating among different instances, sets, records, etc. One of the developers operating system UNIX said " Network base- this is the most the right way lose data."

The complexity of the practical use of hierarchical and network DBMSs forced us to look for other ways to present data. At the end of the 60s, DBMSs based on inverted files appeared, characterized by ease of organization and the presence of very convenient data manipulation languages. However, such DBMSs have a number of restrictions on the number of files for storing data, the number of connections between them, the length of the record and the number of its fields.

The most common models today are relational models, which will be discussed in detail in Chapter 3.

The physical organization of data has a major impact on the operational characteristics of the database. DBMS developers are trying to create the most productive physical models data, offering users one or another tool for customizing the model for a specific database. The variety of ways to adjust the physical models of modern industrial DBMSs does not allow us to consider them in this section.

Database organization models

1. Hierarchical approach to organizing databases. Hierarchical databases have the form of trees with arc-links and nodes-data elements. The hierarchical structure implied inequality between data - some were strictly subordinate to others. Such structures, of course, clearly satisfy the requirements of many, but not all, real-life problems.

2. Network data model. In network databases, along with vertical connections, horizontal connections are also implemented. However, many hierarchical shortcomings have been inherited, and the main one is the need to clearly define physical level data connections and just as clearly follow this connection structure when querying the database.

3. Relational model. The relational model emerged from the desire to make the database as flexible as possible. This model provided a simple and effective mechanism for maintaining data links.

Firstly, all data in the model is presented in the form of tables and only tables. The relational model is the only one that ensures uniformity of data presentation. Both entities and the connections of these very entities are represented in the model in exactly the same way - tables . True, this approach complicates the understanding of the meaning of the information stored in the database, and, as a result, the manipulation of this information.

Allows you to avoid the difficulties of manipulation second element models – a relationally complete language (note that language is an integral part of any data model, without it the model does not exist). The completeness of a language when applied to a relational model means that it must perform any operation of relational algebra or relational calculus (the completeness of the latter has been proven mathematically by E.F. Codd). Moreover, the language must describe any query in terms of operations on tables, not on their rows. One such language is SQL.

Third element relational model requires the relational model to maintain some integrity constraints. One such constraint states that each row in a table must have a certain unique identificator, called primary key . The second limitation is imposed on the integrity of links between tables. It states that table attributes that reference the primary keys of other tables must have one of those primary key values.

4. Object-oriented model. New areas of computing use, such as scientific research, computer-aided design and automation of institutions, required databases to be able to store and process new objects - text, audio and video information, as well as documents. The main difficulties of object-oriented data modeling stem from the fact that such a developed mathematical apparatus on which a general object-oriented data model could be based does not exist. This is largely why there is still no basic object-oriented model. On the other hand, some authors argue that a general object-oriented data model in the classical sense cannot be defined because the classical concept of a data model is unsuitable for the object-oriented paradigm. Despite the advantages of object-oriented systems - implementation of complex data types, communication with programming languages, etc. - superiority in the near future relational DBMS guaranteed.

5.3.3 Data models and conceptual modeling

It was already mentioned above that a schema is created using some data definition language. In fact, it is created based on the data definition language of a specific target DBMS, which is a relatively low level; with its help, it is difficult to describe the data requirements so that the created diagram is understandable to users of the most different categories. To achieve such an understanding, it is necessary to create a description of the circuit in some, more high level, which we will call the data model. In this case, by a data model we will understand an integrated set of concepts for describing data, connections between them and restrictions imposed on data within a certain subject area.

A model is a representation of objects and events in a subject area, as well as the relationships that exist between them. A data model can be thought of as a combination of the following three components.

· Structural part, i.e. a set of rules by which a database can be built.

· The control part, which determines the types of permissible operations with data (this includes operations of updating and retrieving data, as well as operations of changing the database structure).

· A set of data integrity constraints that guarantee the correctness of the data used.

The purpose of building a data model is to present data in an understandable way. If such a representation is possible, then the data model can be easily applied when designing a database. To represent the ANSI-SPARC architecture, the following three can be defined: related models data:

· external model data that displays the views of each user type existing in the organization;

· conceptual model data, displaying a logical (or generalized) view of the data, independent of the type of DBMS selected;

· internal data model displaying the conceptual diagram in a certain way, understandable by the selected target DBMS.

Quite a number of data models have been proposed and published in the literature. They are divided into three categories: object-based data models, record-based data models, and physical data models. The first two are used to describe data at the conceptual and external levels, and the last one at the internal level.

Object data models. When building object models data uses concepts such as entities, attributes and relationships. An entity is a separate element (employee, product, concept or event) of a subject area that must be represented in the database. An attribute is a property that describes some aspect of an object and whose value should be captured, and a relationship is an associative relationship between entities. Listed below are some of the most common types object data models.

    • Entity-Relationship model or ER model.
    • Semantic model.
    • Functional model.
    • Object-oriented model.

Currently, the ER model has become one of the main methods for conceptual database design. The object-oriented model extends the definition of an entity to include not only the attributes that describe the state of the object, but also the actions that are associated with it, i.e. his behavior. In this case, the object is said to encapsulate state and behavior.

Record-based data models. In a record-based model, the database consists of several fixed-format records that can be different types. Each record type defines a fixed number of fields, each of which has a fixed length. There are three main types of record-based logical data models: relational model relational data model, network data model and hierarchical model data (hierarchical data model).

There are many ways to implement elementary data units, and therefore a variety of data models are known. The data model provides rules for structuring it. As a rule, operations on data are related to their structure. Varieties existing models data correspond to a variety of applications and user preferences.

To represent data, models based on forms of information representation are used. Such models are called syntactic.

In the specialized literature there is a description of quite large quantity data models. Hierarchical, network and relational methods are widely used. In addition to them, the most famous are the binary model and the semantic network.

The classic, longest used data model is considered to be based on hierarchical tree type structure(a fragment is shown in Fig. 10). The "ordered tree" model is also often used, in which the relative order of the subtrees is significant. In such a model, each subsequent unit of information is associated with only one previous one, and previous ones can be associated with several subsequent ones.


Network data model is based on such a representation of information in which one unit of information can be associated with an arbitrary number of others (Fig. 11).


Relational data model based on tabular methods and means of presenting and manipulating data. In such a model, information about the subject area is displayed in a table called a “relationship” (Fig. 12). A row in such a table is called a tuple, and a column is called an attribute. Each attribute can take a subset of values ​​from a specific area - domain.

The vast majority of DBMSs focused on personal computers, are systems built on the basis of a relational data model - relational DBMS.

Binary data model is a graph model in which vertices reflect representations of simple, unambiguous attributes, and arcs represent representations of binary relationships between attributes (Fig. 13).


The binary model is not widely used, but in some cases it finds practical use. Yes, in the area artificial intelligence Research has long been conducted to represent information in the form of binary relations.

Semantic networks were proposed as data models by researchers working on various problems artificial intelligence. Just like in the network and binary models, basic structures semantic web can be represented by a graph, the set of vertices and arcs of which forms a network. However, semantic networks are designed to represent and systematize knowledge of a very general nature.

Thus, any graph model (for example, a labeled binary graph) can be considered a semantic network if it is initially clearly stated what the vertices and arcs mean and how they are used.

Semantic networks are a rich source of data modeling ideas that are extremely useful for solving the problem of representing complex situations. They can be used independently or in conjunction with ideas underlying other data models. Their interesting feature It is that the distance measured on the network (semantic distance, or metric) plays an important role in determining the proximity of interrelated concepts. At the same time, it is possible to explicitly emphasize that the semantic distance is large. As shown in Fig. 14, STORE correlates with the personality of the SELLER, and at the same time, the SELLER has WEIGHT. The relationship between personality and specialty is obvious, but this does not necessarily mean the relationship between STORE and WEIGHT.


It must be said that data models such as the semantic network, for all the richness of their capabilities in modeling complex situations, are characterized by complexity and some inefficiency in conceptual terms.

Let us dwell in more detail on relational, hierarchical and network models data.

Relational model data is characterized by the following components:

– information construction: a relationship with a two-level structure;

– valid operations: projection, selection, connection and some others;

– restrictions: functional dependencies between attributes of a relationship.

Each class of objects R the material world is put into correspondence with a certain set of attributes, for example A 1 , A 2 , ..., A n. Separate object class R described by a string of values ​​( a 1, a 2, ..., a n), Where a i– attribute value A i.

Line ( a 1, a 2, ..., a n) is called a tuple. An entire object class has a corresponding set of tuples called a relation. Let us denote the relation describing the class of objects R, also through R.

Expression R(A 1, A 2, ..., A n) is called a relation schema R.

For each tuple component, its relationship to the corresponding attribute must be specified. In the relational data model, to ensure this connection, the order of the components of the tuple coincides with the order of the attributes in the relation schema.

Each relationship reflects the state of a class of objects at some point in time. Therefore, one relationship scheme in different moments different relationships may correspond to time.

The set of relation values ​​can be represented in the form of a table in which the following correspondences are observed:

– the name of the table and the list of column names correspond to the relation diagram;

– a table row corresponds to a relation tuple;

– all rows of the table (and therefore all tuples) are different;

– the order of rows and columns is arbitrary (in particular, the relational data model does not require special sorting of rows).

The description of relationship processing processes can be done in two ways:

– indicating a list of operations, the implementation of which leads to the required result (procedural approach),

– a description of the properties that the resulting relation must satisfy (declarative approach).

Let's consider procedural approach. A set of relations and operations on them forms relational algebra. Typically, the list of operations contains projection, selection, union, intersection, subtraction, and join.

A projection is an operation that transfers the columns of the original relation specified in the operation condition to the resulting relation. The algebraic representation of the projection has the form

T = R[X],

Where R– initial attitude; T– resulting relation; X– list of attributes in the relationship structure T(projection condition).

Consider the relation O 1, containing information on product sales in 2010 (Table 1).

Table 1

Attitude T 1, containing information only about actual release products, is obtained as a result of performing the projection

T 1 = O 1[Store, Products, Fact]

and looks like a table. 2.

table 2

Columns can be specified in any order:

T 1 = ABOUT 1 [Products, Store, Fact].

A selection is an operation that transfers to the resulting relation those rows from the original relation that satisfy the selection condition. The selection condition is tested on each row of the relation individually and cannot span information across multiple rows. There are two simplest types of sampling conditions:

1) Attribute name 1<знак сравнения>Value where comparison signs =, #, >, ≥, are allowed<, ≤. Например: Цена > 100.

The attribute names must be contained in the source relation structure. The algebraic notation for the sample has the form

T = R[p],

Where R– initial attitude; T– resulting relation; R– sampling condition.

For example, we get the values T 2 = ABOUT 1 [Product = “P 1”] (Table 3).

Table 3

The operations of union, intersection and subtraction are performed on two original relations with the same structure.

Let us denote the initial relations by R 1 and R 2 , the resultant – through T.

An association T = U(R 1 , R 2) contains rows present either in relation R 1, or in R 2 .

Intersection T = I(R 1 , R 2) contains the rows present in the relations R 1 and R 2 at the same time.

Subtraction T = M(R 1 , R 2) contains those lines from R 1 that are missing in R 2 .

A relation join operation is performed on two source relations and creates one resultant relation. Each row of the first source relation is compared in turn with all the rows of the second relation, and if the join condition is met for this pair of rows, then they are concatenated and form the next row in the resulting relation. The connection condition has the form

Attribute name 1<знак сравнения>Attribute name 2,

where Attribute Name 1 is in one source relation and Attribute Name 2 is in another. We will use the following notation for the join operation:

T = R l [ p] R 2 ,

Where R 1 and R 2 – initial relations; T– resulting relation; R– connection condition.

One of the most important special cases of connection is called natural connection and has following features:

– the comparison sign in the connection condition is “=”;

– Attribute Name 1 and Attribute Name 2 must match, or rather, contain the intersection of the attribute lists of the source relations;

– the list of attributes of the resulting relation is formed as a result of combining the lists of attributes of the original relations.

The designation for a natural compound does not contain a compound condition and has the form T = R l* R 2 .

Declarative approach for processing relational databases data is based on the interpretation of concepts and methods of mathematical logic. In particular, relational calculus is based on predicate calculus. Let us name the concepts of mathematical logic necessary for relational calculus.

1. Symbols of variables and constants. In the language constructs of relational calculus, they correspond to the names of attributes and variables, as well as constants.

2. Logical connectives “and”, “or”, “not” and comparison signs =, # (not equal), >,<, ≥, ≤.

3. Terms, i.e. any constants and variables, as well as functions whose arguments are terms.

4. Elementary formulas are predicates whose arguments are terms. Predicates connected by the operations “and”, “or”, “not” are also elementary formulas. Elementary formulas are, for example, the expressions Surname = “Petrov” and Amount ≤ Total.

5. Formulas, i.e., the result of applying quantifiers of generality or existence to elementary formulas. The formula corresponds to a query to a relational database, expressed in terms of relational calculus.

The main task of designing an EIS database is to determine the number of relationships (or other constituent units of information) and their attribute composition.

The problem of grouping attributes into relationships, the set of which is not fixed in advance, allows for many different solutions. Rational grouping options must take into account the following requirements:

– the set of relations should ensure minimal redundancy in the presentation of information;

– adjustment of relationships should not lead to ambiguity or loss of information;

– restructuring of the set of relations when adding new attributes to the database should be minimal.

Normalization is one of the most studied methods of transforming relations, allowing to improve the characteristics of the database according to the listed criteria.

There are many restrictions on the values ​​stored in a relational database. Compliance with these restrictions in specific respects is associated with the presence of so-called normal forms. The process of converting database relationships to one or another normal form is called relationship normalization . Normal forms are numbered sequentially from 1 in ascending order. The larger the normal form number, the more restrictions on the stored values ​​must be observed in the relevant respect.

Constraints typical of the relational data model are functional and multivalued dependencies, as well as their generalizations. In principle, the set of additional restrictions can grow and, therefore, the number of normal forms will increase. The applied restrictions are aimed at reducing redundant information in the relational database.

A relation in first normal form (1NF) is an ordinary relation with a two-level structure. The next normal forms (second and third) use restrictions associated with the concept of functional dependence. Functional dependencies are defined for attributes that are in the same relation that satisfies 1NF.

The simplest case of functional dependence involves two attributes. In a relationship R(A, B, ..., J) attribute A functionally defines an attribute IN, if at any time each value A IN(denoted AIN).

In other words, IN functionally depends on A (IN = f(A)). The first designation turns out to be more convenient when the number of functional dependencies grows and their relationships become difficult to discern; it will be used in the future. The absence of functional dependence is denoted as AIN.

For attributes A And IN some relation, the following situations are possible:

– lack of functional dependence;

- Availability AIN(or INA), but not both dependencies together;

– presence of one-to-one correspondence AIN.

The concept of functional dependence extends to situations with three or more attributes in the following form. Attribute group ( A, IN, WITH) functionally defines the attribute D in a relationship T(A, B, C, D, ..., J), if each combination of values<a, b, With> matches a single value d (A- meaning A; b- meaning IN; With- meaning WITH; d- meaning D). The presence of such a functional dependence will be denoted by A, IN, WITHD.

The existence of functional dependencies is associated with the attribute encoding methods used. Thus, for many institutions, it can be argued that each department (as an object of the subject area) belongs to a single institution. However, this is not enough to prove the functional dependence of Department → Institution. If in each institution the departments are numbered sequentially, starting with 1, then the functional relationship is incorrect. If the department code, in addition to the number, also contains the institution code (or the uniqueness of the codes is ensured in some other way), then the functional relationship Department → Institution is valid.

For an indicator with many attribute attributes R = (R 1 , R 2 , ..., P n) and the base attribute Q functional dependence is valid RQ, although it cannot be said that this is the only dependence on the specified attributes.

Probable clue relation is such a set of attributes, each combination of values ​​of which occurs only in one row of the relation, and no subset of attributes has this property. There may be several possible keys in a relation. Their importance in data processing is determined by the fact that sampling by a known value of a probable key results in one relation row or none.

In practice, the attributes of the probable key of a relation are associated with the properties of those objects and events about which information is stored in the relation. If, as a result of adjusting the relationship, the names of the attributes that form the key have changed, then the information will be seriously distorted. Therefore, systematically checking the properties of a probable key makes it possible to control the reliability of information in a relation.

When there are multiple candidate keys in a relation, observing them simultaneously is very difficult. It is advisable to choose one of them as the main (primary) one. Primary key A relationship is called a probable key, the values ​​of which are used to control the reliability of information in the relationship.

With regard to economic information, in the vast majority of cases, relations obtained from existing economic documents contain a single probable key, which is also the primary key. This is because the content of economic documents is understood equally by all users. In what follows we will keep in mind only such relations. The presence of two or more probable clues in a relationship with meaningful information can be explained by the presence of several possible ways of interpreting the same data. The primary key is often simply called a key.

In relationships with a large number of rows, finding the primary key by directly applying the definition is quite difficult. In addition, at the EIS design stage, the values ​​of many relations are simply unknown, so practically the primary key of a relation is calculated based on the existing functional dependencies.

Each primary key value appears in only one row of the relationship. The value of any attribute in this line is also unique. If through TO denote the attributes of the primary key in the relation R(A, B, C, ..., J), then the following functional dependencies are valid: TOA, TOIN, TOWITH, ..., TOJ. A set of primary key attributes functionally defines any attribute of a relationship. The converse is also true: if a group of attributes is found that functionally defines all the attributes of a relationship individually, and this group cannot be reduced, then the primary key of the relationship is found.

For the initial set of functional dependencies, there are a number of patterns, knowledge of which allows one to obtain derived dependencies. Let's note some of them:

- If A, INA, then A, ININ;

- AIN And AWITH then and only when ASun;

- If AIN And INWITH, That AWITH;

- If AIN, That ACIN (WITH arbitrarily);

- If AIN, That ACSun (WITH arbitrarily);

- If AIN And SunD, That ACD.

If it is known in advance that there is only one probable key in a relation, then it can be found in a simple way. A probable key (if it is unique, i.e., matches the primary key) is a set of attributes that do not occur on the right sides of all functional dependencies. In other words, from the complete list of relation attributes, it is necessary to delete the attributes found on the right sides of all functional dependencies. The remaining attributes form the primary key.

A relation is in second normal form (2NF) if it corresponds to 1NF and does not contain incomplete functional dependencies.

An incomplete functional dependency consists of two dependencies:

– a probable relation key functionally defines some non-key attribute,

– part of the probable key functionally defines the same non-key attribute.

A relation that does not comply with 2NF is characterized by redundancy of stored data. A database is in 2NF if all of its relations are in 2NF.

A relation corresponds to 3NF if it corresponds to 2NF and there are no transitive functional dependencies (FD) among its attributes.

Transitive Federal Law includes two Federal Laws:

– a probable relation key functionally defines a non-key attribute;

– This attribute functionally defines another non-key attribute.

If TO– the key of relationship, A, IN– non-key attributes and TOA, AIN are fair federal laws, then they are transitive. A special case of a transitive federal law is an incomplete federal law, when TO = WITH, E And TOE, EA.

A database is in 3NF if all of its relations are in 3NF.

A relational database, generally corresponding to the third normal form, is characterized by a number of properties, knowledge of which facilitates and streamlines data processing processes. The implementation of database queries using relational algebra operators can be described by the following rules.

1. In the verbal formulation of the request, highlight the names of the attributes that make up the shell, the input and output of the request, as well as the selection conditions.

2. Fix many shell attributes. If all the necessary attributes are in any one relation, then subsequent selection and projection operations are carried out only with it. If the required attributes are distributed across several relationships, then these relationships must be connected. Each pair of relations is connected by the condition of equality of attributes with matching names (or defined on a common domain). After each connection, using projection, you can cut off attributes that are unnecessary for subsequent operations.

4. If a query can be divided into parts (subqueries), then its implementation is also divided into parts, where the result of each subquery is a separate relation.

5. This sequence of actions is standard, but may create intermediate relationships that are too large. This disadvantage can be compensated for by performing some selections and projections on the original relations (before making the join) and changing the relative order of the required joins.

Network database is represented as a set of relations and fan relations. Relationships are divided into primary and dependent.

Fan relation W(R, S) is a pair of relations consisting of one main ( R), one dependent relationship ( S) and the connections between them, provided that each value of the dependent relation is associated with a single value of the main relation. This condition is a limitation characteristic of the network data model as a whole. The way to implement this limitation in computer memory is different for different network DBMSs.

The operations allowed in the network data model represent different sampling options.

Network databases, depending on the restrictions on the entry of relations into fan relationships, are divided into two-level and multi-level networks.

A limitation of two-level networks is that each relationship can exist in one of the following roles:

– outside of any fan relationships;

– as the main relationship in any number of fan relationships;

– as a dependent relationship in any number of fan relationships.

It is prohibited for a relationship to exist as a main one in one context and at the same time as a dependent one in another.

Multi-level networks do not provide for any restrictions on the interconnection of fan relationships; in some network DBMSs even cyclic network structures are allowed.

For two-level network DBMSs, two more restrictions are introduced (from a theoretical point of view, optional):

– the primary key of the main relation can only be one-attribute;

– a fan relation exists if the primary key of the main relation is part of the primary key of the dependent relation.

To organize a fan-shaped relationship in the computer memory, an additional attribute called the communication address is introduced into the structure of the main and dependent relationships. The communication address values ​​work together to ensure that each dependent relation value corresponds in a fan relation S to a single value of the main relation R.

The value of a ratio when stored in computer memory is often called a record. The communication address is an attribute within a record that stores the starting address or number of the next record to be processed.

The connection between the values ​​of the dependent relation and the single value of the main relation in the simplest case is ensured as follows. The link address of a certain main relation record points to one of the dependent relation records (the value of the main relation link address is the starting address of that dependent relation record), the link address of the specified dependent relation record points to the next dependent relation record associated with the same main relation record, and etc. The last dependent relation record in this chain addresses the main relation record named above. This results in a ring structure of communication addresses called like a fan, where the role of the “handle” of the fan is played by the recording of the main relationship. In graphic illustrations, the communication address is depicted by an arrow directed from the communication address of a given record to the record whose starting address (number) serves as the value of this communication address.

There are standard conventions for how to include and exclude data in a fan relationship. The activation method can be characterized as automatic or non-automatic.

The automatic method indicates that when a new value of the main relation appears, it is immediately put into correspondence with some value of the dependent relation and forms a new element of the fan relation. Failure to comply with this rule is typical for the non-automatic method.

Exclusion methods can be mandatory or optional. In the mandatory method, once a value is included in the main relation, it becomes a permanent member of the main relation. It can be updated, but cannot be removed from the relationship. The optional method allows you to remove any value from the underlying relationship.

From the analogy of the definitions of a fan relation and functional dependence, the statement follows: if there is a fan relation, then the key of the dependent relation functionally determines the key of the main relation, and vice versa, if the key of one relation functionally determines the key of the second relation, then the first relation can be dependent, and the second – main in some fan-like way.

In a network database diagram, relationships and fan relationships are often treated as files and connections, which allows the network structure to be viewed as a set of files

F = {F l ( X 1), F 2 (X 2), ..., F i(X i), ..., Fn(Xn)},

Where X i– key attributes in the file F i.

Additionally, a network structure graph is introduced IN with vertices ( X l, X 2 , ..., X i, ..., Xn). Arc<X i, Xj> in the column IN exists if X i is a part Xj And Fj[X i] is a subset F i. The last condition has the same meaning as the syntactic inclusion of relations in the relational data model. This assumes that the main file key is contained in the dependent file. Graph IN similar to the connection graph for a relational database.

Database DBA called acyclic, if between any two vertices on the graph IN there is at most one path. Two-layer networks are always acyclic .

For many files F acyclic database DBA operation is quite applicable

m(DBA) = F 1 & F 2 & ... & F i & ...& Fn,

called maximum intersection. Its analogue can be a sequence of connections in a relational database.

In network DBMSs, the number of sampling operations is quite large. The projection operation functions for a network DBMS perform the description of a network database subschema. A network database schema is a description of all relationships indicating the attribute composition and keys of each relationship, as well as fan relationships. In the application program, it is possible to declare part of the relationships of a network database, in each relationship - a certain subset of attributes (with the obligatory retention of key attributes) and only some fan relationships. The corresponding data description is called a subschema. Relationships, fan relationships, and attributes not specified in the subschema become inaccessible to the application program. Unlike the projection operation, the database corresponding to the subschema is not created physically, but by restricting access to the source database, which is defined in the schema.

The results of valid connections are actually recorded in the network DBMS using chains of communication addresses. Access to the results of a possible join starts from some main relation to a fan of values ​​in the corresponding dependent relation, the achieved key values ​​in the dependent relation are remembered and used for searching in some other main relation; from this main relationship a transition to a new dependent one is possible, etc.

Hierarchical model data has many similarities with the network data model; chronologically it appeared even earlier. Valid information structures in a hierarchical data model are relation, fan relation and hierarchical database. Unlike the previously discussed data models, in which it was assumed that the information mapping of one subject area is one database, the hierarchical model allows the mapping of one subject area into several hierarchical databases.

The concepts of relation and fan relation in the hierarchical data model do not change.

A hierarchical database is a set of relationships and fan relationships for which two restrictions are met:

1) there is a single relation, called the root, which is not dependent in any fan relation;

2) all other relations (except for the root) are dependent relations in only one fan relation.

The hierarchical database scheme is identical in composition to the network database. The above restrictions are supported by hierarchical DBMSs.

The constraint that is maintained in the hierarchical data model is that it is impossible to violate the requirements that appear in the definition of the hierarchical database. This limitation is ensured by a special arrangement of relation values ​​in the computer memory. Below we will look at one of the simplest implementations of laying out a hierarchical database.

It should be noted that there are various possibilities for passing through hierarchically organized values ​​in a linear sequence. The principle used for hierarchical databases is called end passage. Let's list its rules.

1. Starting from the first value of the root relation, the first values ​​of the corresponding relations at each level are listed, up to the last.

2. All values ​​in the fan relation where step 1 stopped are listed.

3. The values ​​of all fans of this fan relation are listed.

4. From the achieved level, the rise to the previous level occurs, and if it is possible to apply step 1, then the process is repeated.

A hierarchical database record is a set of values ​​containing one value of the root relation and all fans extending from it in accordance with the structure of the hierarchical database. In our example, one record consists of data related to one faculty (see Fig. 11).

For fan relations within a hierarchical database, the already known pattern is valid: if there is a fan relation, then the key of the dependent relation functionally determines the key of the main relation. And vice versa: if the key of one relation functionally determines the key of the second relation, then the first relation can be dependent, and the second – the main one in some fan relation.

In addition, the constraint that there is a single root relation in a hierarchical database translates into the requirement that the primary key of each non-root relation must functionally define the primary key of the root relation.

The algorithm for obtaining the structure of a hierarchical database was compiled by A.I. Mishenin.

When comparing data models, it is very difficult to separate factors that characterize the fundamental features of the model from factors associated with the implementation of these data models using specific DBMSs.

Considering the advantages and disadvantages of the most famous data models, it is worth noting a number of undoubted advantages of the relational approach:

– simplicity: in the relational model there is only one information structure, which formalizes the tabular presentation of data, familiar to economist users;

– theoretical justification: the presence of theoretically reasoned methods for normalizing relations and checking the acyclicity of the structure makes it possible to obtain databases with the required characteristics;

– data independence: changing the structure of a relational database, as a rule, leads to minimal changes in application programs.

Among the disadvantages of the relational data model are the following:

– low speed when performing a connection operation;

– high memory consumption for representing a relational database. Although design in 3NF is designed for minimal redundancy (each fact is represented in the database once), other data models under the same conditions provide less memory consumption. For example, the length of a communication address is usually much shorter than the length of an attribute value.

The advantages of a hierarchical data model are:

– simplicity: although the model uses three information structures, the hierarchical principle of subordination of concepts is natural for many economic tasks (for example, for organizing statistical reporting);

– minimal memory consumption: for tasks that can be implemented using any of the three data models, the hierarchical model allows you to obtain a representation with the minimum required memory.

Disadvantages of the hierarchical model:

– non-universality: many important options for data interconnection cannot be implemented using a hierarchical model without increasing redundancy in the database;

– admissibility of only the navigation principle of access to data;

– data is accessed only through the root relation.

The following advantages of the network data model should be noted:

– universality: the expressive capabilities of the network data model are the most extensive in comparison with other models;

– the ability to access data through the values ​​of several relations (for example, through any main relations).

The disadvantages of the network data model include:

– complexity, i.e. the abundance of concepts, variants of their relationships and implementation features;

– admissibility of only the navigation principle of access to data.

The results obtained for acyclic databases suggest that acyclic relational databases, two-level network databases and a hierarchical database without logical connections have equivalent information presentation capabilities.

The analysis of data models did not address the problem of ordering values ​​in database relationships. For the relational model, this ordering is optional from a theoretical point of view, but in the other two models it is widely used to improve the efficiency of query implementation.

The final choice of a data model is influenced by many additional factors, for example, the availability of well-proven DBMSs, the qualifications of application programmers, the size of the database, etc.

Recently, relational DBMSs have taken a predominant position as a means of developing electronic information systems. The disadvantages of the relational model are compensated by the increase in speed and memory resources of modern computers. Due to the processes of decentralization of management in the economy, many EIS databases have a simple structure that is easily transformed into understandable systems of tables (relations).

Test questions and assignments

1. List the most famous types of data models.

2. Explain hierarchical and network data models. What are their similarities and differences?

3. Describe the relational model.

4. Describe the binary model and its scope.

5. What are the specifics of semantic networks and their purpose?

6. List information constructs for various technologies.

7. Name the components of the relational data model.

8. Define tuple and relation.

9. In what ways can relational processing be described?

10. Reveal the essence of the procedural description of data processing processes.

11. Explain the declarative approach to processing relational databases.

12. What is normalization of relations?

13. How many attributes are there in the simplest functional dependency?

14. Define the functional dependence of attributes in terms of the relational approach.

15. What is a probable relation key?

16. What is a primary key? What is another name for it?

17. Explain the patterns for many functional dependencies.

18. Describe the second and third normal forms of relationships.

19. Explain access to a relational database.

20. Name the information structures in the network model.

21. What is a “fan attitude”?

22. Give a definition of two-level networks.

23. Define multi-level networks.

24. What is a “communication address”?

25. What is called a “fan”?

26. What components does a network database schema contain?

27. What standard conventions do you know about how to include and exclude data in a fan relationship?

28. What are files and connections?

29. What is “maximum intersection”?

30. Name the information structures in the hierarchical model.

31. Define a hierarchical database.

32. Tell us about the rules of end passage.

33. Define a hierarchical base record.

34. Name the advantages and disadvantages of the relational approach.

35. List the advantages and disadvantages of the hierarchical model.

36. Describe the strengths and weaknesses of the network data model.

38. Complete tasks 2.1–2.20 on operations on relations from the workshop.

39. Complete tasks 2.21–2.32 on the topic “Functional dependencies and keys” from the workshop.

40. Complete tasks 2.33–2.60 on the topic “Normal forms of relationships” from the workshop.

41. Complete tasks 2.61–2.71 on the topic “Acyclic Databases” from the workshop.

42. Complete tasks 2.72–2.93 on the topic “Network and hierarchical data models” from the workshop.

Types of database data models

Data organization models. Network, relational, hierarchical models.

The core of any database is the data model. Using a data model, domain objects and the relationships between them can be represented.

Data model is a set of data structures and their processing operations. Let's look at three main types of data models: hierarchical, network and relational.

Types of database data models

Hierarchical The database model is depicted as a tree. Tree nodes represent a collection of data, such as logical records.

Hierarchical model is a collection of elements arranged in the order of their subordination from general to specific and forming a tree (graph) inverted in structure.

The basic concepts of a hierarchical structure include level, node, and relationship. Knot is a collection of data attributes that describe an object. In a hierarchical tree diagram, nodes are represented as vertices in the graph. Each node at a lower level is connected to only one node at a higher level. A hierarchical tree has only one vertex, not subordinate to any other vertex and located at the topmost - first level. Dependent (slave) nodes are at the second, third, etc. levels. The number of trees in the database is determined by the number of root records. For each database record, there is only one hierarchical path from the root record.

Network DB models correspond to a wider class of management objects, although they require additional costs for their organization.

In the network structure with the same basic concepts (level, node, connection), each element can be connected to any other element.

Relational The database model represents objects and relationships between them in the form of tables, and all operations on data are reduced to operations on these tables. Almost all modern DBMSs are based on this model. This model is more understandable and “transparent” for the end user of the data organization.

Relational model presents the data objects and connections between them in the form of tables, while the connections are also considered as objects. All rows that make up a table in a relational database must have a primary key. All modern DBMS tools support the relational data model.

This model is characterized by simplicity of the data structure, user-friendly tabular representation and the ability to use the formal apparatus of relational algebra and relational calculus for data processing.

Each relational table is a two-dimensional array and has the following properties:

1. Each table element corresponds to one data element.

2. All columns in the table are homogeneous, i.e. all elements in a column have the same type and length.

3. Each column has a unique name.

4. There are no identical rows in the table;

5. The order of rows and columns can be arbitrary.

The classification of data models is based on concepts of the interconnection of objects. There can be four types of different relationships between database tables: “one to one”; "one to many"; "many to many".

With respect "one to one » At each moment of time, one record of table “1” corresponds to no more than one record of table “2”. For example, one client corresponds to only one hotel room. This type of relationship is not used very often, since such data can be placed in one table. This relationship is used to divide very wide tables, for example, to divide a table with information about a company’s employees into two - official and personal information.

Connection with the attitude " one to many» characterizes the fact that one instance of information object “1” corresponds to 0,1,2 or more instances of object “2”. Such a relationship exists, for example, between the “Suppliers” and “Products” tables, i.e. Each supplier may sell different products, but each product has a single supplier.

Attitude " many to many» assumes that at any given time one record of table “1” corresponds to several instances of table “2” and vice versa. An example is the connection between the “Client” and “Bank” information objects. One client stores funds in many banks. One bank serves many clients. The relationship is implemented using a third (linking) table, the key of which consists of at least two fields that are foreign key fields in the source tables.

There are three main types of data models.

Hierarchical model. It assumes the organization of data in the form of a tree structure. A tree is a hierarchy of elements. At the topmost level of the structure is the root of the tree. One tree can have only one root, the rest are nodes called child nodes. Each node has a source node above it.

A hierarchical database represents both a set of relations and fan relations for which two restrictions are observed: there is a single relation, called the root, which is not dependent on any fan relation; all other relations (except the root) are dependent relations in only one fan relation.

A hierarchical database record is a set of values ​​containing one root relation value and all the fans accessible from it. In our example, the record consists of data related to one faculty.

Network model. The model is based on network structures in which any element can be connected to any other element. Information structures in the model are relationships and fan relationships. The latter are divided into basic and dependent. Fan attitude W(R,S) called a relationship pair R And S and the relationship between them, provided that each value S associated with a single meaning R. Attitude R is called the original (basic), and S- generated (dependent).

An additional attribute called the link address is introduced into the structure of the main and dependent relations, which ensures that each value of the dependent relation corresponds S with a single value of the main relation R. The communication address stores the starting address or number of the next record to be processed. The ring structure of communication addresses is called like a fan. The role of the “handle” of the fan is played by the recording of the main relationship.

The disadvantage of the data models discussed above is that when adding new vertices or establishing new connections, problems arise in unloading data from the database and loading it into a new structure. This may result in loss of data or occurrence of undefined data values.

Relational model. The data structure of this model is based on the apparatus of relational algebra and normalization theory. The model assumes the use of two-dimensional tables (relations).

Limitations on relational model relationships : each table element is a simple data element; there are no identical rows in the table; columns (fields) are assigned unique names; all rows of the table have the same structure; In a table, the order of rows and columns is arbitrary.

The relationship between tables is carried out through the values ​​of one or more matching fields. Each row of a table in a relational database is unique. To ensure row uniqueness, keys are used that contain one or more table fields. Keys are stored in an organized manner, allowing direct access to table records during searches.