Database design. Database structure: building blocks. Basic concepts about databases and subdatabases

Before you start creating a database, you need to spend some time working on it. design.

The main goal of database design is to reduce the redundancy of stored data, and therefore save the amount of memory used, reduce the cost of multiple update operations of redundant copies and eliminate the possibility of inconsistencies due to storing information about the same object in different places . A so-called “pure” database project (“every fact in one place”) can be created using the relationship normalization methodology. Normalization should be used at the final verification stage of database design.

Poor design of the database structure almost always leads to wasted time spent on its further processing. Experienced developers spend as much time designing databases as they do creating them. In general, database development includes the following stages:

1. Determine the purpose of the database.

2. Deciding what input data the database should contain.

3. Determine the source database tables.

4. Determining the fields that will be included in the tables and selecting fields that contain unique values.

5. Assigning relationships between tables and final review of the resulting structure.

6. Creating tables, linking them together and experimentally filling the database with trial data.

7. Creation of forms, reports and queries for operations with entered data.

Determining the purpose of the database

The development of every database begins with an examination of the problem it is intended to solve or the need it is intended to satisfy.

As an example, let's try to create the simplest database of a fiction library “Library”. The database is designed to store data about books purchased by the library, information about the location of individual copies of each publication and information about readers.

Selecting information to include in the database

To maintain library catalogs, organize searches for required books and library statistics, information must be stored in the database, most of which is placed in annotated catalog cards. An analysis of requests for literature shows that to search for suitable books (by topic, author, publisher, etc.) and select the right one (for example, by abstract), the following should be highlighted attributes index card:

2. Title of the book.

3. Place of publication (city).

4. Publisher (name of publishing house).

5. Year of manufacture.

6. Abstract.

Attributes that make it possible to characterize the storage locations of individual copies of books include:

1. Room number (room for storing books).

2. Shelf number in the room.

3. Shelf number on the rack.

4. Number (inventory number of the book).

5. Date of purchase.

6. The date of placement of a specific book in a specific place.

7. Date of removal of the book from the designated place.

Attributes that help characterize readers include:

1. Library card (form) number.

2. Reader's last name.

3. Reader's name.

4. Reader's middle name.

5. Reader's address.

6. Reader's phone number.

7. Date of issue of a specific book to the reader.

8. The period for which a particular book is issued to the reader.

9. Book return date.

Defining Source Tables

Analysis of the objects and attributes defined above allows us to determine the following tables for database construction for the designed database:

2. Books. The table is designed to store information about books.

3. Publishers.The table is intended to store information about publishers.

4. Storage. The table is intended to describe where books are stored.

5. Issue.The table is intended to store information about issued books.

6. Readers.The table is intended to store information about library readers.

Selecting required table fields

Having determined the set of tables included in the database, you need to think about what information about each object will be included in each of the tables. Each field must belong to one separate table. At the same time, the information in each field must be structurally elementary, that is, it must be stored in the fields in the form of the smallest logical components.

Based on the above, we determine fields in the selected tables and type stored data.

Books:

· book code– a numeric field designed to uniquely identify each specific book in the database;

· book title

· annotation- text field;

· publication date;

· date of admission to the library;

· storage.
Publishers:

· publisher code– a numeric field designed to uniquely identify each specific publisher in the database;

· publishing house name– character field, no more than 256 characters;

· city where the publishing house is located– character field, no more than 25 characters.

Storage:

· location code– a numeric field designed to uniquely identify each specific shelf in the database;

· room number– numeric field;

· rack number– numeric field;

· shelf number– numeric field.

Issue:

· issue code– a numeric field designed to uniquely identify each specific issue in the database;

· issued book number– numeric field;

· reader code– numeric field;

· date of issue;

· date of issue(amount of days);

· return date.

Readers:

· library card number– a numeric field designed to uniquely identify each specific reader in the database;

· surname

· Name– character field, no more than 50 characters;

· surname– character field, no more than 50 characters;

· address– character field, no more than 256 characters;

· telephone– character field, no more than 20 characters.

Selecting unique fields

In a relational database, tables can be related to each other. This relationship is established using unique fields. Unique fields– these are fields in which values cannot be repeated. For example, the series and number of a passport uniquely identifies any person holding a passport. A field (or combination of fields) that uniquely identifies a record in a table is called primary key.The primary key field can also be the serial number of an entry in the catalogue, the personnel number of an enterprise employee, or the article number of a product in retail trade.

For our database, the primary keys are the following fields:

· Books – book code.

· Publishers – publisher code.

· Storage – location code.

· Issue – issue code.

· Readers – number of ticket.

Assigning relationships between tables

Inter-table relationships link two tables using a common field that exists in both tables. There are three types of such connections:

· one to one– each record of table A cannot be associated with more than one record of table B;

· one-to-many– one record in table A can be associated with many records in table B (for example, there can be many students in each class);

· many-to-many– each record in table A can be associated with many records in table B, and each record in table B can be associated with many records in table A (for example, each student can have several teachers, and each teacher can have many students).

Relational databases do not allow many-to-many relationships to be created directly. However, in real life Such relationships occur very often, so they are implemented through auxiliary tables, linking several tables with one-to-many relationships.

In order to link one table to another, you need to enter the primary key field from the first table into the second table, i.e. enter into second table external key. The relationship between two tables is performed by connecting the primary key of the main table (located on the “one” side of the relationship) to the same foreign key field of the related table (located on the “many” side of the relationship). A foreign key field in a related table must be the same data type as the primary key in the parent table, with one exception. If the main table's primary key has a Counter data type, then the foreign key field in the related table must have a Numeric data type.

In our database we will establish the following types of relationships between tables:

1. Authors - Books. Here's the connection many-to-many, any author can have more than one book, and any book can be written by more than one author. Therefore, we introduce an auxiliary table “Authors–books” with the following fields:

· book code.

2. Books – Publishers. Here's the connection many-to-many, any book can be published by several publishing houses, and any publishing house publishes more than one book. Therefore, we introduce another auxiliary table “Books-publishing houses” with the following fields:

· book code;

· publisher code.

3. Storage - Books. Here's the connection one-to-many, many books can be placed on one shelf, but any book can only be on one shelf in storage. Therefore, we define the “Storage Location” field in the “Books” table as a foreign key, and connect the “Storage” and “Books” tables with the primary key “Location Code” and the foreign key “Storage Location”.

4. Books – Issue. Here's the connection one-to-many, i.e. the same book can be issued several times per year different dates different readers. Therefore, we define the “Issue book number” field in the “Issue” table as a foreign key, and connect the “Books” and “Issue” tables with the primary key “Book code” and the foreign key “Issue book number”.

5. Readers - Issue. Here's the connection one-to-many, i.e. the same book can be issued several times to different readers at different times. Therefore, we define the “Reader Code” field in the “Issue” table as a foreign key, and connect the “Readers” and “Issue” tables with the primary key “Library Card Number” and the foreign key “Reader Code”.

Normalization of relations

Having finished designing tables and identifying the relationships that exist between them, you need to carefully double-check the resulting structure before starting to create tables and enter information. Normalization of relations allows you to significantly reduce the amount of stored information and eliminate anomalies in the organization of data storage.

Rule 1: Each table field must represent a unique type of information.

The database we designed has no fields in different tables containing the same information (except for foreign keys).

Rule 2: Each table must have a unique identifier, or primary key, which can consist of one or more fields.

In the database we designed, all tables (with the exception of the auxiliary “Authors - books” and “Publishers - books”) contain a primary key.

Rule 3: For each primary key value, the values in the data columns must relate to the table object and fully describe it.

This rule is used in two ways. First, the table should not contain data that is not related to the object defined by the primary key. For example, although each book requires information about its author, the author is an independent object, and data about him should be in the appropriate table. Secondly, the data in the table must fully describe the object.

Rule 4: It should be possible to change the values of any field (not included in the primary key) without affecting the data of other fields.

The last rule allows you to check whether problems will arise when changing data in tables. Since in the database we designed, the data contained in different fields of the tables is not repeated anywhere, we have the opportunity to adjust the values of any fields (with the exception of primary keys).

Filling the database, creating forms and reports

To determine how well the database structure corresponds to the task at hand and how convenient it is to work with this database, you need to enter a few simple entries. Typically, after this you have to return to the structure of the database and adjust it in accordance with what results were obtained during such a test.

On final stage create forms for entering information into the database, reports for displaying information and queries with the help of which information is retrieved from several tables. If the database is intended to be transferred to other users, then, most likely, it is necessary for someone from outside to check how convenient it is to work with forms and reports.

Received data schema The developed database in MS Access is shown in Fig. 4.1.

Rice. 4.1. Data schema of the developed database in Microsoft Access

Control questions

1. Define an information system.

2. Explain the concept of a database.

3. What is a subject area?

4. Define DBMS.

5. What is a data model?

6. Explain the basic principles of the relational data model.

7. Explain the features Microsoft DBMS Access.

8. What are the main objects of the base Access data?

9. Explain the structure of an Access table.

10. Explain the concepts: request, form, report, data access page, macro, module.

11. What are the main stages of database design?

12. How is the information included in the database selected?

13. Explain the concepts: primary key, foreign key.

14. What is the purpose of relationships between tables?

15. Explain the main types of relationships between tables.

16. What is the normalization of database relationships?

Database Design

Basic concepts about databases and DBMS

Information system (IS) is a system built on the basis of computer technology, designed for storing, searching, processing and transmitting significant amounts of information, having a certain practical scope of application.

Database- This is IP that is stored electronically.

Database (DB)– an organized collection of data intended for long-term storage external memory COMPUTER, constant update and use.

Databases are used to store and search large amounts of information. Database examples: Notebook, dictionaries, reference books, encyclopedias, etc.

Database classification:

1. According to the nature of the stored information:

- Factual – contain brief information about the described objects, presented in a strictly defined format (card files, for example: database of the library’s book collection, database of the institution’s personnel),

- Documentary – contain documents (information) of various types: text, graphic, audio, multimedia (archives, for example: reference books, dictionaries, databases of legislative acts in the field of criminal law, etc.)

2. By data storage method:

- Centralized (stored on one computer),

- Distributed (used in local and global computer networks).

3. According to the data organization structure:

- Relational (tabular),

- Non-relational.

The term “relational” (from the Latin relatio – relationship ) indicates that such a data storage model is built on the relationship of its constituent parts. Relational the database is essentially a two-dimensional table. Each row of such a table is called a record. The columns of the table are called fields: each field is characterized by its name and data type. A database field is a table column containing the values of a specific property.

Properties of the relational data model:

Each table element is one data element;

All table fields are homogeneous, i.e. have one type;

There are no identical entries in the table;

The order of records in the table can be arbitrary and can be characterized by the number of fields and data type.

Hierarchical is called a database in which information is ordered as follows: one element is considered the main one, the rest are subordinates. IN hierarchical In the database, records are arranged in a certain sequence, like the steps of a ladder, and data can be searched by sequentially “descent” from step to step. This model is characterized by such parameters as levels, nodes, connections. The principle of operation of the model is such that several nodes of a lower level are connected using a connection with one node of a higher level.

Node – information model of an element located on this level hierarchy.

Properties of the hierarchical data model:

Several lower-level nodes are connected to only one higher-level node;

A hierarchical tree has only one vertex (the root) and is not subordinate to any other vertex;

Each node has its own name (identifier);

There is only one path from the root record to the more private data record.

The hierarchical database is the Catalog Windows folders, which you can work with by launching Explorer. The top level is occupied by the Desktop folder. At the second level there are the folders My Computer, My Documents, Network Neighborhood and Recycle Bin, which are descendants of the Desktop folder, being twins. In turn, the My Computer folder is an ancestor in relation to third-level folders, disk folders (Disk 3.5 (A:), C:, D:, E:, F:) and system folders (Printers, Control Panel, etc. .).

Network is called a database in which horizontal links are added to the vertical hierarchical relationships. Any object can be a master and a slave.

The network database is actually the World Wide Web computer network Internet. Hyperlinks link hundreds of millions of documents together into a single distributed network database.

Software designed to work with databases is called database management system(DBMS). DBMS are used for orderly storage and processing of large volumes of information.

Database management system(DBMS) is a system that provides search, storage, correction of data, and generation of responses to queries. The system ensures data safety, confidentiality, movement and communication with other software.

The main actions that a user can perform using the DBMS:

Creating a database structure;

Filling the database with information;

Changing (editing) the structure and content of the database;

Searching for information in the database;

Data sorting;

Database protection;

Checking the integrity of the database.

Modern DBMS make it possible to include in them not only textual and graphic information, but also sound fragments and even video clips.

The ease of use of the DBMS allows you to create new databases without resorting to programming, but using only built-in functions. DBMS ensure the correctness, completeness and consistency of data, as well as easy access to them.

Popular DBMS - FoxPro, Access for Windows, Paradox.

Thus, it is necessary to distinguish between databases themselves (DBs) - ordered sets of data, and database management systems (DBMS) - programs that manage the storage and processing of data. For example, Access application included in office suite Microsoft programs Office is a DBMS that allows the user to create and process tabular databases.

Principles of designing control systems databases follow from the requirements that a database organization must satisfy:

- Productivity and availability. Requests from the user by the database are satisfied at the speed required to use the data. The user quickly receives data whenever he needs it.

- Minimum costs. Low cost of storing and using data, minimizing the cost of making changes.

- Simplicity and ease of use. Users can easily find out and understand what data is available to them. Access to data should be simple, exclusive possible mistakes from the user's side.

- Easy to make changes. The database can grow and change without disrupting existing uses of the data.

- Possibility of search. A database user can make a variety of queries regarding the data stored in it. To implement this, a so-called query language is used.

- Integrity. Modern databases can contain data shared by many users. It is very important that during the work the data elements and connections between them are not broken. In addition, hardware errors and various types of random failures should not lead to irreversible data loss. This means that the data management system must contain a data recovery mechanism.

- Security and privacy. Data security means the protection of data from accidental or intentional access to it by unauthorized persons, from unauthorized modification (change) of data or its destruction. Privacy is defined as the right of individuals or organizations to decide when, how, and how much information can be shared with other individuals or organizations.

Below is an example of one of the most common database management systems - Microsoft Access is part of the popular Microsoft package Office - We will learn about basic data types, how to create databases, and how to work with databases.

Database Design

Like anyone software, the database has its own life cycle (LCD). The main component in life cycle DB is the creation of a unified database and the programs necessary for its operation.

LCBD includes the following main stages:

1. Planning for database development;

2. Determination of system requirements;

3. Collection and analysis of user requirements:

4. Database design:

Conceptual Database Design - Creation conceptual model data, that is information model. Such a model is created without focusing on any specific DBMS and data model. Most often, the conceptual database model includes: description information objects, or concepts of the subject area and connections between them; description of integrity constraints, i.e. requirements for acceptable data values and relationships between them;

Logical database design – creating a logical data model; creating a database schema based on a specific data model, such as a relational data model. For a relational data model, a logical model is a set of relationship diagrams, usually specifying primary keys, as well as "links" between relationships, which are foreign keys.

The transformation of a conceptual model into a logical model is usually carried out according to formal rules. This stage may be to a large extent automated.

At the logical design stage, the specifics of a specific data model are taken into account, but the specifics of a specific DBMS may not be taken into account.

Physical database design - creating a database schema for a specific DBMS, creating a description of the DBMS. The specifics of a particular DBMS may include restrictions on the naming of database objects, restrictions on supported data types, etc. In addition, the specifics of a particular DBMS during physical design include the choice of solutions related to the physical data storage environment (choice of disk memory management methods, division of the database into files and devices, data access methods, development of data protection tools), creation of indexes, etc. .;

5. Application development:

Transaction design (group SQL statements(a set of commands) executed as a whole);

Design user interface;

6. Implementation;

8. Testing;

9. Operation and maintenance:

Functional analysis and support original version DB;

Adaptation, modernization and support for redesigned options.

Database Design– the process of creating a database schema and defining necessary restrictions integrity (compliance of the information in the database with its internal logic, structure and all explicitly specified rules).

Main tasks of database design:

Ensuring that all necessary information is stored in the database.

Ensuring the ability to obtain data for all necessary requests.

Reduce data redundancy and duplication.

Ensuring database integrity.

Database design steps

The design process includes the following stages:

1. Infological design.
2. Determining the requirements for the operating environment in which the information system will operate.
3. Selecting a database management system (DBMS) and other tools software.
4. Datalogical (logical) database design.
5. Physical design of the database.

In the first stage, the developer (database administrator) combines private views of the database contents obtained from user interviews with his own views of the data that may be needed in future applications, creating generalized informal description of a database. This description is done using natural language, mathematical formulas, tables, graphs and other tools that are understandable to all people working on database design. This description of the subject area is called infological data model.

The infological data model is a human-oriented model and is completely independent of the physical parameters of the data storage environment. Such a data storage medium may be human memory rather than a computer. Therefore, the information model does not change until some changes in the real world require appropriate changes to be made to it so that this model continues to reflect the subject area.

The remaining models, datalogical and physical, are computer-oriented. With their help, the DBMS allows programs and users to access stored data only by their names, without worrying about the physical location of this data. The necessary data is found by the DBMS on external storage devices using physical data model.

Since the specified access is carried out using a specific DBMS, the models must be described in the data description language of this DBMS. This description is called datalogical data model.

Three-level architecture (infological, datalogical and physical layers) allows you to ensure the independence of stored data from the programs that use them. The developer can, if necessary, rewrite the stored data to other storage media or reorganize its physical structure, changing only physical model data. The DBA can connect any number of new users (new applications) to the system, adding, if necessary, to the datalogical model. These changes to the physical and datalogical models will not be noticed by existing users of the system (they will be “transparent” to them), just as new users will not be noticed. Therefore, data independence enables the development of a database system without disrupting existing applications.

Infological (information-logical) model. The goal of the infological design stage is to obtain semantic (conceptual) models that reflect the subject area and information needs users. Therefore, this stage is also called semantic modeling. Semantic modeling is a modeling of data structure based on the meaning of this data.

Concept "Subject area"- basic in database theory and does not have a strict definition. It follows from the concepts of “object” and “subject”. Subject area(BY)- Part real world, to be studied with the aim of organizing management and, ultimately, automation. Software seems to be a lot fragments, which are characterized by many objects, many processes using objects, as well as many users, characterized by a single view of the subject area.

Object called phenomenon outside world. This is either something that really exists - a person, a product, a product, or a process - birth registration, receipt of goods, production of products. Each object has a huge amount properties.

Examples.

An object " Human"has the properties: height, name, date of birth...,

an object - " Product"has properties: quality, date of manufacture, appearance….

There are numerous connections between objects. For example:

· Human buys, sells, produces Product
· Product created, bought, sold Human.

Item - a model of a real object, in which only the properties and connections allocated for the IS are recorded. The totality of selected items forms object kernel subject area, and the totality of their relationships - structure of a fragment of reality . That. the concept of “Subject Domain” corresponds to the consumer’s point of view on the object core: it identifies only those objects, properties of objects and connections between objects that are valuable for the IS and should be stored in the database.

All actions to identify the core of the subject area are carried out at the stage of IS analysis.

The object core of the system does not remain constant during the life cycle of an IS: objects disappear and appear, their properties and relationships change. The chains of these changes recorded over time are called trajectories of the subject area, and the set of general properties of the trajectory is domain semantics

Available whole line domain modeling techniques. One of the currently most popular techniques is based on the use of graphical diagrams that include a small number of heterogeneous ERD (Entity-Relationship Diagrams) components. In Russian-language literature these diagrams are called “object - relationship” or “essence - connection”.

The ERD model was proposed in 1976. Peter Ping-Sheng Chen. Subsequently, many authors developed their own versions of such models: Martin notation, IDEF1X notation, Barker notation), but they are all based on graphic charts, proposed by Chen.

Most modern design approaches are based on the use of variations of the ER model. relational databases data.

In fact, all variants of entity-relationship diagrams come from the same idea - a drawing is always clearer text description. All such diagrams use a graphical representation of domain entities, their properties (attributes), and relationships between entities.

We will get acquainted with ER diagrams in Barker notation, which is quite easy to understand the main ideas.

Basic concepts of ER diagrams. The main concepts of the ER model are entity, relationship and attribute.

For greater expressiveness and better understanding, the entity name can be accompanied by examples of specific objects of this type.

Definition 1. Essence is a real or imaginable object, information about which must be stored and accessible. Entities can be people, places, planes, flights, taste, color, etc.

Each entity must have a name expressed by a singular noun. In this case, the name of the entity is the name of the type, and not of some specific instance of this type. The concept of entity type refers to a set of homogeneous individuals, objects, events or ideas acting as a whole.

Examples of entities can be such object classes as “Supplier”, “Employee”, “Invoice”.

Each entity in the model is depicted as a rectangle containing the entity name:

Definition 2. Entity instance is a specific representative of a given entity.

For example, a representative of the “Employee” entity can be “Employee Ivanov”.

Entity instances must be distinguishable, i.e. entities must have some properties that are unique to each instance of that entity.

Definition 3. Entity attribute is a named characteristic of an entity. Its name must be unique for specific type entity, but can be the same for different types of entities (for example, COLOR can be defined for many entities: DOG, CAR, PAINT, etc.). Attributes are used to define what information should be collected about an entity. Examples of attributes for the CAR entity are TYPE, MAKE, LICENSE PLATE, COLOR, etc.

Here too there is a distinction between attribute type and instance. The COLOR attribute type has many instances or values: Red, Blue, Banana, White Night, etc., but each entity instance is assigned only one attribute value.

There is no absolute difference between entity types and attributes. An attribute is such only in relation to the entity type. In another context, an attribute may act as an independent entity. For example, for an automobile plant, color is only an attribute of the production product, but for a paint and varnish factory, color is an entity type.

Each attribute is provided with a name that is unique within the entity. The name of the attribute must be expressed as a singular noun (possibly with characterizing adjectives).

Examples of attributes of the “Employee” entity can be such attributes as “Personnel number”, “Last name”, “First name”, “Patronymic”, “Position”, “Salary”, etc.

Attributes are depicted within a rectangle defining the entity:

Attributes can be classified into one of three different types: descriptive, indicative, or auxiliary.

Descriptive attributes represent facts intrinsic to each instance of an entity.

Pointing attributes are used to give a name or designation to instances of an entity.

Auxiliary attributes are used to associate an instance of one entity with an instance of another. Attributes are subject to strictly defined rules.

Definition 4. Entity Key - minimum set attributes whose values can be used to uniquely find the required instance of an entity. Minimality means that excluding any attribute from the set does not allow the entity to be identified by the remaining ones.

For example, for an entity Schedule the key is the attribute Flight_number or set: Point of departure, Departure time And Destination(provided that one plane flies from point to point at any given time).

An entity can have several different keys.

Key attributes are shown underlined on the diagram:

Definition 5. Connection - this is some kind of association between two entities. One entity can be connected to another entity or to itself. Relationships allow one entity to find other entities related to it.

If the purpose of the database was only to store individual, unrelated data, then its structure could be very simple. However, one of the main requirements for organizing a database is to ensure the possibility of finding some entities by the values of others, for which it is necessary to establish between them certain connections. And since real databases often contain hundreds or even thousands of entities, theoretically more than a million connections can be established between them. The presence of such a multitude of connections determines the complexity of information models.

For example, connections between entities can be expressed by the following phrases - “An EMPLOYEE can have several CHILDREN”, “Each EMPLOYEE must be enrolled in exactly one DEPARTMENT”.

Graphically, the relationship is depicted by a line connecting two entities:

Each link has two ends and one or two names. The name is usually expressed in an indefinite verbal form: “to have”, “to belong”, etc. Each name refers to its own end of the connection. Sometimes names are not written because they are obvious.

Each link can have one of the following types of communication :

Communication type one to one means that one instance of the first entity (left) is associated with one instance of the second entity (right). A one-to-one relationship most often indicates that we actually have only one entity, incorrectly divided into two.

Communication type one-to-many means that one instance of the first entity (left) is associated with several instances of the second entity (right). This is the most commonly used type of communication. The left entity (on the "one" side) is called parental , right (from the “many” side) - subsidiary . (see pic. for graphical representation of the connection)

Communication type many-to-many means that each instance of the first entity can be associated with multiple instances of the second entity, and each instance of the second entity can be associated with multiple instances of the first entity. The many-to-many type of relationship is temporary type of communication acceptable in the early stages of model development. In the future, this type of relationship must be replaced by two one-to-many relationships by creating an intermediate entity.

Each connection can have one of two communication modalities :

Modality" Maybe may be related with one or more instances of another entity, or maybe not related not a single copy.

Modality" must " means that an instance of one entity must be associated with at least one an instance of another entity.

Communication may have different modality from different ends.

The described graphical syntax allows definitely read the diagrams using the following phrase structure:

<Каждый экземпляр СУЩНОСТИ 1> <МОДАЛЬНОСТЬ СВЯЗИ> <НАИМЕНОВАНИЕ СВЯЗИ> <ТИП СВЯЗИ> <экземпляр СУЩНОСТИ 2>.

Each link can be read either from left to right or from right to left. For example, the relationship presented in Figure 4 above reads like this:

From left to right: "each employee can have several children."

From right to left: “Each child must belong to exactly one employee.”

Normal forms of ER circuits. As in relational database diagrams, ER diagrams introduce the concept of normal forms, and their meaning closely matches the meaning of relational normal forms. We will give only very brief and informal definitions of the first three normal forms.

IN first normal form of the ER diagram, duplicate attributes or groups of attributes are eliminated, i.e. implicit entities “disguised” as attributes are identified.

In second normal form, attributes that depend only on part of the unique identifier (entity key) are eliminated. This part of the unique identifier identifies an individual entity.

IN third normal The form eliminates attributes that depend on attributes that are not included in the unique identifier (entity key). These attributes are the basis of a single entity.

At correct definition entities, the resulting tables will immediately be in 3NF. The main advantage of the method is that the model is built by successive refinements of the initial diagrams.

Getting the relational schema from the ER schema:

Step 1. Every simple entity turns into a table. A simple entity is an entity that is not a subtype and has no subtypes. The entity name becomes the table name.

Step 2. Each attribute becomes a possible column with the same name; a more precise format can be selected. Columns matching not mandatory attributes, may contain undefined values; columns corresponding to required attributes cannot.

Step 3. The components of the entity's unique identifier become the table's primary key. If there are several possible unique identifiers, the most used one is chosen. If the unique identifier includes relationships, a copy of the unique identifier of the entity at the far end of the relationship is added to the number of primary key columns (this process can continue recursively). These columns are named using relationship end names and/or entity names.

Step 4. Many-to-one (and one-to-one) relationships become foreign keys. Those. A copy of the unique identifier from the "one" end of the relationship is made, and the corresponding columns constitute the foreign key. Optional relationships correspond to nullable columns; mandatory relationships - for columns that do not allow null values.

Step 5. Indexes are created on the primary key (a unique index), foreign keys, and those attributes on which queries are primarily intended to be based.

Step 6. If there were subtypes in the conceptual scheme, then two ways are possible:

· all subtypes in one table(s)
· for each subtype - separate table(b)

Method (a) creates a table for the outermost supertype, and views can be created for subtypes. At least one column containing the TYPE code is added to the table; it becomes part of the primary key.

When using method (b), for each subtype of the first level (for lower ones - representations), the supertype is recreated using the UNION representation (from all subtype tables, common columns- supertype columns).

Everything in one table	Table - per subtype
Advantages
Everything is kept together Easy access to supertype and subtypes Fewer tables required	Subtyping rules are clearer Programs work only with the necessary tables
Flaws
Too general a solution Requires additional logic to handle different sets of columns and different constraints Potential bottleneck (due to blocking) Subtype columns should be optional Some DBMSs require additional memory to store null values	Too many tables Confusing columns in UNION view Potential performance loss when working through UNION Modifications are not possible on a supertype.

Step 7 There are two ways to work with exclusive relationships:

· generic domain(A)
· explicit foreign keys (b)

If the remaining foreign keys are all in the same domain, i.e. have a common format (method(a)), then two columns are created: the relationship identifier and the entity identifier. The link ID column is used to distinguish the links covered by the exclusion arc. The entity identifier column is used to store the unique identifier values of the entity at the far end of the corresponding relationship.

If the resulting foreign keys are not in the same domain, then explicit foreign key columns are created for each relationship covered by the exclusion arc; all of these columns can contain null values.

An example of developing a simple ER model. When developing ER models, we must obtain the following information about the subject area:

1. List of domain entities.
2. List of entity attributes.
3. Description of the relationships between entities.

ER diagrams are convenient because the process of identifying entities, attributes and relationships is iterative. Having developed the first approximate version of the diagrams, we refine them by interviewing subject matter experts. At the same time, the documentation in which the results of the conversations are recorded are the ER diagrams themselves.

Let's assume that we are faced with the task of developing an information system for a certain wholesale trading company. First of all, we must study the subject area and the processes occurring in it. To do this, we interview company employees, read documentation, study order forms, invoices, etc.

For example, during a conversation with a sales manager, it turned out that he (the manager) believes that the system being designed should perform the following actions:

· Store customer information.
· Print invoices for goods released.
· Monitor the availability of goods in the warehouse.

Let's select all the nouns in these sentences - these will be potential candidates for entities and attributes, and analyze them (we will highlight unclear terms with a question mark):

· Buyer
· Invoice is a clear candidate for the entity.
· Product- a clear candidate for the entity
· (?)Stock- In general, how many warehouses does the company have? If there are several, then it will be a candidate for a new entity.
· (?)Product availability- this is most likely an attribute, but an attribute of what entity?

An obvious connection immediately arises between the entities - “buyers can buy many goods” and “goods can be sold to many buyers.” The first version of the diagram looks like this:

After asking additional questions to the manager, we found out that the company has several warehouses. Moreover, each product can be stored in several warehouses and be sold from any warehouse.

Where should I place the entities “Invoice” and “Warehouse” and what should I link them to? Let us ask ourselves, how are these entities related to each other and to the entities “Buyer” and “Product”?

· Buyers purchase goods and receive invoices containing data on the quantity and price of the purchased goods.
· Each buyer can receive several invoices.
· Each invoice must be issued to one buyer.
· Each invoice must contain several goods (there are no empty invoices). Each product, in turn, can be sold to several buyers through several invoices.
· In addition, each invoice must be issued from a specific warehouse, and many invoices can be issued from any warehouse.

Thus, after clarification, the diagram will look like this:

infological attribute information display

It's time to think about entity attributes. Speaking with employees of the company, we found out the following:

· Each buyer is a legal entity and has a name, address, and bank details.
· Each product has a name, price, and is also characterized by units of measurement.
· Each invoice has a unique number, date of issue, a list of goods with quantities and prices, as well as the total amount of the invoice. The invoice is issued from a specific warehouse and to a specific buyer.
· Each warehouse has its own name.

Let's write down all the nouns that will be potential attributes again and analyze them:

· Entity- the term is rhetorical, we do not work with individuals. We don't pay attention.
· Buyer's name
· Address- a clear characteristic of the buyer.
· Bank details - a clear characteristic of the buyer.
· Name of product
· (?)The price of the product- it seems that this is a characteristic of the product. Does this characteristic differ from the price on the invoice?
· Unit- a clear characteristic of the product.
· Invoice number- a clear unique characteristic of the invoice.
· Invoice date- a clear characteristic of the invoice.
· (?)List of goods in the invoice- a list cannot be an attribute. You probably need to separate this list into a separate entity.
· (?)Quantity of goods in the invoice- this is an obvious characteristic, but a characteristic of what? This is a characteristic of not just a “product”, but a “product in the invoice”.
· (?)The price of the goods in the invoice- again, this should not just be a description of the product, but a description of the product in the invoice. But the price of the product has already been seen above - is it the same thing?
· Invoice amount- a clear characteristic of the invoice. This characteristic is not independent. The amount of the invoice is equal to the sum of the costs of all goods included in the invoice.
· Warehouse name- a clear characteristic of the warehouse.

During an additional conversation with the manager, it was possible to clarify various concepts prices It turned out that each product has a certain current price. This is the price at which the product is sold in this moment. Naturally, this price may change over time. The price of the same product in different invoices issued in different time, may be different. Thus there is two prices- the price of the goods in the invoice and the current price of the goods.

With the emerging concept of “List of goods in the invoice” everything is quite clear.

The entities "Invoice" and "Product" are related to each other by a relationship of type many-to-many. Such a connection, as we noted earlier, should be split into two one-to-many relationships. This requires additional essence.

This entity will be the “List of goods in the invoice” entity. Its connection with the entities “Invoice” and “Product” is characterized by the following phrases

- “each invoice must have several entries from the list of goods in the invoice”,

- “each entry from the list of goods in the invoice must be included in exactly one invoice”,
- "each product can be included in several records from the list of goods in the invoice",
- “each entry from the list of goods in the invoice must be associated with exactly one product.”

The attributes "Quantity of goods in the invoice" and "Price of the goods in the invoice" are attributes of the entity "List of goods in the invoice".

We will do the same with the connection connecting the entities “Warehouse” and “Product”. Let's introduce additional entity "Item in warehouse". The attribute of this entity will be “Quantity of goods in stock”. Thus, the product will be listed in any warehouse and its quantity in each warehouse will be different.

Now you can put all this into a diagram:

Conceptual and physical ER models. The example ER diagram developed above is an example concept diagram. This means that the diagram does not take into account features of a specific DBMS. From this conceptual diagram you can construct physical diagram, which will already take into account such features of the DBMS as permissible types and names of fields and tables, integrity restrictions, etc. A physical version of the above diagram might look, for example, like this:

In this diagram, each entity represents a database table, each attribute becomes a column of the corresponding table. Please note that in many tables, for example, "CUST_DETAIL" and "PROD_IN_SKLAD", corresponding to the entities "Invoice list record" and "Item in warehouse", new attributes have appeared that were not in the conceptual model - these are the key attributes of the parent tables , migrated into child tables in order to provide relationships between tables using foreign keys.

The resulting tables are in 3NF.

Entity-relationship diagrams allow you to use visual graphic symbols for modeling entities and their relationships.

Distinguish conceptual And physical ER diagrams. Conceptual diagrams do not take into account the specific features of specific DBMSs. Physical diagrams are built on conceptual ones and represent a prototype of a specific database. Entities defined in the conceptual diagram become tables, attributes become table columns (taking into account the data types and column names allowed for a given DBMS), connections are implemented by migration key attributes of parent entities and creating foreign keys.

More complex elements of the ER model. We focused only on the most basic and most obvious concepts of the ER data model. More complex elements of the model include the following:

· Subtypes and supertypes of entities. As in programming languages with developed type systems (for example, in object-oriented programming languages), the possibility of inheriting an entity type based on one or more supertypes is introduced.

An entity can be split into two or more mutually exclusive subtypes, each of which includes common attributes and/or relationships. These common attributes and/or relationships are explicitly defined once over high level. Subtypes can define their own attributes and/or relationships. In principle, subtyping can continue for more low levels, but experience shows that in most cases two or three levels are sufficient.

The entity on the basis of which subtypes are defined is called a supertype. Subtypes must form a complete set, i.e. any instance of a supertype must belong to some subtype. Sometimes, for completeness, it is necessary to define an additional subtype OTHERS.

Example: Supertype AIRCRAFT

How are you supposed to read this? From supertype: AIRCRAFT, which must be an AIRPLANE, HELICOPTER, BIRD CLICKER or OTHER AIRCRAFT. From the subtype: HELICOPTER, which belongs to the type of AIRCRAFT. From a subtype, which is also a supertype: AIRPLANE, which belongs to the type of AIRCRAFT and must be a GLIDER or MOTOR PLANE.

Sometimes it is convenient to have two or more different subtypes of an entity. For example, the essence of PERSON can be divided into subtypes based on professional characteristics (PROGRAMMER, MILKmaid, etc.), or perhaps based on gender (MAN, WOMAN).

· Many-to-many connections. Sometimes it is necessary to link entities in such a way that there may be multiple instances of the entity at both ends of the link (for example, all members of a cooperative jointly own the property of the cooperative). To do this, a type of “many-to-many” relationship is introduced.
· Specified degrees of connection. Sometimes it is useful to define the possible number of entity instances participating in a given relationship (for example, an employee is allowed to participate in no more than three projects at a time). To express this semantic constraint, it is allowed to indicate at the end of the connection its maximum or mandatory degree.
· Cascade deletions of entity instances. Some relationships are so strong (in the case of a one-to-many relationship, of course) that when you delete the reference entity instance (corresponding to the one end of the relationship), you must also delete all entity instances corresponding to the many end of the relationship. The corresponding requirement for "cascading deletion" can be formulated when defining an entity.
· Domains. As with the relational data model, it is useful to be able to define a potentially valid set of values for an entity (domain) attribute.

The most correct intuitive interpretation of the concept of a domain is to understand the domain as an admissible potential set of values of a given type. For example, the domain "Names" is defined on the base type of character strings, but its values can only include strings that can represent a name (in particular, such strings cannot begin with a soft character).

It should also be noted the semantic load of the domain concept: data are considered comparable only if they belong to the same domain. In our example, the domain values "Gap Numbers" and "Group Numbers" are of type integer, but are not comparable.

These and other more complex elements of the Entity-Relationship data model make it significantly more powerful, but at the same time make it somewhat more difficult to use.

Database Design Stages

It is impossible to create a database without a detailed description of it, just as it is not possible to make any complex product without a drawing and a detailed description of the technologies for its production. In other words, we need a project. Project It is generally accepted to consider a sketch of some device, which will later be translated into reality.

The database design process is a process of transitions from informal verbal description information structure subject area to a formalized description of subject area objects in terms of a certain model. The ultimate goal of design is to build a specific database. Obviously, the design process is complex and therefore it makes sense to divide it into logically completed parts - stages.

There are five main stages of database design:

1. Collection of information and system analysis of the subject area.

2. Infological design.

3. Selecting a DBMS.

4. Datalogical design.

5. Physical design.

Collection of information and system analysis of the subject area- this is the first and the most important stage when designing a database. It is necessary to carry out a detailed verbal description of the objects of the subject area and the real connections present between real objects. It is desirable that the description defines the relationships between objects in the subject area.

In general, there are two approaches to choosing the composition and structure of a subject area:

· Functional approach – is used when the functions of a certain group of persons and the sets of tasks for which this database is created are known in advance, i.e. minimal is clearly visible necessary set objects of the subject area for description.

· Subject approach– when the information needs of database customers are not clearly recorded and can be multidimensional and dynamic. IN in this case It is difficult to select a minimum set of domain objects. The description of the subject area includes such objects and relationships that are most characteristic and essential for it. At the same time, the database becomes subject-specific and is suitable for solving many problems (which seems the most tempting). However, the difficulty of universal coverage of the subject area and the impossibility of specifying user needs leads to redundant complex scheme A database that will be ineffective for some tasks.

System analysis must end detailed description information about objects of the subject area that should be stored in the database, the formulation of specific tasks that will be solved using this database with brief description algorithms for their solution, description of output and input documents when working with the database.

Infological design– a partially formalized description of objects in the subject area in terms of a certain semantic model.

Why is an information model needed, and how does it benefit designers? The fact is that the design process is lengthy and requires discussions with the customer and subject matter experts. In addition, when developing serious corporate information systems, the database project is the foundation on which the entire system is built, and the question of the possibility of lending is often decided by bank experts on the basis of a well-made infological database project. Consequently, the information model should include such a formalized description of the subject area that will be easily perceived not only by database specialists. The description should be so capacious that one can assess the depth and correctness of the development of the database project.

Today, Chen's Entity Relationship model has become the most widely used; it has become the de facto standard in information modeling, and is called the ER model.

Selecting a DBMS is carried out on the basis of various requirements for the database and, accordingly, the capabilities of the DBMS, as well as depending on the existing experience of the developers.

Datalogical design there is a description of the database in terms of the accepted data logical data model. In relational databases, datalogical or logical design leads to the development of the database schema, i.e. sets of relationship schemes that adequately model objects of the subject area and semantic relationships between objects. The basis for analyzing the correctness of a circuit is functional dependencies between database attributes. In some cases, unwanted dependencies may appear between relationship attributes, which cause side effects and anomalies when modifying the database. Under modification understand adding new data to the database, deleting data from the database, as well as updating the values of some attributes. To eliminate possible anomalies, it is planned to normalize database relations.

The logical design stage is not just about designing the relationship diagram. As a result of this stage, as a rule, the following resulting documents should be obtained:

· Description of the conceptual schema of the database in terms of the selected DBMS.

· Description of external models in terms of the selected DBMS.

· Description of declarative rules for maintaining database integrity.

· Development of procedures for maintaining the semantic integrity of the database.

Physical design consists in linking the logical structure of the database and the physical storage environment in order to most efficiently place data, i.e. mapping the logical structure of the database into the storage structure. The issue of placing stored data in memory space and selecting effective methods access to various components“physical” database, issues of ensuring data security and integrity are resolved. Constraints in the logical data model are implemented by various means DBMS, for example, using indexes, declarative integrity constraints, triggers, stored procedures. In this case, again, decisions made at the level of logical modeling determine some boundaries within which the physical data model can be developed. Likewise, within these boundaries one can accept various solutions. For example, the relationships contained in the logical data model must be converted into tables, but various indexes can optionally be declared on each table to improve the speed of accessing the data.

Additionally, features can be used to improve performance. parallel processing data. As a result, the database can be located on several network computers. On the other hand, the advantages of multiprocessor systems can be used.

To ensure the safety and security of data, issues of recovery after failures are resolved, Reserve copy information, setting up protection systems to suit the selected security policy, etc.

It should be noted that some modern relational DBMSs mainly use physical structures and access methods based on file design technology, which essentially eliminates the issue of physical design.

Thus, it is clear that decisions made at each stage of modeling and database development will affect subsequent stages. That's why acceptance plays a special role right decisions in the early stages of modeling.

Federal Agency for Education

State educational institution higher professional education

AMUR STATE UNIVERSITY

(GOUVPO "AmSU")

TEST

in the discipline "Information systems in economics"

on the topic: “Principles of construction and stages of database design”

Executor

student of group C – 81 N.A. Vokhmyanina

Supervisor

Associate Professor, Ph.D. D. G. Shevko

Blagoveshchensk 2010

Introduction

1. Principles of database construction

2. Database Construction Concepts

3. Database design stages

Bibliography

INTRODUCTION

The perception of the real world can be correlated with a sequence of different, although sometimes interrelated, phenomena. Since ancient times, people have tried to describe these phenomena (even when they could not understand them). This description is called data.

Traditionally, data is captured using a specific means of communication, for example, using natural language in a specific medium.

Currently, the successful functioning of various firms, organizations and enterprises is simply not possible without a developed information system that allows you to automate the collection and processing of data. Typically, a database is created to store and access data containing information about a certain subject area.

Database (DB)- a named collection of data reflecting the state of objects and their relationships in the subject area under consideration.

A subject area is usually understood as a certain area of human activity or an area of the real world that is subject to study for organizing management and automation, for example, an enterprise, a university, etc.

Database management system (DBMS)- a set of language and software tools designed to create, fill, update and delete databases.

Programs with which users work with the database are called applications.

1. PRINCIPLES OF DATABASE CONSTRUCTION

The following basic requirements apply to modern databases, and, consequently, to the DBMS on which they are built.

1. High performance (short response time to a request).

Response time is the time interval from the moment of a request to the database to the actual receipt of data. A similar term is access time - the time interval between the issuance of a write (read) command and the actual receipt of data. Access refers to the operation of searching, reading or writing data. Often the operations of writing, deleting and modifying data are called update operations.

2. Easy to update data.

3. Data independence.

4. Sharing data from many users.

5. Data security - protecting data from intentional or unintentional breach of confidentiality, distortion or destruction.

6. Standardization of the construction and operation of the database (in fact, a DBMS).

8. Friendly user interface.

The most important are the first two contradictory requirements: increased performance requires simplification of the database structure, which, in turn, complicates the procedure data updates, increases their redundancy.

Data independence- the ability to change the logical and physical structure of the database without changing user perceptions.

Data independence implies invariance to the nature of data storage, software and technical means. It ensures minimal changes to the database structure when the data access strategy and the structure of the source data themselves change. This is achieved by “pushing” all changes to the conceptual and logical design stages with minimal changes to the physical design stage.

Data Security includes their integrity and protection.

Data integrity - the resistance of stored data to damage and destruction associated with faults technical means, system errors and erroneous user actions.

It assumes:

1. absence of inaccurately entered data or two identical entries about the same fact;

2. protection against errors when updating the database;

3. inability to delete (or cascade delete) related data from different tables;

4. non-distortion of data when working in multi-user mode and in distributed databases data;

5. data safety in case of equipment failures (data recovery).

Integrity is ensured by integrity triggers - special application programs that operate under certain conditions. Data protection from unauthorized access involves restricting access to confidential data and can be achieved:

1. introducing a password system;

2. obtaining permissions from the database administrator (DBA);

4. formation of views - tables derived from the original ones and intended for specific users.

The last three procedures are easily performed within the Structured Query Language - SQL, often called SQL2.

Standardization ensures the continuity of DBMS generations and simplifies the interaction of databases of the same generation of DBMS with the same and different data models. Standardization (ANSI/SPARC) has been carried out to a large extent in terms of the DBMS user interface and SQL language. This made it possible to successfully solve the problem of interaction between various relational DBMSs both using the SQL language and using the Open DataBase Connection (ODBC) application. In this case, both local and remote access to data (client/server technology or network option).

2. CONCEPT OF BUILDING A DATABASE

There are two approaches to building a database, based on two approaches to creating an automated control system (ACS).

The first of them, widely used in the 80s and therefore called classical (traditional), associated with the automation of document flow (a set of documents moving during the operation of an enterprise). The source and output coordinates were documents, as can be seen from example 1.

The following thesis was used. Data is less mobile than algorithms, so you should create a universal database that can then be used for any algorithm. However, it soon became clear that creating a universal database was problematic. The concept of data integration, which was dominant until recently, turned out to be untenable with a sharp increase in its volume. Moreover, applications began to appear (for example, text, graphic editor), based on widely used standard algorithms.

By the 90s, a second one had formed, modern approach related to control automation. It involves the initial identification of standard application algorithms (business algorithms in foreign terminology), under which the data, and therefore the database, are defined. Object-oriented programming has only increased the importance of this approach.

The database can operate in single and multi-user modes (several users connect to one computer through different ports).

They use bottom-up and top-down database design. The first is used in distributed databases when integrating designed local databases, which can be implemented using various data models. More typical for centralized databases is top-down design.

3. STAGES OF DATABASE DESIGN

Database design occurs in four stages.

At the stage formulation and analysis of requirements The goals of the organization are established and the requirements for the database are determined. They consist of general requirements, defined in section 1, and specific requirements. To formulate specific requirements, the technique of interviewing personnel is usually used different levels management. All requirements are documented in a form accessible to the end user and the database designer.

Stage conceptual design consists of describing and synthesizing user information requirements into the initial database design. The source data can be a set of user documents in the classical approach or application algorithms (business algorithms) in modern approach. The result of this stage is a high-level representation (in the form of a system of database tables) of user information requirements based on various approaches.

First, the database model is selected. Then a database structure is created, which is filled with data using menu systems, screen forms or in the database table viewing mode. Here, the protection and integrity (including referential integrity) of data is ensured using a DBMS or by constructing triggers.

In progress logical design the high-level representation of the data is converted into the structure of the DBMS used. The main goal of this stage is to eliminate data redundancy using special normalization rules. The purpose of normalization is to minimize data repetition and possible structural changes in the database during update procedures. This is achieved by dividing (decomposing) one table into two or more and then using navigation operations in queries. Note that navigation search reduces the performance of the database, i.e. increases request response time. The resulting logical structure of the database can be quantified using various characteristics (number of accesses to logical records, volume of data in each application, total volume of data). Based on these assessments, the logical structure can be improved to achieve greater efficiency.