Relational algebra in databases: operations, examples. Unary renaming operation. During the laboratory work, operations with databases in general were studied. acquired skills in using the "ibexpert" application to create, delete, register
The relationship between an owner record and a member record is also 1:N.The main difference between these models is that in the network model, a record can be a member of more than one group relationship. According to this model, each group relation is named and a distinction is made between its type and instance. A group relationship type is specified by its name and defines properties common to all instances of this type. A group relationship instance is represented by an owner record and a set of (possibly empty) subordinate records. There is the following limitation: record instance cannot be a member of two instances of group relationships of the same type (i.e., the employee from the example in paragraph 1, for example, cannot work in two departments).
- trees (a) and (b) shown in Fig. 4.2 are replaced by one network structure in which the EMPLOYEE record is included in two group relationships;
- to display the M:N type, enter the EMPLOYEE_CONTRACT record, which has no fields and serves only to link the CONTRACT and EMPLOYEE records (see Fig. 4.3). Note that this record can also store useful information, for example, the share of a given employee in the total remuneration under a given contract.
Rice. 4.3.
Each instance of a group relationship is characterized by the following characteristics:
How to arrange subordinate records:
- arbitrary,
- chronological /queue/,
- reverse chronological /stack/,
- assorted.
If a record is declared subordinate to several group relationships, then each of them can have its own ordering method assigned.
Mode for enabling subordinate records:
- automatic - it is impossible to enter a record into the database without it being immediately assigned to a certain owner;
- manual - allows you to remember a subordinate record in the database and not immediately include it in an instance of a group relationship. This operation is later initiated by the user.
Exception mode.
It is customary to distinguish three classes of membership of subordinate records in group relations:
- Fixed. A subordinate record is tightly linked to an owner record and can only be removed from the group relationship by deleting it. When you delete an owner record, all subordinate records are automatically deleted. In the example above, fixed membership assumes a group relationship of CONCLUSION between the CONTRACT and CUSTOMER records, since a contract cannot exist without a customer.
- Mandatory. It is possible to switch a subordinate record to another owner, but it cannot exist without an owner. To delete an owner record, it must have no subordinate records with required membership. The records “EMPLOYEE” and “DEPARTMENT” are connected by this relationship. If a department is disbanded, all of its employees must either be transferred to other departments or fired.
- Optional. You can exclude a record from a group relationship, but keep it in the database without assigning it to a different owner. When an owner record is deleted, its subordinate records - optional members - are retained in the database, no longer participating in a group relationship of this type. An example of such a group relationship is “PERFORM” between “EMPLOYEES” and “CONTRACT”, since there may be employees in the organization whose activities are not related to the fulfillment of any contractual obligations to customers.
Operations on data in a network database model
Add | - make a record in the database and, depending on the inclusion mode, either include it in a group relationship, where it is declared subordinate, or not include it in any group relationship. |
Include in group relation | - link an existing subordinate record to the owner record. |
Switch | - link an existing subordinate record to another owner record in the same group relationship. |
Update | - change the value of the elements of a previously extracted record. |
Extract | - extract records sequentially by key value, as well as using group relationships - you can go from the owner to member records, and from a subordinate record to the owner of the set. |
Delete | - remove a record from the database. If this record is the owner of a group relationship, then the membership class of the subordinate records is analyzed. Mandatory members must first be excluded from the group relationship, fixed members must be deleted along with the owner, optional members will remain in the database. |
Exclude from group relationship | - break the connection between the owner record and the member record. |
Integrity Constraints
As in hierarchical model Only the maintenance of referential integrity is ensured (the owner of the relationship is a member of the relationship).
Advantages and disadvantages of early DBMSs
Advantages of early DBMS:
- advanced means of managing data in external memory at a low level;
- the ability to manually build effective application systems;
- the ability to save memory by separating subobjects (in network systems)
Disadvantages of early DBMS:
- difficulty of use;
- high level of knowledge requirements about the physical organization of the database;
- dependence of application systems on the physical organization of the database;
- overloading the logic of application systems with details of organizing access to the database.
Both hierarchical and network data model requires the presence of highly qualified programmers. And even in such cases, the implementation of user requests is often delayed for a long time.
Object-oriented DBMS
The emergence of object-oriented DBMSs was caused by the needs of programmers in OO languages, who needed tools for storing objects that did not fit in the computer's RAM. Also important was the task of saving the state of objects between repeated launches of the application program. Therefore, most OODBMSs are a library whose data management procedures are included in the application program. Examples of OODBMS implementations like dedicated server databases are extremely rare.
It should immediately be noted that the generally accepted definition of " object-oriented data model" does not exist. Now we can only talk about a certain “object” approach to the logical representation of data and about various object-oriented ways of implementing it.
We know that any data model must include three aspects: structural, holistic and manipulation. Let's see how they are implemented on an object-oriented basis programming paradigms.
Structure
The structure of the object model is described using three key concepts:
encapsulation | - each object has some internal state (stores a record of data inside itself), as well as a set of methods - procedures, with the help of which (and only in this way) you can access the data that determines the internal state of the object, or change them. Thus, objects can be considered as independent entities, separated from the external world; |
inheritance | - implies the ability to create new classes of objects from classes of objects that inherit the structure and methods of their ancestors, adding to them features that reflect their own individuality. Inheritance can be simple (one ancestor) or multiple (several ancestors); |
polymorphism | - different objects can react differently to the same external events depending on how their methods are implemented. |
Data integrity
To maintain integrity, the object-oriented approach suggests using the following tools:
- automatic maintenance of inheritance relationships; the ability to declare some data fields and methods of an object as “hidden”, not visible to other objects; such fields and methods are used only by methods of the object itself; creating integrity control procedures within the object
Data manipulation tools
Unfortunately, object-oriented programming lacks common data manipulation tools such as relational algebra or relational calculus. Data is processed using one of the general-purpose object-oriented programming languages, usually SmallTalk, C++ or Java.
Let us now summarize some results
Object-oriented databases, unlike relational ones, store objects rather than records. The object-oriented approach provides a more advanced means of representing the real world than the relational model, a natural representation of data. In the relational model, all relationships belong to the same level, which is what complicates the transformation of the hierarchical relationships of the entity-relationship model into the relational model. The OO model can be viewed layer by layer, at different levels of abstraction. It is possible to define new data types and operations with them.
At the same time, OO - The model also has a number of disadvantages:
- There are no powerful non-procedural means of retrieving objects from the database. All queries have to be written in procedural languages, the problem of optimizing them is assigned to the programmer;
- instead of purely declarative integrity constraints(such as explicitly declaring primary and foreign keys of relational tables using keywords PRIMARY KEY And REFERENCES) or semi-declarative triggers, you have to write procedural code to ensure internal integrity.
Obviously, both of these shortcomings are associated with the lack of developed means of data manipulation. This problem is solved in two ways - expanding OO languages towards data management (ODMG standard), or adding object properties to relational DBMSs (SQL-3, as well as so-called object-relational DBMSs).
Each operation includes data selection (selection) and the actions that will be performed on the selected data. The main operations in a relational database are database update operations and relationship processing operations.
TO database update operations include those operations that insert new tuples, remove unnecessary ones, and adjust the attribute values of existing tuples, namely: these are operations Turn on, Delete, Update.
Operation Turn on requires specifying the name of the relationship and preliminary generation of the attribute values of the new tuple. The tuple key must be specified.
Operation Delete requires the name of the relation, as well as the identification of the tuple or group of tuples to be deleted.
Operation Update is executed for the named relation and can correct both one and several tuples. For example, if the company’s management decided to increase all employee salaries by the same amount, then several tuples will be adjusted at once with one Update operation.
Concerning processing operations, then they are borrowed from relational algebra. According to E. Codd’s approach, relational algebra includes eight operations, five of which are basic: Sample, Projection, Multiplication, Union, Subtraction.
Sample- select from the relation only those tuples that satisfy the given condition.
At Projections relation to a given set of its attributes, a new relation is obtained, created by extracting tuples containing the specified attributes from the original relation.
At Multiplication(Cartesian product) of two relations, a new relation is obtained, the tuples of which are a concatenation of the tuples of the first and second relations.
As a result Associations two relations, a third one is obtained, including tuples included in at least one relation, that is, containing all the elements of the original relations.
At Subtraction Only those tuples of the first relation are returned that remain after subtracting the second relation, that is, all tuples of the second relation are discarded from the first relation.
The remaining three operations are derivatives; they can be obtained from the main operations; they are called additional: Connection, Intersection, Division.
Operation Compound applies to two relationships that have a common attribute. The result of this operation for two relations under some condition is a relation of tuples that are a combination of the first and second relations that satisfy the specified condition.
Intersection two relations is a relation that includes all tuples included in both relations.
Operation Divisions assumes that there are two relations: one is binary (containing two attributes), the other is unary (containing one attribute). The result is a relation consisting of tuples that include the values of the first attribute of the tuples of the first relation, but only those for which the set of values of the second attribute of the first relation coincides with the set of values of the attributes of the second relation.
A distinctive feature of relation processing operations is that the unit of processing in them is not tuples, but relations: one or two relations are used at the input of each operation, and the result of the operations is a new relation.
Let's look at some of the most commonly used operations of relational algebra in more detail.
Operation An association- the input is given two compatible relations of the same dimension: A and B. The result is a relation of the same structure, containing all tuples A and all tuples B
Intersection assumes the presence at the input of two relations of the same dimension: A and B. At the output, a relation of the same structure is created, containing only those tuples A that are in B.
Division. At the input of the operation, two relations are used: A and B. Let relation A, called divisible, contain attributes (A 1, A 2, A 3,..., A n). Relation B is a divisor and contains a subset of attributes A: (A 1, A 2, ..., A k), where k In general, the operations of the relational data model provide the ability to manipulate relationships, allowing you to update the database, as well as select subsets of stored data and present them in the desired form. When designing and working with databases, these eight operations are usually not enough. Therefore, operations such as renaming attributes, creating new calculated attributes, assignment operations, comparisons, etc. are added. Modification Anomalies
Modification Anomalies manifest themselves in the fact that a change in the value of one data may entail scanning the entire table and a corresponding change in some other table records. Deletion anomalies are that when you delete any data from the table, other information that is not directly related to the deleted data may also disappear. Addition anomalies arise in cases where information cannot be placed in a table until it is incomplete, or inserting a new record requires additional scanning of the table Let there be a relation that stores information about students, the courses they take, and the cost of these courses. From this relation, a tuple is removed that contains (in addition to information about the student) information about the name and cost of the course attended by this student. If information about the name and cost of the course was stored in a single copy only in this tuple, it will irrevocably disappear from the relation. This situation is called deletion anomaly. Performing a delete operation results in the loss of information about two entities. This same relationship can be illustrated by an example insertion anomaly. Let's say we need to add information about the name and cost of a certain course, but we will not be able to add this information until no students are enrolled in the course. You can get rid of both anomalies by splitting the existing relationship into two, each of which will contain data from only one entity. Then deleting student information will not affect course data. When splitting a relationship into two, problems also arise. For example, is it possible to enroll a student in a course that does not yet exist? These problems must be resolved by discussing business rules. If the business rules require that course and cost information be available when a student enrolls in that course, then when the student enrolls in the course, a check will be made to ensure that the required course exists. These types of checks are called referential integrity constraints or foreign key integrity constraints. Integrity of entities - no primary key value must contain null. Design stages:
Conceptual design - the database development process begins with requirements analysis. The designer at this stage of development must find answers to the following questions: what data elements should be stored, who will access them and how. A Spanish model is created. Information that does not depend on physical aspects, target database and programming languages Boolean - the logical structure of the database is created. To do this, determine how the data will be grouped logically. The structure of the database at this stage is expressed in terms of application objects and relationships between them. Depends on the target DBMS, redundancy checking, normalization. Physical - The logical structure of the database is converted into a physical structure taking into account performance aspects. Data elements at this stage receive attributes and are defined as columns in the tables of the DBMS chosen for the implementation of the database. Basic file and index organization relationships, integrity constraints and security measures. Just in case - transactions- an indivisible aftereffect of operations that transfer the database from one stable state to another. Properties - atomicity (indivisibility), consistency (from one state to another), isolation (user transactions do not interfere with each other), durability (the result must be recorded in the database after the release, even if it crashed at the next moment). Entity-relationship method.
Entity-relationship modeling method gives an abstract model of a problem domain using the following basic concepts: essence(entities), relationships(relationships) between entities and attributes(attributes) to represent properties of entities and relationships. Any fragment of the subject area can be represented as set of entities, between which there is some many connections. Let's give definitions: Essence is an object that can be identified in some way that distinguishes it from other objects. Examples: a specific person, enterprise, event, etc. Entity set- a set of entities of the same type (having the same properties). Examples: all people, businesses, holidays, etc. Entity sets do not have to be disjoint. For example, an entity that belongs to the set MAN also belongs to the set PEOPLE. An entity is actually a set attributes, which describe the properties of all members of a given entity set. Domain was already higher. Entity Key- is one or more attributes that uniquely define a given entity. Connection is an association established between several entities. Examples: Unfortunately, there are no general rules for determining what is considered an entity and what is a relationship. In the example discussed above, we assumed that “leading” is a connection. However, we can consider the entity “manager”, which has connections “manages” with the entity “department” and “is” with the entity “employee”. A relationship can also have attributes. For example, for the DEPARTMENT-EMPLOYEE relationship, you can set the attribute WORK_TERRENCE_IN_DEPARTMENT. The role of the entity in the relationship- the function that an entity performs in a given connection. For example, in a PARENT-CHILD relationship, PERSON entities may have the roles "parent" and "child". Specifying roles in the entity-relationship model is optional and serves to clarify the semantics of the relationship. Set of connections- this is the relationship between n(and n no less than 2) entities, each of which belongs to a certain set of entities. Although, strictly speaking, the concepts of “connection” and “set of connections” are different (the first is an element of the second), they are nevertheless very often confused. When n=2, i.e. when a relationship combines two entities, it is called binary. It has been proven that n-ary set of connections ( n>2) can always be replaced by many binary ones, but the former better reflect the semantics of the subject area. The number of entities that can be associated through a set of connections with another entity is called degree of connection. Considering degrees is especially useful for binary relationships. The following degrees of binary bonds can exist: Another important characteristic of a connection besides its degree is membership class entities included in it or cardinality communications. "EMPLOYEE" has mandatory membership class(this fact is also indicated by indicating the interval of the number of possible occurrences of the entity in a relationship, in this case it is 1.1), and the entity "DEPARTMENT" has optional membership class(0,1). Now we can describe this relationship as 0,1:1,1
. This figure further illustrates the fact that multiple sets of relationships can be defined between two entities. In this case, for completely obvious reasons (each contract is concluded with a specific customer, and each customer has at least one contract, otherwise it would not be such), each entity has a mandatory membership class. If the existence of an entity x depends on the existence of an entity y, then x is called dependent entity(sometimes entity x is called "weak" and "entity" y is called strong). As an example, consider the relationship between the previously described entities WORKING_GROUP and CONTRACT. The working group is created only after the contract is signed with the customer, and ceases to exist upon completion of the contract. Then WORKING_GROUP is dependent on the CONTRACT entity. We will denote a dependent entity by a double rectangle, and its connection with a strong entity by a line with an arrow ( we had an oval for the dependent) The cardinality of the connection for a strong entity will always be (1,1). The membership class and degree of relationship for a dependent entity can be anything. 12. Hierarchical and network data models. Hierarchical model is a set of elements arranged in the order of their subordination from general to specific and forming a tree (graph) inverted in structure. The basic concepts of a hierarchical structure include level, node, and relationship. Knot is a set of data attributes that describe an object. In a hierarchical tree diagram, nodes are represented as vertices in the graph. Each node at a lower level is connected to only one node at a higher level. A hierarchical tree has only one vertex, not subordinate to any other vertex and located at the topmost - the first level. Dependent (slave) nodes are at the second, third, etc. levels. The number of trees in the database is determined by the number of root records. Each database record has only one hierarchical path from the root record. The organization of data in a hierarchical type DBMS is defined in terms of: element, aggregate, record (group), group relation, database. The root record of each tree must contain a key with a unique value. The keys of non-root records must have a unique value only within the group relationship. Each record is identified by a complete concatenated key, which is the set of keys of all records from the root along the hierarchical path. When depicted graphically, group relationships are represented by arcs of a directed graph, and record types are represented by vertices. For group relationships in the hierarchical model, automatic inclusion mode and fixed membership are provided. This means that for any non-root record to be remembered in the database, its parent record must exist. When a parent record is deleted, all subordinate records are automatically deleted. Example: An enterprise consists of departments in which employees work. Each department can have multiple employees, but an employee cannot work in more than one department. Therefore, for a personnel management information system, it is necessary to create a group relationship consisting of a parent record DEPARTMENT (NAME OF DEPARTMENT, NUMBER OF EMPLOYEES) and a child record EMPLOYEE (LAST NAME, POSITION, SALARY). (For simplicity, we assume that there are only two child records.) - rice a (further) To automate the accounting of contracts with customers, it is necessary to create another hierarchical structure: customer - contracts with him - employees involved in working on the contract. This tree will include the records CUSTOMER(CUSTOMER_NAME, ADDRESS), CONTRACT(NUMBER, DATE, AMOUNT), CONTRACTOR (SURNAME, POSITION, DEPARTMENT_NAME) - fig. b. Flaws hierarchical databases: Advantages: The simplest is a fairly efficient use of memory and good time performance for performing operations on data. However, this model is convenient mainly for working with hierarchically organized information. Operations on data defined in the hierarchical model: In the EXTRACT operation, it is possible to specify selection conditions. All modification operations are applied only to one “current” record (which was previously retrieved from the database). This approach to data manipulation is called “navigation”. Integrity constraints. Only the integrity of the relationships between the owners and members of the group relationship is maintained (no descendant can exist without an ancestor). The correspondence of paired records belonging to different hierarchies is not automatically maintained. The first database management systems, which appeared in the mid-60s, made it possible to work with a hierarchical database. The most famous was the hierarchical IMS system from IBM. Other systems are also known: PC/Focus, Team-Up, Data Edge and ours: Oka, INES, MIRIS. Network data model. Network model- a structure in which any element can be associated with any other element. A network database consists of sets of records that are related to each other so that records can contain explicit links to other sets of records. Thus, sets of records form a network. Relationships between records can be arbitrary, and these relationships are explicitly present and stored in the database. A network data model is defined in the same terms as a hierarchical one. It consists of many records that can be owners or members of a group relationship. The relationship between an owner record and a member record is also of the form 1:N. The main difference between these models is that in the network model, a record can be a member more than one group relationship. According to this model, each group relation is named and a distinction is made between its type and its instance. A group relationship type is specified by its name and defines properties common to all instances of this type. A group relationship instance is represented by an owner record and a set of (possibly empty) subordinate records. However, there is the following limitation: a record instance cannot be a member of two instances of group relationships of the same type (an employee cannot work in two departments) Hierarchical structure from the picture above. is converted to network as follows Trees (a) and (b) are replaced by a single network structure in which the EMPLOYEE entry is included in two group relationships; to display the M:N type, enter the EMPLOYEE_CONTRACT record, which has no fields and serves only to link the CONTRACT and EMPLOYEE records Each instance of a group relationship is characterized by the following characteristics: arbitrary, chronological /queue/, reverse chronological /stack/, assorted. If a record is declared subordinate to several group relationships, then each of them can have its own ordering method assigned. automatic - it is impossible to enter a record into the database without it being immediately assigned to a certain owner; manual - allows you to remember a subordinate record in the database and not immediately include it in an instance of a group relationship. This operation is later initiated by the user). Fixed. A subordinate record is tightly linked to an owner record and can only be removed from the group relationship by deleting it. When you delete an owner record, all subordinate records are automatically deleted. In the example, fixed membership assumes the group relationship "CONCLUDES" between the records "CONTRACT" and "CUSTOMER" because a contract cannot exist without a customer. Mandatory. It is possible to switch a subordinate record to another owner, but it cannot exist without an owner. To delete an owner record, it must have no subordinate records with required membership. The records “EMPLOYEE” and “DEPARTMENT” are connected by this relationship. If a department is disbanded, all of its employees must either be transferred to other departments or fired. Optional. You can exclude a record from a group relationship, but keep it in the database without assigning it to a different owner. When an owner record is deleted, its subordinate records - optional members - are retained in the database, no longer participating in a group relationship of this type. An example of such a group relationship is “PERFORM” between “EMPLOYEES” and “CONTRACT”, since the organization may have employees whose activities are not related to the fulfillment of any contractual obligations to customers. Operations on data. ADD- make a record in the database and, depending on the inclusion mode, either include it in a group relationship, where it is declared subordinate, or not include it in any group relationship. INCLUDE INTO GROUP RELATIONSHIP- link an existing subordinate record to the owner record. SWITCH- link an existing subordinate record to another owner record in the same group relationship. UPDATE- change the value of the elements of a previously extracted record. EXTRACT- extract records sequentially by key value, as well as using group relationships - you can go from the owner to member records, and from a subordinate record to the owner of the set. DELETE- remove a record from the database. If this record is the owner of a group relationship, then the membership class of the subordinate records is analyzed. Mandatory members must first be excluded from the group relationship, fixed members must be deleted along with the owner, optional members will remain in the database. Integrity constraints. As in the hierarchical model, only the maintenance of referential integrity is ensured (the owner of the relationship is a member of the relationship). Basics dignity network model is high memory efficiency and efficiency. Flaw– the complexity and rigidity of the base scheme, as well as the difficulty of understanding. In addition, integrity control is weakened in this model, since it allows arbitrary connections to be established between records. The complexity of the DBMS implementation, the complexity of the data access mechanism, and the need to clearly define data connections at the physical level To well-known network database management systems include: DBMS, IDMS, TOTAL, VISTA, NETWORK, SETOR, COMPASS, etc. Comparing hierarchical and network databases, we can say the following. In general, hierarchical and network models provide fairly fast access to data. But since in network databases the main structure of information presentation has the form of a network in which each vertex (node) can have a connection with any other, then the data in the network database is more equal than in the hierarchical one, since access to information can be carried out starting from any node. Graph (hierarchical and network) models are implemented as data models in database management systems running on mainframe computers. For personal computers, relational databases are more common, although there are also database management systems that support the network model. Relational algebra, as you might guess, this is a special type of algebra in which all operations are performed on relational data models, i.e., on relationships. In tabular terms, a relationship includes rows, columns, and a row—the column header. Therefore, natural unary operations are the operations of selecting certain rows or columns, as well as changing column headings - renaming attributes. ^
The first unary operator we'll look at is fetch operation– the operation of selecting rows from a table representing a relationship according to which or principle, i.e. selection of rows tuples that satisfy a certain condition or conditions. ^
Sampling operator
denoted by σ
<P >, sampling condition – P <S>, i.e., operator σ
is always taken by a certain condition on tuples P, and the condition itself P is written depending on the relation scheme S. Considering all this myself fetch operation over the relation scheme S in relation to the relation r σ
<P >r (S) ≡
σ
<P >r = {t (S) |t ∈ r & P <S >t } = {t (S) |t ∈ r & IfNull (P <S >t , False }; The result of this operation will be a new relation with the same relation schema S, consisting of those tuples t (S) original relation operands that satisfy the sampling condition P then the condition for the tuple is to substitute the values of the tuple’s attributes instead of the attribute names. To better understand the principle of this operation, let's give an example. Let the following relationship diagram be given: ^S: Session (grade book number, Last name, Subject, Grade). Let's take the following sampling condition: P <S> = (Subject = ‘Computer Science’ and Grade > 3). We need from the original relation operand, select those tuples that contain information about students who passed the “Informatics” subject with at least three points. Let also be given the following tuple from this relation: t 0 (S) ∈ r (S Applying our selection condition to the tuple t 0 , we get: P On this particular tuple, the selection condition is not satisfied. In general, the result of this particular sample σ
<Предмет = "Информатика" and Оценка >3 > Session there will be a “Session” table in which rows that satisfy the selection condition are left. ^
Another standard unary operation we will study is the projection operation. Projection operation is the operation of selecting columns from a table representing a relationship according to which or sign. Namely, the machine selects those attributes (i.e., literally those columns) of the original relation operands that were specified in the projection. ^
Projection operator
denoted by [ S"] or π t (S) [S' ] = {t (a)|a ∈ def (t) ∩ S ’}, S " ⊆S . It is important to note that duplicate tuples are excluded from the result, i.e., there will be no duplicate rows in the table representing the new resulting relation. Taking into account all of the above, the projection operation in terms of database management systems will look like this: π
<S" >r (S) ≡ π
<S' >r ≡ r (S) [S ’] ≡ r [S" ] = {t (S) [S' ] | t ∈ r }; Let's look at an example illustrating the principle of the sampling operation. Let the relation “Session” and the schema of this relation be given: S: Session (grade book number, Last name, Subject, Grade); We will be interested in only two attributes from this scheme, namely “grade book number” and “Last name” of the student, so the subschema S" will look like this: ^S": (grade book number, last name). Need an initial attitude r (S) project onto the subcircuit S" . t 0 (S) ∈ r (S): ((grade book number: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); This means that the projection of this tuple onto this subcircuit ^S" will look like this: t 0 (S) S": ((grade book number: 100), (Last name: ‘Ivanov’)); If we talk about the projection operation in terms of tables, then the projection Session [Gradebook No., Last Name] of the original relation is the Session table, from which all columns have been deleted except two: Gradebook No. and Last Name. In addition, all duplicate lines have also been removed. ^
The last unary operator we'll look at is attribute renaming operation. If we talk about a relation as a table, then the renaming operation is needed in order to change the names of all or some columns. ^
Rename operator
as follows: ρ<φ
>, here φ –
rename function . This function sets mutually one-to-one correspondence between schema attribute names S And Ŝ,
where respectively S – scheme of the original relation, and Ŝ –
relationship diagram with renamed attributes. So the operator ρ
<φ>
in relation to r (S) gives a new relationship with the schema Ŝ
, consisting of tuples of the original relation with only the renamed attributes. Let's write the operation of renaming attributes in terms of database management systems: ρ
<φ
> r (S) ≡ ρ
<φ
>r = {ρ
<φ
> t (S)| t ∈ r }; Here is an example of using this operation: Let's consider the already familiar Session relation, with the diagram: S: Session (grade book number, Last name, Subject, Grade); Let's introduce a new relation schema Ŝ, with other attribute names that we would like to see instead of the existing ones: Ŝ :
(ZK No., Last name, Subject, Point); For example, the database customer wanted to see other names in your ready-made relation. To implement this order, you need to design the following rename function: φ
: (grade book number, last name, subject, grade) → (grade book number, last name, subject, score); In fact, only two attributes need to be renamed, so it would be legal to write the following renaming function instead of the existing one: φ
: (grade book number, grade) →
(No. ZK, Point); t 0 (S) ∈
r (S):
((grade book number: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); Let's apply the rename operator to this tuple: ρ<φ>t 0 (S):
((ZK No.: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); So, this is one of the tuples of our relation that has had its attributes renamed. In tabular terms, the relation ρ
< № зачетной книжки, Оценка →
“ZK No., Point > Session – this is a new table obtained from the Session relationship table by renaming the specified attributes. ^
Unary operations, like any other, have certain properties. Let's look at the most important of them. The first property of the unary operations of selection, projection and renaming is a property that characterizes the ratio of cardinalities of relations. (Recall that cardinality is the number of tuples in a particular relation.) It is clear that here we are considering, respectively, the original relation and the relation obtained as a result of applying a particular operation. Note that all properties of unary operations follow directly from their definitions, so they can be easily explained and even, if desired, deduced independently. 1) power ratio: a) for the sampling operation: | σ
<P >r |≤ |r |; b) for the projection operation: | r [S" ] | ≤ |r |; c) for the renaming operation: | ρ
<φ
>r | = |r |; In total, we see that for two operators, namely the sampling operator and the projection operator, the power of the original relations - operands is greater than the power of the relations obtained from the original ones using the corresponding operations. This is because the selection process associated with these two selection and projection operations eliminates some rows or columns that do not satisfy the selection conditions. In the case when all rows or columns satisfy the conditions, the cardinality (i.e., the number of tuples) does not decrease, so the inequality in the formulas is not strict. In the case of a renaming operation, the power of the relation does not change, due to the fact that when changing names, no tuples are excluded from the relation; 2) idempotency property: a) for the sampling operation: σ
<P > σ
<P >r = σ
<P >; b) for the projection operation: r [S' ] [S' ] = r [S" ]; c) for the renaming operation, in the general case, the idempotency property is not applicable. This property means that twice sequential application of the same operator to any or relation is equivalent to its single use. For the operation of renaming relation attributes, generally speaking, this property can be applied, but always with special reservations and conditions. The idempotency property is very often used to simplify the form of an expression and bring it to a more economical, relevant form. The last property we will consider is the property of monotonicity. It is interesting to note that under any conditions all three operators are monotonic; 3) property of monotonicity: a) for the sampling operation: r 1 ⊆
r 2 ⇒ σ
<P > r 1 ⇒
σ
<P >r 2 ; b) for the projection operation: r 1 ⊆
r 2 ⇒
r 1 [S" ] ⊆
r 2 [S" ]; c) for the renaming operation: r 1 ⊆
r 2 ⇒
ρ
<φ
>r 1 ⊆ ρ
<φ
>r 2 ; The concept of monotonicity in relational algebra is similar to the same concept from ordinary, general algebra. Let us explain: if initially the relationship r 1 and r 2 were connected in such a way that r ⊆ r 2 ,
then even after applying any of the three sampling, projection or renaming operators, this relationship will remain the same. Relational algebra is a language of operations performed on relationships - tables of a relational database. The operations of relational algebra allow you to create another relation based on one or more relations without changing the original relations themselves. The resulting other relation is usually not written to the database, but exists as a result of the SQL query - an array created by functions for working with databases in programming languages. For each relational algebra operation, its implementation will be given in the form of queries in the SQL language. Let's consider the operations of relational algebra. So that you are not distracted by the contents of tables that are not your databases, such as “Products”, “Drivers”, “plums”, “pears”, “tea”, “coffee”, Vladimir, Sergei, etc. We will perform operations on relationships (tables) with abstract data, such as R1, R2 (names of tables - relationships), etc. and A1, A2, A3 (names of attributes - columns) and h15, w11, etc. (contents of database table records). Priorities for performing relational algebra operations (in descending order of list items, and in one item - operations with equal priorities): The select operation operates on one relation and determines the resulting relation R, which contains only those tuples (or rows, or records), relations that satisfy a given condition (predicate P
). Thus, the fetch operation is a unary operation and is written as follows: Where P- predicate (logical condition). SQL Query
Now let's see what happens when we run this relational algebra operation and its corresponding SQL query. The table below shows one relation that this operation works on. We look at column A3 and establish that the predicate A3>"d0" is satisfied by the entries in the first and third rows of the original relation (since the number of the letter y in the alphabet is greater than the number of the letter d). As a result, we get the following new relation, which contains two lines: This material will help you combine all kinds of logical conditions for selections. "Boolean algebra (algebra of logic)" . SQL Query
SELECT A1, A2, A3 from R1 UNION SELECT A1, A2, A3 from R2 Now let's see what happens when we run this relational algebra operation and its corresponding SQL query. Now two relations are given, since the union operation is a binary operation: We combine the lines of the first and second relation and see that the third line, which is the third in both the first and second relation, is identical, so we include it in the new relation only once. We get the following relation: The following is important: a join operation can be performed only when two relations have the same number and names of attributes (columns), or, formally speaking, are join compatible. The result of the intersection of two sets (relations) A and B () will be a set (relation) C that includes those and only those elements that are in both set A and set B. The operation of intersection of relational algebra is identical to the operation. SQL Query
SELECT A1, A2, A3 from R1 INTERSECT SELECT A1, A2, A3 from R2 Some SQL dialects do not have the INTERSECT keyword. Its replacement, for example, in MySQL and others, is INNER JOIN. How the SQL JOIN operator works in general and its varieties INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN - in the lesson SQL JOIN - joining database tables . MySQL Query
Now let's see what happens when we run this relational algebra operation and its corresponding SQL query. Again, two relations R1 and R2 are given: We look through all the records in two relations, and find that in both the first and second relations there is one row - the one that is third in both the first and second relations. We get a new relation: The difference of two relations R1 and R2 () consists of tuples (or records, or rows) that are present in relation R1, but not in relation R2. The relations R1 and R2 must be join compatible. The relational algebra difference operation is identical to the operation. SQL Query
SELECT A1, A2, A3 from R2 EXCEPT Let's establish what happens as a result of executing this relational algebra operation and the corresponding SQL query. Again, two relations R1 and R2 are given: From the relation R2 we exclude the row that also exists in the relation R2 - the third - and we obtain a new relation: The Cartesian product operation () defines a new relation R, which is the result of concatenating each tuple of relation R1 with each tuple of relation R2. SQL Query
SELECT * from R3, R4 Let's establish what happens as a result of executing this relational algebra operation and the corresponding SQL query. Given two relations R3 and R4: The new relation must contain all the attributes (columns) of the two relations. First, the first row of relation R3 is concatenated with each of the two rows of relation R4, then the second row of relation R3, then the third. The result should be 3 X 2 = 6 tuples (rows). We get this new relation: Relational algebra, as you might guess, this is a special type of algebra in which all operations are performed on relational data models, i.e., on relationships. In tabular terms, a relationship includes rows, columns, and a row—the column header. Therefore, natural unary operations are the operations of selecting certain rows or columns, as well as changing column headings - renaming attributes. The first unary operator we'll look at is fetch operation– the operation of selecting rows from a table representing a relationship according to some principle, i.e. selecting tuple rows that satisfy a certain condition or conditions. Sampling operator denoted by ?
<P>, sampling condition – P<S>, i.e., operator ?
is always taken with a certain condition on tuples P, and the condition itself P is written depending on the relation scheme S. Considering all this myself fetch operation over the relation scheme S in relation to the relation r ?
<P>r(S) ?
?
<P>r = {t(S) |t ? r & P<S>t} = {t(S) |t ? r & IfNull(P<S>t, False}; The result of this operation will be a new relation with the same relation schema S, consisting of those tuples t(S) of the original operand relation that satisfy the sampling condition P To better understand the principle of this operation, let's give an example. Let the following relationship diagram be given: S: Session (grade book number, Last name, Subject, Grade). Let's take the following sampling condition: P<S> = (Subject = ‘Computer Science’ and Grade > 3). We need to select from the original operand relation those tuples that contain information about students who passed the Computer Science subject with at least three points. Let also be given the following tuple from this relation: t 0 (S) ? r(S Applying our selection condition to the tuple t 0 , we get: P On this particular tuple, the selection condition is not satisfied. In general, the result of this particular sample ?
<Предмет = "Информатика" and Оценка >3 > Session there will be a “Session” table in which rows that satisfy the selection condition are left. Another standard unary operation we will study is the projection operation. Projection operation is the operation of selecting columns from a table representing a relationship based on some characteristic. Namely, the machine selects those attributes (that is, literally those columns) of the original operand relation that were specified in the projection. Projection operator denoted by [ S"] or ? t(S) [S'] = {t(a)|a ? def(t) ? S’}, S" ?S. It is important to note that duplicate tuples are excluded from the result, i.e., there will be no duplicate rows in the table representing the new resulting relation. Taking into account all of the above, the projection operation in terms of database management systems will look like this: ?
<S">r(S) ? ?
<S'>r ? r(S) [S’] ? r [S" ] = {t(S) [S'] | t ? r }; Let's look at an example illustrating the principle of the sampling operation. Let the relation “Session” and the schema of this relation be given: S: Session (grade book number, Last name, Subject, Grade); We will be interested in only two attributes from this scheme, namely “grade book number” and “Last name” of the student, so the subschema S" will look like this: S": (grade book number, last name). Need an initial attitude r(S) project onto the subcircuit S". t 0 (S) ? r(S): ((grade book number: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); This means that the projection of this tuple onto this subcircuit S" will look like this: t 0 (S) S": ((grade book number: 100), (Last name: ‘Ivanov’)); If we talk about the projection operation in terms of tables, then the projection Session [Gradebook No., Last Name] of the original relation is the Session table, from which all columns have been deleted except two: Gradebook No. and Last Name. In addition, all duplicate lines have also been removed. The last unary operator we'll look at is attribute renaming operation. If we talk about a relation as a table, then the renaming operation is needed in order to change the names of all or some columns. Rename operator as follows: ?
>, here ? -
rename function. This function establishes a one-to-one mapping between schema attribute names S And S, where respectively S- schema of the original relation, and S -
relationship diagram with renamed attributes. So the operator ?
<?>
in relation to r(S) gives a new relationship with the schema S, consisting of tuples of the original relation with only the renamed attributes. Let's write the operation of renaming attributes in terms of database management systems: ?
<?
> r(S) ? ?
<?
>r = {?
<?
> t(S)| t ? r}; Here is an example of using this operation: Let's consider the already familiar Session relation, with the diagram: S: Session (grade book number, Last name, Subject, Grade); Let's introduce a new relation schema S, with different attribute names that we would like to see instead of the existing ones: S: For example, the database customer wanted to see other names in your ready-made relation. To implement this order, you need to design the following rename function: ?
: (grade book number, last name, subject, grade) > (grade book number, last name, subject, score); In fact, only two attributes need to be renamed, so it would be legal to write the following renaming function instead of the existing one: ?
: (grade book number, grade) >
(No. ZK, Point); t 0 (S) ? r(S):
((grade book number: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); Let's apply the rename operator to this tuple: ?>t 0 (S):
((ZK No.: 100), (Last name: ‘Ivanov’), (Subject: ‘Databases’), (Score: 5)); So, this is one of the tuples of our relation that has had its attributes renamed. In tabular terms, the relation ?
< № зачетной книжки, Оценка >
“ZK No., Point > Session - this is a new table obtained from the Session relationship table by renaming the specified attributes. Unary operations, like any other, have certain properties. Let's look at the most important of them. The first property of the unary operations of selection, projection and renaming is a property that characterizes the ratio of cardinalities of relations. (Recall that cardinality is the number of tuples in a particular relation.) It is clear that here we are considering, respectively, the original relation and the relation obtained as a result of applying a particular operation. Note that all properties of unary operations follow directly from their definitions, so they can be easily explained and even, if desired, deduced independently. 1) power ratio: a) for the sampling operation: | ?
<P>r |? |r|; b) for the projection operation: | r[S"] | ? |r|; c) for the renaming operation: | ?
<?
>r | = |r|; In total, we see that for two operators, namely the sampling operator and the projection operator, the power of the original relations - operands is greater than the power of the relations obtained from the original ones using the corresponding operations. This is because the selection process associated with these two selection and projection operations eliminates some rows or columns that do not satisfy the selection conditions. In the case when all rows or columns satisfy the conditions, the cardinality (i.e., the number of tuples) does not decrease, so the inequality in the formulas is not strict. In the case of a renaming operation, the power of the relation does not change, due to the fact that when changing names, no tuples are excluded from the relation; 2) idempotency property: a) for the sampling operation: ?
<P> ?
<P>r = ?
<P>; b) for the projection operation: r [S'] [S'] = r [S"]; c) for the renaming operation, in the general case, the idempotency property is not applicable. This property means that double sequential application of the same operator to any relation is equivalent to its single application. For the operation of renaming relation attributes, generally speaking, this property can be applied, but always with special reservations and conditions. The idempotency property is very often used to simplify the form of an expression and bring it to a more economical, relevant form. The last property we will consider is the property of monotonicity. It is interesting to note that under any conditions all three operators are monotonic; 3) property of monotonicity: a) for the sampling operation: r 1 ?
r 2 ? ?
<P> r 1 ?
?
<P>r 2 ; b) for the projection operation: r 1 ?
r 2 ?
r 1 [S"] ?
r 2 [S"]; c) for the renaming operation: r 1 ?
r 2 ?
?
<?
>r 1 ? ?
<?
>r 2 ; The concept of monotonicity in relational algebra is similar to the same concept from ordinary, general algebra. Let us explain: if initially the relationship r 1 and r 2 were connected in such a way that r ? r 2 ,
then even after applying any of the three sampling, projection or renaming operators, this relationship will remain the same.
EXCLUDE FROM GROUP RELATIONSHIP- break the connection between the owner record and the member record.Lecture No. 4. Relational algebra. Unary operations
1. Unary fetch operation
t. It is clear that in order to apply whatt 0 = (‘Databases’ = ‘Computer Science’ and 5 > 3);2. Unary projection operation
. Here S"– subcircuit of the original relation schema S, i.e. some of its columns. What does this mean? This means that S' has fewer attributes than S, because in S" Only those for which the projection condition was satisfied remained. And in the table representing the relation r (S"), there are as many rows as there are in the table r (S), and there are fewer columns, since only those corresponding to the remaining attributes remain. So the projection operator π< S">
in relation to r (S) results in a new relation with a different relation schema r (S"), consisting of projections t (S) [S"] tuples of the original relation. How are these tuple projections determined? Projection any tuple t (S) original relation r (S) per subcircuit S" is determined by the following formula:3. Unary renaming operation
4. Properties of unary operations
Sampling operation
R3
A1 A2 A3 A4
3
hh yl ms
4
pp a1 sr
1
rr yl ms
R
A1 A2 A3 A4
3
hh yl ms
1
rr yl ms
R1
R2
A1 A2 A3 A1 A2 A3
Z7 aa w11 X8 pp k21
B7 hh h15 Q2 ee h15
X8 pp w11 X8 pp w11
R
A1 A2 A3
Z7 aa w11
B7 hh h15
X8 pp w11
X8 pp k21
Q2 ee h15
Intersection operation
R1
R2
A1 A2 A3 A1 A2 A3
Z7 aa w11 X8 pp k21
B7 hh h15 Q2 ee h15
X8 pp w11 X8 pp w11
R
A1 A2 A3
X8 pp w11
Difference operation
SELECT A1, A2, A3 from R1R1
R2
A1 A2 A3 A1 A2 A3
Z7 aa w11 X8 pp k21
B7 hh h15 Q2 ee h15
X8 pp w11 X8 pp w11
R
A1 A2 A3
X8 pp w11
Q2 ee h15
Cartesian product operation
R3
R4
A1 A2 A3 A4 A5 A6
3
hh yl ms 3
hh
4
pp a1 sr 4
pp
1
rr yl ms
R
A1 A2 A3 A4 A5 A6
3
hh yl ms 3
hh
3
hh yl ms 4
pp
4
pp a1 sr 3
hh
4
pp a1 sr 4
pp
1
rr yl ms 3
hh
1
rr yl ms 4
pp
1. Unary fetch operation
t. It is clear that in order to apply some condition to a tuple, it is necessary to substitute the values of the tuple’s attributes instead of the attribute names.t 0 = (‘Databases’ = ‘Computer Science’ and 5 > 3);2. Unary projection operation
. Here S"– subcircuit of the original relation schema S, i.e. some of its columns. What does this mean? This means that S' has fewer attributes than S, because in S" Only those for which the projection condition was satisfied remained. And in the table representing the relation r(S"), there are as many rows as there are in the table r(S), and there are fewer columns, since only those corresponding to the remaining attributes remain. So the projection operator ?< S">
in relation to r(S) results in a new relation with a different relation schema r(S"), consisting of projections t(S) [S"] tuples of the original relation. How are these tuple projections determined? Projection any tuple t(S) original relation r(S) per subcircuit S" is determined by the following formula:3. Unary renaming operation
4. Properties of unary operations