The relational model is concept based. Structure of the relational data model

The main data structure in the relational model (RMD) is attitude, Therefore, the model is called relational.

In RMD data organized in tables, which must satisfy certain restrictions, and can be considered as relationship.

The tabular representation was proposed by an American mathematician E.F. Codd in the early 70s and implemented by IBM in the early 80s. A two-dimensional table is one of the most simple ways presentation of data to the user or programmer.

Tabular representation of the "Employees" relationship is given in Table 1.

Table 1. Relationship "Employees" - table "Employees"

RMD uses formal relational terms that can be replaced by informal ones. The correspondence between formal and informal terms is presented in Table 2.

Table 2. Correspondence between informal and formal RMD terms

The model has a serious theoretical basis - set theory and mathematical logic.

Basic concepts of the relational model:

- data type, value;

- domain;

- attribute;

- motorcade;

- key;

- attitude.

Concept data type is equivalent to the concept of data type in algorithmic languages. Atomic value data is the smallest indivisible unit of data in the RMD.

Domain is a potential set of valid atomic values ​​of the same type. Data are considered comparable only when they refer to the same domain. For example, the group number and student ID number have the same type - integer, but belong to different domains, so comparing them does not make sense.

Attribute is a named relation column, the values ​​of the column elements must be from the same domain.

Let a set of domains D 1 , D 2 ,…,D N (not necessarily different) be given.

Cortege relations are a set of N values ​​arranged in a strictly defined order (d 1, d 2,...,d N), such that d 1 belongs to D 1, d 2 belongs to D 2, and d N belongs to D N. The number of elements in a tuple is called arity.



Complete Cartesian product domains D 1 x D 2 x…x D N is a set of all possible different tuples of arity N, where each element of the tuple is taken from its own domain (example: D1(Nogr), D2(Fam_stud), D3(Type)).

Ratio, defined on these N domains, is a subset of the full Cartesian product of the original domains.

Relationship diagram name a list of relation attribute names indicating domains or attribute types. When the number of attributes is equal to N, they say N-ary relation. Relation schema – table header row.

A tuple corresponding to a given relation schema is a set of (attribute name, value) pairs that contains one occurrence of each attribute name belonging to the relation schema.

The primary key of the relationship is is an attribute or set of attributes that uniquely identifies each tuple in a relation and ensures that the rows are unique. The key can be simple (from one attribute) or complex (composed of several attributes).

The properties of a key are uniqueness and minimality. For each relationship, at least the full set of its attributes has the uniqueness property. The key should be minimal, i.e. no fields are redundant to uniquely identify a row in a table; removing any field from a key breaks the identification.

Many objects have identifiers, then we can talk about a natural key containing information about the object, for example, personal passport data. An artificial or surrogate key is generated by the system and usually does not contain significant information, such as the student's school number, i.e. outside the system it makes no sense.

Limitations of the relational model(table properties):

1. Each table is uniquely named, represents one real object or defines a relationship between objects, and consists of rows of the same type corresponding to the tuples of the relation, and columns corresponding to the attributes of the relation.

2. Each table element must be atomic, i.e. indivisible.

3. All rows have the same structure with a fixed number of fields and values. There cannot be two identical rows in a table (follows from the definition of the relationship). The rows differ from each other in at least one value.

4. Each column is given a unique name. Elements of the same column must be homogeneous, extracted from one domain.

5. In table operations, rows and columns can be viewed in any order. There is no ordering of tuples, but domains are ordered within a relation. But the order in which named columns are accessed also becomes irrelevant.

6. Each table must have primary key. The key allows you to find a single row in the table. Tuples are indexed by key, which speeds up data processing. Keys are used to logically link tables.

A shorter form of the “Employees” table entry or relation diagram“Employees” (the primary key of the relationship is underlined):

Employees( Tab_number, Last name, Department, Date of birth, Position, Salary)

IN general view the relation diagram is written as

R(a 1 , a 2 , ..., a N), where a i is the name of the i-th attribute of the relation R.

RBD is a set of interconnected relationships. RDB circuit refer to the set of named relationship schemas used to represent information and the current values ​​of the relationships as a database. In the physical representation, each relationship corresponds to a database file, an instance of a relationship corresponds to a database file record, and relation attributes correspond to fields of a database file record.

A subschema, being a projection of the database schema, contains only those relationships, and only those attributes of these relationships that are used by the application programmer to solve a specific task (query).

When describing a database subschema, all necessary restrictions can be specified, for example: an access password to a relation or its attributes, a processing mode for a relation or its attributes - “read only” or “read and write” and others.

The database subschema may also contain virtual attributes - these are not relationship attributes stored in the database, but calculated algorithmically for a specific task (query). Also in the database subschema, the methods of data ordering can be changed (that is, other keys are assigned), data types (digital - replaced with symbolic ones and vice versa). At the same time, the DBMS provides automatic execution all these processes.

The peculiarity of RMD is that both objects and the relationships between them are presented in a single form, in the form of tables.

In contrast to the hierarchical and network models communications between relationships are maintained implicitly. The RMD supports binary connections such as “1:1”, “1:M”, “M:1”. In each connection, one relationship acts as the main one (from side 1), and the other as subordinate (from side M). This means that one tuple of a main relation can be associated with several tuples of a subordinate relation.

The relationship is maintained by a connecting set of attributes, in the main relation this is the primary key (primary key), which must be present in the subordinate relation as a secondary key. This set of attributes in a subordinate relationship is called a foreign key (foreign key).

Foreign keys are used to establish logical connections between relationships. The primary key identifies a single tuple in the primary relation, and the secondary key identifies the set of tuples in the subordinate relation that are related to a single tuple in the primary relation.

Let's consider an example of maintaining a “1:M” type connection between the “Group” and “Students” relationships. The main relation is the “Group” relation, the primary key of the relation is the “Group_Number” attribute. The subordinate relation is the “Students” relation, the primary key of the relation is “Credit_book_number”. The “Students” relation must have the “Group_Number” attribute as a foreign key to link to the main relation. It is this attribute that allows you to determine information about the group in which the student is studying.

Scheme of the main relationship “Group”:

Group ( Group No., Speciality)

Scheme of subordinate relationship “Students”:

Students ( No._gradebook, Last name, Date of birth, Group No.)

(In relationship diagrams, primary keys are underlined and secondary keys are in italics).

Examples of RMD DBMS: DBase, FoxPro, Clipper, Paradox, MS Access, Informix, Oracle, etc.

Any data structure can be reduced to two-dimensional tables. Let's consider an example of converting a hierarchical MD, shown in Fig., into a relational database, i.e. into a set of interconnected relationships.

For each node of the tree structure, a separate relation is created, i.e. relationships “Departments”, “Employees”, “Labor_Activities”, “Children”, “Works” are created. Each relation corresponding to a child node is appended with the identifier of the source node. As a result, we obtain a relational database consisting of the following relations:

Departments ( Department_number, Name)

Employees ( Tab_number, Last name, Salary, Department_number)

Labor_activity ( Order_number, Contents_of the order, Tab_number)

Children ( Child_name, Tab_number , Date of Birth)

Works ( Job_code, Name, Completion date, Department_number)

Advantages relational model:

– simplicity and clarity of the model;

– homogeneity of structures for representing objects and connections;

– the ability to manipulate data without the need to know the specific physical organization of the database in external memory;

– the ability to use non-procedural query languages ​​by untrained database users.

The considered data logic models differ in the way they allow the user to represent and process relationships.

In the hierarchical and network communication model, physical data is interconnected by physical pointers. This is an advantage and a disadvantage. The intended physical connections are well realized, but other relationships are difficult to extract. In the relational model, connections are logical.

In the relational approach, connections between objects are represented in the same way as the objects themselves, by tuples in relations. The uniformity of data presentation simplifies the software and language tools of the DBMS.

In mathematical disciplines, the concept of “table” corresponds to the concept of “relation”. The table reflects a real world object – essence, and each line reflects a specific instance of the entity. Each column has a name unique to the table.

The lines have no names, their order is not defined, and their number is logically unlimited. One of the main advantages of RMD is uniformity (each row of the table has the same format). The user himself decides whether the corresponding entities have homogeneity. This solves the problem of model suitability. The main elements of the RMD are shown in Fig. 17.

A relation is a two-dimensional table containing some data.

Essence– an object of any nature, data about which is stored in the database.

Attributes– properties characterizing the entity (columns). The degree of the relationship is the number of columns.

Relationship diagram– a list of attribute names, for example, EMPLOYEE (No., Full Name, Year of Birth, Position, Department).

Domain a set of relation attribute values ​​(data type). More strictly speaking, domain There is potential set of values.

Domain Properties: A domain is a set, although in general its values ​​cannot be simply enumerated. The following properties are thus inherited from the set:

    Limitation: the domain has a boundary, data is divided into possible and impossible.

    As with a set, this does not mean that the number of elements is finite. Uniqueness:

you can compare some elements with others and avoid duplicates. For one single domain this goes without saying. The concept of a domain helps correctly

simulate attributes must be domain-specific, or "defined on a domain". Multiple attributes can be set on one domain.

Atomicity of values: attribute values ​​must be simple, atomic, not composite.

Naturalness of domains. D Omens must carry a semantic load. It is more useful to treat a domain as a certain group of description parameters subject area, to some semantic concept.

As a rule, many subject areas are already quite formalized and have ready-made concepts and reference books.

Limit unnecessary comparisons between attributes – main purpose of domains.

Cortege– table row.

Cardinality (power)– number of rows in the table.

Full Name

Year of birth

Job title

Department code

Ivanov I.I.

Petrov P.P.

Professor

Sidorov S.S.

Vasiliev V.V.

Vovkin V.V.

teacher

Stepanov S.S.

Professor

Figure 17. Elements of the relational model

The concept of a database is closely related to such concepts of structural elements as field, record, file (table),

    field , - an elementary unit of logical organization of data, which corresponds to an indivisible unit of information - a detail. The following is used to describe the field: characteristics :

    Name , For example. Last name, first name, patronymic, date of birth;

    type , for example, symbolic, numeric, calendar;

    length , for example, 15 bytes, and will be determined by the maximum possible number of characters

    accuracy , for numeric data, such as two decimal places to show the fractional part of a number.

Record- a set of logically related fields. A record instance is a separate implementation of a record containing specific values ​​for its fields.

File(table) - a collection of instances of records of the same structure.

A table in a relational data model can be considered as a class of objects of the same type .

Thus, for objects of the same class the set of properties will be the same, although the values ​​of these properties for each object, of course, may be different.

Data types allowed in the relational data model.

Main data types used in data models:

Short Integer – short integer;

Long Integer – long integer;

Float – real number (floating point number);

Double – real number (floating point number) double precision;

Text – text data type;

Logical - logical (yes/no);

Data - temporary. The value is defined as a delimited date in the specified format;

Blob – large binary objects (binary large object - BLOB), which can store data of unlimited size. Fields of this type allow you to store dimensionless arbitrary binary information.

Key element (primary key) of data is an element from which the values ​​of other data elements can be determined. An object can be uniquely identified by two or more data values.

Primary key This is an attribute that uniquely identifies the rows of a relationship. A primary key made up of multiple attributes is called a composite key. The primary key cannot be completely or partially empty (have the value null).

Practical meaning The primary key is obvious: a domain object is uniquely described by a set of table attributes. The primary key captures the most important thing about an object, its unique essence. The remaining fields can be called “just attributes”.

Keys that can be used as primary keys are called potential or alternativekeys.

External key - serves to link tables. These are values ​​from one table that can be uniquely linked to another. More precisely, for a relationship, a foreign key is a set of predefined attributes

External key - is an attribute(s) of one table that can serve as the primary key of another table. It is a link to the primary key of another table (Fig. 18).

Fig 18. Relationship connection

Relationship STUDENT (Name, Group, Specialty) And ITEM (Name Pr, Clock) connected by relationship STUDENT_SUBJECT (Name, Name of Pr, Grade), in which foreign keys Full name And Name_Pr form a composite key.

databases if they are based on this model. Having a data model allows you to compare specific implementations using one common language.

Although the concept of a data model is general, and one can talk about hierarchical, network, semantic and other data models, it should be noted that in the field of databases this concept was introduced by Edgar Codd in relation to relational systems and is most effectively used in this context. Attempts to directly apply similar models to pre-relational organizations show that relational model too “big”, and for post-relational organizations it turns out to be “small”.

general characteristics

Although the concept relational data model the founder was the first to introduce relational approach Edgar Codd, the most common interpretation relational data model, apparently, belongs to the famous popularizer of Codd’s ideas, Christopher Date, who reproduces it (with various clarifications) in almost all of his books (see, for example, K. Date. Introduction to Database Systems. 6th ed., M. ; St. Petersburg: Williams. – 2000). According to Data's interpretation, relational model consists of three parts describing different aspects relational approach: structural part, manipulation part and integral part.

In the structural part of the model it is fixed that the only generic structure is 7 This is the second time in this lecture that the normalized n -ary relation is the only generic data structure used in relational databases. It's time to clarify what we mean by the term generic structure. Programming languages ​​with extensive type systems usually have constructs called generic types, parameterizable types, type constructors, type generators etc., allowing the generation of a specific data type based on its abstract (usually predefined) specification. The peculiarity of such types is that the main operations of a particular type are defined at the level of this abstract specification. One of the most famous examples is set type, for example, in the Pascal language. When relational data model We don't say explicitly that a relation is a generic type, but essentially that is what it is. Operations of relational algebra are defined at the abstract relation level and apply to any relation-values ​​with concrete headers. The data used in relational databases is the normalized n-ary relation. The concepts of domains, attributes, tuples, header, body and relation variable. In fact, in the two previous sections of this lecture we considered precisely the concepts and properties of the structural component relational model.

In the manipulation part of the model, two fundamental mechanisms for manipulating relational databases are defined - relational algebra and relational calculus. The first mechanism is based mainly on classical set theory (with some refinements and additions), and the second is based on the classical logical apparatus of first-order predicate calculus. We will consider these mechanisms in more detail in the following lectures, but for now we will only note that the main function of the manipulation part relational model is to provide a measure of the relationality of any specific relational database language: a language is called relational if it has no less expressiveness and power than relational algebra or relational calculus.

Entity and reference integrity

Finally, in the integral part relational data model two basic integrity requirements are fixed that must be supported in any relational DBMS. The first requirement is called entity integrity requirement. An object or entity of the real world in relational databases corresponds to tuples of relations. Specifically, the requirement is that any tuple of any value-relationship of any relation variable must be distinguishable from any other tuple of this ratio values by composite values ​​of a predetermined set of attributes relation variable, i.e., in other words, any relation variable must have primary key. As we saw in the previous section, this requirement is automatically satisfied if the system does not violate the basic properties of relations.

In fact, the requirement integrity of the entity completely sounds like this: any relation variable there must be a primary key and no value primary key in tuples the values ​​are relations relation variable must not contain undefined values. For this formulation to be fully understandable, we must at least briefly discuss the concept undefined(NULL).

Of course, theoretically, any tuple entered into a persisted relation should contain all the characteristics of the real-world entity it models that we want to store in the database. However, in practice, not all of these characteristics may be known by the time the entity needs to be recorded in the database. A simple example there may be a procedure for hiring a person whose salary has not yet been determined. In this case, an employee of the HR department who enters a tuple describing a new employee into the EMPLOYEES relation simply cannot provide the value of the SLU_ZARP attribute (any value in the SIZE_PAYMENT domain will incorrectly characterize the salary of the new employee).

Edgar Codd suggested using in such cases undefined values. Undefined value does not belong to any data type and can be present among the values ​​of any attribute defined on any data type (unless explicitly prohibited when the attribute is defined). If a is a value of some data type or NULL , op is any two-place "arithmetic" operation of that data type (for example, + ), and lop is an operation of comparing values ​​of this type (for example, = ), then by definition:

a op NULL = NULL NULL op a = NULL a lop NULL = unknown NULL lop a = unknown

Here unknown is the third boolean value, or Boolean, type, having the following properties:

NOT unknown = unknown true AND unknown = unknown true OR unknown = true false AND unknown = false false OR unknown = unknown

(remember that the AND and OR operations are commutative) 8 As the author's experience shows, not all students remember basic logical operations. To guarantee, we present truth tables for the operations AND (& – conjunction), OR ( – disjunction) and NOT ( – negation):

AND true false OR true false NOT true false
true true false true true true false true
false false false false true false

. In this lecture, the following brief introduction to undefined values, but in the following lectures we will return to this topic several times.

So, the first of the requirements is the requirement integrity of the entity- means that the primary key must fully identify each entity, and therefore as part of any value primary key presence is not allowed undefined values. (In classical relational model this requirement also applies to possible keys; as will be shown in the following lectures, in SQL-oriented DBMSs such a requirement for possible keys not supported.)

The second requirement, which is called referential integrity requirement, is more complex. Obviously, if the relations are normalized, complex entities of the real world are represented in a relational database in the form of several tuples of several relations. For example, imagine that you want to represent in relational database DEPARTMENT entity with the attributes DEPARTMENT_NUMBER (department number), DEPARTMENT_SIZE (number of employees) and DEPARTMENT_SLU (set of department employees). For each employee, you need to store SLU_NUMBER (employee number), SLU_NAME (employee name) and SLU_SARP (employee salary). As we will see in Chapter 7, if the appropriate database is properly designed, two relationships will appear in it: DEPARTMENTS (DEPARTMENT_NUMBER, DEPARTMENT_SIZE)(primary key – (DOT_NUMBER)) and EMPLOYEES (SERV_NUMBER, SLU_NAME, SLU_ZARP, SLU_DEPARTMENT_NOM)(primary key – (SLN_NUMBER) ).

As you can see, the SLU_DEPARTMENT_NOM attribute is introduced into the EMPLOYEES relation not because the department number is the employee’s own property, but only in order to be able, if necessary, to restore the full DEPARTMENT entity. The value of the SLU_DEPARTMENT_NOM attribute in any tuple of the EMPLOYEES relation must correspond to the value of the DEPARTMENT_NOM attribute in some tuple of the DEPARTMENTS relation. An attribute of this kind (possibly composite) is called foreign key, since its values ​​uniquely characterize entities represented by tuples of some other relation (i.e., they specify the values ​​of their primary key). Of course, a foreign key can be composite, that is, it can consist of several attributes. A relation in which a foreign key is defined is said to refer to a corresponding relation in which the same attribute is primary key.

Requirement referential integrity, or foreign key integrity requirement, is that for every foreign key value appearing in the referencing relation's value-tuple relation variable, or in meaning-relation relation variable, to which the link points, there must be a tuple with the same value primary key, or the foreign key value must be completely undefined (i.e. point to nothing) The SQL language allows several options for defining a foreign key, of which only one fully corresponds to the classical approach. We will discuss this in more detail in future lectures.. For our example, this means that if a department number is specified for an employee, then that department must exist.

Note that, like the primary key,

The vast majority of modern information systems are based on data presented in the form of a relational model. The main concepts of the relational data model are:

Subject area- this is a part of the real world (a class or a set of classes of real objects), considered from a certain point of view, subject to model reflection for the purpose of its automation. The subject area is infinite and contains both concepts and data that are essential for the development of an information system, as well as insignificant or non-significant data. In general, a domain model describes information processes,occurring in the domain and the data used by these processes. The subject area is represented by many structural units (for example, an enterprise - workshops, administration, accounting, etc.). Each structural unit of a domain is characterized by a multitude of objects and processes that use objects, as well as a multitude of users, characterized by different views on the domain. The result of application development and the success of the information system depend on how correctly the subject area is modeled.

For example, as a subject area you can choose the accounting department of an enterprise, human resources department, bank, store, etc. So, if you choose accounting of goods in a warehouse as the subject area, then the concepts “invoice” and “invoice” are essentially important concepts, and the fact that the employee accepting invoices has two children is not important for accounting for goods. However, from the HR department's point of view, data on the presence of children is essential. Thus, the importance of data depends on the choice of domain.

Data Model – it is a set of data structures and their processing operations.

Relational model– a data representation model for the subject area, built on relational relationships. According to K. J. Date, the relational data model describes three aspects: structural, holistic manipulation:

· Structural - the data in the model is a set of relationships.

· Holistic - relationships meet certain conditions of integrity. (declarative integrity constraints at the domain (data type) level, relation level and database level).

· Manipulation (processing) - the model supports relation manipulation operators (relational algebra, relational calculus).

Attitude– a set of subject area objects that are described by common (general) characteristics and properties. A relationship is a fundamental concept in the relational data model. A relation is an abstract concept; a table can serve as a visual representation of a relation in relational theory on paper or a screen.

Attribute– an information display (characteristic, property) of a subject area object, used to describe it, taking a specific value from a set of valid values. Each attribute has a name that is used to refer to the data. Attribute names are unique within a relationship. That. a relation represents a set of attributes. At a practical level, an attribute is a table field.

Relationship diagram – a complete list of relation attribute names.

Tuple – is a relation element containing an unambiguous representation of a real world object, in accordance with highlighted attributes. At a practical level, a tuple is a record in a table.

Relationship key– an attribute or set of relation attributes that uniquely identifies every motorcade. If a relation key satisfies the conditions of uniqueness and minimality, then such a key is called primary. A relation attribute used to store the primary key values ​​of another relation for the purpose of creating a relationship between those relations is called external .

In a relational model of a subject area, as a rule, the data necessary for the operation of an information system is presented in the form set of interconnected relationships . Foreign keys are used to establish logical connections between relationships. So that the data in information system were unambiguous and consistent, restrictive conditions must be established in the relational model - integrity constraints , which allow minimizing errors during system operation. The most important integrity constraints are: categorical and referential integrity.

When establishing logical connections between relationships, we use four types of connections :

· One to one– established between primary keys of relationships. In this case, each tuple of one relation will correspond to only one tuple of another relation.

· One to many- established between the primary key of one relationship and the foreign key of another relationship. In this case, one tuple of one relation will correspond to several tuples of another relation.

· Many to one- established between the foreign key of one relation and the primary key of another relation. In this case, several tuples of one relation will correspond to only one tuple of another relation.

· Many to many - established between foreign key relationships. Moreover, any tuples of one relation can correspond to several tuples of another relation.

Lecture 12

Relational data model.

Normalization. Normal forms.

Technology mapping a conceptual database model onto a relational data model

1. Basic concepts of the relational data model

As was shown in the previous lecture, to define a relational data model, it is necessary to declare the structure of the data, how to manipulate it and integrity constraints (slide 2).

1.1. Structural component of the relational model

From the point of view of data structure, the relational model is a convenient and most common form of presenting data in the form of a table. The concept of “table” corresponds to the concept of “relation”. This is where the name of the model comes from – relational. That is, in relation to databases, the concepts “relational database” and “tabular database” are essentially synonymous. Unlike the hierarchical and network model, this way of representing

1) understandable to a non-programmer user;

2) allows you to easily change the schema - attach new data elements and records without changing the corresponding subschemas;

3) provides the necessary flexibility when handling unexpected requests.

In addition, any network or hierarchical diagram can be represented by two-dimensional relationships.

One of the main advantages of the relational model is its homogeneity. All data is considered to be stored in tables in which each row has the same format. Each row in the table represents some real-world object or relationship between objects. The user of the model must decide for himself whether the corresponding real-world entities are homogeneous. This solves the problem of the suitability of the model for the intended application.

The main concepts by which the relational model is defined are the following: domain, relation, tuple, cardinality, attributes, degree, primary key. The relationship between the concepts is illustrated on the slide (slide 3).

Domainis a set of values ​​from which the values ​​of the corresponding attributes of a certain relationship are taken. In programming terms, a domain is a system-defined (standard) or user-defined data type.

Primary keyis a column or some subset of columns that is unique, i.e. uniquely defines the rows. A primary key that includes more than one column is called a multiple key, or a primary key, or a primary key, or a primary key. The Object Integrity Rule states that a primary key cannot be completely or partially empty, i.e. be null.

The remaining keys, which can also be used as primary keys, are called potential or alternative keys.

Let us formulate the rules for assigning primary keys to entities:

1).The primary key must uniquely identify any instance of an entity.

2).If possible, the primary key should be the most compact fromof all potential keys, the best data type for the primary key is integer.

3).The primary key can be composite, but increasing the numbercolumns included in it contradicts the requirement of compactness. The compactness requirement also cannot be met if, for example, in aselect the string attribute as the primary keylong data type.

4). Primary key values should not be subject to frequent modifications. Ideally, the business logic of the domain is such that these values ​​are not supposed to change at all.

5). Primary key modification rules must be controlled internal business logic subject area, and not solutions thatare taken over her. For example, in a database developed forneeds of the dean's office, for the STUDENT entityYou should not choose the series and passport number as the primary keystudent. Although these data, in principle, have the properties of an obligationvigor and uniqueness, but their change can be initiated by studentsom, and not by the faculty administration.

5). If among the information collected about the entity, it is not possible to identify data that satisfies the above requirements, then we recommendit is recommended to consider the possibility of creating surrogate primary key,which, without carrying any semantic load, simply serves identifier of a specific entity instance. Typically, all sorts of options are chosen as a surrogate primary key.nal codes or identifiers. The surrogate key is most often hidden on external level of relational database modeling.

External keyis a column or subset of one table that can serve as a primary key for another table. External key table is a reference to the primary key of another table. The referential integrity rule states that a foreign key can either be empty or match the value of the primary key it refers to. Foreign keys are an integral part of the relational model because they implement relationships between database tables.

A foreign key, like a primary key, can also be a combination of columns. In practice, a foreign key will always be a composite (consisting of multiple columns) if it references a composite primary key in another table. Obviously, the number of columns and their data types in the primary and foreign keys are the same.

If a table is related to several other tables, it can have multiple foreign keys.

The concepts of the relational model represent special terminology introduced by the authors theoretical foundations, however, they also have more familiar analogues (but not equivalent in all respects!), the correspondence of which is given in the following table ( slide 4 ).

12.1.2. Control component of the relational model

The set of permissible operations on data presented as a set of relations is specified by relational algebra. In addition to relational data manipulation operations, the control component must include a data definition; definition of views; integrity conditions; identification of access rights; transaction boundaries (start, completion and cancellation).

12.1.3. Data integrity ( slide 5)

Domain-level integrity

In relational theory, it is generally accepted that all values ​​of the attributes of a relation are atomic. This follows from the interpretation of the concept of domain. A domain can be thought of as a subset of values ​​of some data type that have a specific meaning. The relational model requires that the data types used be simple (scalar), that is, without internal structure.

A domain has a unique name within the database, defined on a simple data type or on another domain. Actually, for the relational data model, the type of data used is not important. The requirement that the data type be simple, it must be understood that Relational operations should not take into account the internal data structure.

The main purpose of domains is that they limit comparisons. It is logically incorrect to compare values ​​from different domains, even if they are of the same type. So the concept of domain helps correctly For one single domain this goes without saying. subject area.

Integrity at the Relationship Level

Potential keys serve as the only means of addressing at the tuple level in a relationship. Only knowing the value of a tuple's candidate key allows one to accurately specify that tuple.

From a semantic data modeling perspective, potential keys serve means of identification domain objects – instances of entities, data about which is stored in the relation. Because these instances must be distinguishable by definition, their identifiers cannot contain unknown values.

Typically, for the situation of unknown or incomplete data, data types supplemented with the so-calledNULL- care.

NULL -value is some kind of indicator that the value is unknown. Usage problem NULL -values ​​in the theory of relational databases have not been completely resolved. Almost all implementations of modern relational DBMSs allow the use NULL -values, despite their insufficient theoretical justification.

Relationship Integrity Rule reads: every relation must have at least one candidate key whose constituent attributes cannot accept null - values. This candidate key is best declared to be the primary key of the table corresponding to the relationship.

It should be noted that most DBMSs allow you to create tables without primary keys. However, violation of the rule of integrity of relationships in practice immediately makes itself felt. For example, for a DBMS MS It will become impossible for the SQL server to access data using technology OLE DB Provider.

Foreign key integrity (database level integrity)

Various domain objects, information about which is stored in the database, are always interconnected. The most typical way to communicate this type of relationship between relationships is described by a foreign key constraint ( FK, Foreign Key).

A foreign key is usually does not have the property of uniqueness. This is as it should be, since a child relation can have several tuples referencing the same tuple in the parent relation, which, in fact, gives the type of relationship between the relations “one-to-many” . This is a standard type of link that preserves referential integrity. If the foreign key still has the property of uniqueness, then the relationship between the relations is of the “one-to-one” type. .

Although every foreign key value must match the candidate key values ​​in some tuple of the parent relation, the reverse is generally not true. There may be values ​​in a parent table's relationship field that are not referenced by any of the foreign key values.

NULL -values ​​for foreign key attributes are valid only when the foreign key attributes are not part of any candidate key.

Since foreign keys actually serve as references to tuples in a different (or the same) relationship, these references should not point to non-existent objects.

The considerations formulated above determine foreign key integrity rule or referential integrity relational database: foreign keys must not be inconsistent, i.e. for each foreign key value there must be a corresponding value in the relationship field in the parent relation.

12.1.4. Codd's Rules

In general, the concept of the relational model is defined by the following twelve Codd rules ( slide 6 ):

1.Information Rule. All information in the database must be provided exclusively at the logical level and in only one way - in the form of values ​​​​contained in tables.

2.Guaranteed access rule. Logical access to each and every data element (atomic value) in a relational database must be provided by using a combination of table name, primary key, and column name.

3.Invalid value support rule. A relational database must support invalid values ​​that are different from a zero-length character string, a whitespace string, zero, or any other number and are used to represent missing data regardless of the type of that data.

4.Dynamic directory rule based on relational model. The logical-level description of the database must be presented in the same form as the master data so that users with appropriate rights can work with it using the same relational language that they use to work with the master data.

5.Exhaustive Data Sublanguage Rule . A relational system can support different languages ​​and user interaction modes (for example, question and answer mode). However, there must be at least one language whose statements can be represented as character strings according to some well-defined syntax and which fully supports data definition; definition of views; data processing (interactive and programmatic); integrity conditions; identification of access rights; transaction boundaries (start, completion and cancellation).

6. View update rule. All views that can theoretically be updated should be available for updating.

7. Rule for adding, updating and deleting. The ability to work with a relation as a single operand must exist not only when reading data, but also when adding, updating, and deleting data.

8. Physical data independence rule. Application programs and utilities for working with data must remain intact at the logical level regardless of any changes in the way data is stored or accessed.

9. Logical data independence rule. Application programs and data utilities must remain logically intact when any changes are made to the underlying tables that theoretically preserve the data contained in those tables intact.

10. Rule of independence of integrity conditions. It should be possible to define integrity conditions specific to a particular relational database in the relational database sublanguage and store them in a directory rather than in an application program.

11. Independence of propagation rule. A relational DBMS should not depend on the needs of a particular client.

12. Rule of uniqueness. If a relational system has a low-level language (processing one record at a time), then it must not be possible to use it to bypass the integrity rules and conditions expressed in a high-level relational language (processing multiple records at a time).

Rule 2 indicates the role of primary keys when searching for information in a database. The table name allows you to find the desired table, the column name allows you to find the desired column, and the primary key allows you to find the row containing the data item you are looking for.

Rule 3 requires that missing data can be represented using invalid values ​​( NULL) .

Rule 4 states that a relational database must describe itself. In other words, the database must contain a set system tables, describing the structure of the database itself.

Rule 5 requires the DBMS to use a relational database language, e.g. SQL . Such a language must support all the main functions of a DBMS - creating a database, reading and entering data, implementing database security, etc.

Rule 6 concerns ideas, that are virtual tables, allowing different users to see different parts of the database structure. This is one of the most difficult rules to implement in practice.

Rule 7 emphasizes that databases are set-oriented by nature. It requires that add, delete, and update operations can be performed on sets of rows. This rule is intended to prohibit implementations that only support single-string operations.

Rules 8 and 9 mean separating the user and application program from the low-level database implementation. They argue that the specific storage or access implementations used in the DBMS, and even changes in the structure of database tables, should not affect the user's ability to work with the data.

Rule 10 states that the database language must support the restrictive conditions imposed on the data entered and the actions that can be performed on the data.

Rule 11 states that the database language must be capable of working with distributed data located on other computer systems.

Rule 12 prevents the use of database capabilities other than the database language, which could compromise database integrity.

12.2. Normalization.

When working with relationships that contain redundant data, problems that are called update anomalies and are divided into insertion anomalies, deletion anomalies and modification anomalies. Consider, for example, the relation presented on the slide ( slide 7 ).

Insertion anomalies. You cannot add, for example, information about a discipline that no student has yet taken to a relational table. On the other hand, adding a new discipline for a student will require duplication of information about the student, which leads to potential data incompatibility (in case of input errors).

Deletion anomalies. When deleting information from a relational table about students who have passed an exam or test in a certain discipline, information about the discipline itself will be completely deleted.

Modification Anomalies. Cause potential data inconsistency that occurs when entering duplicate data (in the event of an erroneous entry in one or more values), as well as when editing duplicate data.

The above anomalies can be avoided by normalizing the initial ratio.

The process of normalization is the decomposition of a table into two or more to eliminate duplication of data and potential inconsistency. The ultimate goal of normalization is to achieve a database design in which “every fact appears in only one place.”

12.2.1. Functional dependencies

The normalization process is based on the concept of functional dependence. Functional dependence describes the relationship between the attributes of a relation: if in a relation R containing attributes A and B, attribute B functionally depends on attribute A, then each individual value of attribute A is associated with only one value of attribute B (and groups of attributes can act as A and B) . The attribute or group of attributes A is called determinant functional dependence ( slide 8 ).

Thus, in the presence of a functional dependence A →B, tuples (rows) having the same value of attribute A also coincide in the value of attribute B. However, the converse is not true: the same value of attribute B can correspond to different values ​​of attribute A. For example, from the functional relationship Employee → Position it follows that wherever the employee “Eremeev V.K.” is indicated, the position “Professor” will correspond to him, but other employees may also have the position “Professor”.

The functional dependence A →B is full functional dependency if removing any attribute from attribute group A results in the loss of that dependency. The functional dependence A →B is partial a functional dependency if attribute group A contains one or more attributes that, when removed, retain the dependency.

If for attributes A, B and C of some relation there are functional dependencies A →B, B →C, attribute C is said to be related transitive dependence with attribute A through attribute B (in this case, attribute A should not functionally depend on either attribute B or attribute C).

Multivalued dependency. One attribute of a table is said to multivaluedly define another attribute of the same table if, for each value of the first attribute, there is a well-defined set of corresponding values ​​of the second attribute ( slide 9 ).

As an example, consider a fragment of the table “Acceptance of exams (tests)”. The table reflects the relationship between the discipline and reporting form with the teacher’s name. In this table there is a multi-valued relationship “Discipline - Teacher”: the discipline “Mathematical Analysis” is taught by several teachers (Rakov I. I., Rybin K. K., Karpov K. Yu.) and, accordingly, all of them can participate in taking exams (tests). Another multivalued relationship is “Discipline - Reporting Form”: both an exam and a test can be conducted in the same discipline. At the same time, the Reporting Form and the Teacher are not connected by functional dependence, which leads to redundancy (to add the name of another teacher, you will have to enter two new rows into the table).

12.2.2. Normal forms

At each stage of normalization, each of the relations is in one of the so-called normal forms. Normal forms (from the lowest to the highest) are related by the inclusion operation, i.e. the older normal form has the properties of all the previous ones and additionally has its own distinctive features.

Normalization is a formal method for analyzing relationships by identifying the primary key and existing functional dependencies. Sequential removal of partial functional dependencies and transitive dependencies is carried out by decomposing relations and transferring them to the next (higher) normal form.

The relational table is in first normal form (1NF), If ( slide 10 )

Every value of any attribute is atomic;

There are no identical rows in the table;

Each column is uniquely named by an attribute name and contains the current value of that attribute;

Each attribute is associated with a specific domain (data type).

A relational table in 1NF has primary key – an attribute or collection of attributes whose values ​​uniquely characterize each record.

The relational table is in second normal form (2NF), if it satisfies the definition of 1NF and all its attributes that are not included in the primary key are fully functionally dependent on the primary key.

The relational table is in third normal form (3NF)(slide 11 ), if it satisfies the definition of 2NF and none of its non-key attributes have a transitive functional dependency on the primary key (that is, none of the non-key attributes have a functional dependency on any other non-key attribute).

The table is in Boyce-Codd third normal form (BCNF)(strong third normal form) if and only if any functional dependence between its attributes reduces to a complete functional dependence on possible primary key (that is, all determinants of the relationship are potential primary keys).

Usually in practice they are content with converting a relational database to 3NF or BCNF, so we will not consider higher normal forms here.

The following normal forms (4NF and 5NF) take into account not only functional, but also multivalued dependencies between attributes. In order to give definitions of these normal forms, we introduce the concept of complete decomposition of a table ( slide 12 ).

Full table decomposition call such a collection of an arbitrary number of its projections, the connection of which completely coincides with the contents of the table.

The table is in fifth normal form (5NF) if and only if in each of its complete decompositions all projections contain a possible key. A table that does not have any complete decomposition is also in 5NF.

Fourth normal form (4NF) is a special case of 5NF, when the complete decomposition must be a union of exactly two projections. In practice, it is not easy to find a real table that would be in 4NF, but not in 5NF.

12.3. Normalization procedure ( slide 14)

The procedure for converting tables to 3NF is based on the fact that the only functional dependencies in any table should be dependencies of the form AK, Where K is the primary key, and A- some attribute. The purpose of normalization is to remove other functional dependencies.

There are two possible cases:

1. The table has a composite primary key, for example, ( K1,K2), and also includes the attribute A, which functionally depends on part of that key (for example, on K2), but not from the full key. In this case, it is recommended to create another table containing the attributes K2 And A(primary key - K2), and remove the attribute A from the original table ( slide 15 ).

2. The table has a primary (possible) key TO, attribute A1, which is not a possible key, but functionally depends on TO, and another non-key attribute A2, which functionally depends on A1. The solution here is essentially the same as before - another table is formed containing the attributes A1 And A2, with primary key A1, and the attribute A2 is removed from the original table (slide 16).

Thus, by repeating the application of the two rules discussed, for any given table in almost all real practical situations, one can ultimately obtain a set of tables that are in 3NF or BCNF and do not contain any functional dependencies of the form other than ATO.

12.4. Deriving a relational schema from an ER diagram ( slide 17)

1. Each simple entity turns into a table (relationship). The entity name becomes the table name.

2. A many-to-many relationship is considered as an entity relationship and turns into a table (relationship). Thus, the many-to-many relationship is transformed into two many-to-one relationships.

3. Each attribute becomes a possible column with the same name. Columns corresponding to optional attributes may contain null values; columns corresponding to required attributes cannot. If the attribute is multiple, then a separate relation is built for it.

4. The components of the entity's unique identifier are turned into a primary key. If there are several possible unique identifiers, the most used one is chosen. If the unique identifier includes relationships, then a copy of the unique identifier of the entity at the far end of the relationship is added to the number of primary key columns (this process can continue recursively). These columns are named using relationship end names and/or entity names.

5. Many-to-one and one-to-one relationships become foreign keys. Those. A copy of the unique identifier from the "one" end of the relationship is created, and the corresponding columns constitute the foreign key.

6. Indexes are created on the primary key (unique index), as well as foreign keys and those attributes that will be frequently used in queries.

7. If there are subtypes in the conceptual scheme, then two options are possible.

All subtypes are stored in one table, which is created for the outermost supertype, and views are created for the subtypes. At least one column containing the TYPE code is added to the table and becomes part of the primary key.

In the second case, a separate table is created for each subtype and for each first-level subtype (for lower ones - views), the supertype is recreated using the UNION view (common columns - supertype columns - are selected from all subtype tables).

8. If the remaining foreign keys all belong to the same domain, i.e. have a common format, two columns are created: the relationship identifier and the entity identifier. The relationship ID column is used to differentiate relationships. The entity identifier column is used to store the unique identifier values ​​of the entity at the far end of the corresponding relationship.

If the resulting foreign keys do not belong to the same domain, then explicit foreign key columns are created for each relationship covered by the exclusion arc.

Publications on the topic