Shannon's first theorem. Shannon's fundamental theorem on coding for an interference-free channel. Coding of speech information

5. The concept of conditional probability

6. General formula for the probability of events occurring

7. General formula for the probability of the sum of events

Lecture 3. The concept of entropy

1. Entropy as a measure of uncertainty

2. Properties of entropy

3. Conditional entropy

Lecture 4. Entropy and information

1. Volumetric approach to measuring the amount of information

2. Entropy approach to measuring the amount of information

Lecture 5. Information and alphabet

Lecture 6. Statement of the coding problem. Shannon's first theorem.

Lecture 7. Methods for constructing binary codes. Alphabetical non-uniform binary coding with signals of equal duration. Prefix codes.

1. Statement of the problem of optimization of non-uniform coding

2. Uneven code with delimiter

3. Codes without separator. Fano condition

4. Shannon–Fano prefix code

5. Huffman prefix code

Lecture 8. Methods for constructing binary codes. Other options

1. Uniform alphabetic binary coding. Byte code

2. International byte coding systems for text data. Universal text data encoding system

3. Alphabetic coding with unequal duration of elementary signals. Morse code

4. Block binary coding

5. Graphic data encoding

6. Encoding of audio information

Lecture 9. Number systems. Representation of numbers in different number systems. Part 1

1. Number systems

2. Decimal number system

3. Binary number system

4. 8- and hexadecimal number systems

5. Mixed number systems

6. The concept of economy of a number system

Lecture 10. Number systems. Representation of numbers in different number systems. Part 2.

1. The task of converting a number from one number system to another

2. Converting q  p integers

3. Converting p  q integers

4. Converting p  q fractions

6. Converting numbers between 2-digit, 8-digit and hexadecimal number systems

Lecture 11. Coding numbers in a computer and operations on them

1. Normalized numbers

2. Converting a number from its natural form to its normalized form

3. Converting normalized numbers

4. Encoding and processing unsigned integers

5. Encoding and processing signed integers

6. Coding and processing of real numbers

Lecture 12. Transmitting information over communication lines

1. General scheme for transmitting information in a communication line

2. Communication channel characteristics

3. Effect of noise on channel capacity

Lecture 13. Ensuring the reliability of information transfer.

1. Statement of the problem of ensuring transmission reliability

2. Codes that detect a single error

3. Codes that correct a single error

Lecture 14. Methods of transmitting information in computer communication lines

1. Parallel data transfer

2. Serial data transmission

3. Communication of computers via telephone lines

Lecture 15. Data classification. Representation of data in computer memory

1. Data classification

2. Representation of elementary data in RAM

Lecture 16. Classification of data structures

1. Classification and examples of data structures

2. The concept of logical notation

Lecture 17. Organization of data structures in RAM and on external media

1. Organization of data structures in RAM

2. Hierarchy of data structures on external media

3. Features of storage devices

Control questions

Bibliography

Lecture 6. Statement of the coding problem. Shannon's first theorem.

The theory of information coding is one of the branches of theoretical computer science. To the main tasks solved in this section computer science, it is necessary to include the following:

development of principles the most economical information coding;

parameter coordination transmitted information with the characteristics of the communication channel;

development of techniques to ensure reliability transfer of information via communication channels, that is, no loss of information.

As noted when considering the initial concepts of computer science, a certain alphabet is used to represent discrete messages. However, there is often no one-to-one correspondence between the information contained in a message and its (message) alphabet.

In a number of practical applications, there is a need to translate a message from one alphabet to another, and such a conversion should not lead to loss of information.

Let us introduce a number of definitions.

We will assume that the source of information represents information in the form of a discrete message, using an alphabet that we will agree to call primary alphabet . Next, this message enters a device that converts and presents the message in another alphabet - we will call this alphabet secondary alphabet .

Code - this is a rule that describes the correspondence of characters or combinations of characters of the primary alphabet to characters or combinations of characters of the secondary alphabet.

However, there is another definition:

Code is a set of characters or combinations of characters of the secondary alphabet, used to represent characters or combinations of characters of the primary alphabet.

Coding is the translation of the information represented by a message in the primary alphabet into a sequence of codes.

Decoding – this is the reverse operation of encoding, that is, the restoration of information in the primary alphabet according to the received sequence of codes.

Encoder is a device that performs the encoding operation.

Decoder is a device that performs decoding.

The encoding and decoding operations are called reversible , if their consistent use ensures a return to the original information without any loss.

Example reversible coding is the representation of characters in a telegraph code and their restoration (by the recipient) after transmission.

Example irreversible coding can serve as a translation from one natural language to another. In this case, the reverse translation, generally speaking, does not restore source text. Here it is important to understand that the original information is restored with losses both in terms of semantics and in the sense of distortion in the form of presentation (the sequence of characters of the primary alphabet obtained after reverse translation is almost always different compared to the source text).

Of course, for practical problems related to the symbolic representation of information, the ability to restore information from its code is a necessary condition for the applicability of the code, so in the future we will limit ourselves to only considering reversible coding.

Encoding of information precedes the transmission and storage of information. In this case, the storage of information is associated with fixing a certain state of the information carrier, and the transmission of information is associated with the process of changing the state of the information carrier (environment), that is, with signals. We will call these states or signals elementary signals .The set of elementary signals constitutes the secondary alphabet.

Now we will not discuss the technical aspects of message transmission and storage, that is, we will not discuss how the transmission and reception of a sequence of signals or recording the states of the information carrier (information recording) are technically implemented.

Let's try to do mathematical formulation of the problem of information coding.

Let the primary alphabet consist of

, and the secondary alphabet comprises
signs with average information per sign
. Let also the original message, represented in the primary alphabet, contain characters, and the encoded message (represented in a secondary alphabet) contains
signs. So the original message contains
information, and the encoded message contains
information.

The reversible encoding operation can increase the amount of information in a message, but cannot reduce it.:

This inequality can be rewritten as
or

Attitude
characterizes the average number of characters of the secondary alphabet that must be used to encode one character of the primary alphabet.

Size we will call it the length of the code or the length of the code chain:

From the previous inequality
follows that
, That's why:

. (7.1)

Typically, in practice, the encoding procedure is implemented in such a way that the condition is satisfied

That's why
, that is, one character of the primary alphabet is represented by several characters of the secondary alphabet.

Methods for constructing codes with fixed alphabets And there are many. Therefore, the problem arises of choosing (or constructing) the best version of the code - we will call it optimal code .

The profitability of using an optimal code when transmitting and storing information is an economic factor, since a more efficient code can allow less energy and time to be spent on message transmission. If the optimal code is used when transmitting information, the communication line can be occupied for less time. The use of an optimal code when storing information can allow the use of less surface area (volume) of the information carrier.

At the same time, you should be aware that the profitability of the code not always is identical to the time efficiency of the entire “encoding-transmission-decoding” process. It is possible that you will have to pay for using an efficient code when transmitting information by the fact that encoding and decoding operations will take more time and/or other resources (memory of a technical device, if these operations are performed with its help).

As follows from condition (7.1), minimum possible average code length equals:

. (7.2)

This expression should be taken as a ratio evaluative character, which sets a lower limit on the length of the code. However, from (7.2) it is unclear to what extent real circuits coding it is possible to approximate the value
to value
.

For this reason, the theorems proved by Shannon are important for coding theory and communication theory. Shannon's first theorem concerns the encoding situation in the absence of interference distorting the message. Shannon's second theorem applies to real communication lines with noise and will be discussed later.

Shannon's first theorem , or fundamental theorem about coding in the absence of interference , is formulated as follows:

In the absence of interference, a coding option is always possible in which the average number of code characters per character of the primary alphabet will be arbitrarily close to the ratio of the average information per character of the primary and secondary alphabets.

This theorem states the fundamental possibility of optimal coding, that is, constructing a code with an average length having the value
.

However, it is necessary to realize that from this theorem itself it in no way follows how to implement such coding in practice. To develop a practical implementation of optimal encoding, additional considerations must be involved, which we will discuss next.

From (7.2) it is clear that there are two ways to reduce the value
:

K. Shannon examined the following particular situation in detail. In this situation, when encoding a message, the different probability of the appearance of characters in the primary alphabet is taken into account, that is, the amount of information per character of the primary alphabet is calculated as a first approximation:
. However, correlations of the signs of the primary alphabet are not taken into account. Sources of similar messages are called sources without memory (on the correlation of the signs of the primary - one's - alphabet). If at the same time an equal probability of the appearance of signs of the secondary alphabet is ensured, then, as follows from (7.2), for the minimum average code length
the following relation turns out to be true:

. (7.3)

As a measure of excess
above
you can enter relative code redundancy
:

Taking (7.2) into account, the last formula can be rewritten in the form

and
.

Magnitude
shows how much the encoding operation increased the length of the original message. It's obvious that
at .

Thus, the solution to the coding optimization problem is to find coding schemes that would ensure that the average code length approaches the value
, and the relative redundancy
code - to zero. The lower the value
, the less redundant information appears, that is, information associated with encoding; the code becomes more profitable and the encoding operation becomes more efficient.

Using the concept of code redundancy, one can construct a different formulation of Shannon's first theorem :

In the absence of interference, it is always possible to encode a message in such a way that the code redundancy will be arbitrarily close to zero.

The most important situation for practice is when in the secondary alphabet
, that is, when only two types of signals are used to represent codes on a communication line. This encoding is called binary coding . Technically, this is the most easily implemented option. For example, the existence of voltage in the wire (pulse) or its absence (pause); the presence or absence of a hole on the punched card; the presence or absence of a magnetized area on a floppy disk, and so on

The characters of the binary alphabet are usually denoted “0” and “1”; these characters should be perceived as letters. The convenience of binary codes also lies in the fact that with equal durations and probabilities, each elementary signal (0 or 1) carries exactly 1 bit of information ().

In the case of a binary secondary alphabet (we denote
) from formula (7.3) we obtain:

and the first Shannon's theorem receives another (third) formulation :

In the absence of interference, the average length binary code can be arbitrarily close to the average information per sign of the primary alphabet.

Thus, when encoding source messages without memory with binary signs of equal probability using formula (7.4), we find:

. (7.5)

When decoding binary messages, the problem arises of isolating signals (sequences of pulses and pauses) from the stream. code words (groups of elementary signals) corresponding to individual characters of the primary alphabet. In this case, the receiving device records intensity And duration signals, and can also correlate the sequence of incoming signals with reference sequences (with code table ).

Note following features binary secondary alphabet used in encoding:

Combinations of the listed features of the secondary alphabet determine the basis of a particular method of encoding messages. However, even with the same basis, different options for constructing codes are possible, which will differ in their effectiveness.

The main significance of Shannon's results in this area is that they provide a universal criterion allowing technical comparisons various devices and systems in terms of their information transfer capabilities. Technically, message sources and communication channels can be significantly different devices on the signals used, message encoding methods, data formats, speed characteristics. Under these conditions, Shannon's information measure and ideal coding theorems make it possible to assess the extent to which technically different systems correspond to each other to solve the problem of message transmission. This requires, based on the technical indicators of the source and channel, to evaluate their information indicators: information productivity and information throughput. The ratio of information indicators is the ideal measure by which one can judge the degree of compliance of real systems.

Shannon's special merit is that he was the first to realize the real picture of the influence of random interference on the process of message transmission. The fundamental effect of interference is expressed to the extent to which they affect the information performance of the system. Therefore, channels with the same capacity are equivalent to the possibility of error-free transmission of messages, regardless of whether they have interference or not.

To clearly explain the role of Shannon’s theorem, we will resort to the following comparison. Let there be a pipeline for delivery of some liquid product from a source. Technical capabilities pipelines are determined by the amount of liquid that can be transmitted through it per unit of time. We define the productivity of the source by the amount of pure product coming from it per unit time, and the throughput of the pipeline - as the maximum possible transfer rate of the pure product, corresponding to the condition that a pure product without impurities comes from the source. An analogue of a channel with interference can be a pipeline with a leak. Its throughput will be less. Than in a pipeline without leakage, by the amount of product leakage per unit of time. One can now imagine what effect would be caused by the statement that there is such a way of introducing an impurity (“redundancy”) into a product in which, by introducing an amount of impurity equal to the leakage in the pipeline, it is possible to deliver the product through it without loss at a speed corresponding to the throughput of the pipeline with a leak. This is precisely the meaning of Shannon’s theorem in relation to the problem of information transmission. Continuing the analogy of this example, we can say that this method of introducing an impurity requires the presence of a kind of “settler” in which the impurity will settle for a certain time before being fed into the pipeline (ideally, an infinite time). After such a “settlement”, when the liquid moves through the pipeline, only the impurity will leak out.

Interpretation of Shannon's results for information storage and retrieval problems. The results of Shannon's theorems, traditionally formulated for the problem of message transmission, are easily extended to problems of storing and retrieving information.

Let's consider the data storage problem in the following generalized form. Let data in the form of a sequence of records be placed in the cells of a storage device (memory); Each entry is placed in a separate cell. Records intended for storage are characterized by a certain collection technical features: sizes, data encoding methods, code formats, etc. The storage cells in which the records are stored are also characterized by a certain set of their technical features: internal representation of data, access method, label system, etc. technical limitations on the process of data placement. In addition, information placed in memory cells may be subject to interference, which causes errors to appear in the records.

The question arises under what conditions reliable storage of information is possible, i.e. retrieving data from the memory in the form in which they were placed there.

To answer this question in accordance with the Shannon approach, it is necessary to move from technical characteristics to informational:

For stored data, determine the average entropy of the record;

For a storage device, determine the maximum amount of information that can be placed in a cell, taking into account its technical limitations and the interference present in it, that is, determine the information capacity of the cell.

Then for information users the following formulation will be valid Shannon's theorem for the problem of information storage: for a storage device (with and without interference) there is a way to encode and decode stored data with any reliability, if only the average entropy of the record is less than the information capacity of the cell.

If we consider the application of ideal coding to the problem of storing information, it becomes clear that this will require a memory with potentially infinite number cells to accommodate typical sequences of records of any length. This demonstrates the technical impracticability of ideal coding in relation to the problem of information storage.

You can get closer to the ideal result by enlarging the stored records. In practice, so-called record blocking is widely used in computer data storage devices (for example, magnetic tape and disk drives). In this case, groups of records are combined into blocks, which are placed in the memory as some single enlarged records. This achieves a more economical use of the information capacity of the cells.

Practical implementation noise-resistant information storage is based on noise-resistant coding methods. Before placing records in the memory, their redundancy is artificially increased by introducing additional control characters. After reading records from the memory, errors are detected and corrected.

Let us now consider the search problem in the following generalized form. Let there be a file whose records are identified by keys. Multiple record queries form a sequence of search arguments. Knowledge of search keys and arguments may be distorted due to random interference during file generation and query preparation.

The question arises under what conditions a reliable search for information is possible, i.e. receiving for each request exactly those records that are required.

In accordance with the Shannon approach, let us move on to information characteristics:

For sequences of search arguments, we determine the average entropy of the arguments. If the arguments are affected by errors, then it is necessary to take into account the increase in average entropy due to errors;

For a set of file records, we determine the information capacity of the key - the maximum amount of information that can be contained in the file key. If the file keys may contain random errors, then it is necessary to take into account the reduction in the information capacity of the key due to errors.

Then the following formulation will be valid for information indicators Shannon's theorem for the information search task:

To search a file (with or without interference) there is a method for an arbitrarily reliable search the necessary records, if only the average entropy of the argument is less than the information capacity of the key.

The application of the ideal encoding algorithm in this problem will require potentially infinite file enlargement in order to search for typical sequences of source arguments as arguments. This demonstrates the technical impracticability of ideal coding in relation to the problem of information retrieval.

As mentioned in the second chapter, the development of technically feasible methods for noise-immune search is currently in its infancy. The results available here are significantly more modest than, for example, in noise-immune coding, where an extensive and deep mathematical coding theory has been created. What's the matter here? Why not use the results of coding theory to solve a related retrieval problem.

The main idea of error-correcting coding is artificial introduction redundancy in messages by feeding them into a channel with noise. In most search tasks, we deal with natural message redundancy that we cannot actively influence. We receive an already distorted search argument, for example, the client’s last name mumbled over the phone, and we can only rely on the natural redundancy of the language (the natural redundancy of the Russian language, like most European languages, is approximately 60%).

However, taking into account the fundamental solvability of this problem in the light of Shannon’s results, as well as recent publications on the problems of noise-immune search, one can hope that this problem will be solved technically.

Data compression.

Encoded messages are transmitted over communication channels, stored in storage devices, and processed by the processor. The volumes of data circulating in the automated control system are large, and therefore, in many cases, it is important to ensure data coding that is characterized by a minimum length of the resulting messages. This is a data compression problem. Solving it ensures an increase in the speed of information transfer and a decrease in the required memory of storage devices. Ultimately, this leads to increased efficiency of the data processing system.

There are two approaches (or two stages) of data compression:

Compression based on the analysis of the specific structure and semantic content of the data;

Analysis-based compression statistical properties encoded messages. Unlike the first, the second approach is universal and can be used in all situations where there is reason to believe that messages obey probabilistic laws. We'll look at both of these approaches next.

4.1. Compression based on the semantic content of the data

These methods are heuristic and unique in nature, but the main idea can be explained as follows. Let the set contain elements. Then, to encode the elements of the set with a uniform code, binary characters are required. In this case, all binary code combinations will be used. If not all combinations are used, the code will be redundant. Thus, to reduce redundancy, an attempt should be made to delineate the set of possible values of data elements and encode accordingly. IN real conditions this is not always easy, some types of data are very more power set of possible values. Let's see what they do in specific cases.

Transition from natural notations to more compact ones. The values of many specific data are encoded in a human-readable form. However, they usually contain more characters than necessary. For example, the date is written as "January 26, 1982" or in the shortest form: “01/26/82”. however, many code combinations, such as "33.18.53" or "95.00.11", are never used. To compress such data, the day can be encoded with five digits, the month with four, the year with seven, i.e. the entire date will take no more than two bytes. Another way of recording dates, proposed back in the Middle Ages, is to record the total number of days that have passed to date since some reference point. In this case, they are often limited to the last four digits of this representation. For example, May 24, 1967 is written as 0000, and counting days from that date obviously requires two bytes in packed decimal format.

INFORMATION CODING.

ABSTRACT ALPHABET

Information is transmitted in the form of messages. Discrete information is written using a certain finite set of signs, which we will call letters, without putting the usual limited meaning into this word (such as “Russian letters” or “ letters"). A letter in this expanded sense is any of the signs that are established by some agreement for communication. For example, in the usual transmission of messages in Russian, such signs will be Russian letters - uppercase and lowercase, punctuation marks, space; if the text contains numbers, then there are numbers. In general, we will call a letter an element of some finite set (set) of signs that are different from each other. We will call the set of characters in which their order is determined an alphabet (the order of characters in the Russian alphabet is well known: A, B,..., Z).

Let's look at some examples of alphabets.

1, Alphabet of capital Russian letters:

A B C D E F G H I J K L M N O P R S T U V

2. Morse alphabet:

3. Alphabet keyboard symbols IBM PC (Russified keyboard):

4. Alphabet of regular six-sided dice signs:

5. Arabic numeral alphabet:

6. Alphabet of hexadecimal digits:

0123456789ABCDEF

This example, in particular, shows that signs of one alphabet can be formed from signs of other alphabets.

7. Alphabet of binary digits:

Alphabet 7 is one example of so-called “binary” alphabets, i.e. alphabets consisting of two characters. Other examples are the binary alphabets 8 and 9:

8. Binary alphabet “dot, “dash”:. _

9. Binary alphabet “plus”, “minus”: + -

10. Alphabet of capital Latin letters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

11. Alphabet of the Roman numeral system:

I V X L C D M

12. Alphabet of the language of flowcharts depicting algorithms:

13. Alphabet of the Pascal programming language (see Chapter 3).
^

ENCODING AND DECODING

In a communication channel, a message composed of symbols (letters) of one alphabet can be transformed into a message made of symbols (letters) of another alphabet. The rule that describes the one-to-one correspondence of letters of alphabets during such a transformation is called a code. The message conversion procedure itself is called recoding. Such a message transformation can be carried out at the moment the message arrives from the source into the communication channel (encoding) and at the moment the message is received by the recipient (decoding). Devices that provide encoding and decoding will be called encoder and decoder, respectively. In Fig. 1.5 shows a diagram illustrating the process of message transmission in the event of recoding, as well as the influence of interference (see the next paragraph).

Rice. 1.5. The process of transmitting a message from a source to a receiver

Let's look at some code examples.

1. Morse code in the Russian version (an alphabet compiled from the Russian alphabet capital letters and the alphabet of Arabic numerals is put into correspondence with the Morse alphabet):

2. Trisime code (the characters of the Latin alphabet are assigned to combinations of three characters: 1,2,3):

A	111	D	121	G	131	J211	M221	P231	S311	V321	Y331
IN	112	E	122	H	132	K212	N222	Q232	T312	W322	Z332
WITH	113	F	123	I	133	L213	O223	R233	U313	X323	.333

The Trisime code is an example of a so-called uniform code (one in which all code combinations contain the same number of characters - in in this case three). An example of non-uniform code is Morse code.

3. Encoding numbers with signs various systems for notation see §3.

THE CONCEPT OF SHANNON'S THEOREMS

It was previously noted that when transmitting messages over communication channels, interference may occur that can lead to distortion of received characters. So, for example, if you try to transmit a voice message in windy weather to a person located at a considerable distance from you, it may be greatly distorted by interference such as wind. In general, message transmission in the presence of interference is a serious theoretical and practical task. Its importance increases due to widespread implementation computer telecommunications, where interference is inevitable. When working with encoded information distorted by interference, the following main problems can be identified: establishing the very fact that information distortion has occurred; finding out exactly where in the transmitted text this happened; correcting the error, at least with some degree of certainty.

Interference in the transmission of information is quite common in all areas professional activity and in everyday life. One of the examples was given above, other examples are talking on the phone, the receiver of which is “crackling”, driving a car in the fog, etc. Most often, a person completely copes with each of the above tasks, although he is not always aware of how he does it (that is, not algorithmically, but based on some associative connections). It is known that natural language has a large redundancy(in European languages - up to 7%), which explains the greater noise immunity of messages composed of characters from the alphabets of such languages. An example illustrating the resistance of the Russian language to interference is the sentence “in slovakh vso glosnoo zomonono bokvoy o.” Here 26% of the characters are “affected”, but this does not lead to a loss of meaning. Thus, redundancy is a useful property in this case.

Redundancy could also be used when transmitting encoded messages in technical systems. For example, each fragment of text (“sentence”) is transmitted three times, and the pair of fragments that completely coincide is considered correct. However, high redundancy leads to large amounts of time spent transmitting information and requires a large amount of memory for storing it. The first theoretical study of effective coding was undertaken by K. Shannon.

^ First theorem Shannon declares the possibility of creating a system for efficient coding of discrete messages, in which the average number of binary symbols per message symbol asymptotically tends to the entropy of the message source (in the absence of interference).

The problem of efficient coding is described by the triad:

X = (X 4i) - encoder - IN.

Here X, B - respectively, the input and output alphabet. Under the multitude x i You can understand any signs (letters, words, sentences). IN - a set, the number of elements of which, in the case of encoding characters with numbers, is determined by the base of the number system (for example, T= 2). The encoder matches each message x i from X code combination made up of n i characters set IN. The limitation of this task is the absence of interference. It is required to estimate the minimum average length of the code combination.

To solve this problem, the probability must be known P i message appears x i, which corresponds to a certain number of characters n i alphabet IN. Then the mathematical expectation of the number of characters from IN will be determined as follows:

n cf = p i P i(average value).

This average number of alphabet characters IN corresponds to maximum entropy Ntax = n avg log T. To ensure the transmission of information contained in messages X code combinations from IN, the condition H4max ≥ must be satisfied H(x), or p sr log T≥- P i log R i. In this case, the encoded message has redundancy p sr≥H(x)/ log t, n min = H(x) / log T.

Redundancy factor

TO u = ( H max – H(x)) / H max = ( n cp – n min) / n cp

Let's write these values in the form of a table. 1.8. We have:

N min = H(x)/log 2 = 2,85, K u = (2,92 - 2,85) / 2,92 = 0,024,

Those. the code has virtually no redundancy. It can be seen that the average number of binary symbols tends to the entropy of the message source.

^ Table 1.8 Example to Shannon's first theorem

N	Рх i	x i	Code	n i	n i - ∙ P i	Рх i∙log Рх i
1	0,19	X 1	10	2	0,38	-4,5522
2	0,16	X 2	001	3	0,48	-4,2301
3	0.16	X 3	011	3	0,48	-4,2301
4	0,15	X 4	101	3	0,45	-4,1054
5	0,12	X 5	111	3	0,36	-3,6706
6	0,11	X 6	111	3	0,33	- 3,5028
7	0,09	X 7	1011	4	0,36	-3,1265
8	0,02	X 8	1001	4	0,08	-3,1288
	Σ=1	Σ=2.92	Σ=2.85

^ Shannon's second theorem states that in the presence of interference in the channel, it is always possible to find a coding system in which messages will be transmitted with a given reliability. If there is a restriction throughput channel must exceed the capacity of the message source. Thus, Shannon's second theorem establishes the principles of error-correcting coding. For a discrete channel with noise, the theorem states that if the message creation rate is less than or equal to the channel capacity, then there is a code that ensures transmission with an arbitrary error rate.

The proof of the theorem is based on the following reasoning. Initially the sequence X = (xi) encoded with characters from IN so that maximum throughput is achieved (the channel has no interference). Then in a sequence of IN length P introduced r characters and is transmitted over the channel new sequence from n + r characters. Number of possible sequences of length and + T greater than the number of possible length sequences P. The set of all sequences of length P + r can be broken down into P subsets, each of which is associated with one of the sequences of length P. If there is interference on a sequence of P + r removes it from the corresponding subset with arbitrarily small probability.

This makes it possible to determine on the receiving side of the channel which subset the received sequence of length, distorted by interference, belongs to n + r, and thereby restore the original sequence of length P.

This theorem does not provide a specific method for constructing a code, but it indicates the limits of what can be achieved in creating error-resistant codes and stimulates the search for new ways to solve this problem.

The theory of information coding is one of the branches of theoretical computer science. The main tasks addressed in this section include the following:

development of principles the most economical information coding;

coordination of parameters of transmitted information with the characteristics of the communication channel;

development of techniques to ensure reliability transmitting information via communication channels, i.e. no loss of information.

The last two tasks are related to information transfer processes - they will be discussed in Chapter. 5. The first task - encoding information - concerns not only transmission, but also processing and storage of information, i.e. covers a wide range of problems; Their private solution will be to represent information on a computer. Let's begin mastering coding theory by discussing these issues.

3.1. Statement of the coding problem. Shannon's first theorem

As noted when considering the initial concepts of computer science, a certain alphabet is used to represent discrete messages. However, there is no clear correspondence between the information contained in the message and its alphabet.

In a number of practical applications, there is a need to translate a move message from one alphabet to another, and such a transformation should not lead to loss of information.

Let us introduce a series of definitions.

We will assume that the source presents information in the form of a discrete message, using for this purpose the alphabet, which we will further agree to call primary. Next, this message enters a device that converts and represents it in another alphabet - we will call this alphabet secondary.

Code - (1) a rule describing the correspondence of signs or their combinations of the primary alphabet with signs or their combinations of the secondary alphabet.

(2) a set of characters of the secondary alphabet, used to represent characters or their combinations of the primary alphabet.

Coding - translation of information represented by a message in the primary alphabet into a sequence of codes.

Decoding - operation inverse to coding, i.e. restoration of information in the primary alphabet according to the received sequence of codes.

Encoder - a device that performs the encoding operation.

Decoder - device that performs decoding.

The encoding and decoding operations are calledreversible, if their consistent application ensures a return to the original information without any loss.

An example of reversible coding is the representation of characters in a telegraph code and their recovery after transmission. An example of irreversible coding is translation from one natural language to another - reverse translation, generally speaking, does not restore the original text. Of course, for practical problems related to the symbolic representation of information, the ability to restore information using its code is a necessary condition application of the code, therefore, in further presentation we will limit ourselves to considering only reversible coding.

Encoding precedes the transmission and storage of information. In this case, as indicated earlier, storage is associated with fixing a certain state of the information carrier, and transmission is associated with a change in state over time (i.e., a process). We will call these states or signals elementary signals - it is their totality that constitutes the secondary alphabet.

Without discussing the technical aspects of message transmission and storage (i.e., how the transmission and reception of signal sequences or state fixation are actually implemented), we will try to give a mathematical formulation of the encoding problem.

Let the primary alphabet A consists of N characters with average information per character I(A), and the secondary alphabet IN- from M signs with average information per sign I(IN). Let also the original message, represented in the primary alphabet, contain P characters, and the encoded message is T signs. If the original message contains Ist(A) information, and the encoded one is I fin (B), then the condition of encoding reversibility, i.e. non-disappearance information during encoding can obviously be written as follows:

the meaning of which is that A reversible encoding operation can increase the amount of information in a message, but cannot reduce it. However, each of the quantities in this inequality can be replaced by the product of the number of signs and the average information content of the sign, i.e.:

Attitude t/n, obviously characterizes the average number of characters of the secondary alphabet that must be used to encode one character of the primary alphabet - we'll call him code length or code chain length and denote K(A,B). Hence

Usually N>M and I(A) > I(B), whence K(A,B) > 1, i.e. one character of the primary alphabet is represented by several characters of the secondary alphabet. Since the methods for constructing codes with fixed alphabets A and B there is a set, the problem of choosing (or constructing) the best option arises - we will call it optimal code. The profitability of the code when transmitting and storing information is an economic factor, since a more efficient code allows you to spend less energy and time on message transmission and, accordingly, take up less of the communication line; During storage, less surface area (volume) of the media is used. At the same time, one should be aware that the profitability of the code is not identical to the time profitability of the entire encoding-transmission-decoding chain; a situation is possible when using an efficient code during transmission will have to pay for the fact that encoding and decoding operations will take more time and other resources (for example, space in the memory of a technical device, if these operations are performed with its help).

As follows from β.1), the minimum possible value of the average code length will be:

This expression should be perceived as an evaluative relation that sets a lower limit on the code length; however, it is unclear from it to what extent approximation is possible in real coding schemes K(A,B) To Kmin(A,B). For this reason, two theorems proved by Shannon are of utmost importance for coding theory and communication theory. The first, which we will now consider, concerns the encoding situation in the absence of interference distorting the message. The second theorem applies to real communication lines with noise and will be discussed in Chap. 5.

Shannon's first theorem, which is called the main theorem about coding in the absence of interference, is formulated as follows:

In the absence of interference, it is always possible to encode a message in such a way that the average number of code characters per character of the primary alphabet will be arbitrarily close to the ratio of the average information per character of the primary and secondary alphabets.

The above statement is a theorem and, therefore, must be proven. We will omit it, referring those interested specifically in the evidentiary side to the book by A. M. Yaglom and I. M. Yaglom. It is important for us that the theorem opens up the fundamental possibility of optimal coding, i.e. building code with average length TOmin(A,B). However, it is necessary to realize that from the theorem itself it in no way follows how to implement such coding in practice - for this some additional considerations must be involved, which will be the subject of our subsequent discussion.

From (3.2) it is clear that there are two ways to reduce K min (A,B):

reducing the numerator is possible if, when encoding, we take into account the difference in the frequencies of appearance of different characters in the message, two-letter, three-letter correlations, etc. (in paragraph 2.3. it was shown that I 0 >I 1 >I 2 >…>I ∞);

increasing the denominator - for this it is necessary to use a coding method in which the appearance of characters of the secondary alphabet would be equally probable, i.e. I (B) =log 2 M.

IN particular situation, considered in detail by K. Shannon, when encoding a message in the primary alphabet, the different probabilities of the appearance of signs are taken into account (what we called the “first approximation” in paragraph 2.3), however, their correlations are not tracked - the sources of such messages are called sources without memory. If at the same time an equal probability of the appearance of characters of the secondary alphabet is ensured, then, as follows from β.2), for the minimum average code length the following relation turns out to be valid:

As a measure of excess K(A,B) over K min (A, B) one can enter relative code redundancy (Q(A,B):

This value shows how much the encoding operation increased the length of the original message. Obviously, Q(A,B)→ 0 at K(A,B)→TOmin(A,B). Consequently, the solution to the code optimization problem is to find coding schemes that would ensure that the average code length approaches the value K min (AB), equal to the ratio of average information per sign of the primary and secondary alphabets. It is easy to show that the less Q(A,B), the closer I fin (B) is to Ist(A)), i.e. There is less information associated with encoding, the code is more profitable and the encoding operation is more efficient.

Using the concept of code redundancy, we can construct a different formulation of Shannon’s theorem:

In the absence of interference, it is always possible to encode a message in such a way that the code redundancy will be arbitrarily close to zero.

The most important situation for practice is when M = 2, those. To represent codes in a communication line, only two types of signals are used - technically this is the most easily implemented option (for example, the existence of voltage in the wire (we will call this impulse) or lack thereof ( pause); the presence or absence of a hole on a punched card or a magnetized area on a floppy disk); this type of coding is called binary. The characters of the binary alphabet are usually denoted as "O" and "1", but you need to think of them as letters, not numbers. The convenience of binary codes is that with equal durations and probabilities, each elementary signal (0 or 1) carries 1 bit of information (log 2 M = 1); then from β.3)

and Shannon's first theorem receives the following interpretation:

In the absence of interference, the average length of a binary code can be arbitrarily close to the average information per character of the primary alphabet.

Application of formula (3.4) for binary source messages without memory when encoded with equal probability signs gives:

When decoding binary messages, the problem arises of isolating from the stream of signals (sequences of pulses and pauses) code words (groups of elementary signals) corresponding to individual characters of the primary alphabet. In this case, the receiving device records intensity And duration signals, and can also correlate a certain sequence of signals with a reference one ( code table).

The following features of the secondary alphabet used in encoding are possible:

elementary signals (0 and 1) can have the same duration (τ 0 =τ 1) or different (τ 0 ≠τ 1);

the code length may be the same for all characters of the primary alphabet (in this case the code is called uniform) or the codes of different characters of the primary alphabet may have different lengths ( uneven code);

codes can be constructed for a separate character of the primary alphabet ( alphabetical coding) or for combinations thereof ( coding blocks, words).

Combinations of the listed features determine the basis of a particular coding method; however, even with the same basis, various options for constructing codes that differ in their effectiveness are possible. Our next task will be to look at different encoding schemes for some basics.

1.2. Shannon's first theorem

When working with encoded information distorted by interference, the following main problems can be identified: establishing the very fact that information distortion has occurred; finding out exactly where in the transmitted text this happened; error correction - at least with some degree of certainty.

Interference in the transmission of information is by no means a property only of technical systems. This is quite a common thing in everyday life. The example was above; other examples are talking on the phone with a crackling sound, driving a car in the fog, etc. Most often, a person copes quite well with each of the above tasks, although he is not always aware of how he does it (that is, non-algorithmically, but based on some associative connections). It is known that natural language has great redundancy (in European languages - up to 70%), which explains the high noise immunity of messages composed of characters from the alphabets of such languages. An example illustrating the resistance of the Russian language to interference is the sentence “in slovakh vso glosnoo zomonono bokvoy o.” Here 26% of the characters are “affected”, but this does not lead to a loss of meaning. Thus, redundancy is a useful property in this case.

For example, each fragment of text (“sentence”) is transmitted three times, and the pair of fragments that completely coincide is considered correct. However, high redundancy leads to large amounts of time spent transmitting information and requires a large amount of memory for storing it. This leads to the task of eliminating redundancy, or efficient coding. K. Shannon was the first to undertake a theoretical study of this kind of problem.

Shannon's first theorem on information transmission, also called the fundamental theorem on coding in the absence of interference, is formulated as follows:

In the absence of transmission interference, it is always possible to encode a message in such a way that the average number of code characters per character of the encoded alphabet will be arbitrarily close to the ratio of the average information per character of the primary and secondary alphabets.

Using the concept of code redundancy, we can give a shorter formulation of the theorem:

In the absence of transmission interference, it is always possible to encode a message in such a way that the code redundancy will be arbitrarily close to zero.

These statements are theorems and, therefore, must be proven, but we will omit the proofs. It is important for us that the theorem opens up the fundamental possibility of optimal coding. However, it is necessary to realize that from the theorem itself it in no way follows how to implement such coding in practice - for this some additional considerations must be involved, which will be the subject of subsequent discussion.

Thus, optimal coding is in principle possible.

The most important situation for practice is when M = 2, that is, information is encoded with only two signals 0 and 1.

Shannon considered the situation when, when encoding a message in the primary alphabet, different probabilities of the appearance of characters are taken into account, as well as an equal probability of the appearance of characters of the secondary alphabet. Then:

K min (A, B)= I (A) / log 2 M= I (A) ,

here I (A) is the average information per sign of the primary alphabet.

Let us limit ourselves to the situation when M = 2, i.e. To represent codes in a communication line, only two types of signals are used - the most easily implemented option. This type of encoding is called binary. The signs of the binary alphabet are usually denoted “0” and “1. The convenience of binary codes lies in the fact that each elementary signal (0 or 1) carries 1 bit of information (log 2 M = 1); then from (1), Shannon’s theorem :

I 1 (A) ≤ K (2)

and Shannon's first theorem receives the following interpretation:

In the absence of transmission interference, the average length of a binary code can be arbitrarily close to the average information per character of the primary alphabet.

Determining the amount of transmitted information during binary coding comes down to simply counting the number of pulses (ones) and pauses (zeros). In this case, the problem arises of isolating individual codes from the flow of signals (sequences of pulses and pauses). The receiving device records the intensity and duration of the signals. The elementary signals (0 and 1) can have the same or different durations. Their number in the code (the length of the code chain), which is assigned to the sign of the primary alphabet, can also be the same (in this case the code is called uniform) or different (uneven code). Finally, codes can be constructed for each character of the source alphabet (alphabetic coding) or for their combinations (coding of blocks, words). As a result, when encoding (alphabetical and verbal), the following combinations are possible:

Table 1. Combination options

Chip durations	Encoding of primary characters (words)	Situation
the same	uniform
the same	uneven
	uniform
	uneven

In the case of using uneven coding or signals of different durations (situations (2), (3) and (4)), to separate the code of one character from another between them, it is necessary to transmit a special signal - a time separator (end of character sign) or use such codes that turn out to be unique, i.e. inconsistent with parts of other codes. With uniform coding with signals of equal duration (situation (1)), the transmission of a special separator is not required, since the separation of one code from another is carried out according to the total duration, which turns out to be the same for all codes (or the same number bit when stored).

The duration of a binary elementary pulse shows how long it takes to transmit 1 bit of information. Obviously, it takes time to transmit information, on average per sign of the primary alphabet. Thus, the problem of coding optimization can be formulated in other terms: build a coding system such that the total duration of codes during transmission (or the total number of codes during storage) of a given message is minimal.

If there is a source of information with entropy H(x) and a communication channel with capacity C, then if C > H(X), then it is always possible to encode enough long message in such a way that it will be transmitted without delay. If, on the contrary, C

Shannon's first theorem declares the possibility of creating a system for efficient coding of discrete messages, in which the average number of binary symbols per message symbol asymptotically tends to the entropy of the message source (in the absence of interference).

Shannon's first theorem (reformulation).

In the absence of interference, the average length of a binary code can be arbitrarily close to the average information per character of the primary alphabet.

What may be the features of the secondary alphabet when encoding:

Elementary codes 0 and 1 can have the same duration (t0=t1) or different (≠).

The length of the code can be the same for all characters of the primary alphabet (uniform code) or different (uneven code)

Codes can be constructed for a single character of the primary alphabet (alphabetic coding) or for their combinations (coding of blocks, words).

1.3 Shannon's second theorem

The ratio of the communication channel capacity to the speed of undistorted transmission of alphabet symbols of the transmitted message must be greater than or equal to the entropy of transmission of one symbol.

Shannon's second theorem states that in the presence of interference in the channel, it is always possible to find a coding system in which messages will be transmitted with a given reliability. If there is a constraint, the channel's capacity must exceed the capacity of the message source. Shannon's second theorem establishes the principles of error-correcting coding. For a discrete channel with noise, the theorem states that if the message creation rate is less than or equal to the channel capacity, then there is a code that ensures transmission with an arbitrarily low error rate.

The proof of the theorem is based on the following reasoning. Initially, the sequence X=(x i ) is encoded with symbols from B so that maximum throughput is achieved (the channel has no interference). Then r symbols are introduced into a sequence of B of length n and a new sequence of n + r symbols is transmitted over the channel. The number of possible sequences of length n + r is greater than the number of possible sequences of length n. The set of all sequences of length n + r can be divided into n subsets, each of which is associated with one of the sequences of length n. If there is interference on a sequence of n + r, it removes it from the corresponding subset with an arbitrarily small probability. Information coding (4) Abstract >> Computer Science

Contents I. History coding information……………………………..3 II. Coding information……………………………………………………4 III. Coding text information…………………………….4 IV. ... and humanity as a whole. But it's up to you to decide task coding humanity began information long before...

Coding and encryption of information

Abstract >> Computer Science

Friends. That's when it arose task protection of this correspondence from excessive... tried to use to solve this tasks a variety of methods and... software on universal system coding. Coding graphical data If you consider with...

Coding numbers, symbols and graphic information, data units

Test >> Computer Science

... No. 2……………………………………..8 PRACTICAL TASK No. 3………………………………………………………..9 REFERENCES……… ………………………………………..eleven CODING NUMBERS, SYMBOLS AND GRAPHIC INFORMATION, UNITS... function calculation: Y = Algorithm for solving this tasks will look like: Based on the received...

Coding speech information

Abstract >> Computer Science

... task, which the frequency selection method fundamentally cannot cope with. Description of the method coding... Introduction 2 System coding speech 4 Rationale for choosing a method coding 4 Description of the method coding 5 Pseudo-random generator...

Let there be a source of information X and a receiver Y, connected by channel K. The productivity of the information source H 1 (X) is known, i.e. average quantity binary units information coming from a source per unit of time (numerically it is equal to the average entropy of the message produced by the source per unit of time). The capacity of channel C 1 is known, i.e. the maximum amount of information that a channel is capable of transmitting in the same unit of time. The question arises: what should the channel capacity be for it to cope with its task, i.e. so that information from the source to the receiver arrives without delay?

Even though the answer is already obvious, it is given in Shannon's first theorem.

Shannon's 1st theorem:

If the capacity of the communication channel C 1 is greater than the entropy of the information source per unit time

C 1 > H 1 (X),

then it is always possible to encode a sufficiently long message so that it is transmitted by a communication channel without delay.

If, on the contrary,

C 1< Н 1 (Х),

then the transfer of information without delay is impossible.

Transmission of information with distortions.

Channel capacity with interference.

Both Shannon's 1st theorem and both encoding methods we have considered refer to the transmission of information without errors. In reality, this process is inevitably accompanied by errors (distortions). A transmission channel in which distortion is possible is called an interference (or noise) channel.

It is quite obvious that the presence of interference leads to loss of information. In order to receive the required amount of information at the receiver in the presence of interference, it is necessary to take special measures. One of these measures is the introduction of so-called “redundancy”. It is known that all living languages have some redundancy, reaching up to 50% (i.e., we could not pronounce 50% of the letters at all or mix them up :-)).

Here is an example: According to rzelulattam ilsseovadniy odongo anligysokgo unviertiseta, not ieemt zanchneya, in kokam pryakde rsapozholeny bkuvy v solva. Galvone, so that you pre-avya and psloendya bkvuy blyi on mseta. Osatlyne bkvuy mgout seldovt in ploonm bsepordyak, everything is torn tkest chtaitsey without wandering.

I hope you understand what is written here))))

However, for error-free transmission, the natural redundancy of language may be either excessive or insufficient: it all depends on how great the danger of distortion is. Using information theory methods, it is possible to find the required degree of redundancy of the information source for each level of interference.

Consider, for example, the following problem. Communication channel K transmits from the information source X to the receiver Y elementary symbols 0 and 1 in the amount of k symbols per unit of time. During the transmission process, each symbol, regardless of the others, with probability μ can be distorted (i.e. replaced by the opposite one). We need to find the channel capacity.

Let's first define maximum information per character that the channel can transmit. Let the source produce symbols 0 and 1 with probabilities p and 1-p.

Then the entropy of the source will be

H(X)=-p log p - (1-p) log (1-p).

If the transmission of messages were not accompanied by errors, then the amount of information per symbol would be equal to the entropy of the X system itself. If there are errors, it will be less:

I = H(Y) - H(Y/X).

To find the total conditional entropy Н(У/Х), we first find the partial conditional entropies: Н(У/х 1) and Н(У/х 2).

Let's calculate H(Y/x 1); To do this, assume that the symbol 0 is transmitted. Let us find the conditional probabilities that in this case the system Y is in the state y 1 = 0 and in the state y 2 = 1. The first of them is equal to the probability that the signal is not confused:

P(y 1 /x 1)=1-µ;

the second is the probability that the signal is mixed up:

P(y 2 /x 1)=µ.

Conditional entropy H(U/x 1) will be:

H(Y/x 1)=-Σ P(y i /x 1) log P(y i /x 1) = -(1-µ) log (1-µ) - µ log µ.

Let us now find the conditional entropy of the system Y, provided that signal 1 is transmitted:

P(y 1 /x 2)=µ; P(y 2 /x 2)=1-µ,

Н(У/х 2)= - µ log µ - (1-µ) log (1-µ).

Thus

N(U/x 1) = N(U/x 2).

The total conditional entropy H(Y/X) is obtained if we average the conditional entropies taking into account the probabilities p and 1-p. Since the partial conditional entropies are equal, then

H(U/X)= - µ log µ - (1-µ) log (1-µ).

Conclusion: the conditional entropy Н(У/Х) does not depend at all on the probabilities with which the symbols 0 and 1 occur transmitted message, but depends only on the error probability µ.

Now calculate the information conveyed by one character:

I = H(Y) - H(Y/X) = - r log r - (1-r) log (1-r) - - µ log µ - (1-µ) log (1-µ) =

= [η(r) + η(1-r)] - [η(µ) + η(1-µ)],

Where r is the probability that the symbol 0 will appear in the output.

We know that entropy and the amount of information are maximum for equally probable signals, i.e. (in our case) at p=1/2. Therefore, the maximum amount of information transmitted by one symbol

I max = 1 - [η(µ) + η(1-µ)],

And the communication channel capacity will be equal to

С= k(1 - [η(µ) + η(1-µ)]).

Note that η(µ) + η(1-µ) is the entropy of a system having 2 possible states with probabilities µ and 1-µ. It characterizes the loss of information per symbol associated with the presence of interference.

Example 1. Determine the capacity of a communication channel capable of transmitting 100 symbols 0 or 1 per unit of time, with each symbol being distorted (replaced by the opposite) with probability µ=0.001.

η(µ) = η(0.001) = 0.0664

η(1-µ) = 0.0144

η(µ) + η(1-µ) = 0.0808

0.0808 bits of information are lost per character. The channel capacity is

C = 100(1-0808) = 91.92 ≈ 92 bits per unit of time.

Knowing the channel capacity, it is possible to determine the upper limit of the speed of information transmission over a noisy channel. Shannon's second theorem applies to this case.

Shannon's 2nd theorem:

Let there be a source of information X, the entropy of which per unit time is equal to H 2 (X), and a channel with capacity C 2 . Then if

H 2 (X) > C 2,

Then, with any encoding, transmission of messages without delays and distortions is impossible. If

N 2 (X)< С 2 ,

Then it is always possible to encode a sufficiently long message so that it is transmitted without delays and distortions with a probability arbitrarily close to one.

Example 2. There is a source of information with entropy per unit time H(X) = 100 bits and 2 communication channels; each of them can transmit 70 binary characters (0 or 1) per unit of time; each binary sign is replaced by the opposite one with probability µ=0.1. It is necessary to find out: is the capacity of these channels sufficient to transmit the information supplied by the source?

Solution: Determine the loss of information per character:

η(µ) + η(1-µ) = 0.332+0.137=0.469 bits

Maximum amount information transmitted over one channel per unit of time:

C = 70(1-0.469) = 37.2 bits.

The maximum amount of information that can be transmitted over two channels per unit of time:

37.2*2 = 74.7 bits,

which is not enough to ensure the transfer of information from the source.

| 2 |