Control questions.  What is a bit

– Igor (Administrator)

In this article I will consider the question of what a bit is, why it is needed and what place it has in the world of information technology. The material is aimed primarily at beginners and ordinary users who are simply interested in learning what a beat is. So if you are a seasoned computer geek, then this article is not for you, although it’s up to you to decide.

Today, almost the entire information industry, including technology, is based on the term “bit”, which is derived from the phrase “binary digit” or in Russian a binary sign (number). So what is a beat? Purely technically, the word “bit” means a unit of information that can take only two values ​​0 or 1. To put it in simpler terms and give an analogy from life, this is a common choice between “Yes” or “No”. It is not difficult to notice that this approach is very easy to perceive and understand, even for those people who do not get along well with technical sciences (almost every day billions of people interact in one way or another with electronic devices, including alarm clocks, computers, etc.).

Note: On the Internet you can find references to other methods of counting and creating electronics. However, due to their complexity in mass production and the creation of appropriate software, they never took root.

What does the bit allow? Bits make it easy to build more complex structures, such as descriptions of action algorithms, classification systems, data warehouses, and much more. In other words, using bits you can both store data and create program codes for control devices in a single style. For example, if you paid attention, documents and programs are just files on your hard drive. In addition, such binary logic is also used in the production of the equipment itself (creating the same boards).

In technical engineering, 0 is usually taken to mean either no current or a low signal level, and 1 is taken to be a high signal level. In other words, there is current on the contacts - that means 1, no current - that means 0. Simple, easy, understandable and there are no problems with the integration of various devices and blocks.

In general, by “what is a bit” we can mean the entire modern information industry, and not just the two values ​​0 and 1. Therefore, you can find this word in almost any technical article, whether it is dedicated to hardware or program codes.

Note: It is worth knowing that the word “bit” can also mean other definitions. For example, in probability theory, a bit is the binary logarithm of the probability of equally probable events.

If, as a result of receiving a message, complete clarity is achieved on this issue (i.e., uncertainty disappears), they say that comprehensive information has been received. This means that there is no need for additional information on this topic. On the contrary, if after receiving the message the uncertainty remains the same (the reported information was either already known or not relevant), then no information was received (zero information).

Bit – the smallest unit of information presentation. In computer science, a value often used is called a byte and is equal to 8 bits.

Byte – the smallest unit of processing and transmission of information.

The bit allows you to select one option out of two possible ones; byte, respectively, 1 out of 256 (2 8).

Along with bytes, larger units are used to measure the amount of information:

1 KB (one kilobyte) = 2 10 bytes = 1024 bytes;

1 MB (one megabyte) = 2 10 KB = 1024 KB;

1 GB (one gigabyte) = 2 10 MB = 1024 MB.

Recently, due to the increase in the volume of processed information, such derived units as:

1 Terabyte (TB) = 1024 GB = 2 40 bytes,

1 Petabyte (Pb) = 1024 TB = 2 50 bytes.

Example . Arrange in ascending order the following sequence:

1024 MB, 11 Petabyte, 2224 GB, 1 Terabyte.

Solution . First, let's bring the values ​​of measuring the amount of information to a single value convenient for this sequence. In this case it is GB.

1024 MB = 1 GB, which is less than 1 Terabyte = 1024 GB, which in turn is less than 2224 GB and less than 11 Petabytes,

Therefore, the sequence, ordered in ascending order, is:

1024 MB, 1 Terabyte, 2224 GB, 11 Petabyte

II. Encoding information.

A computer can only process information presented in numerical form. All other information (texts, sounds, images, instrument readings, etc.) must be converted into numerical form for processing on a computer.

The transition from one form of information representation to another, more convenient for storage, transmission or processing, is called coding information.

Coding is the operation of transforming signs or groups of signs of one sign system into signs or groups of signs of another sign system.

As a rule, all numbers in a computer are represented using zeros and ones, i.e. the work is carried out in the binary number system, since in this case the devices for processing them are much simpler.

1. Text encoding.

When entered into a computer, each letter is encoded with a certain number, and when output to external devices (screen or print), images of letters are constructed from these numbers for human perception. The correspondence between a set of letters and numbers is called a character encoding.

The alphabetic approach is based on the fact that any message can be encoded using a finite sequence of characters from some alphabet. The set of characters used to write text is called alphabet . The number of characters in the alphabet is called its power .

There is a binary alphabet that contains only 2 characters, and its cardinality is two.

To represent text information in a computer, an alphabet with a capacity of 256 characters is most often used. One character from such an alphabet carries 8 bits of information, because 2 8 = 256.

8 bits make up one byte, therefore, the binary code of each character takes up 1 byte of computer memory. Traditionally, to encode one character, an amount of information equal to 1 byte (8 bits) is used. All characters of such an alphabet are numbered from 0 to 255, and each number corresponds to an 8-bit binary code from 00000000 to 11111111.

For different types of computers and operating systems, different encoding tables are used, differing in the order in which the alphabet characters are placed in the encoding table. The international standard on personal computers is the ASCII encoding table. Messages written using ASCII characters use a 256-character alphabet.

In addition, there are currently a number of code tables for Russian letters. These include the KOI8 encoding table, which uses an alphabet of 256 characters.

The new international standard UNICODE has become widespread, which allocates not one byte for each character, but two, so it can be used to encode not 256 characters, but 2 16 = 65536 different characters.

The information content of a sequence of characters does not depend on the content of the message.

To determine the amount of information in a message using the alphabetical approach, you need to solve the following problems sequentially:

    Determine the amount of information (i) in one symbol using the formula 2i = N, where N is the power of the alphabet,

    Determine the number of characters in the message, taking into account punctuation marks and spaces (m),

    Calculate the amount of information using the formula: V = i * m.

Example . The text message “Ten letters” is encoded; determine its information volume using the ASCII and UNICODE systems.

Solution . The message contains 11 characters. One character from the ASCII alphabet carries 8 bits of information, so the information volume according to the ASCII system will be 11 * 8 bits = 88 bits = 11 bytes.

One character from the UNICODE alphabet carries 16 bits of information or 2 bytes, so the information volume according to the UNICODE system will be 11 * 16 bits = 176 bits = 22 bytes.

For a binary message of the same length, the information volume is 11 bits, because N = 2, i = 1 bit, m = 11, V = 11 bit.

Lecture 2. Properties of information. Amount of information. The concept of an algorithm.

Brief summary

The lecture discussed the concepts of computer science and informatization. It describes how information is transmitted and in what form it exists.

Control questions

1. What does Computer Science study?

2. What is meant by information?

3. What are called information processes?

4. Define what technical means are.

5. Define what software is and what it includes.

6. What does the term Brainware mean?

7. Define Information Objects.

8. Give examples of message delivery.

9. Describe the message passing process.


Lecture 2. Properties of information. Amount of information. The concept of an algorithm.

The lecture discusses the general meaning of the concepts of an algorithm, the amount of information, and what properties the information has. Concepts of informatization of society

Purpose of the lecture: Understand how the amount of information is measured. The lecture discusses the concepts of bits and bytes of information.

What properties does information have?

Information properties:

The information is reliable, if it reflects the true state of affairs. Inaccurate information can lead to misunderstandings or poor decisions.

Reliable information may become unreliable over time, since it tends to become outdated, that is, it ceases to reflect the true state of affairs.

The information is complete if it is enough for understanding and making decisions. Both incomplete and redundant information hinder decision making or may lead to errors.

Accuracy of information is determined by the degree of its proximity to the real state of the object, process, phenomenon, etc.

The value of information depends on how important it is for solving the problem, as well as on how much further it will be used in any type of human activity.

Only timely information received can bring the expected benefit. Both premature presentation of information (when it cannot yet be assimilated) and its delay are equally undesirable.

If valuable and timely information is expressed in an unclear way, it can become useless.

Information becomes understandable if it is expressed in the language spoken by those for whom this information is intended.

Information should be presented in an accessible (according to the level of perception) form. Therefore, the same questions are presented in different ways in school textbooks and scientific publications.

Information on the same issue can be presented briefly (concisely, without unimportant details) or extensively (detailed, verbose). Conciseness of information is necessary in reference books, encyclopedias, textbooks, and all kinds of instructions.

How is the amount of information measured?

Is it possible to objectively measure the amount of information? Scientists still cannot give an exact answer to this question. How, for example, can you measure the information contained in the literary works of Pushkin, Lermontov, Dostoevsky. The most important result of information theory is the following conclusion: Under certain, very broad conditions, it is possible to neglect the qualitative features of information, express its quantity as a number, and also compare the amount of information contained in different groups of data.

Currently, approaches to defining the concept of “amount of information” have become widespread, based on the fact that the information contained in a message can be loosely interpreted in the sense of its novelty or, in other words, reducing the uncertainty of our knowledge about an object. These approaches use mathematical concepts probabilities And logarithm

Let's say you need to guess one number from a set of numbers from one to one hundred. Using Hartley's formula, you can calculate how much information is required for this: I = log 2 100  6.644. Thus, a message about a correctly guessed number contains an amount of information approximately equal to 6.644 units of information.

Let's give others examples of equally probable messages:

1. when tossing a coin: “ it came up heads", « heads fell";

2. on the book page: “ the number of letters is even", « odd number of letters".

Let us now determine are the messages equally probable? "The first woman to leave the building's doors" And "The man will be the first to leave the door of the building". It is impossible to answer this question unequivocally. It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of leaving the door first is the same for a man and a woman, and if this is a military barracks, then for a man this probability is much higher than for a woman.

For problems of this kind, the American scientist Claude Shannon proposed in 1948 another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.

It is easy to see that if the probabilities p 1 , ..., p N are equal, then each of them is equal 1/N, and Shannon's formula turns into Hartley's formula.

In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results are applicable only to a certain range of cases, outlined by the initial assumptions.

As a unit of information, Claude Shannon proposed to take one bit (English. bit - binary digit- binary digit).

Bit in information theory - the amount of information necessary to distinguish between two equally probable messages (such as “heads” - “tails”, “even” - “odd”, etc.).

In computing by bit called the smallest “portion” of computer memory required to store one of the two characters “0” and “1” used for internal machine representation of data and commands.

Bit- unit of measurement is too small. In practice, a larger unit is more often used - byte , equal eight bits. It is precisely eight bits that are required to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).



Even larger derived units of information are also widely used:

· 1 Kilobyte (KB) = 1024 bytes = 210 bytes,

· 1 Megabyte (MB) = 1024 KB = 220 bytes,

· 1 Gigabyte (GB) = 1024 MB = 230 bytes.

Recently, due to the increase in the volume of processed information, such derived units as:

· 1 Terabyte (TB) = 1024 GB = 240 bytes,

· 1 Petabyte (PB) = 1024 TB = 250 bytes.

Per unit of information, one could choose the amount of information needed to distinguish between, for example, ten equally probable messages. It will not be binary ( bit), and decimal ( dit) unit of information.

It is important to distinguish binary multiple prefixes from the corresponding decimal ones:

“one K” – 1 K=210=1024 from “one kilo” – 103=1000,

“one M” – 1 M=220=1048576 from “one mega” – 106=1000000, etc.

This is often abused by manufacturers of computer equipment, in particular, manufacturers of hard magnetic disks, who, when indicating their information capacity, use a smaller unit of measurement so that the resulting value is expressed as a large number (as in the famous cartoon - “But in parrots, I’m longer!” ).

|

Variety is essential when conveying information. You can’t paint white on white; state alone is not enough. If the memory cell is capable of being in only one(initial) state and is not able to change its state under external influence, this means that it is not able to perceive and remember information. The information capacity of such a cell is 0.

Minimal diversity is ensured by the presence two states. If a memory cell is capable, depending on external influence, of taking one of two states, which are usually designated “0” and “1”, it has minimal information capacity.

The information capacity of one memory cell, capable of being in two different states, is taken as a unit of measurement for the amount of information - 1 bit.

1 bit (bit- abbreviation from English. bi nary digi t- binary number) - unit of measurement information capacity And amount of information, as well as one more quantity – information entropy, which we will get to know later.

Bit, one of the most unconditional units of measurement. If the unit of measurement of length could be set arbitrarily: cubit, foot, meter, then the unit of measurement of information could not be essentially any other. At the physical level, bits is memory cell , which at each moment of time is in one of two states: “ or ".

1" If each point of some image can only be either black , or white , such an image is called a bitmap because each point represents a memory cell with a capacity of 1 bit. A light bulb that can either " burn ", or " don't burn " also symbolizes the bit. A classic example illustrating 1 bit of information - the amount of information obtained as a result of tossing a coin - “ eagle " or "”.

tails An amount of information equal to 1 bit can be obtained in response to a question like “»/ « Yes" If initially there were more than two answer options, the amount of information received in a particular answer will be more than 1 bit, if there are less than two answer options, i.e. one, then this is not a question, but a statement, therefore, obtaining information is not required, since uncertainty No.

The information capacity of a memory cell capable of receiving information cannot be less than 1 bit, but quantity received information may be less than 1 bit. This occurs when the answer options are “yes” and “no.” not equally likely. The inequality, in turn, is a consequence of the fact that some preliminary (a priori) information on this issue is already available, obtained, for example, on the basis of previous life experience. Thus, in all the reasoning of the previous paragraph, one very important caveat should be taken into account: they are valid only for the equally probable case.

    What is meant by a bit of information?

    Define the unit of measurement of byte information.

    Define the concept of a bit in a byte.

    List the derived units of information.

    What is the power of the alphabet?

    What formula can you use to calculate the size of the alphabet?

    What are the main approaches to measuring information?

    Write down a formula connecting the number of events with different probabilities and the amount of information.

APPENDIX A

EXAMPLES OF PROBLEMS (WITH SOLUTIONS)

Example 1. After the computer science exam, grades (“5”, “4”, “3” or “2”) are announced. How much information will be conveyed by the message about the grade of student A, who has learned only half of the tickets, and the message about the grade of student B, who has learned all the tickets.

Solution. Experience shows that for student A all four assessments (events) are equally probable and then the amount of information carried by the assessment message can be calculated using Hartley’s formula:

I= Iog 2 4 = 2 bits.

As a result of observations, for student B the most likely grade is “5” ( R 1 = 1/2), the probability of rating “4” is two times less ( R 2 = 1/4), and the probabilities of ratings “2” and “3” are still two times less ( R 3 =p 4 =1/8). Since these events are not equally probable, to calculate the amount of information we use Shannon’s formula:

I = - ( 1/2 log 2 l/2+1/4 log 2 l/4+1/8 log 2 l/8+1/8 log 2 l/8)bit= 1.75 bit

(log 2 l/2=-1,log 2 l/4=-2,log 2 l/8=-3).

Answer: 2 bits; 1.75 bits.

Example 2. There are 32 balls in the lottery drum. How much information does the message that the number is 17 contain?

Solution. Since drawing any of the 32 balls is equally probable, the amount of information about one drawn number is found from the equation: 2 I=32. Since 32=2 5, then I=5 bits. (The answer does not depend on which number is drawn).

Answer: 5 bits

Example 3. To register on the site, the user is required to create a password. The password length is exactly 11 characters. The characters used are decimal digits and 12 different letters of the alphabet, and all letters are used in two styles: both lowercase and uppercase (letter case matters).

The minimum possible and identical integer number of bytes is allocated for storing each such password on the computer, while character-by-character encoding is used and all characters are encoded with the same and minimum possible number of bits.

Determine the amount of memory it takes to store 60 passwords (the password must occupy a WHOLE number of bytes).

Solution.

    according to the condition, the password can use 10 numbers (0...9) + 12 uppercase letters of the alphabet + 12 lowercase letters, a total of 10+12+12=34 characters;

    to encode 34 characters you need to allocate 6 bits of memory (5 bits are not enough, they allow you to encode only 2 5 = 32 options);

    to store all 11 characters of the password you need 11*6 = 66 bits;

    since the password must occupy an integer number of bytes, we take the nearest larger (more precisely, not smaller) value that is a multiple of 8: this is 72 = 9*8; that is, one password takes 9 bytes;

    therefore, 60 passwords take up 9*60 = 540 bytes.

Answer: 540 bytes.

Example 4. The database stores records containing information about students:

<Фамилия>– 16 characters: Russian letters (first capital, the rest lowercase);

<Имя>– 12 characters: Russian letters (first capital, the rest lowercase);

<Отчество>– 16 characters: Russian letters (first capital, the rest lowercase);

<Год рождения>– numbers from 1960 to 1997.

Each field is written using the fewest possible number of bits. Determine the minimum (integer) number of bytes required to encode one entry if the letters e And e are considered to be identical.

Solution.

    so, you need to determine the minimum possible sizes in bits for each of the four fields and add them up;

    it is known that the first letters of the first name, patronymic and last name are always capitalized, so you can store them as lowercase and make them capitalized only when displayed on the screen;

    Thus, for character fields it is enough to use an alphabet of 32 characters (Russian lowercase letters, “e” and “ё” are the same, spaces are not needed);

    to encode each character of a 32-character alphabet, 5 bits are needed (32=2 5), so to store the first, middle and last names you need (16+12+16)*5=220 bits;

    there are 38 options for the year of birth, so you need to allocate 6 bits for it (2 6 =64≥38);

    thus, a total of 226 bits or 29 bytes are required.

Answer: 29 bytes.

Example 5. The text contains 150 pages; each page has 40 lines, each line has 60 characters (a 256-character alphabet was used to record the text). How much information in MB is contained in the document?

Solution. The capacity of the alphabet is 256 characters, so one character carries 1 byte of information. This means that the page contains 40·60 = 2400 bytes of information. The volume of all information in the document (in different units):

2400·150 = 360,000 bytes.

360000/1024 = 351.6 KB.

351.5625/1024 = 0.3 MB.

Answer: 0.3 MB.

Example 6 . What is the power of the alphabet used to write a message containing 2048 characters (a page of text), if its size is 1.25 KB?

Solution. Let's translate the information message into bits:

1.25*1024*8=10240 bits.

Let's determine the number of bits per character:

10240:2048=5 bits.

Using the formula for the power of the alphabet, we determine the number of characters in the alphabet:

N=2 i=2 5 =32 characters.

Answer: 32 characters.