Annecy symbols. ASCII encoding (American standard code for information interchange) - basic text encoding for the Latin alphabet

According to the International Telecommunication Union, in 2016, three and a half billion people used the Internet with some regularity. Most of them don’t even think about the fact that any messages they send via PC or mobile gadgets, as well as texts that are displayed on all kinds of monitors, are actually combinations of 0 and 1. This representation of information is called encoding. It ensures and greatly facilitates its storage, processing and transmission. In 1963, the American ASCII encoding was developed, which is the subject of this article.

Presenting information on a computer

From the point of view of any electronic computer, text is a set of individual characters. These include not only letters, including capital ones, but also punctuation marks and numbers. In addition, special characters “=”, “&”, “(” and spaces are used.

The set of characters that make up the text is called the alphabet, and their number is called cardinality (denoted as N). To determine it, the expression N = 2^b is used, where b is the number of bits or the information weight of a particular symbol.

It has been proven that an alphabet with a capacity of 256 characters can represent all the necessary characters.

Since 256 represents the 8th power of two, the weight of each character is 8 bits.

A unit of measurement of 8 bits is called 1 byte, so it is customary to say that any character in text stored on a computer takes up one byte of memory.

How is coding done?

Any texts are entered into the memory of a personal computer using keyboard keys on which numbers, letters, punctuation marks and other symbols are written. They are transferred to RAM in binary code, i.e. each character is associated with a decimal code familiar to humans, from 0 to 255, which corresponds to a binary code - from 00000000 to 11111111.

Byte-byte character encoding allows the processor performing text processing to access each character individually. At the same time, 256 characters are quite enough to represent any symbolic information.

ASCII character encoding

This abbreviation in English stands for code for information interchange.

Even at the dawn of computerization, it became obvious that it was possible to come up with a wide variety of ways to encode information. However, to transfer information from one computer to another, it was necessary to develop a unified standard. So, in 1963, the ASCII encoding table appeared in the USA. In it, any symbol of the computer alphabet is associated with its serial number in binary representation. ASCII was originally used only in the United States and later became an international standard for PCs.

ASCII codes are divided into 2 parts. Only the first half of this table is considered the international standard. It includes characters with serial numbers from 0 (coded as 00000000) to 127 (coded 01111111).

Serial number

ASCII text encoding

Symbol

0000 0000 - 0001 1111

Characters with N from 0 to 31 are called control characters. Their function is to “manage” the process of displaying text on a monitor or printing device, giving a sound signal, etc.

0010 0000 - 0111 1111

Characters from N from 32 to 127 (standard part of the table) - upper and lowercase letters of the Latin alphabet, 10th digits, punctuation marks, as well as various brackets, commercial and other symbols. The character 32 represents a space.

1000 0000 - 1111 1111

Characters with N from 128 to 255 (alternative part of the table or code page) can have different variants, each of which has its own number. The code page is used to specify national alphabets that are different from Latin. In particular, it is with its help that ASCII encoding for Russian characters is carried out.

In the table, the encodings are capitalized and follow each other in alphabetical order, and the numbers are in ascending order. This principle remains the same for the Russian alphabet.

Control characters

The ASCII encoding table was originally created for receiving and transmitting information via a device that has not been used for a long time, such as a teletype. In this regard, non-printable characters were included in the character set, used as commands to control this device. Similar commands were used in such pre-computer messaging methods as Morse code, etc.

The most common teletype character is NUL (00). It is still used today in most programming languages ​​to indicate the end of a line.

Where is ASCII encoding used?

The American Standard Code is needed not only for entering text information on the keyboard. It is also used in graphics. In particular, in ASCII Art Maker, the images of the various extensions represent a spectrum of ASCII characters.

There are two types of such products: those that perform the function of graphic editors by converting images into text and those that convert “drawings” into ASCII graphics. For example, the famous emoticon is a prime example of an encoding symbol.

ASCII can also be used when creating an HTML document. In this case, you can enter a certain set of characters, and when viewing the page, a symbol that corresponds to this code will appear on the screen.

ASCII is also necessary for creating multilingual websites, since characters that are not included in a specific national table are replaced with ASCII codes.

Some features

ASCII was originally used to encode text information using 7 bits (one was left blank), but today it works as 8 bits.

The letters located in the columns located above and below differ from each other in only one single bit. This significantly reduces the complexity of the audit.

Using ASCII in Microsoft Office

If necessary, this type of text information encoding can be used in Microsoft text editors such as Notepad and Office Word. However, you may not be able to use some functions when typing in this case. For example, you won't be able to use bold text because ASCII encoding only preserves the meaning of the information, ignoring its general appearance and form.

Standardization

The ISO organization has adopted ISO 8859 standards. This group defines eight-bit encodings for different language groups. Specifically, ISO 8859-1 is an Extended ASCII table for the United States and Western European countries. And ISO 8859-5 is a table used for the Cyrillic alphabet, including the Russian language.

For a number of historical reasons, the ISO 8859-5 standard was used for a very short time.

For the Russian language, the following encodings are actually used at the moment:

  • CP866 (Code Page 866) or DOS, which is often called alternative GOST encoding. It was actively used until the mid-90s of the last century. At the moment it is practically not used.
  • KOI-8. The encoding was developed in the 1970s and 80s, and is currently the generally accepted standard for email messages on the RuNet. It is widely used in Unix operating systems, including Linux. The “Russian” version of KOI-8 is called KOI-8R. In addition, there are versions for other Cyrillic languages, such as Ukrainian.
  • Code Page 1251 (CP 1251, Windows - 1251). Developed by Microsoft to provide support for the Russian language in the Windows environment.

The main advantage of the first CP866 standard was the preservation of pseudographic characters in the same positions as in Extended ASCII. This made it possible to run foreign-made text programs, such as the famous Norton Commander, without modifications. Currently, CP866 is used for programs developed for Windows that run in full-screen text mode or in text windows, including FAR Manager.

Computer texts written in CP866 encoding are quite rare these days, but it is the one that is used for Russian file names in Windows.

"Unicode"

At the moment, this encoding is the most widely used. Unicode codes are divided into areas. The first (U+0000 to U+007F) includes ASCII characters with codes. This is followed by the character areas of various national scripts, as well as punctuation marks and technical symbols. In addition, some Unicode codes are reserved in case there is a need to include new characters in the future.

Now you know that in ASCII, each character is represented as a combination of 8 zeros and ones. To non-specialists, this information may seem unnecessary and uninteresting, but don’t you want to know what’s going on “in the brains” of your PC?!

Dec Hex Symbol Dec Hex Symbol
000 00 specialist. NOP 128 80 Ђ
001 01 specialist. SOH 129 81 Ѓ
002 02 specialist. STX 130 82
003 03 specialist. ETX 131 83 ѓ
004 04 specialist. EOT 132 84
005 05 specialist. ENQ 133 85
006 06 specialist. ACK 134 86
007 07 specialist. BEL 135 87
008 08 specialist. B.S. 136 88
009 09 specialist. TAB 137 89
010 0Aspecialist. LF 138 8AЉ
011 0Bspecialist. VT 139 8B‹ ‹
012 0Cspecialist. FF 140 8CЊ
013 0Dspecialist. CR 141 8DЌ
014 0Especialist. SO 142 8EЋ
015 0Fspecialist. S.I. 143 8FЏ
016 10 specialist. DLE 144 90 ђ
017 11 specialist. DC1 145 91
018 12 specialist. DC2 146 92
019 13 specialist. DC3 147 93
020 14 specialist. DC4 148 94
021 15 specialist. N.A.K. 149 95
022 16 specialist. SYN 150 96
023 17 specialist. ETB 151 97
024 18 specialist. CAN 152 98
025 19 specialist. E.M. 153 99
026 1Aspecialist. SUB 154 9Aљ
027 1Bspecialist. ESC 155 9B
028 1Cspecialist. FS 156 9Cњ
029 1Dspecialist. G.S. 157 9Dќ
030 1Especialist. R.S. 158 9Eћ
031 1Fspecialist. US 159 9Fџ
032 20 clutch SP (Space) 160 A0
033 21 ! 161 A1 Ў
034 22 " 162 A2ў
035 23 # 163 A3Ћ
036 24 $ 164 A4¤
037 25 % 165 A5Ґ
038 26 & 166 A6¦
039 27 " 167 A7§
040 28 ( 168 A8Yo
041 29 ) 169 A9©
042 2A* 170 A.A.Є
043 2B+ 171 AB«
044 2C, 172 A.C.¬
045 2D- 173 AD­
046 2E. 174 A.E.®
047 2F/ 175 A.F.Ї
048 30 0 176 B0°
049 31 1 177 B1±
050 32 2 178 B2І
051 33 3 179 B3і
052 34 4 180 B4ґ
053 35 5 181 B5µ
054 36 6 182 B6
055 37 7 183 B7·
056 38 8 184 B8e
057 39 9 185 B9
058 3A: 186 B.A.є
059 3B; 187 BB»
060 3C< 188 B.C.ј
061 3D= 189 BDЅ
062 3E> 190 BEѕ
063 3F? 191 B.F.ї
064 40 @ 192 C0 A
065 41 A 193 C1 B
066 42 B 194 C2 IN
067 43 C 195 C3 G
068 44 D 196 C4 D
069 45 E 197 C5 E
070 46 F 198 C6 AND
071 47 G 199 C7 Z
072 48 H 200 C8 AND
073 49 I 201 C9 Y
074 4AJ 202 C.A. TO
075 4BK 203 C.B. L
076 4CL 204 CC M
077 4DM 205 CD N
078 4EN 206 C.E. ABOUT
079 4FO 207 CF P
080 50 P 208 D0 R
081 51 Q 209 D1 WITH
082 52 R 210 D2 T
083 53 S 211 D3 U
084 54 T 212 D4 F
085 55 U 213 D5 X
086 56 V 214 D6 C
087 57 W 215 D7 H
088 58 X 216 D8 Sh
089 59 Y 217 D9 SCH
090 5AZ 218 D.A. Kommersant
091 5B[ 219 D.B. Y
092 5C\ 220 DC b
093 5D] 221 DD E
094 5E^ 222 DE YU
095 5F_ 223 DF I
096 60 ` 224 E0 A
097 61 a 225 E1 b
098 62 b 226 E2 V
099 63 c 227 E3 G
100 64 d 228 E4 d
101 65 e 229 E5 e
102 66 f 230 E6 and
103 67 g 231 E7 h
104 68 h 232 E8 And
105 69 i 233 E9 th
106 6Aj 234 E.A. To
107 6Bk 235 E.B. l
108 6Cl 236 E.C. m
109 6Dm 237 ED n
110 6En 238 E.E. O
111 6Fo 239 EF P
112 70 p 240 F0 R
113 71 q 241 F1 With
114 72 r 242 F2 T
115 73 s 243 F3 at
116 74 t 244 F4 f
117 75 u 245 F5 X
118 76 v 246 F6 ts
119 77 w 247 F7 h
120 78 x 248 F8 w
121 79 y 249 F9 sch
122 7Az 250 F.A. ъ
123 7B{ 251 FB s
124 7C| 252 F.C. b
125 7D} 253 FD uh
126 7E~ 254 F.E. Yu
127 7FSpecialist. DEL 255 FF I

ASCII Windows character code table.
Description of special (control) characters

It should be noted that ASCII table control characters were originally used to ensure data exchange via teletypewriter, data entry from punched tape, and for simple control of external devices.
Currently, most of the ASCII table control characters no longer carry this load and can be used for other purposes.
Code Description
NUL, 00Null, empty
SOH, 01Start Of Heading
STX, 02Start of TeXt, the beginning of the text.
ETX, 03End of TeXt, end of text
EOT, 04End of Transmission
ENQ, 05Enquire. Please confirm
ACK, 06Acknowledgment. I confirm
BEL, 07Bell, call
BS, 08Backspace, go back one character
TAB, 09Tab, horizontal tab
LF, 0ALine Feed, line feed.
Nowadays in most programming languages ​​it is denoted as \n
VT, 0BVertical Tab, vertical tabulation.
FF, 0CForm Feed, page feed, new page
CR, 0DCarriage Return, carriage return.
Nowadays in most programming languages ​​it is denoted as \r
SO,0EShift Out, change the color of the ink ribbon in the printing device
SI,0FShift In, return the color of the ink ribbon in the printing device back
DLE, 10Data Link Escape, switching the channel to data transmission
DC1, 11
DC2, 12
DC3, 13
DC4, 14
Device Control, device control symbols
NAK, 15Negative Acknowledgment, I do not confirm.
SYN, 16Synchronization. Synchronization symbol
ETB, 17End of Text Block, end of the text block
CAN, 18Cancel, canceling previously transferred
EM, 19End of Medium
SUB, 1ASubstitute, substitute.
Placed in place of a symbol whose meaning was lost or corrupted during transmissionESC, 1B
Escape Control SequenceFS, 1C
File Separator, file separatorGS, 1D
Group SeparatorRS, 1E
Record Separator, record separatorUS, 1F
Unit SeparatorDEL, 7F

Delete, erase the last character.

As you know, a computer stores information in binary form, representing it as a sequence of ones and zeros. To translate information into a form convenient for human perception, each unique sequence of numbers is replaced by its corresponding symbol when displayed.

One of the systems for correlating binary codes with printed and control characters is

At the current level of development of computer technology, the user is not required to know the code of each specific character. However, a general understanding of how coding is carried out is extremely useful, and for some categories of specialists, even necessary.

Creating ASCII

The encoding was originally developed in 1963 and then updated twice over the course of 25 years.

In the original version, the ASCII character table included 128 characters; later an extended version appeared, where the first 128 characters were saved, and previously missing characters were assigned to codes with the eighth bit involved.

For many years, this encoding was the most popular in the world. In 2006, Latin 1252 took the leading position, and from the end of 2007 to the present, Unicode has firmly held the leading position.

Each ASCII character has its own code, consisting of 8 characters representing a zero or a one. The minimum number in this representation is zero (eight zeros in the binary system), which is the code of the first element in the table.

Two codes in the table were reserved for switching between standard US-ASCII and its national variant.

After ASCII began to include not 128, but 256 characters, an encoding variant became widespread, in which the original version of the table was stored in the first 128 codes with the 8th bit zero. National written characters were stored in the upper half of the table (positions 128-255).

The user does not need to know ASCII character codes directly. A software developer usually only needs to know the element number in the table to calculate its code using the binary system if necessary.

Russian language

After the development of encodings for the Scandinavian languages, Chinese, Korean, Greek, etc. in the early 70s, the Soviet Union began creating its own version. Soon, a version of an 8-bit encoding called KOI8 was developed, preserving the first 128 ASCII character codes and allocating the same number of positions for letters of the national alphabet and additional characters.

Before the introduction of Unicode, KOI8 dominated the Russian segment of the Internet. There were encoding options for both the Russian and Ukrainian alphabet.

ASCII problems

Since the number of elements even in the extended table did not exceed 256, there was no possibility of accommodating several different scripts in one encoding. In the 90s, the “crocozyabr” problem appeared on the Runet, when texts typed in Russian ASCII characters were displayed incorrectly.

The problem was that the different ASCII codes did not match each other. Let us remember that various characters could be located in positions 128-255, and when changing one Cyrillic encoding to another, all letters of the text were replaced with others having an identical number in a different version of the encoding.

Current state

With the advent of Unicode, the popularity of ASCII began to decline sharply.

The reason for this lies in the fact that the new encoding made it possible to accommodate characters from almost all written languages. In this case, the first 128 ASCII characters correspond to the same characters in Unicode.

In 2000, ASCII was the most popular encoding on the Internet and was used on 60% of web pages indexed by Google. By 2012, the share of such pages had dropped to 17%, and Unicode (UTF-8) took the place of the most popular encoding.

Thus, ASCII is an important part of the history of information technology, but its use in the future seems unpromising.

Character overlay

The BS (backspace) character allows the printer to print one character on top of another. ASCII provided for adding diacritics to letters in this way, for example:

  • a BS "→ á
  • a BS ` → à
  • a BS ^ → â
  • o BS / → ø
  • c BS , → ç
  • n BS ~ → с

Note: in old fonts, the apostrophe " was drawn slanted to the left, and the tilde ~ was shifted up, so they just fit the role of an acute and a tilde on top.

If the same character is superimposed on a character, the result is a bold font effect, and if an underline is superimposed on a character, the result is underlined text.

  • a BS a → a
  • aBS_→ a

Note: This is used, for example, in the man help system.

National ASCII variants

The ISO 646 (ECMA-6) standard provides for the possibility of placing national symbols in place @ [ \ ] ^ ` { | } ~ . In addition to this, on site # can be posted £ , and in place $ - ¤ . This system is well suited for European languages ​​where only a few extra characters are needed. The version of ASCII without national characters is called US-ASCII, or "International Reference Version".

Subsequently, it turned out to be more convenient to use 8-bit encodings (code pages), where the lower half of the code table (0-127) is occupied by US-ASCII characters, and the upper half (128-255) by additional characters, including a set of national characters. Thus, the upper half of the ASCII table, before the widespread adoption of Unicode, was actively used to represent localized characters, letters of the local language. The lack of a unified standard for placing Cyrillic characters in the ASCII table caused many problems with encodings (KOI-8, Windows-1251 and others). Other languages ​​with non-Latin scripts also suffered from having several different encodings.

.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
0. NUL SOM EOA EOM EQT W.R.U. RU BELL BKSP HT LF VT FF CR SO S.I.
1. DC 0 DC 1 DC 2 DC 3 DC 4 ERR SYNC L.E.M. S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7
2.
3.
4. BLANK ! " # $ % & " ( ) * + , - . /
5. 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
6.
7.
8.
9.
A. @ A B C D E F G H I J K L M N O
B. P Q R S T U V W X Y Z [ \ ]
C.
D.
E. a b c d e f g h i j k l m n o
F. p q r s t u v w x y z ESC DEL

On those computers where the minimum addressable unit of memory was a 36-bit word, 6-bit characters were initially used (1 word = 6 characters). After the transition to ASCII, such computers began to contain either 5 seven-bit characters (1 bit remained extra) or 4 nine-bit characters in one word.

ASCII codes are also used to determine which key is pressed during programming. For a standard QWERTY keyboard, the code table looks like this:

The set of characters with which text is written is called alphabet.

The number of characters in the alphabet is its power.

Formula for determining the amount of information: N=2b,

where N is the power of the alphabet (number of characters),

b – number of bits (information weight of the symbol).

The alphabet with a capacity of 256 characters can accommodate almost all the necessary characters. This alphabet is called sufficient.

Because 256 = 2 8, then the weight of 1 character is 8 bits.

The unit of measurement 8 bits was given the name 1 byte:

1 byte = 8 bits.

The binary code of each character in computer text takes up 1 byte of memory.

How is text information represented in computer memory?

The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

Now the question arises, which eight-bit binary code to assign to each character.

It is clear that this is a conditional matter; you can come up with many encoding methods.

All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

Different types of computers use different encoding tables.

The table has become the international standard for PCs ASCII(read aski) (American Standard Code for Information Interchange).

The ASCII code table is divided into two parts.

Only the first half of the table is the international standard, i.e. symbols with numbers from 0 (00000000), up to 127 (01111111).

ASCII encoding table structure

Serial number

Code

Symbol

0 - 31

00000000 - 00011111

Symbols with numbers from 0 to 31 are usually called control symbols.
Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

32 - 127

00100000 - 01111111

Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other symbols.
Character 32 is a space, i.e. empty position in the text.
All others are reflected in certain signs.

128 - 255

10000000 - 11111111

Alternative part of the table (Russian).
The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.
The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

First half of the ASCII code table


Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographical order in the arrangement of characters is called the principle of sequential coding of the alphabet.

For letters of the Russian alphabet, the principle of sequential coding is also observed.

Second half of the ASCII code table


Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

The most common encoding currently used is Microsoft Windows, abbreviated CP1251.

Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode. This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Let's try using an ASCII table to imagine what words will look like in computer memory.

Internal representation of words in computer memory

Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.