Ascii description. ASCII encoding (American standard code for information interchange) - basic text encoding for the Latin alphabet

A computer understands the process of converting it into a form that allows for more convenient transmission, storage or automatic processing of this data. Various tables are used for this purpose. ASCII was the first system developed in the United States for working with English text, which subsequently became widespread throughout the world. The article below is devoted to its description, features, properties and further use.

Display and storage of information in a computer

Symbols on a computer monitor or one or another mobile digital gadget are formed based on sets of vector forms of various characters and a code that allows you to find among them the symbol that needs to be inserted in the right place. It represents a sequence of bits. Thus, each character must uniquely correspond to a set of zeros and ones, which appear in a certain, unique order.

How it all began

Historically, the first computers were English-language. To encode symbolic information in them, it was enough to use only 7 bits of memory, while 1 byte consisting of 8 bits was allocated for this purpose. The number of characters understood by the computer in this case was 128. These characters included the English alphabet with its punctuation marks, numbers and some special characters. The English-language seven-bit encoding with the corresponding table (code page), developed in 1963, was called the American Standard Code for Information Interchange. Usually, the abbreviation “ASCII encoding” was and is still used to denote it.

Transition to multilingualism

Over time, computers became widely used in non-English speaking countries. In this regard, there was a need for encodings that allow the use of national languages. It was decided not to reinvent the wheel and take ASCII as a basis. The encoding table in the new edition has expanded significantly. The use of the 8th bit made it possible to translate 256 characters into a computer language.

Description

The ASCII encoding has a table that is divided into 2 parts. Only its first half is considered to be a generally accepted international standard. It includes:

  • Characters with serial numbers from 0 to 31, encoded in sequences from 00000000 to 00011111. They are reserved for control characters that control the process of displaying text on the screen or printer, sounding a sound signal, etc.
  • Characters with NN in the table from 32 to 127, encoded by sequences from 00100000 to 01111111 form the standard part of the table. These include space (N 32), letters of the Latin alphabet (lowercase and uppercase), ten-digit numbers from 0 to 9, punctuation marks, brackets of different styles and other symbols.
  • Characters with serial numbers from 128 to 255, encoded by sequences from 10000000 to 11111111. These include letters of national alphabets other than Latin. It is this alternative part of the ASCII table that is used to convert Russian characters into computer form.

Some properties

Features of the ASCII encoding include the difference between the letters “A” - “Z” of lower and upper case by only one bit. This circumstance greatly simplifies register conversion, as well as checking whether it belongs to a given range of values. In addition, all letters in the ASCII encoding system are represented by their own sequence numbers in the alphabet, which are written with 5 digits in the binary number system, preceded by 011 2 for lowercase letters and 010 2 for uppercase letters.

One of the features of the ASCII encoding is the representation of 10 digits - “0” - “9”. In the second number system they start with 00112 and end with 2 number values. Thus, 0101 2 is equivalent to the decimal number five, so the character "5" is written as 0011 01012. Based on the above, you can easily convert BCD numbers into an ASCII string by adding the bit sequence 00112 to each nibble on the left.

"Unicode"

As you know, thousands of characters are required to display texts in the languages ​​of the Southeast Asian group. Such a number of them cannot be described in any way in one byte of information, so even extended versions of ASCII could no longer satisfy the increased needs of users from different countries.

Thus, the need arose to create a universal text encoding, the development of which, in collaboration with many leaders of the global IT industry, was undertaken by the Unicode consortium. Its specialists created the UTF 32 system. In it, 32 bits were allocated to encode 1 character, constituting 4 bytes of information. The main disadvantage was a sharp increase in the amount of required memory by as much as 4 times, which entailed many problems.

At the same time, for most countries with official languages ​​belonging to the Indo-European group, the number of characters equal to 2 32 is more than excessive.

As a result of further work by specialists from the Unicode consortium, the UTF-16 encoding appeared. It became the option for converting symbolic information that suited everyone both in terms of the amount of memory required and the number of encoded characters. That is why UTF-16 was adopted by default and requires 2 bytes to be reserved for one character.

Even this fairly advanced and successful version of Unicode had some drawbacks, and after the transition from the extended version of ASCII to UTF-16, the weight of the document doubled.

In this regard, it was decided to use UTF-8 variable length encoding. In this case, each character of the source text is encoded as a sequence of length from 1 to 6 bytes.

Contact American standard code for information interchange

All Latin characters in UTF-8 variable length are encoded into 1 byte, as in the ASCII encoding system.

A special feature of YTF-8 is that in the case of text in Latin without using other characters, even programs that do not understand Unicode will still be able to read it. In other words, the base ASCII text encoding simply becomes part of the new variable-length UTF. Cyrillic characters in YTF-8 occupy 2 bytes, and, for example, Georgian characters - 3 bytes. By creating UTF-16 and 8, the main problem of creating a single code space in fonts was solved. Since then, font manufacturers can only fill the table with vector forms of text characters based on their needs.

Different operating systems prefer different encodings. To be able to read and edit texts typed in a different encoding, Russian text conversion programs are used. Some text editors contain built-in transcoders and allow you to read text regardless of encoding.

Now you know how many characters are in the ASCII encoding and how and why it was developed. Of course, today the Unicode standard is most widespread in the world. However, we must not forget that it is based on ASCII, so the contribution of its developers to the IT field should be appreciated.

[8-bit encodings: ASCII, KOI-8R and CP1251] The first encoding tables created in the USA did not use the eighth bit in a byte. The text was represented as a sequence of bytes, but the eighth bit was not taken into account (it was used for official purposes).

The ASCII table (American Standard Code for Information Interchange) has become a generally accepted standard. The first 32 characters of the ASCII table (00 to 1F) were used for non-printing characters. They were designed to control a printing device, etc. The rest - from 20 to 7F - are regular (printable) characters.

Table 1 - ASCII encoding

Dec Hex Oct Char Description
0 0 000 null
1 1 001 start of heading
2 2 002 start of text
3 3 003 end of text
4 4 004 end of transmission
5 5 005 inquiry
6 6 006 acknowledge
7 7 007 bell
8 8 010 backspace
9 9 011 horizontal tab
10 A 012 new line
11 B 013 vertical tab
12 C 014 new page
13 D 015 carriage return
14 E 016 shift out
15 F 017 shift in
16 10 020 data link escape
17 11 021 device control 1
18 12 022 device control 2
19 13 023 device control 3
20 14 024 device control 4
21 15 025 negative acknowledge
22 16 026 synchronous idle
23 17 027 end of trans. block
24 18 030 cancel
25 19 031 end of medium
26 1A 032 substitute
27 1B 033 escape
28 1C 034 file separator
29 1D 035 group separator
30 1E 036 record separator
31 1F 037 unit separator
32 20 040 space
33 21 041 !
34 22 042 "
35 23 043 #
36 24 044 $
37 25 045 %
38 26 046 &
39 27 047 "
40 28 050 (
41 29 051 )
42 2A 052 *
43 2B 053 +
44 2C 054 ,
45 2D 055 -
46 2E 056 .
47 2F 057 /
48 30 060 0
49 31 061 1
50 32 062 2
51 33 063 3
52 34 064 4
53 35 065 5
54 36 066 6
55 37 067 7
56 38 070 8
57 39 071 9
58 3A 072 :
59 3B 073 ;
60 3C 074 <
61 3D 075 =
62 3E 076 >
63 3F 077 ?
Dec Hex Oct Char
64 40 100 @
65 41 101 A
66 42 102 B
67 43 103 C
68 44 104 D
69 45 105 E
70 46 106 F
71 47 107 G
72 48 110 H
73 49 111 I
74 4A 112 J
75 4B 113 K
76 4C 114 L
77 4D 115 M
78 4E 116 N
79 4F 117 O
80 50 120 P
81 51 121 Q
82 52 122 R
83 53 123 S
84 54 124 T
85 55 125 U
86 56 126 V
87 57 127 W
88 58 130 X
89 59 131 Y
90 5A 132 Z
91 5B 133 [
92 5C 134 \
93 5D 135 ]
94 5E 136 ^
95 5F 137 _
96 60 140 `
97 61 141 a
98 62 142 b
99 63 143 c
100 64 144 d
101 65 145 e
102 66 146 f
103 67 147 g
104 68 150 h
105 69 151 i
106 6A 152 j
107 6B 153 k
108 6C 154 l
109 6D 155 m
110 6E 156 n
111 6F 157 o
112 70 160 p
113 71 161 q
114 72 162 r
115 73 163 s
116 74 164 t
117 75 165 u
118 76 166 v
119 77 167 w
120 78 170 x
121 79 171 y
122 7A 172 z
123 7B 173 {
124 7C 174 |
125 7D 175 }
126 7E 176 ~
127 7F 177 DEL

As is easy to see, this encoding contains only Latin letters, and those that are used in the English language. There are also arithmetic and other service symbols. But there are neither Russian letters, nor even special Latin ones for German or French. This is easy to explain - the encoding was developed specifically as an American standard. As computers began to be used throughout the world, other characters needed to be encoded.

To do this, it was decided to use the eighth bit in each byte. This made 128 more values ​​available (from 80 to FF) that could be used to encode characters. The first of the eight-bit tables - “extended ASCII” ( Extended ASCII) - included various variants of Latin characters used in some languages ​​of Western Europe. It also contained other additional symbols, including pseudographics.

Pseudographic characters allow you to provide some semblance of graphics by displaying only text characters on the screen. For example, the file management program FAR Manager works using pseudographics.

There were no Russian letters in the Extended ASCII table. Russia (formerly the USSR) and other countries created their own encodings that made it possible to represent specific “national” characters in 8-bit text files - Latin letters of the Polish and Czech languages, Cyrillic (including Russian letters) and other alphabets.

In all encodings that have become widespread, the first 127 characters (that is, the byte value with the eighth bit equal to 0) are the same as ASCII. So an ASCII file works in either of these encodings; The letters of the English language are represented in the same way.

The ISO organization (International Standardization Organization) has adopted the ISO 8859 group of standards. It defines 8-bit encodings for different groups of languages. So, ISO 8859-1 is an Extended ASCII table for the USA and Western Europe. And ISO 8859-5 is a table for the Cyrillic alphabet (including Russian).

However, for historical reasons, the ISO 8859-5 encoding did not take root. In reality, the following encodings are used for the Russian language:

Code Page 866 (CP866), aka “DOS”, aka “alternative GOST encoding”. Widely used until the mid-90s; now used to a limited extent. Practically not used for distributing texts on the Internet.
- KOI-8. Developed in the 70-80s. It is a generally accepted standard for transmitting email messages on the Russian Internet. It is also widely used in operating systems of the Unix family, including Linux. The Russian-language version of KOI-8 is called KOI-8R; There are versions for other Cyrillic languages ​​(for example, KOI8-U is a version for the Ukrainian language).
- Code Page 1251, CP1251, Windows-1251. Developed by Microsoft to support the Russian language in Windows.

The main advantage of the CP866 was the preservation of pseudo-graphics characters in the same places as in Extended ASCII; therefore, foreign text programs, for example, the famous Norton Commander, could work without changes. The CP866 is now used for Windows programs running in text windows or full-screen text mode, including FAR Manager.

Texts in CP866 have been quite rare in recent years (but it is used to encode Russian file names in Windows). Therefore, we will dwell in more detail on two other encodings - KOI-8R and CP1251.



As you can see, in the CP1251 encoding table, Russian letters are arranged in alphabetical order (with the exception, however, of the letter E). This arrangement makes it very easy for computer programs to sort alphabetically.

But in KOI-8R the order of Russian letters seems random. But actually it is not.

In many older programs, the 8th bit was lost when processing or transmitting text. (Now such programs are practically “extinct”, but in the late 80s - early 90s they were widespread). To get a 7-bit value from an 8-bit value, just subtract 8 from the most significant digit; for example, E1 becomes 61.

Now compare KOI-8R with the ASCII table (Table 1). You will find that Russian letters are placed in clear correspondence with Latin ones. If the eighth bit disappears, lowercase Russian letters turn into uppercase Latin letters, and uppercase Russian letters turn into lowercase Latin letters. So, E1 in KOI-8 is the Russian “A”, while 61 in ASCII is the Latin “a”.

So, KOI-8 allows you to maintain the readability of Russian text when the 8th bit is lost. “Hello everyone” becomes “pRIWET WSEM”.

Recently, both the alphabetical order of characters in the encoding table and readability with the loss of the 8th bit have lost their decisive importance. The eighth bit in modern computers is not lost during transmission or processing. And alphabetical sorting is done taking into account the encoding, and not by simply comparing codes. (By the way, the CP1251 codes are not completely arranged alphabetically - the letter E is not in its place).

Due to the fact that there are two common encodings, when working with the Internet (mail, browsing Web sites), you can sometimes see a meaningless set of letters instead of Russian text. For example, “I AM SBYUFEMHEL.” These are just the words “with respect”; but they were encoded in CP1251 encoding, and the computer decoded the text using the KOI-8 table. If the same words, on the contrary, were encoded in KOI-8, and the computer decoded the text according to the CP1251 table, the result would be “U HCHBTSEOYEN”.

Sometimes it happens that a computer deciphers Russian-language letters using a table not intended for the Russian language. Then, instead of Russian letters, a meaningless set of symbols appears (for example, Latin letters of Eastern European languages); they are often called “crocozybras”.

In most cases, modern programs cope with determining the encodings of Internet documents (emails and Web pages) independently. But sometimes they “misfire”, and then you can see strange sequences of Russian letters or “krokozyabry”. As a rule, in such a situation, to display real text on the screen, it is enough to select the encoding manually in the program menu.

Information from the page http://open-office.edusite.ru/TextProcessor/p5aa1.html was used for this article.

Material taken from the site:

Excel for Office 365 Word for Office 365 Outlook for Office 365 PowerPoint for Office 365 Publisher for Office 365 Excel 2019 Word 2019 Outlook 2019 PowerPoint 2019 OneNote 2016 Publisher 2019 Visio Professional 2019 Visio Standard 2019 Excel 2016 Word 2016 Outlook 2016 PowerPoint 2016 2013 Publisher 2016 Visio 2013 Visio Professional 2016 Visio Standard 2016 Excel 2013 Word 2013 Outlook 2013 PowerPoint 2013 Publisher 2013 Excel 2010 Word 2010 Outlook 2010 PowerPoint 2010 OneNote 2010 Publisher 2010 Visio 2010 Excel 2007 Word 2007 Outlook 20 07 PowerPoint 2007 Publisher 2007 Access 2007 Visio 2007 OneNote 2007 Office 2010 Visio Standard 2007 Visio Standard 2010 Less

In this article: Insert an ASCII or Unicode character into a document

If you only need to enter a few special characters or symbols, you can use keyboard shortcuts. For a list of ASCII characters, see the following tables or the article Inserting National Alphabets Using Keyboard Shortcuts.

Notes:

Inserting ASCII characters

To insert an ASCII character, press and hold the ALT key while entering the character code. For example, to insert a degree symbol (º), press and hold the ALT key, then enter 0176 on the numeric keypad.

To enter numbers, use the numeric keypad rather than the numbers on the main keyboard. If you need to enter numbers on the numeric keypad, make sure the NUM LOCK indicator is on.

Inserting Unicode Characters

To insert a Unicode character, enter the character code, then press ALT and X. For example, to insert a dollar symbol ($), enter 0024 and press ALT and X. For all Unicode character codes, see .

Important: Some Microsoft Office programs, such as PowerPoint and InfoPath, do not support converting Unicode codes to characters. If you need to insert a Unicode character in one of these programs, use .

Notes:

    If the wrong Unicode character appears after you press ALT+X, select the correct code, and then press ALT+X again.

    In addition, you must enter "U+" before the code. For example, if you enter "1U+B5" and press ALT+X, the text "1µ" will be displayed, and if you enter "1B5" and press ALT+X, the symbol "Ƶ" will be displayed.

Using the symbol table

A character table is a program built into Microsoft Windows that allows you to view the characters available for a selected font.

Using a symbol table, you can copy individual symbols or a group of symbols to the clipboard and paste them into any program that supports displaying those symbols. Opening the symbol table

    In Windows 10, enter the word "symbol" in the search box on the taskbar and select the symbol table from the search results.

    In Windows 8, type "symbol" on the Start screen and select the symbol table from the search results.

    In Windows 7, click the Start button, select All Programs, Accessories, System Tools, and then click Character Map.

Characters are grouped by font. Click the font list to select the appropriate character set. To select a symbol, click it, then click the Select button. To insert a symbol, right-click the desired location in the document and select Paste.

Frequently used character codes

For a complete list of characters, see Computer, ASCII Character Code Table, or Unicode Character Tables Organized by Set.

Glyph

Glyph

Currency

Legal symbols

Mathematical symbols

Fractions

Punctuation and dialect symbols

Shape symbols

Commonly used diacritics codes

For a complete list of glyphs and corresponding codes, see.

Glyph

Glyph

Non-printing ASCII control characters

The characters used to control some peripheral devices, such as printers, are numbered 0–31 in the ASCII table. For example, the page feed/new page character is number 12. This character tells the printer to move to the beginning of the next page.

Table of non-printing ASCII control characters

Decimal number

Sign

Decimal number

Sign

Freeing the data channel

Start of title

First device control code

Beginning of text

Second device control code

End of text

Third device control code

End of transmission

Fourth device control code

five-pointed

Negative confirmation

Confirmation

Synchronous transmission mode

Sound signal

End of transmitted data block

Horizontal tabulation

End of media

Line feed/new line

Replacement symbol

Vertical tab

exceed

Page translation/new page

Twelve

File separator

Carriage return

Group separator

Shift without storing bits

Record separator

Bit-preserving shift

fifteen

Data separator

In order to use ASCII correctly, it is necessary to expand your knowledge in this area and about coding capabilities.

What it is?

ASCII is an encoding table of printed characters (see screenshot No. 1) typed on a computer keyboard to transmit information and some codes. In other words, the alphabet and decimal digits are encoded into corresponding symbols that represent and carry the necessary information.

ASCII was developed in America, so the standard character set usually includes the English alphabet with numbers, for a total of about 128 characters. But then a fair question arises: what to do if encoding of the national alphabet is required?

Other versions of the ASCII table have been developed to address similar issues. For example, for languages ​​with a foreign structure, the letters of the English alphabet were either removed, or additional characters were added to them in the form of a national alphabet. Thus, the ASCII encoding may contain Russian letters for national use (see screenshot No. 2).

Where is the ASCII coding system used?

This coding system is necessary not only for typing text information on the keyboard. It is also used in graphics. For example, in the ASCII Art Maker program, graphic images of various extensions consist of a range of ASCII characters (see screenshot No. 3).


As a rule, such programs can be divided into those that perform the function of graphic editors, inverting an image into text, and those that convert an image into ASCII graphics. The well-known emoticon (or as it is also called “smiling human face”) is also an example of an encoding symbol.

This encoding method can also be used when writing or creating an HTML document. For example, you enter a specific and necessary set of characters, and when viewing the page itself, the symbol corresponding to this code will be displayed on the screen.

Among other things, this type of encoding is necessary when creating a multilingual website, because characters that are not included in one or another national table will need to be replaced with ASCII codes. If the reader is directly connected with information and communication technologies (ICT), then it will be useful for him to familiarize himself with such systems as:

  • Portable character set;
  • Control characters;
  • EBCDIC;
  • VISCII;
  • YUSCII;
  • Unicode;
  • ASCII art;
  • KOI-8.
  • ASCII Table Properties

    Like any systematic program, ASCII has its own characteristic properties. So, for example, the decimal number system (digits from 0 to 9) is converted to the binary number system (i.e., each decimal digit is converted to binary 288 = 1001000, respectively).

    The letters located in the upper and lower columns differ from each other only by a bit, which significantly reduces the level of complexity of checking and editing the case.

    With all these properties, ASCII encoding works as eight-bit, although it was originally intended to be seven-bit.

    Using ASCII in Microsoft Office programs:

    If necessary, this option for encoding information can be used in Microsoft Notepad and Microsoft Office Word. Within these applications, the document can be saved in ASCII format, but in this case, you will not be able to use some functions when typing text.

    In particular, bolding and bolding will not be available because encoding only preserves the meaning of the typed information, and not the general appearance and form. You can add such codes to a document using the following software applications:

    • Microsoft Excel;
    • Microsoft FrontPage;
    • Microsoft InfoPath;
    • Microsoft OneNote;
    • Microsoft Outlook;
    • Microsoft PowerPoint;
    • Microsoft Project.

    It is worth considering that when typing ASCII code in these applications, you must hold down the ALT key.

    Of course, all the necessary codes require a longer and more detailed study, but this is beyond the scope of our article today. I hope that you found it really useful.

    See you again!

    Good bad

    Character Overlay

    The BS (backspace) character allows the printer to print one character on top of another. ASCII provided for adding diacritics to letters in this way, for example:

    • a BS "→ á
    • a BS ` → à
    • a BS ^ → â
    • o BS / → ø
    • c BS , → ç
    • n BS ~ → с

    Note: in older fonts, the apostrophe " was drawn slanted to the left, and the tilde ~ was moved up, so they just fit the role of an acute and a tilde on top.

    If the same character is superimposed on a character, the result is a bold font effect, and if an underline is superimposed on a character, the result is underlined text.

    • a BS a → a
    • aBS_→ a

    Note: this is used, for example, in the man help system.

    National ASCII variants

    The ISO 646 (ECMA-6) standard allows for national characters to be placed in place of @ [ \ ] ^ ` ( | ) ~ . In addition to this, £ can be placed in the # place, and ¤ can be placed in the $ place. This system is well suited for European languages ​​where only a few extra characters are needed. The version of ASCII without national characters is called US-ASCII, or "International Reference Version".

    Subsequently, it turned out to be more convenient to use 8-bit encodings (code pages), where the lower half of the code table (0-127) is occupied by US-ASCII characters, and the upper half (128-255) by additional characters, including a set of national characters. Thus, the upper half of the ASCII table, before the widespread adoption of Unicode, was actively used to represent localized characters, letters of the local language. The lack of a unified standard for placing Cyrillic characters in the ASCII table caused many problems with encodings (KOI-8, Windows-1251 and others). Other languages ​​with non-Latin scripts also suffered from having several different encodings.

    .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F0. 1. 2. 3. 4. 5. 6. 7. 8. 9. A. B. C. D. E. F.
    NUL SOM EOA EOM EQT W.R.U. RU BELL BKSP HT LF VT FF CR SO S.I.
    DC 0 DC 1 DC 2 DC 3 DC 4 ERR SYNC L.E.M. S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7
    BLANK ! " # $ % & " ( ) * + , - . /
    0 1 2 3 4 5 6 7 8 9 : ; < = > ?
    @ A B C D E F G H I J K L M N O
    P Q R S T U V W X Y Z [ \ ]
    a b c d e f g h i j k l m n o
    p q r s t u v w x y z ESC DEL

    On those computers where the minimum addressable unit of memory was a 36-bit word, 6-bit characters were initially used (1 word = 6 characters). After the transition to ASCII, such computers began to contain either 5 seven-bit characters (1 bit remained extra) or 4 nine-bit characters in one word.

    ASCII codes are also used to determine which key is pressed during programming. For a standard QWERTY keyboard, the code table looks like this: