I. History of information coding……………………………..3

II. Coding of information……………………………………………………4

III. Coding of text information…………………………….4

IV. Types of encoding tables……………………………………………………...6

V. Calculation of the amount of text information………………………14

List of references……………………………..16

I . History of information coding

Humanity has been using text encryption (encoding) since the very moment when the first secret information appeared. Here are several text encoding techniques that were invented at various stages of the development of human thought:

Cryptography is secret writing, a system of changing writing in order to make the text incomprehensible to the uninitiated;

Morse code or uneven telegraph code, in which each letter or sign is represented by its own combination of short chips electric current(dots) and elementary parcels of triple duration (dash);

Signature gestures are a sign language used by people with hearing impairments.

One of the earliest known encryption methods is named after the Roman emperor Julius Caesar (1st century BC). This method is based on replacing each letter of the encrypted text with another, by shifting the alphabet from the original letter by a fixed number of characters, and the alphabet is read in a circle, that is, after the letter i, a is considered. So the word “byte”, when shifted two characters to the right, is encoded as the word “gwlf”. The reverse process of deciphering a given word is necessary to replace each encrypted letter with the second one to the left of it.

II. Encoding information

Code is a set symbols(or signals) to record (or convey) some predefined concepts.

Information coding is the process of forming a specific representation of information. In a narrower sense, the term “coding” is often understood as a transition from one form of information representation to another, more convenient for storage, transmission or processing.

Usually, each image when encoding (sometimes called encryption) is represented by a separate sign.

A sign is an element of a finite set of elements distinct from each other.

In a narrower sense, the term “coding” is often understood as a transition from one form of information representation to another, more convenient for storage, transmission or processing.

You can process text information on a computer. When entered into a computer, each letter is encoded with a certain number, and when output to external devices (screen or print), images of letters are constructed from these numbers for human perception. The correspondence between a set of letters and numbers is called a character encoding.

As a rule, all numbers in a computer are represented using zeros and ones (not ten digits, as is usual for people). In other words, computers usually operate in the binary number system, since this makes the devices for processing them much simpler. Entering numbers into a computer and outputting them for human reading can be done in the usual decimal form, and all the necessary conversions are performed by programs running on the computer.

III. Encoding text information

The same information can be presented (encoded) in several forms. With the advent of computers, the need arose to encode all types of information that both an individual and humanity as a whole deal with. But humanity began to solve the problem of encoding information long before the advent of computers. The grandiose achievements of mankind - writing and arithmetic - are nothing more than a system for encoding speech and numerical information. Information never appears in its pure form, it is always presented somehow, encoded somehow.

Binary coding is one of the common ways of representing information. IN computers,In CNC robots and machine tools, typically all the ,information the device deals with is encoded as words of the ,binary alphabet.

Since the late 60s, computers have increasingly been used to process text information, and currently the bulk of personal computers in the world (and most of the time) are occupied with processing text information. All these types of information in a computer are presented in binary code, that is, an alphabet of power two is used (only two characters 0 and 1). This is due to the fact that it is convenient to represent information in the form of a sequence of electrical impulses: there is no impulse (0), there is an impulse (1).

Such coding is usually called binary, and the logical sequences of zeros and ones themselves are called machine language.

From a computer point of view, text consists of individual characters. The symbols include not only letters (uppercase or lowercase, Latin or Russian), but also numbers, punctuation marks, special characters such as "=", "(", "&", etc., and even (pay special attention!) spaces between words.

Texts are entered into the computer's memory using the keyboard. The letters, numbers, punctuation marks and other symbols we are familiar with are written on the keys. They enter RAM in binary code. This means that each character is represented by 8-bit binary code.

Traditionally, to encode one character, an amount of information equal to 1 byte is used, i.e. I = 1 byte = 8 bits. Using a formula that connects the number of possible events K and the amount of information I, you can calculate how many different symbols can be encoded (assuming that symbols are possible events): K = 2 I = 2 8 = 256, i.e. for To represent text information, you can use an alphabet with a capacity of 256 characters.

This number of characters is quite sufficient to represent text information, including uppercase and lowercase letters of the Russian and Latin alphabet, numbers, signs, graphic symbols, etc.

Coding consists of assigning each character a unique decimal code from 0 to 255 or a corresponding binary code from 00000000 to 11111111. Thus, a person distinguishes characters by their outline, and a computer by their code.

The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

In the process of displaying a symbol on a computer screen, the reverse process is performed - decoding, that is, converting the symbol code into its image. It is important that assigning a specific code to a symbol is a matter of agreement, which is recorded in the code table.

Now the question arises, which eight-bit binary code to assign to each character. It is clear that this is a conditional matter; you can come up with many encoding methods.

All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

IV . Types of encoding tables

A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

Different types of computers use different encoding tables.

The ASCII code table (American Standard Code for Information Interchange) has been adopted as an international standard, encoding the first half of characters with numeric codes from 0 to 127 (codes from 0 to 32 are assigned not to characters, but to function keys).

The ASCII code table is divided into two parts.

Only the first half of the table is the international standard, i.e. characters with numbers from 0 (00000000), to 127 (01111111).

ASCII encoding table structure

Serial number Code Symbol
0 - 31 00000000 - 00011111

Symbols with numbers from 0 to 31 are usually called control symbols.

Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

32 - 127 0100000 - 01111111

Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other symbols.

Character 32 is a space, i.e. empty position in the text.

All others are reflected by certain signs.

128 - 255 10000000 - 11111111

Alternative part of the table (Russian).

The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.

The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

First half of the ASCII code table

Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographic order in the arrangement of symbols is called the principle of sequential coding of the alphabet.

For letters of the Russian alphabet, the principle of sequential coding is also observed.

Second half of the ASCII code table

Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

The most common encoding currently used is Microsoft Windows, abbreviated CP1251. Introduced by Microsoft; given the widespread operating systems(OS) and other software products of this company in the Russian Federation, it has found wide distribution.

Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode.

This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Internal representation of words in computer memory

using an ASCII table

Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.

Thus, each encoding is specified by its own code table. As can be seen from the table, different characters are assigned to the same binary code in different encodings.

For example, the sequence of numeric codes 221, 194, 204 in the CP1251 encoding forms the word “computer”, whereas in other encodings it will be a meaningless set of characters.

Fortunately, in most cases the user does not have to worry about transcoding text documents, since this is done by special converter programs built into applications.

V . Calculation of the amount of text information

Task 1: Encode the word “Rome” using the KOI8-R and CP1251 encoding tables.


Task 2: Assuming that each character is encoded in one byte, estimate the information volume of the following sentence:

“My uncle has the most honest rules,

When I seriously fell ill,

He forced himself to respect

And I couldn’t think of anything better.”

Solution: This phrase has 108 characters, including punctuation, quotation marks and spaces. We multiply this number by 8 bits. We get 108*8=864 bits.

Task 3: The two texts contain the same number of characters. The first text is written in Russian, and the second in the language of the Naguri tribe, whose alphabet consists of 16 characters. Whose text contains more information?


1) I = K * a (the information volume of the text is equal to the product of the number of characters and the information weight of one character).

2) Because Both texts have the same number of characters (K), then the difference depends on the information content of one character of the alphabet (a).

3) 2 a1 = 32, i.e. a 1 = 5 bits, 2 a2 = 16, i.e. and 2 = 4 bits.

4) I 1 = K * 5 bits, I 2 = K * 4 bits.

5) This means that the text written in Russian carries 5/4 times more information.

Task 4: The size of the message, containing 2048 characters, was 1/512 of a MB. Determine the power of the alphabet.


1) I = 1/512 * 1024 * 1024 * 8 = 16384 bits - converted the information volume of the message into bits.

2) a = I / K = 16384 /1024 = 16 bits - accounts for one character of the alphabet.

3) 2*16*2048 = 65536 characters – the power of the alphabet used.

Task 5: Laser printer Canon LBP prints at an average speed of 6.3 Kbps. How long will it take to print an 8-page document, if you know that one page has an average of 45 lines and 70 characters per line (1 character - 1 byte)?


1) Find the amount of information contained on 1 page: 45 * 70 * 8 bits = 25200 bits

2) Find the amount of information on 8 pages: 25200 * 8 = 201600 bits

3) We reduce to common units of measurement. To do this, we convert Mbits into bits: 6.3*1024=6451.2 bits/sec.

4) Find the printing time: 201600: 6451.2 =31 seconds.


1. Ageev V.M. Information and coding theory: sampling and coding of measurement information. - M.: MAI, 1977.

2. Kuzmin I.V., Kedrus V.A. Fundamentals of information theory and coding. - Kyiv, Vishcha school, 1986.

3. The simplest methods of text encryption / D.M. Zlatopolsky. – M.: Chistye Prudy, 2007 – 32 p.

4. Ugrinovich N.D. Computer Science and information Technology. Textbook for grades 10-11 / N.D. Ugrinovich. – M.: BINOM. Laboratory of Knowledge, 2003. – 512 p.

5. http://school497.spb.edu.ru/uchint002/les10/les.html#n

Principle of sequential alphabet encoding: In the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order of value.

