Encoding information code tables. Coding text information in a computer

Text information consists of symbols: letters, numbers, punctuation marks, etc. One byte is enough to store 256 different values, which allows you to place any of the alphanumeric characters in it. The first 128 characters (occupying the least significant seven bits) are standardized using the ASCII (American Standard Code for Information Interchange) encoding. The essence of encoding is that each character is assigned a binary code from 00000000 to 11111111 or a corresponding decimal code from 0 to 255. To encode Russian letters, various code tables are used (KOI-8R, CP1251, CP10007, ISO-8859-5 ):

KOI8R- eight-bit standard for encoding letters of the Cyrillic alphabets (for the UNIX operating system). Developers KOI8R placed the Russian alphabet characters at the top of the extended ASCII table so that the positions of the Cyrillic characters correspond to their phonetic counterparts in the English alphabet at the bottom of the table. This means that from the text written in KOI8R, the result is text written in Latin characters. For example, the words “high house” take the form “dom vysokiy”;

CP1251– eight-bit encoding standard used in OS Windows;

CP10007- eight-bit encoding standard used in the Cyrillic alphabet of the Macintosh operating system (Apple computers);

ISO-8859-5 – an eight-bit code approved as a standard for encoding the Russian language.

Encoding graphic information

Graphic information can be presented in two forms: analog And discrete. Painting canvas created by the artist is analog representation example, and the image printed using a printer, consisting of individual (elements) points of different colors, is discrete representation.

By splitting a graphic image (sampling), graphic information is converted from analogue form to discrete form. In this case, coding is performed - assigning a specific value in the form of a code to each element of the graphic image. Creation and storage of graphic objects is possible in several types - as vector, fractal or raster Images. A separate item considered 3D (three-dimensional) graphics, which combines vector and raster image generation methods.

Vector graphics used to represent graphic images such as pictures, drawings, diagrams.

They are formed from objects - a set of geometric primitives (points, lines, circles, rectangles), which are assigned certain characteristics, for example, line thickness, fill color.

An image in vector format simplifies the editing process, since the image can be scaled, rotated, and deformed without loss. Moreover, each transformation destroys the old image (or fragment), and a new one is built in its place. This presentation method is good for diagrams and business graphics. When encoding a vector image, it is not the image of the object itself that is stored, but the coordinates of the points, using which the program recreates the image each time.

Main disadvantage vector graphics is inability to produce photographic quality images. In vector format, the image will always look like a drawing.

Raster graphics. Any picture can be divided into squares, thus obtaining raster - two-dimensional array squares. The squares themselves - raster elements or pixels(picture's element) - elements of a picture. The color of each pixel is encoded with a number, which allows you to specify the order of color numbers (from left to right or top to bottom) to describe the picture. The number of each cell in which the pixel is stored is recorded in memory.

Drawing in raster format

Each pixel is assigned brightness, color, and transparency values, or a combination of these values. A raster image has a number of rows and columns. This storage method has its drawbacks: a larger amount of memory required for working with images.

The volume of a raster image is determined by multiplying the number of pixels by the information volume of one point, which depends on the number of possible colors. Modern computers mainly use the following screen resolutions: 640 by 480, 800 by 600, 1024 by 768 and 1280 by 1024 pixels. The brightness of each point and its coordinates can be expressed using integers, which allows the use of binary code to process graphics data.

In the simplest case (a black and white image without grayscale), each point on the screen can have one of two states - “black” or “white”, that is, 1 bit is needed to store its state. Color images are generated according to the binary color code of each pixel stored in video memory. Color images can have different color depths, which are determined by the number of bits used to encode the color of a dot. The most common color depths are 8, 16, 24, 32, 64 bits.

To encode color graphic images, an arbitrary color is divided into its components. The following coding systems are used:

HSB (H - hue, S - saturation, B - brightness),

RGB (Red - red,Green - green, Blue- blue) And

CMYK ( C yan - blue, Magenta - purple, Yellow - yellow and Black - black).

The first system is convenient for person, the second - for computer processing, and the last one is for printing houses. The use of these color systems is due to the fact that the luminous flux can be formed by radiation that is a combination of “pure” spectral colors: red, green, blue or their derivatives.

Fractal is an object whose individual elements inherit the properties of parent structures. Since a more detailed description of smaller-scale elements occurs using a simple algorithm, such an object can be described with just a few mathematical equations. Fractals allow you to describe images that require relatively little memory to represent in detail.

Drawing in fractal format

3D graphics (3D) operates with objects in three-dimensional space. Three-dimensional computer graphics are widely used in cinema and computer games, where all objects are represented as a set of surfaces or particles. All visual transformations in 3D graphics are controlled using operators having a matrix representation.

Encoding of audio information

Music, like any sound, is nothing more than sound vibrations, which, having registered, can be reproduced quite accurately. To represent an audio signal in computer memory, it is necessary to represent the received acoustic vibrations in digital form, that is, convert them into a sequence of zeros and ones. Using a microphone, sound is converted into electrical vibrations, after which the amplitude of the vibrations can be measured at regular intervals (several tens of thousands of times per second) using a special device - analog-to-digital converter (ADC). To reproduce sound, a digital signal must be converted to analog using digital-to-analog converter (DAC). Both of these devices are built into sound card computer. The indicated sequence of transformations is presented in Fig. 2.6..

Transformation of analog signal to digital and vice versa

Every sound measurement is recorded in binary code. This process is called sampling (sampling), performed using an ADC.

Sample (sample English sample) is the time interval between two measurements of the amplitude of an analog signal. In addition to a period of time, a sample is also called any sequence of digital data that is obtained through analog-to-digital conversion. An important parameter sampling is frequency - the number of measurements of the analog signal amplitude per second. The audio sampling rate range is from 8000 to 48000 measurements per second.

Graphical representation of the sampling process

Playback quality is affected sample rate and resolution(the size of the cell allocated for recording the amplitude value). For example, recording music to CDs uses 16-bit values ​​and a sampling rate of 44032 Hz.

By hearing, a person perceives sound waves with a frequency ranging from 16 Hz to 20 kHz (1 Hz - 1 vibration per second).

In the Audio DVD CD format, the signal is measured 96,000 times in one second, i.e. A sampling frequency of 96 kHz is used. To save hard disk space in multimedia applications, lower frequencies are often used: 11, 22, 32 kHz. This leads to a decrease in the audible frequency range, which means that what is heard is distorted.

The set of characters with which text is written is called alphabet.

The number of characters in the alphabet is its power.

Formula for determining the amount of information: N=2b,

where N is the power of the alphabet (number of characters),

b – number of bits (information weight of the symbol).

The alphabet with a capacity of 256 characters can accommodate almost all the necessary characters. This alphabet is called sufficient.

Because 256 = 2 8, then the weight of 1 character is 8 bits.

The unit of measurement 8 bits was given the name 1 byte:

1 byte = 8 bits.

The binary code of each character in computer text takes up 1 byte of memory.

How is text information represented in computer memory?

The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

Now the question arises, which eight-bit binary code to assign to each character.

It is clear that this is a conditional matter; you can come up with many encoding methods.

All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

For different types Computers use different encoding tables.

The table has become the international standard for PCs ASCII(read aski) (American Standard Code for Information Interchange).

The ASCII code table is divided into two parts.

Only the first half of the table is the international standard, i.e. symbols with numbers from 0 (00000000), up to 127 (01111111).

ASCII encoding table structure

Serial number

Code

Symbol

0 - 31

00000000 - 00011111

Symbols with numbers from 0 to 31 are usually called control symbols.
Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

32 - 127

00100000 - 01111111

Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other symbols.
Character 32 is a space, i.e. empty position in the text.
All others are reflected by certain signs.

128 - 255

10000000 - 11111111

Alternative part of the table (Russian).
The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.
The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

First half of the ASCII code table


Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographic order in the arrangement of symbols is called the principle of sequential coding of the alphabet.

For letters of the Russian alphabet, the principle of sequential coding is also observed.

Second half of the ASCII code table


Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

The most common encoding currently used is Microsoft Windows, abbreviated CP1251.

Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode. This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Let's try using an ASCII table to imagine what words will look like in the computer's memory.

Internal representation of words in computer memory

Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.

Encoding text information in a computer is sometimes an essential condition for the correct operation of a device or the display of a particular fragment. How this process occurs during the operation of a computer with text and visual information, sound - we will analyze all this in this article.

Introduction

Electronic computer (which we Everyday life We call it a computer) perceives text in a very specific way. For her, encoding text information is very important, since she perceives each text fragment as a group of symbols isolated from each other.

What are the symbols?

Not only Russian, English and other letters act as symbols for a computer, but also punctuation marks and other characters. Even the space we use to separate words when typing on a computer is perceived by the device as a symbol. In some ways it is very reminiscent of higher mathematics, because there, according to many professors, zero has a double meaning: it is both a number and at the same time does not mean anything. Even for philosophers, the question of white space can be a pressing issue. A joke, of course, but, as they say, there is some truth in every joke.

What kind of information is there?

So, to perceive information, the computer needs to start processing processes. What kind of information is there anyway? The topic of this article is the encoding of textual information. We will pay special attention to this task, but we will also deal with other micro-topics.

Information can be text, numeric, audio, graphic. The computer must run processes that encode textual information in order to display on the screen what we, for example, type on a keyboard. We will see symbols and letters, this is understandable. What does the machine see? She perceives absolutely all information - and now we are not just talking about text - as a certain sequence of zeros and ones. They form the basis of the so-called binary code. Accordingly, the process that converts information received by the device into something it can understand is called “binary coding of text information.”

Brief principle of operation of binary code

Why is it that binary coding of information is most widespread in electronic machines? The text base, which is encoded using zeros and ones, can be absolutely any sequence of symbols and signs. However, this is not the only advantage that binary text encoding of information has. The thing is that the principle on which this coding method is based is very simple, but at the same time quite functional. When there is an electrical impulse, it is marked (conditionally, of course) with a unit. There is no impulse - marked with zero. That is, text coding of information is based on the principle of constructing a sequence of electrical impulses. A logical sequence made up of binary code symbols is called machine language. At the same time, encoding and processing text information using binary code allows operations to be carried out in a fairly short period of time.

Bits and bytes

A number perceived by a machine contains a certain amount of information. It is equal to one bit. This applies to every one and every zero that make up one or another sequence of encrypted information.

Accordingly, the amount of information in any case can be determined simply by knowing the number of characters in the binary code sequence. They will be numerically equal to each other. 2 digits in the code carry 2 bits of information, 10 digits - 10 bits, and so on. The principle of determining the information volume that lies in a particular fragment of binary code is quite simple, as you can see.

Coding text information in a computer

Right now you are reading an article that consists of a sequence, as we believe, of letters of the Russian alphabet. And the computer, as mentioned earlier, perceives all information (and in this case too) as a sequence not of letters, but of zeros and ones, indicating the absence and presence of an electrical impulse.

The thing is that you can encode one character that we see on the screen using a conventional unit of measurement called a byte. As written above, binary code has a so-called information load. Let us recall that numerically it is equal to the total number of zeros and ones in the selected code fragment. So, 8 bits make 1 byte. The combinations of signals can be very different, as can be easily seen by drawing a rectangle on paper consisting of 8 cells of equal size.

It turns out that text information can be encoded using an alphabet with a capacity of 256 characters. What's the point? The meaning lies in the fact that each character will have its own binary code. Combinations “tied” to certain characters start from 00000000 and end with 11111111. If you move from the binary to the decimal number system, then you can encode information in such a system from 0 to 255.

Do not forget that now there are various tables that use the encoding of letters of the Russian alphabet. These are, for example, ISO and KOI-8, Mac and CP in two variations: 1251 and 866. It is easy to make sure that text encoded in one of these tables will not be displayed correctly in an encoding other than this one. This is due to the fact that in different tables different characters correspond to the same binary code.

This was a problem at first. However, nowadays programs already have built-in special algorithms that convert text, bringing it to the correct form. 1997 was marked by the creation of an encoding called Unicode. In it, each character has 2 bytes at its disposal. This allows you to encode text with a much larger number of characters. 256 and 65536: is there a difference?

Graphics coding

Coding text and graphic information has some similarities. As you know, to display graphic information, it is used peripheral device computer called a monitor. Graphics now (we are talking about computer graphics now) are widely used in a variety of fields. Fortunately, hardware capabilities personal computers allow you to solve quite complex graphic problems.

Processing video information has become possible in recent years. But the text is much “lighter” than the graphics, which, in principle, is understandable. Because of this, the final size of graphics files must be increased. Such problems can be overcome by knowing the essence in which graphical information is presented.

Let's first figure out what groups this type of information is divided into. Firstly, it is raster. Secondly, vector.

Raster images are quite similar to checkered paper. Each cell on such paper is painted over with one color or another. This principle is somewhat reminiscent of a mosaic. That is, it turns out that in raster graphics the image is divided into separate elementary parts. They are called pixels. Translated into Russian, pixels mean “dots”. It is logical that the pixels are ordered relative to the lines. The graphic grid consists of just a certain number of pixels. It is also called a raster. Considering these two definitions, we can say that a raster image is nothing more than a collection of pixels that are displayed on a rectangular grid.

Monitor raster and pixel size affect image quality. The larger the monitor's raster, the higher it will be. Raster sizes are screen resolution, which every user has probably heard of. One of the most important characteristics that computer screens have is resolution, not just resolution. It shows how many pixels there are per unit of length. Typically, monitor resolution is measured in pixels per inch. The more pixels per unit length, the higher the quality will be, since the “grain” is reduced.

Audio stream processing

Coding of text and audio information, like other types of coding, has some features. Let's talk now about last process: coding of audio information.

The representation of an audio stream (as well as an individual sound) can be produced using two methods.

Analogue form of audio information representation

In this case, the value can actually take great amount different meanings. Moreover, these same values ​​do not remain constant: they change very quickly, and this process is continuous.

Discrete form of representation of audio information

If we talk about the discrete method, then in this case the quantity can take only a limited number of values. In this case, the change occurs spasmodically. You can discretely encode not only audio, but also graphic information. As for the analog form, by the way.

Analog audio information is stored on vinyl records, for example. But the CD is already a discrete way of presenting audio information.

At the very beginning, we talked about the fact that the computer perceives all information in machine language. To do this, information is encoded in the form of a sequence of electrical impulses - zeros and ones. Encoding audio information is no exception to this rule. To process sound on a computer, you first need to turn it into that very sequence. Only after this can operations be performed on a stream or a single sound.

When the encoding process occurs, the stream is subject to time sampling. The sound wave is continuous; it develops over small periods of time. The amplitude value is set for each specific interval separately.

Conclusion

So, what did we find out during this article? Firstly, absolutely all information that is displayed on a computer monitor is encoded before appearing there. Secondly, this coding involves translating information into machine language. Thirdly, machine language is nothing more than a sequence of electrical impulses - zeros and ones. Fourthly, there are separate tables for encoding different characters. And, fifthly, graphic and sound information can be presented in analog and discrete form. Here, perhaps, are the main points that we have discussed. One of the disciplines studying this area, is computer science. Coding of textual information and its basics are explained at school, since there is nothing complicated about it.

Contents

I. History of information coding……………………………..3

II. Coding of information……………………………………………………4

III. Coding of text information…………………………….4

IV. Types of encoding tables……………………………………………………...6

V. Calculation of the amount of text information………………………14

List of references……………………………..16

I . History of information coding

Humanity has been using text encryption (encoding) since the very moment when the first secret information appeared. Here are several text encoding techniques that were invented at various stages of the development of human thought:

Cryptography is secret writing, a system of changing writing in order to make the text incomprehensible to the uninitiated;

Morse code or uneven telegraph code, in which each letter or sign is represented by its own combination of short chips electric current(dots) and elementary parcels of triple duration (dash);

Signature gestures are a sign language used by people with hearing impairments.

One of the earliest known encryption methods is named after the Roman emperor Julius Caesar (1st century BC). This method is based on replacing each letter of the encrypted text with another, by shifting the alphabet from the original letter by a fixed number of characters, and the alphabet is read in a circle, that is, after the letter i, a is considered. So the word “byte”, when shifted two characters to the right, is encoded as the word “gwlf”. The reverse process of deciphering a given word is necessary to replace each encrypted letter with the second one to the left of it.

II. Encoding information

Code is a set symbols(or signals) to record (or convey) some predefined concepts.

Information coding is the process of forming a specific representation of information. In a narrower sense, the term “coding” is often understood as a transition from one form of information representation to another, more convenient for storage, transmission or processing.

Usually, each image when encoding (sometimes called encryption) is represented by a separate sign.

A sign is an element of a finite set of elements distinct from each other.

In a narrower sense, the term “coding” is often understood as a transition from one form of information representation to another, more convenient for storage, transmission or processing.

You can process text information on a computer. When entered into a computer, each letter is encoded with a certain number, and when output to external devices (screen or print), images of letters are constructed from these numbers for human perception. The correspondence between a set of letters and numbers is called a character encoding.

As a rule, all numbers in a computer are represented using zeros and ones (not ten digits, as is usual for people). In other words, computers usually operate in the binary number system, since this makes the devices for processing them much simpler. Entering numbers into a computer and outputting them for human reading can be done in the usual decimal form, and all the necessary conversions are performed by programs running on the computer.

III. Encoding text information

The same information can be presented (encoded) in several forms. With the advent of computers, the need arose to encode all types of information that both an individual and humanity as a whole deal with. But humanity began to solve the problem of encoding information long before the advent of computers. The grandiose achievements of mankind - writing and arithmetic - are nothing more than a system for encoding speech and numerical information. Information never appears in its pure form, it is always presented somehow, encoded somehow.

Binary coding is one of the common ways of representing information. IN computers,In CNC robots and machine tools, typically all the ,information the device deals with is encoded as words of the ,binary alphabet.

Since the late 60s, computers have increasingly been used to process text information, and currently the bulk of personal computers in the world (and most of the time) are occupied with processing text information. All these types of information in a computer are presented in binary code, that is, an alphabet of power two is used (only two characters 0 and 1). This is due to the fact that it is convenient to represent information in the form of a sequence of electrical impulses: there is no impulse (0), there is an impulse (1).

Such coding is usually called binary, and the logical sequences of zeros and ones themselves are called machine language.

From a computer point of view, text consists of individual characters. The symbols include not only letters (uppercase or lowercase, Latin or Russian), but also numbers, punctuation marks, special characters such as "=", "(", "&", etc., and even (pay special attention!) spaces between words.

Texts are entered into the computer's memory using the keyboard. The letters, numbers, punctuation marks and other symbols we are familiar with are written on the keys. They enter RAM in binary code. This means that each character is represented by 8-bit binary code.

Traditionally, to encode one character, an amount of information equal to 1 byte is used, i.e. I = 1 byte = 8 bits. Using a formula that connects the number of possible events K and the amount of information I, you can calculate how many different symbols can be encoded (assuming that symbols are possible events): K = 2 I = 2 8 = 256, i.e. for To represent text information, you can use an alphabet with a capacity of 256 characters.

This number of characters is quite sufficient to represent text information, including uppercase and lowercase letters of the Russian and Latin alphabet, numbers, signs, graphic symbols, etc.

Coding consists of assigning each character a unique decimal code from 0 to 255 or a corresponding binary code from 00000000 to 11111111. Thus, a person distinguishes characters by their outline, and a computer by their code.

The convenience of byte-by-byte character encoding is obvious because a byte is the smallest addressable part of memory and, therefore, the processor can access each character separately when processing text. On the other hand, 256 characters is quite a sufficient number to represent a wide variety of symbolic information.

In the process of displaying a symbol on a computer screen, the reverse process is performed - decoding, that is, converting the symbol code into its image. It is important that assigning a specific code to a symbol is a matter of agreement, which is recorded in the code table.

Now the question arises, which eight-bit binary code to assign to each character. It is clear that this is a conditional matter; you can come up with many encoding methods.

All characters of the computer alphabet are numbered from 0 to 255. Each number corresponds to an eight-bit binary code from 00000000 to 11111111. This code is simply the serial number of the character in the binary number system.

IV . Types of encoding tables

A table in which all characters of the computer alphabet are assigned serial numbers is called an encoding table.

Different types of computers use different encoding tables.

The ASCII code table (American Standard Code for Information Interchange) has been adopted as an international standard, encoding the first half of characters with numeric codes from 0 to 127 (codes from 0 to 32 are assigned not to characters, but to function keys).

The ASCII code table is divided into two parts.

Only the first half of the table is the international standard, i.e. characters with numbers from 0 (00000000), to 127 (01111111).

ASCII encoding table structure

Serial number Code Symbol
0 - 31 00000000 - 00011111

Symbols with numbers from 0 to 31 are usually called control symbols.

Their function is to control the process of displaying text on the screen or printing, sounding a sound signal, marking up text, etc.

32 - 127 0100000 - 01111111

Standard part of the table (English). This includes lowercase and uppercase letters of the Latin alphabet, decimal numbers, punctuation marks, all kinds of brackets, commercial and other symbols.

Character 32 is a space, i.e. empty position in the text.

All others are reflected by certain signs.

128 - 255 10000000 - 11111111

Alternative part of the table (Russian).

The second half of the ASCII code table, called the code page (128 codes, starting from 10000000 and ending with 11111111), can have different options, each option has its own number.

The code page is primarily used to accommodate national alphabets other than Latin. In Russian national encodings, characters from the Russian alphabet are placed in this part of the table.

First half of the ASCII code table

Please note that in the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order. This observance of lexicographic order in the arrangement of symbols is called the principle of sequential coding of the alphabet.

For letters of the Russian alphabet, the principle of sequential coding is also observed.

Second half of the ASCII code table

Unfortunately, there are currently five different Cyrillic encodings (KOI8-R, Windows. MS-DOS, Macintosh and ISO). Because of this, problems often arise with transferring Russian text from one computer to another, from one software system to another.

Chronologically, one of the first standards for encoding Russian letters on computers was KOI8 ("Information Exchange Code, 8-bit"). This encoding was used back in the 70s on computers of the ES computer series, and from the mid-80s it began to be used in the first Russified versions of the UNIX operating system.

From the early 90s, the time of dominance of the MS DOS operating system, the CP866 encoding remains ("CP" means "Code Page", "code page").

Apple computers running the Mac OS operating system use their own Mac encoding.

In addition, the International Standards Organization (ISO) has approved another encoding called ISO 8859-5 as a standard for the Russian language.

The most common encoding currently used is Microsoft Windows, abbreviated CP1251. Introduced by Microsoft; given the widespread operating systems(OS) and other software products of this company in the Russian Federation, it has found wide distribution.

Since the late 90s, the problem of standardizing character encoding has been solved by the introduction of a new international standard called Unicode.

This is a 16-bit encoding, i.e. it allocates 2 bytes of memory for each character. Of course, this increases the amount of memory occupied by 2 times. But such a code table allows the inclusion of up to 65536 characters. The complete specification of the Unicode standard includes all the existing, extinct and artificially created alphabets of the world, as well as many mathematical, musical, chemical and other symbols.

Internal representation of words in computer memory

using an ASCII table

Sometimes it happens that a text consisting of letters of the Russian alphabet received from another computer cannot be read - some kind of “abracadabra” is visible on the monitor screen. This happens because computers use different character encodings for the Russian language.

Thus, each encoding is specified by its own code table. As can be seen from the table, different characters are assigned to the same binary code in different encodings.

For example, the sequence of numeric codes 221, 194, 204 in the CP1251 encoding forms the word “computer”, whereas in other encodings it will be a meaningless set of characters.

Fortunately, in most cases the user does not have to worry about transcoding text documents, since this is done by special converter programs built into applications.

V . Calculation of the amount of text information

Task 1: Encode the word “Rome” using the KOI8-R and CP1251 encoding tables.

Solution:

Task 2: Assuming that each character is encoded in one byte, estimate the information volume of the following sentence:

“My uncle has the most honest rules,

When I seriously fell ill,

He forced himself to respect

And I couldn’t think of anything better.”

Solution: This phrase has 108 characters, including punctuation, quotation marks and spaces. We multiply this number by 8 bits. We get 108*8=864 bits.

Task 3: The two texts contain the same number of characters. The first text is written in Russian, and the second in the language of the Naguri tribe, whose alphabet consists of 16 characters. Whose text contains more information?

Solution:

1) I = K * a (the information volume of the text is equal to the product of the number of characters and the information weight of one character).

2) Because Both texts have the same number of characters (K), then the difference depends on the information content of one character of the alphabet (a).

3) 2 a1 = 32, i.e. a 1 = 5 bits, 2 a2 = 16, i.e. and 2 = 4 bits.

4) I 1 = K * 5 bits, I 2 = K * 4 bits.

5) This means that the text written in Russian carries 5/4 times more information.

Task 4: The size of the message, containing 2048 characters, was 1/512 of a MB. Determine the power of the alphabet.

Solution:

1) I = 1/512 * 1024 * 1024 * 8 = 16384 bits - converted the information volume of the message into bits.

2) a = I / K = 16384 /1024 = 16 bits - accounts for one character of the alphabet.

3) 2*16*2048 = 65536 characters – the power of the alphabet used.

Task 5: Laser printer Canon LBP prints at an average speed of 6.3 Kbps. How long will it take to print an 8-page document, if you know that one page has an average of 45 lines and 70 characters per line (1 character - 1 byte)?

Solution:

1) Find the amount of information contained on 1 page: 45 * 70 * 8 bits = 25200 bits

2) Find the amount of information on 8 pages: 25200 * 8 = 201600 bits

3) We reduce to common units of measurement. To do this, we convert Mbits into bits: 6.3*1024=6451.2 bits/sec.

4) Find the printing time: 201600: 6451.2 =31 seconds.

Bibliography

1. Ageev V.M. Information and coding theory: sampling and coding of measurement information. - M.: MAI, 1977.

2. Kuzmin I.V., Kedrus V.A. Fundamentals of information theory and coding. - Kyiv, Vishcha school, 1986.

3. The simplest methods of text encryption / D.M. Zlatopolsky. – M.: Chistye Prudy, 2007 – 32 p.

4. Ugrinovich N.D. Computer Science and information Technology. Textbook for grades 10-11 / N.D. Ugrinovich. – M.: BINOM. Laboratory of Knowledge, 2003. – 512 p.

5. http://school497.spb.edu.ru/uchint002/les10/les.html#n

Principle of sequential alphabet encoding: In the encoding table, letters (uppercase and lowercase) are arranged in alphabetical order, and numbers are ordered in ascending order of value.

Picture 8 from the presentation “Texts in computer memory» for computer science lessons on the topic “Text”

Dimensions: 960 x 720 pixels, format: jpg.

To download a free image for a computer science lesson, right-click on the image and click “Save image as...”.

To display pictures in class, you can also download for free the entire presentation “Texts in computer memory.ppsx” with all the pictures in a zip archive. The archive size is 89 KB.

Download presentation

Text

“Determining the amount of information” - N=2I. Not matter and not energy...? Amount of information. Information. How can I measure the amount of information received? Objectives To study methods for determining the amount of information: quantitative; alphabetical. We measure... Don't be surprised, information can be measured quantitatively. Alphabetical approach to determining the amount of information. “Coding in computer science” - Table of ASCII codes for Russia. About what? where is it stored? how is it encoded? Information coding in computer science and biology. DNA structure. Gene. Lesson plan: The essence of coding. Authors of the spatial DNA model. Homework: Comparison chart. Triplety Uniqueness Degeneracy Universality Non-overlapping.“Encoding text information” - The symbol “a” will appear in the document. Determining the numeric code of a character. The symbol code is stored in

random access memory

computer, where it occupies 1 byte. 1. Launch the standard Notepad program. Enter the Command [Insert Symbol...]. Entering characters by numeric code. The Symbol dialog box will appear on the screen. Coding of text information.

“Texts in computer memory” - Computer alphabet. Encoding table, international standard ASCII. The order of letters in the Latin alphabet is ... i, j, k, l, m, n, o .... Texts in computer memory. Each letter is a symbol of the computer alphabet and therefore takes up 1 byte of memory. "Abracadabra". ANSWER: PCs use different character encodings for the Russian language.

There are a total of 15 presentations in the topic

Publications on the topic