For the Windows operating system, file-based ones are used. Microsoft Windows and file systems

Why may a smartphone not launch programs from a memory card? How is ext4 fundamentally different from ext3? Why will a flash drive last longer if you format it in NTFS rather than FAT? What is the main problem with F2FS? The answers lie in the structural features of file systems. We'll talk about them.

Introduction

File systems determine how data is stored. They determine what limitations the user will encounter, how fast read and write operations will be, and how long the drive will operate without failures. This is especially true for budget SSDs and their younger brothers - flash drives. Knowing these features, you can get the most out of any system and optimize its use for specific tasks.

You have to choose the type and parameters of the file system every time you need to do something non-trivial. For example, you want to speed up the most common file operations. At the file system level this can be achieved different ways: Indexing will provide fast searches, and pre-reserving free blocks will make it easier to overwrite frequently changing files. Preliminary data optimization in random access memory will reduce the number of required I/O operations.

Such properties of modern file systems as lazy writing, deduplication and other advanced algorithms help to increase the period of trouble-free operation. They are especially relevant for cheap SSDs with TLC memory chips, flash drives and memory cards.

There are separate optimizations for different levels of disk arrays: for example, the file system can support simplified volume mirroring, instant snapshotting, or dynamic scaling without taking the volume offline.

Black box

Users generally work with the file system that is offered by default by the operating system. They rarely create new disk partitions and even less often think about their settings - they simply use the recommended parameters or even buy pre-formatted media.

For Windows fans, everything is simple: NTFS on all disk partitions and FAT32 (or the same NTFS) on flash drives. If there is a NAS and it uses some other file system, then for most it remains beyond perception. They simply connect to it over the network and download files, as if from a black box.

On mobile gadgets with Android, ext4 is most often found in the internal memory and FAT32 on microSD cards. Yabloko does not care at all what kind of file system they have: HFS+, HFSX, APFS, WTFS... for them there are only beautiful folder and file icons drawn by the best designers. Linux users have the richest choice, but you can add support for non-native file systems in both Windows and macOS - more on that later.

Common roots

Over a hundred different file systems have been created, but a little more than a dozen can be considered current. Although they were all developed for their own specific applications, many ended up being related on a conceptual level. They are similar because they use the same type of (meta)data representation structure - B-trees (“bi-trees”).

Like any hierarchical system, a B-tree begins with a root record and then branches down to leaf elements - individual records of files and their attributes, or “leaves.” The main point of creating such a logical structure was to speed up the search for file system objects on large dynamic arrays - like hard drives with a capacity of several terabytes or even more impressive RAID arrays.

B-trees require far fewer disk accesses than other types of balanced trees to perform the same operations. This is achieved due to the fact that the final objects in B-trees are hierarchically located at the same height, and the speed of all operations is precisely proportional to the height of the tree.

Like other balanced trees, B-trees have equal path lengths from the root to any leaf. Instead of growing upward, they branch more and grow wider: all branch points in a B-tree store many references to child objects, making them easy to find in fewer calls. A large number of pointers reduces the number of the most time-consuming disk operations - head positioning when reading arbitrary blocks.

The concept of B-trees was formulated back in the seventies and has since undergone various improvements. In one form or another it is implemented in NTFS, BFS, XFS, JFS, ReiserFS and many DBMSs. All of them are relatives in terms of the basic principles of data organization. The differences concern details, often quite important. Related file systems also have a common disadvantage: they were all created to work specifically with disks even before the advent of SSDs.

Flash memory as the engine of progress

Solid-state drives are gradually replacing disk drives, but for now they are forced to use file systems that are alien to them, passed down by inheritance. They are built on flash memory arrays, the operating principles of which differ from those of disk devices. In particular, flash memory must be erased before being written, an operation that NAND chips cannot perform at the individual cell level. It is only possible for large blocks entirely.

This limitation is due to the fact that in NAND memory all cells are combined into blocks, each of which has only one common connection to the control bus. We will not go into details of the page organization and describe the complete hierarchy. The very principle of group operations with cells and the fact that the sizes of flash memory blocks are usually larger than the blocks addressed in any file system are important. Therefore, all addresses and commands for drives with NAND flash must be translated through the FTL (Flash Translation Layer) abstraction layer.

Compatibility with the logic of disk devices and support for commands of their native interfaces is provided by flash memory controllers. Typically, FTL is implemented in their firmware, but can (partially) be implemented on the host - for example, Plextor writes drivers for its SSDs that accelerate writing.

It is impossible to do without FTL, since even writing one bit to a specific cell triggers a whole series of operations: the controller finds the block containing the desired cell; the block is read completely, written to the cache or to free space, then erased entirely, after which it is rewritten back with the necessary changes.

This approach is reminiscent of everyday life in the army: in order to give an order to one soldier, the sergeant makes a general formation, calls the poor fellow out of formation and commands the rest to disperse. In the now rare NOR memory, the organization was special forces: each cell was controlled independently (each transistor had an individual contact).

The tasks for controllers are increasing, since with each generation of flash memory the technical process of its production decreases in order to increase density and reduce the cost of data storage. Along with technological standards, the estimated service life of chips is also decreasing.

Modules with single-level SLC cells had a declared resource of 100 thousand rewrite cycles and even more. Many of them still work in old flash drives and CF cards. For enterprise-class MLC (eMLC), the resource was declared in the range of 10 to 20 thousand, while for regular consumer-grade MLC it is estimated at 3-5 thousand. Memory of this type is actively being squeezed by even cheaper TLC, whose resource barely reaches a thousand cycles. Keeping the lifespan of flash memory at an acceptable level requires software tricks, and new file systems are becoming one of them.

Initially, the manufacturers assumed that the file system was unimportant. The controller itself must service a short-lived array of memory cells of any type, distributing the load between them in an optimal way. For the file system driver, it simulates a regular disk, and itself performs low-level optimizations on any access. However, in practice, optimization different devices varies from magical to fictitious.

In enterprise SSDs, the built-in controller is a small computer. It has a huge memory buffer (half a gigabyte or more) and supports many data efficiency techniques to avoid unnecessary rewrite cycles. The chip organizes all blocks in the cache, performs lazy writes, performs on-the-fly deduplication, reserves some blocks and clears others in the background. All this magic happens completely unnoticed by the OS, programs and the user. With an SSD like this, it really doesn't matter which file system is used. Internal optimizations have a much greater impact on performance and resource than external ones.

Budget SSDs (and even more so flash drives) are equipped with much less smart controllers. The cache in them is limited or absent, and advanced server technologies are not used at all. The controllers in memory cards are so primitive that it is often claimed that they do not exist at all. Therefore, for cheap devices with flash memory, external methods of load balancing remain relevant - primarily using specialized file systems.

From JFFS to F2FS

One of the first attempts to write a file system that would take into account the principles of organizing flash memory was JFFS - Journaling Flash File System. Initially, this development by the Swedish company Axis Communications was aimed at increasing the memory efficiency of network devices that Axis produced in the nineties. The first version of JFFS supported only NOR memory, but already in the second version it became friends with NAND.

Currently JFFS2 has limited use. It is still mainly used in Linux distributions for embedded systems. It can be found in routers, IP cameras, NAS and other regulars of the Internet of Things. In general, wherever a small amount of reliable memory is required.

A further attempt to develop JFFS2 was LogFS, which stored inodes in a separate file. The authors of this idea are Jorn Engel, an employee of the German division of IBM, and Robert Mertens, a teacher at the University of Osnabrück. Source LogFS is available on GitHub. Judging by the fact that last change it was made four years ago, LogFS never gained popularity.

But these attempts spurred the emergence of another specialized file system - F2FS. It was developed by Samsung Corporation, which accounts for a considerable part of the flash memory produced in the world. Samsung makes NAND Flash chips for its own devices and for other companies, and also develops SSDs with fundamentally new interfaces instead of legacy disk ones. Creating a specialized file system optimized for flash memory was a long overdue necessity from Samsung's point of view.

Four years ago, in 2012, Samsung created F2FS (Flash Friendly File System). Her idea was good, but the implementation turned out to be crude. The key task when creating F2FS was simple: to reduce the number of cell rewrite operations and distribute the load on them as evenly as possible. This requires performing operations on multiple cells within the same block at the same time, rather than forcing them one at a time. This means that what is needed is not instant rewriting of existing blocks at the first request of the OS, but caching of commands and data, adding new blocks to free space and delayed erasing of cells.

Today, F2FS support is already officially implemented in Linux (and therefore in Android), but in practice it does not yet provide any special advantages. The main feature of this file system (lazy rewrite) led to premature conclusions about its effectiveness. The old caching trick even fooled early versions of benchmarks, where F2FS demonstrated an imaginary advantage not by a few percent (as expected) or even by several times, but by orders of magnitude. The F2FS driver simply reported the completion of an operation that the controller was just planning to do. However, if the real performance gain for F2FS is small, then the wear on the cells will definitely be less than when using the same ext4. Those optimizations that a cheap controller cannot do will be performed at the level of the file system itself.

Extents and bitmaps

For now, F2FS is perceived as exotic for geeks. Even in your own Samsung smartphones ext4 still applies. Many consider it a further development of ext3, but this is not entirely true. This is more about a revolution than about breaking the 2 TB per file barrier and simply increasing other quantitative indicators.

When computers were large and files were small, addressing was not a problem. Each file was allocated a certain number of blocks, the addresses of which were entered into the correspondence table. This is how the ext3 file system worked, which remains in service to this day. But in ext4 a fundamentally different addressing method appeared - extents.

Extents can be thought of as extensions of inodes as discrete sets of blocks that are addressed entirely as contiguous sequences. One extent can contain an entire medium-sized file, but for large files it is enough to allocate a dozen or two extents. This is much more efficient than addressing hundreds of thousands of small blocks of four kilobytes.

The recording mechanism itself has also changed in ext4. Now blocks are distributed immediately in one request. And not in advance, but immediately before writing data to disk. Lazy multi-block allocation allows you to get rid of unnecessary operations that ext3 was guilty of: in it, blocks for a new file were allocated immediately, even if it entirely fit in the cache and was planned to be deleted as temporary.


FAT Restricted Diet

In addition to balanced trees and their modifications, there are other popular logical structures. There are file systems with a fundamentally different type of organization - for example, linear. You probably use at least one of them often.

Mystery

Guess the riddle: at twelve she began to gain weight, by sixteen she was a stupid fatty, and by thirty-two she became fat, and remained a simpleton. Who is she?

That's right, this is a story about the FAT file system. Compatibility requirements provided her with bad heredity. On floppy disks it was 12-bit, on hard drives- at first it was 16-bit, but it has reached our days as 32-bit. In each subsequent version, the number of addressable blocks increased, but nothing changed in its essence.

The still popular FAT32 file system appeared twenty years ago. Today it is still primitive and does not support access control lists, disk quotas, background compression, or others modern technologies optimization of data processing.

Why is FAT32 needed these days? Everything is still solely to ensure compatibility. Manufacturers rightly believe that a FAT32 partition can be read by any OS. That's why they create it on external hard drives, USB Flash and memory cards.

How to free up your smartphone's flash memory

microSD(HC) cards used in smartphones are formatted in FAT32 by default. This is the main obstacle to installing applications on them and transferring data from internal memory. To overcome it, you need to create a partition on the card with ext3 or ext4. All file attributes (including owner and access rights) can be transferred to it, so any application can work as if it were launched from internal memory.

Windows does not know how to create more than one partition on flash drives, but for this you can run Linux (at least in a virtual machine) or an advanced utility for working with logical partitioning - for example, MiniTool Partition Wizard Free. Having discovered an additional primary partition with ext3/ext4 on the card, the Link2SD application and similar ones will offer many more options than in the case of a single FAT32 partition.


Another argument in favor of choosing FAT32 is often cited as its lack of journaling, which means faster write operations and less wear on NAND Flash memory cells. In practice, using FAT32 leads to the opposite and gives rise to many other problems.

Flash drives and memory cards die quickly due to the fact that any change in FAT32 causes overwriting of the same sectors where two chains of file tables are located. I saved the entire web page, and it was overwritten a hundred times - with each addition of another small GIF to the flash drive. Have you launched portable software? It creates temporary files and constantly changes them while running. Therefore, it is much better to use NTFS on flash drives with its failure-resistant $MFT table. Small files can be stored directly in the main file table, and its extensions and copies are written to different areas of flash memory. In addition, NTFS indexing makes searching faster.

INFO

For FAT32 and NTFS, theoretical restrictions on the level of nesting are not specified, but in practice they are the same: only 7707 subdirectories can be created in a first-level directory. Those who like to play matryoshka dolls will appreciate it.

Another problem that most users face is that it is impossible to write a file larger than 4 GB to a FAT32 partition. The reason is that in FAT32 the file size is described by 32 bits in the file allocation table, and 2^32 (minus one, to be precise) is exactly four gigs. It turns out that neither a movie in normal quality nor a DVD image can be written to a freshly purchased flash drive.

Copying large files is not so bad: when you try to do this, the error is at least immediately visible. In other situations, FAT32 acts as a time bomb. For example, you copied portable software onto a flash drive and at first you use it without problems. After a long time, one of the programs (for example, accounting or email), the database becomes bloated, and... it simply stops updating. The file cannot be overwritten because it has reached the 4 GB limit.

A less obvious problem is that in FAT32 the creation date of a file or directory can be specified to within two seconds. This is not sufficient for many cryptographic applications that use timestamps. The low precision of the date attribute is another reason why FAT32 is not considered a valid file system from a security perspective. However, its weaknesses can also be used for your own purposes. For example, if you copy any files from an NTFS partition to a FAT32 volume, they will be cleared of all metadata, as well as inherited and specially set permissions. FAT simply doesn't support them.

exFAT

Unlike FAT12/16/32, exFAT was developed specifically for USB Flash and large (≥ 32 GB) memory cards. Extended FAT eliminates the above-mentioned disadvantage of FAT32 - overwriting the same sectors with any change. As a 64-bit system, it has no practically significant limits on the size of a single file. Theoretically, it can be 2^64 bytes (16 EB) in length, and cards of this size will not appear soon.

Another fundamental difference between exFAT is its support for access control lists (ACLs). This is no longer the same simpleton from the nineties, but the closed nature of the format hinders the implementation of exFAT. ExFAT support is fully and legally implemented only in Windows (starting from XP SP2) and OS X (starting from 10.6.5). On Linux and *BSD it is supported either with restrictions or not quite legally. Microsoft requires licensing for the use of exFAT, and there is a lot of legal controversy in this area.

Btrfs

Another prominent representative of file systems based on B-trees is called Btrfs. This FS appeared in 2007 and was initially created in Oracle with an eye to working with SSDs and RAIDs. For example, it can be dynamically scaled: create new inodes directly on the running system, or divide the volume into subvolumes without allocating them free space.

The copy-on-write mechanism implemented in Btrfs and full integration with the Device mapper kernel module allow you to take almost instantaneous snapshots through virtual block devices. Pre-compression (zlib or lzo) and deduplication speed up basic operations while also extending the lifetime of flash memory. This is especially noticeable when working with databases (2-4 times compression is achieved) and small files (they are written in orderly large blocks and can be stored directly in “leaves”).

Btrfs also supports full logging mode (data and metadata), volume checking without unmounting, and many other modern features. The Btrfs code is published under the GPL license. This file system has been supported as stable in Linux since kernel version 4.3.1.

Logbooks

Almost all more or less modern file systems (ext3/ext4, NTFS, HFSX, Btrfs and others) belong to the general group of journaled ones, since they keep records of changes made in a separate log (journal) and are checked against it in the event of a failure during disk operations . However, the logging granularity and fault tolerance of these file systems differ.

Ext3 supports three logging modes: with feedback, organized and complete logging. The first mode involves recording only general changes (metadata), performed asynchronously with respect to changes in the data itself. In the second mode, the same metadata recording is performed, but strictly before making any changes. The third mode is equivalent to full logging (changes both in metadata and in the files themselves).

Only the last option ensures data integrity. The remaining two only speed up the detection of errors during the scan and guarantee restoration of the integrity of the file system itself, but not the contents of the files.

Journaling in NTFS is similar to the second logging mode in ext3. Only changes in metadata are recorded in the log, and the data itself may be lost in the event of a failure. This logging method in NTFS was not intended as a way to achieve maximum reliability, but only as a compromise between performance and fault tolerance. This is why people who are used to working with fully journaled systems consider NTFS pseudo-journaling.

The approach implemented in NTFS is in some ways even better than the default in ext3. NTFS additionally periodically creates checkpoints to ensure that all previously deferred disk operations are completed. Checkpoints have nothing to do with recovery points in \System Volume Information\ . These are just service log entries.

Practice shows that such partial NTFS journaling is in most cases sufficient for trouble-free operation. After all, even with a sudden power outage, disk devices do not lose power instantly. The power supply and numerous capacitors in the drives themselves provide just the minimum amount of energy that is enough to complete the current write operation. With modern SSDs, with their speed and efficiency, the same amount of energy is usually enough to perform pending operations. An attempt to switch to full logging would reduce the speed of most operations significantly.

Connecting third-party files in Windows

The use of file systems is limited by their support at the OS level. For example, Windows does not understand ext2/3/4 and HFS+, but sometimes it is necessary to use them. This can be done by adding the appropriate driver.

WARNING

Most drivers and plugins for supporting third-party file systems have their limitations and do not always work stably. They may conflict with other drivers, antiviruses, and virtualization programs.

An open driver for reading and writing ext2/3 partitions with partial support for ext4. IN latest version extents and partitions up to 16 TB are supported. LVM, access control lists, and extended attributes are not supported.


Exists free plugin For Total Commander. Supports reading ext2/3/4 partitions.


coLinux is an open and free port of the Linux kernel. Together with a 32-bit driver, it allows you to run Linux on Windows environment from 2000 to 7 without using virtualization technologies. Supports 32-bit versions only. Development of a 64-bit modification was canceled. coLinux allows, among other things, to organize from Windows access to ext2/3/4 partitions. Support for the project was suspended in 2014.

Windows 10 may already have built-in support for specific Linux file systems, it’s just hidden. These thoughts are suggested by the kernel-level driver Lxcore.sys and the LxssManager service, which is loaded as a library by the Svchost.exe process. For more information about this, see Alex Ionescu’s report “The Linux Kernel Hidden Inside Windows 10,” which he gave at Black Hat 2016.


ExtFS for Windows is a paid driver produced by Paragon. It runs on Windows 7 to 10 and supports read/write access to ext2/3/4 volumes. Provides almost complete support for ext4 on Windows.

HFS+ for Windows 10 is another proprietary driver produced by Paragon Software. Despite the name, it works in all Windows versions starting with XP. Provides full access to HFS+/HFSX file systems on disks with any layout (MBR/GPT).

WinBtrfs is an early development of the Btrfs driver for Windows. Already in version 0.6 it supports both read and write access to Btrfs volumes. It can handle hard and symbolic links, supports alternative data streams, ACLs, two types of compression and asynchronous read/write mode. While WinBtrfs does not know how to use mkfs.btrfs, btrfs-balance and other utilities to maintain this file system.

Capabilities and limitations of file systems: summary table

File system Maximum volume size Limit size of one file Length of proper file name Length of the full file name (including the path from the root) Limit number of files and/or directories Accuracy of file/directory date indication Rights dos-tu-pa Hard links Symbolic links Snap-shots Data compression in the background Data encryption in the background Grandfather-ple-ka-tion of data
FAT16 2 GB in 512 byte sectors or 4 GB in 64 KB clusters 2 GB 255 bytes with LFN - - - - - - - - - -
FAT32 8 TB sectors of 2 KB each 4 GB (2^32 - 1 byte) 255 bytes with LFN up to 32 subdirectories with CDS 65460 10 ms (create) / 2 s (modify) No No No No No No No
exFAT ≈ 128 PB (2^32-1 clusters of 2^25-1 bytes) theoretical / 512 TB due to third-party restrictions 16 EB (2^64 - 1 byte) 2796202 in the catalog 10 ms ACL No No No No No No
NTFS 256 TB in 64 KB clusters or 16 TB in 4 KB clusters 16 TB (Win 7) / 256 TB (Win 8) 255 Unicode characters (UTF-16) 32,760 Unicode characters, up to a maximum of 255 characters per element 2^32-1 100 ns ACL Yes Yes Yes Yes Yes Yes
HFS+ 8 EB (2^63 bytes) 8 EB 255 Unicode characters (UTF-16) not limited separately 2^32-1 1 s Unix, ACL Yes Yes No Yes Yes No
APFS 8 EB (2^63 bytes) 8 EB 255 Unicode characters (UTF-16) not limited separately 2^63 1 ns Unix, ACL Yes Yes Yes Yes Yes Yes
Ext3 32 TB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs) 2 TB (theoretically) / 16 GB for older programs 255 Unicode characters (UTF-16) not limited separately - 1 s Unix, ACL Yes Yes No No No No
Ext4 1 EB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs) 16 TB 255 Unicode characters (UTF-16) not limited separately 4 billion 1 ns POSIX Yes Yes No No Yes No
F2FS 16 TB 3.94 TB 255 bytes not limited separately - 1 ns POSIX, ACL Yes Yes No No Yes No
BTRFS 16 EB (2^64 - 1 byte) 16 EB 255 ASCII characters 2^17 bytes - 1 ns POSIX, ACL Yes Yes Yes Yes Yes Yes

File system- Part operating system, which controls the placement and access to files and directories on disk.

Access– the procedure for establishing communication with memory and files located in it for writing and reading data.

File- a logically related collection of data or programs for placement in external memory the named area is allocated.

The file serves as an accounting unit of information in the OS. Any actions with information in the OS are carried out on files: writing to disk, displaying, entering from the keyboard, printing, reading information, etc.

The following parameters are used to characterize the file:

Full name;

Volume in bytes;

Date of creation;

Time of creation;

Special attributes: R (Read only) – read only, H (Hidden) – hidden file, S (System) – system file, A (Archive) – archived file.

Attributes are additional parameters that define file properties. The operating system allows you to control and change them; the state of attributes is taken into account when performing automatic operations with files. The purpose of the attributes is shown in table. 2.1.

Table 2.1

Attribute Purpose
Only for reading Limits the ability to work with the file - prohibits making changes to it
Hidden Signals the operating system that this file should not be displayed on the screen during file operations, designed to protect against accidental (intentional or unintentional) file damage
End of table. 2.1
System Marks files that have important functions in the operation of the operating system itself. His distinctive feature The problem is that it cannot be changed using the operating system. As a rule, most files that have set attribute“System”, also have the “Hidden” attribute set
Archived Previously used to run programs Reserve copy. Currently not in use

On the disk, the file does not require continuous space for its placement; it can occupy free clusters in different parts of the disk. Cluster is the minimum unit of disk space that can be allocated to a file. The file can occupy either one cluster or several dozen
depending on the amount of information contained in the file. The cluster size (4K, 8K, 16K, 32 KB, etc.) depends on the file system type (FAT, HPFS, NTFS) and disk capacity.

File system FAT(File Allocation Table) is a DOS and Windows 9x file allocation table, originally developed for floppy disks. The advantage of FAT is its widespread use and support by most operating systems. There are FAT16 and FAT32, which use 16 and 32 bits for addressing, respectively, resulting in addressing 2 16 and 2 32 clusters. The FAT16 file system allows you to address 2 16 = 65,536 clusters. As a result, for a logical disk with a capacity of 500 MB, each cluster occupies 8 KB, and for a 1.0 GB disk, the cluster size becomes 16 KB. Therefore, when storing a small file (less than 1 KB), a significant part of the cluster will not be used. The larger the partition size hard drive, the larger the minimum amount of indivisible memory allocated to the file and the greater the losses. These losses are significantly reduced when more efficient file systems are used. HPFS file system(High Performance File System) allows you to overcome a number of other disadvantages of FAT.



For example, when using HPFS:

The speed of searching for a file and working with it increases due to the fact that information about the file is located next to the file itself;

Eliminates file fragmentation, which reduces system performance and wears out disks.

A similar effect is achieved by using the NTFS file system (Windows NT). File system NTFS(NT File System) - developed by Microsoft, is a development of the HPFS file system. It supports disks up to 16,777,216 Terabytes and contains two copies of MFT (Master File Table) with a transaction system (requests to change data) when writing files to disk, which increases reliability. NTFS guarantees the safety of data in the event of copying, moving or deleting files or folders, even if a hardware failure or power failure occurs.



The files can store various types and forms of information presentation: texts, pictures, drawings, numbers, programs, tables, etc. Features specific files determined by their format. Under format is understood as a language element that symbolically describes the representation of information in a file.

The logical drives on which files are written are called the operating system A:, B:, C:, D:, etc. The disks are organized directories (folders)– directories of files indicating their location on the disk. Directories store full file names, as well as characteristics such as creation date and time, size in bytes, and special attributes. Files are combined into directories according to any common characteristic specified by their creator (by type, affiliation, purpose, time of creation, etc.). Directories at low levels are nested within directories of higher levels high levels and are for them nested. This file system structure is called hierarchical. Top level of hierarchy – root disk directory. There is always a single root directory (for Windows OS the root directory is the Desktop) in which directories (folders) and files are located. Each folder
in turn, may contain subfolders and files, etc.

There are two states of a folder: current (active), in which the user's work is performed in current computer time, and passive, which currently has no connection to the folder.

File structure maintenance functions include the following operations that occur under the control of the operating system:

Creating files and folders and assigning names to them;

Renaming files and folders;

Copying and moving files between computer drives and between folders on the same drive;

Deleting files and folders;

Navigation through the file structure in order to access a given file or folder;

File attributes management.

According to the methods of naming files, a distinction is made between “short” and “long” names. The “short” file name consists of two parts: the name itself and the name extension. The actual file name is allotted 8 characters, and its extension is
3 characters. The name is separated from the extension by a dot. Both the name and the extension can only include alphanumeric characters of the Latin alphabet. The “short” name is formed according to the rules for forming file names of the MS DOS operating system. The extension usually describes the file format, for example:

The main disadvantage of “short” names is their low content. It is not always possible to express the characteristics of a file in a few characters, therefore, with the advent of the operating system Windows systems 95 the concept of a “long” name was introduced. This name can contain up to 255 characters. A “long” name can contain any characters except nine special ones:

\ / : * ? " < > |

Spaces and multiple periods are allowed in the name. The name extension includes all characters after the last dot.

In a hierarchical data structure, the file address is given route (by access), leading from the top of the structure to the file. When writing a file access path that goes through a system of subfolders, all intermediate folders are separated by the “\” character (backslash). Full file name contains the disk name, access path and file name (an example is shown in Fig. 2.1).

From:\My Documents\Current\Abstracts\Operating Systems.doc

Rice. 2.1. Full file name

We are accustomed to terms such as “file” and “folder” or “directory”. But what is this mechanism that manages files, audits them and controls their movement?

Figuratively, a disk file storage system can be compared to a huge and chaotically arranged warehouse into which new goods are constantly being delivered. There is a warehouse manager who knows exactly where each product is located and how to quickly access it. Such managers in the file storage system are .

Let's figure out how the file system works, what types of it exist, and consider the basic operations with the file system that affect system performance.

How the Windows file system works

The operating system assigns a name to each file, which, like an address, identifies it in the system. This path is a line at the beginning of which the logical drive on which the file is stored is indicated, and then all folders are displayed sequentially according to the degree of their nesting.

When a program requires a file, it sends a request to the operating system, which is processed by the Windows file system. Using the received path, the system receives the address of the file storage location (physical location) and passes it to the program that sent the request.

Thus, the file system has its own database, which, on the one hand, establishes a correspondence between the physical address of the file and its path, and on the other hand, stores additional file attributes, such as size, creation date, file access rights, and others.

In FAT32 and NTFS file systems, such a database is the Master File Table (MFT).

What actually happens when you move, copy, and delete files?

As strange as it may seem, not all operations with files and folders lead to physical changes on the hard drive. Some operations only make changes to the MFT, and the file itself remains in the same place.

Let's take a closer look at the process of the file system when performing basic operations with files. This will help us understand how the OS becomes clogged, why some files take a long time to load, and what needs to be done to improve the performance of the operating system.

1. Move a file: This operation involves changing one path to another. Therefore, only the entry in the Master File Table needs to be changed, and the file itself does not need to be physically moved. It remains in the same place unchanged.

2. Copy a file: This operation involves creating another additional implementation of the file in a new location. In this case, not only a record is created in MFT, but also another real copy of the file appears in a new location.

3. Deleting a file: In this case, the file is first placed in the Trash. After calling the “Empty” Recycle Bin function, the file system deletes the entry from the MFT. In this case, the file is not physically deleted; it remains in its original place. And it will exist until it is rewritten. This feature should be taken into account when deleting confidential files: it is better to use special programs for this.

Now it becomes clear why the move operation is faster than the copy operation. I repeat, in the second case, in addition to making changes to the Main File Table, you also need to create a physical copy of the file.

What types of file systems are there?

1. FAT16 (File Allocated Table 16). The legacy file system, which could only handle files no larger than 2 GB, supported hard disks with a capacity of no more than 4 GB, and could store and process no more than 65,636 files. With the development of technology and the growing needs of users, this file system was replaced by NTFS.

2. FAT32. With the growing volume of data stored on storage media, a new Windows file system was developed and introduced, which began to support files up to 4 GB in size and set the maximum hard drive capacity at 8 TB. As a rule, currently FAT32 is used only on external storage media.

3. NTFS (New Technology File System). This is a standard file system installed on all modern computers running the Windows operating system. The maximum file size processed by this file system is 16 TB; The maximum supported hard drive size is 256 TB.

An additional feature of NTFS is logging its actions. Initially, all changes are entered into a specially designated area, and only then are they recorded in the file table. This helps prevent data loss, for example due to power failures.

4. HSF+ (Hierarchical File System+). Standard file system for computers running MacOS system. Similar to NTFS, it supports large files and hard drives with a capacity of several hundred terabytes.

To change the file system, you will have to format the hard drive partition. Typically, this operation involves complete removal all available information in this section.

how to find out the file system type?

The easiest way: open “File Explorer” -> select the hard drive partition you are interested in -> right-click on it -> in the menu that appears, select “Properties” -> in the window that opens, select the “General” tab.

Windows File System Maintenance

It should be noted that the file system does not maintain “order” on the hard drive. Windows OS is designed in such a way that it saves new files in the first unallocated cell it comes across. Moreover, if the file does not fit entirely into this cell, then it is divided into several parts (fragmented). Accordingly, the time to access and open such a file increases, which affects the overall performance of the system.

To prevent this and “put things in order” in the file system, you need to regularly defragment your hard drive partitions.

To do this, again go to the properties of the hard drive partition you are interested in (as described above), go to the “Service” tab and click on the “Defragmentation” button.

In the window that opens, you can configure the operation of automatic disk defragmentation.

To defragment yourself, select the hard drive partition, click the “Analyze disk” button -> and then “Disk defragmentation”.

Wait until the operation is completed and close the window.

General information about file systems

The Windows 8 operating system supports several file systems: NTFS, FAT and FAT32. But it can only work for NTFS, that is, it can only be installed on a hard drive partition formatted in a given file system. This is due to the features and security tools that are provided in NTFS, but are missing from previous generation Windows file systems: FAT16 And FAT32. Next, we will look at the entire line of file systems for Windows to understand what role they play in the operation of the system and how they developed during the development of Windows up to Windows 8.

Advantages NTFS relate to almost everything: performance, reliability and efficiency of working with data (files) on disk. Thus, one of the main goals of creating NTFS was to ensure high-speed execution of operations on files (copying, reading, deleting, writing), as well as providing additional capabilities: data compression, recovery of damaged system files on large disks, etc.

Another main purpose of creation NTFS there was an implementation of increased security requirements, since file systems FAT, FAT32 in this respect they were no good at all. Exactly at NTFS you can allow or deny access to any file or folder (limit access rights).


First, let's look at the comparative characteristics of file systems, and then we'll look at each of them in more detail. Comparisons, for greater clarity, are presented in tabular form.

File system FAT It is simply not suitable for modern hard drives (due to its limited capabilities). Concerning FAT32, then it can still be used, but with some reserve. If you buy HDD 1000 GB, then you will have to split it into at least several partitions. And if you are going to do video editing, then it will be very difficult for you 4 GB limit as the maximum possible file size.

The file system does not have all of these disadvantages. NTFS. So, without even going into details and special features of the file system NTFS, you can make a choice in its favor.

File
system
Options
Volume Dimensions Maximum file size
FAT From 1.44 MB to 4 GB 2GB
FAT32 Theoretically, volume sizes from 512 MB to 2 TB are possible. Compression is not supported at the file system level 4GB
NTFS The minimum recommended size is 1.44 MB and the maximum is 2 TB. File system-level compression support for files, directories, and volumes. The maximum size is limited only by the volume size (Theoretically - 264 bytes minus 1 kilobyte. Practically - 244 bytes minus 64 kilobytes)

General use FAT32 can be justified only in cases where you have several operating systems installed on your computer, and any of them does not support NTFS. But today there are practically no such people. Unless you want to install an antique like Windows 98.

File system FAT

File system FAT(usually this means FAT 16) was developed quite a long time ago and was intended to work with small disk and file volumes and a simple directory structure. Abbreviation FAT stands for File Allocation Table(from English file placement table). This table is placed at the beginning of the volume, and two copies of it are kept (to ensure greater stability).
This table is used by the operating system to locate a file and determine its physical location on the hard drive. If the table (and its copy) is damaged, the operating system cannot read the files. It simply cannot determine which file is which, where it begins and where it ends. In such cases, the file system is said to have “crashed.”
File system FAT originally developed by Microsoft for floppy disks. Only then did they start using it for hard drives. At first it was FAT12(for floppy disks and hard drives up to 16 MB), and then it grew into FAT16, which was put into operation with the MS-DOS 3.0 operating system.

File system FAT32

Starting with Windows 95 OSR2, Microsoft begins to actively use FAT32- thirty-two-bit version FAT. What to do, technological progress does not stand still and opportunities FAT 16 was clearly not enough.
Compared to her FAT32 began to provide more optimal access to disks, higher speed of I/O operations, as well as support for large file volumes (disk capacity up to 2 TB).
IN FAT32 more efficient spending implemented disk space(by using smaller clusters). Benefit compared to FAT16 is about 10...15%. That is, when using FAT32 10...15% more information can be written to the same disk than when using FAT16.
In addition, it should be noted that FAT32 provides higher operational reliability and faster program launch speed.
This is due to two significant innovations:
the ability to move the root directory and backup copy FAT(if the main copy is damaged)

The ability to store a backup copy of system data.

File system NTFS

General information
Neither version of FAT provides any acceptable level of security. This, as well as the need for additional file mechanisms (compression, encryption), led to the need to create a fundamentally new file system. And it became the file system NT (NTFS)
NTFS- from English New Technology File System - new technology file system
As already mentioned, its main advantage is security: for files and folders NTFS Access rights can be assigned (read, write, etc.). Thanks to this, data security and system stability have significantly increased. Assigning access rights allows you to prohibit/allow any users and programs to perform any operations on files. For example, without sufficient rights, an unauthorized user will not be able to change any file. Or, again, without sufficient rights, the virus will not be able to corrupt the file.
Besides, NTFS, as mentioned above, provides better performance and the ability to work with large amounts of data.

Since Windows 2000, the version used is NTFS 5.0, which, in addition to the standard ones, allows you to implement the following features:

Data encryption- this feature is implemented by a special NTFS add-on called Encrypting File System(EFS)- encrypting file system. Thanks to this mechanism, encrypted data can only be read on the computer on which the encryption occurred.
Disk quotas- it is now possible to assign users a specific (limited) disk size that they can use.
Efficient storage of sparse files. There are files that contain a large number of consecutive empty bytes. The NTFS file system allows you to optimize their storage.

Using the change log- allows you to record all access operations to files and volumes.

And one more innovation of NTFS - mount points. With mount points, you can define various unrelated folders and even drives on a system as a single drive or folder. This is of great importance for collecting heterogeneous information located in the system in one place.

■ Finally, keep in mind that if you have set certain permissions for a file under NTFS, and then you copy it to a FAT partition, then all its access rights and other unique attributes inherent in NTFS will be lost. So be careful.

NTFS device. Main table of MFT files.
Like any other file system, NTFS divides all usable space into clusters- the minimum data blocks into which files are divided. NTFS supports almost any cluster size - from 512 bytes to 64 KB. However, the generally accepted standard is a 4 KB cluster. It is the one that is used by default. The principle of the existence of clusters can be illustrated by the following example.
If your cluster size is 4 KB (which is most likely), and you need to save a file of 5 KB in size, then 8 KB will actually be allocated for it, since it does not fit in one cluster, and disk space is allocated for a file only by clusters .
For each NTFS disk there is a special file - MFT (Master Allocation Table - main file table). This file contains a centralized directory of all files on the disk. When a file is created, NTFS creates and fills in MFT a corresponding record that contains information about file attributes, file contents, file name, etc.

Besides MFT, there are 15 more special files (together with MFT - 16) that are inaccessible to the operating system and are called metafiles. Everyone's names metafiles start with a symbol $ , But standard means It is not possible for the operating system to view them and see them at all. The following are examples of the main metafiles:

SMFT- MFT itself.
$MFTmirr- a copy of the first 16 MFT records, placed in the middle of the disk (mirror).
$LogFile- logging support file.
$Volume- service information: volume label, file system version, etc.
$AttrDef- a list of standard attributes of files on the volume.
$. - root directory.
$Bitmap- volume free space map.
$Boot - boot sector(if the partition is bootable).
$Quota- a file that records user rights to use disk space.
$Upcase- file-table of correspondence between uppercase and lowercase letters in file names on the current volume.
It is needed mainly because in NTFS file names are written in encoding Unicode, which consists of 65 thousand different symbols, searching for large and small equivalents of which is very non-trivial.
As for the principle of organizing data on an NTFS disk, it is conventionally divided into two parts. The first 12% of the disk is allocated for the so-called MFT zone- the space into which the MFT metafile grows.
It is not possible to write any user data to this area. The MFT zone is always kept empty. This is done so that the most important service file (MFT) does not become fragmented as it grows. The remaining 88% of the disk is normal file storage space.
However, if there is a lack of disk space, the MFT zone itself may shrink (if possible), so you will not notice any discomfort. In this case, new data will already be written to the former MFT zone.
If disk space is subsequently released, the MFT zone will increase again, but in a defragmented form (that is, not as a single block, but in several parts on the disk). There is nothing wrong with this, it is simply considered that the system is more reliable when MFT file not defragmented. Additionally, when the MFT file is not defragmented, the entire file system runs faster. Accordingly, the more defragmented the MFT file is, the slower the file system works.

As for the size of the MFT file, it is approximately calculated based on 1 MB per 1000 files.

Convert FAT32 partitions to NTFS without data loss. convert utility

You can easily convert an existing FAT32 partition to NTFS. For this purpose, Windows 8, Windows 8.1 provides a command line utility convert

Its operating parameters are shown in the screenshot

Thus, to convert drive D: to NTFS, in command line you should enter the following command:

After this, you will be asked to enter the volume label, if any (the volume label is indicated next to the drive name in the window My computer. It serves to identify disks in more detail and may or may not be used. For example it could be Files Storage (D:).
To convert a flash drive, the command looks like this:

convert e : /fs:ntfs /nosecurity /x

CONTROL ROBOT

s disciplines

" Informatics and computer technology" on the topic:

"OS"

"File systems"

1. Operating systems

2. File systems

3. File systems and file names

References

1. Operating systems

Operating system, OS (English) operatingsystem) - basic complex computer programs, providing control of computer hardware, working with files, input and output of data, as well as execution of application programs and utilities.

When you turn on your computer, the operating system loads into memory before other programs and then serves as a platform and environment for them to work. In addition to the above functions, the OS can perform others, for example, providing a user interface, network interaction, etc. Since the 1990s, the most common operating systems for personal computers and servers are OS family Microsoft Windows and Windows NT, Mac OS and Mac OS X, UNIX-class systems, and Unix-like(especially GNU/Linux).

Operating systems can be classified by underlying technology ([Unix]-like or Windows-like), license type ([proprietary] or [open source]), whether currently in development (legacy DOS or NextStep or modern GNU/Linux and Windows), for workstations (DOS, Apple), or for servers (), [real-time operating system|real-time OS] and [embedded operating system|embedded OS] (, ), , or specialized ( production management, training, etc.). Purpose and main features of the MS EXCEL program. Program interface. Basic interface elements. Concept of spreadsheet, cell, row, column, addressing system. Movement along the table field. Data input. Data types. Editing the contents of a cell. Changing the width and height of a cell. Cell properties (Format Cells command).

2. File systems

All modern operating systems provide the creation of a file system, which is designed to store data on disks and provide access to them.

The main functions of the file system can be divided into two groups:

Functions for working with files (creating, deleting, renaming files, etc.)

Functions for working with data stored in files (writing, reading, searching data, etc.)

It is known that files are used to organize and store data on computer media. A file is a sequence of an arbitrary number of bytes that has a unique name of its own or a named area on machine media.

The structuring of many files on computer media is carried out using directories in which the attributes (parameters and details) of the files are stored. A directory can contain many subdirectories, resulting in branched file structures on disks. Organizing files in a tree structure is called a file system.

The principle of organizing the file system is tabular. Data about where on the disk the file is written is stored in the File Allocation Table (FAT).

This table is located at the beginning of the volume. To protect the volume, two copies of FAT are stored on the volume. If the first copy of FAT is damaged disk utilities can use the second copy to restore the volume.

FAT is similar in design to the table of contents of a book, as the operating system uses it to locate a file and determine the clusters that the file occupies on the hard drive.

The smallest physical unit of data storage is a sector. The sector size is 512 bytes. Since the size of the FAT table is limited, for disks larger than 32 MB, it is not possible to provide addressing to each individual sector.

In this regard, groups of sectors are conditionally combined into clusters. A cluster is the smallest unit of data addressing. The cluster size, unlike the sector size, is not fixed and depends on the disk capacity.

At first, a 12-bit version of FAT (called FAT12) was used for floppies and small hard drives (less than 16 MB). MS-DOS then introduced a 16-bit version of FAT for larger drives.

The operating systems MS DOS, Win 95, Win NT implement 16-bit fields in file allocation tables. The FAT32 file system was introduced in Windows 95 OSR2 and is supported in Windows 98 and Windows 2000.

FAT32 is an improved version of FAT designed for use on volumes larger than 2 GB.

FAT32 provides support for disks up to 2 TB in size and more efficient use of disk space. FAT32 uses smaller clusters, which allows for more efficient use of disk space.

Windows XP uses FAT32 and NTFS. A more promising direction in the development of file systems was the transition to NTFS (New Technology File System) with long file names and a reliable security system.

The size of an NTFS partition is not limited. NTFS minimizes the amount of disk space wasted by writing small files to large clusters. In addition, NTFS allows you to save disk space by compressing the disk itself, separate folders and files.

According to the methods of naming files, a distinction is made between “short” and “long” names.

According to the convention adopted in MS-DOS, the way of naming files on IBM PC computers was the 8.3 convention, i.e. The file name consists of two parts: the actual name and the name extension. The file name is allocated 8 characters, and its extension - 3 characters.

The name is separated from the extension by a dot. Both the name and the extension can only include alphanumeric characters of the Latin alphabet. File names written according to convention 8.3 are considered “short”.

With the advent of the Windows 95 operating system, the concept of a “long” name was introduced. Such a name can contain up to 256 characters. This is quite enough to create meaningful file names. A “long” name can contain any characters except nine special characters: /: *? “< > |.

Spaces and multiple periods are allowed in the name. The file name ends with a three-character extension. The extension is used to classify files by type.

The uniqueness of the file name is ensured by the fact that the full name of the file is considered to be the file's own name along with the path to access it. File path starts with the device name and includes all directory (folder) names it goes through. The character “” (backslash - backslash) is used as a separator. For example: D: Documents and SettingsTVAMy documentslessons-tva robots. txt Despite the fact that data on the location of files is stored in a tabular structure, it is presented to the user in the form of a hierarchical structure - this is more convenient for people, and the operating system takes care of all the necessary transformations.

A regular file is an array of bytes, and can be read and written starting from an arbitrary byte of the file. The kernel does not recognize record boundaries in regular files, although many programs treat line feeds as line breaks, but other programs may expect other structures. The file itself does not store any system information about the file, but the file system does store some information about the owner, permissions, and usage of each file.

The component called file name is a string up to 255 characters long. These names are stored in a special type of file called catalog. Information about a file in a directory is called directory entry and includes, in addition to the file name, a pointer to the file itself. Directory entries can reference other directories as well as regular files. This creates a hierarchy of directories and files, which is called a file system. filesystem;

Figure 2-2. Small file system

One small file system is shown in Figure 2-2. Directories can contain subdirectories, and there are no restrictions on how deep one directory can be nested within another. To maintain file system integrity, the kernel does not allow processes to write directly to directories. A file system can store not only regular files and directories, but also references to other objects, such as devices and sockets.

The file system forms a tree, the beginning of which is in root directory, sometimes called by name slash, which matches the single slash character (/). The root directory contains files; in our example in Figure 2.2, it contains vmunix, a copy of the kernel executable object file. It also contains directories; in this example it contains the usr directory. Inside the usr directory is the bin directory, which mainly contains the executable object code of programs such as ls and vi.

The process accesses the file by specifying path before it, which is a string consisting of few or no file names separated by slash characters (/). The kernel associates two directories with each process, through which routes to files can be interpreted. Root directory process is the highest point on the file system that a process can reach; it usually corresponds to the root directory of the entire file system. A route starting with a slash character is called absolute route, and is interpreted by the kernel starting from the process root directory.

A path name that does not begin with a slash is called relative route, and is interpreted relative to current working directory process. (This directory is also called for short current directory or working directory) The current directory itself can be identified directly by name dot, which corresponds to one point (). File name dot-dot(.) denotes the parent directory of the current directory. The root directory is an ancestor of itself.

Publications on the topic