Linux - ext3 and ext4 file systems. Non-logging mode. Larger file and file system size

File system(English file system) - an order that determines the way of organizing, storing and naming data on information storage media of IT equipment (using portable flash memory cards for repeated recording and storage of information in portable electronic devices: digital cameras, mobile phones, etc.) and computer equipment. It defines the format of the content and physical storage of information, which is usually grouped in the form of files. The specific file system determines the size of the file (folder) name, the maximum possible size file and section, a set of file attributes. Some file systems provide service capabilities, such as access control or file encryption.

File system tasks

The main functions of any file system are aimed at solving the following tasks:

file naming;

software interface for working with files for applications;

mapping the logical model of the file system onto the physical organization of the data storage;
organizing file system resilience to power failures, hardware and software errors;

In multi-user systems, another task appears: protecting the files of one user from unauthorized access by another user, as well as ensuring collaboration with files, for example, when a file is opened by one of the users, for others the same file will be temporarily available in read-only mode.

A file system is the basic structure a computer uses to organize information on its hard drive. When installing a new hard drive it must be partitioned and formatted for a specific file system, after which data and programs can be stored on it. There are three possible file system options in Windows: NTFS, FAT32, and the rarely used legacy FAT system (also known as FAT16).

NTFS is the preferred file system for this version of Windows. It has many advantages over more early system FAT32; Some of them are listed below.

The ability to automatically recover from some disk errors (FAT32 does not have this ability).
Improved support big hard disks.
Higher degree of security. You can use permissions and encryption to deny user access to certain files.

The FAT32 file system and the rarely used FAT system were used in previous Windows versions, including Windows 95, Windows 98 and Windows Millenium Edition. The FAT32 file system does not provide the level of security provided by NTFS, so if your computer has a partition or volume formatted as FAT32, the files on that partition are visible to anyone who has access to the computer. The FAT32 file system also has file size limitations. In this version of Windows, it is not possible to create a FAT32 partition larger than 32GB. In addition, a FAT32 partition cannot contain a file larger than 4GB.

The main reason for using a FAT32 system is that the computer will be able to run either Windows 95, Windows 98, or Windows Millennium Edition, or this version of Windows (multiple operating system configuration). To create such a configuration, you must install the previous version of the operating system on a partition formatted as FAT32 or FAT, making it the primary partition (the primary partition may contain the operating system). Other partitions accessed from previous versions of Windows must also be formatted as FAT32. More early versions Windows can only access network NTFS partitions or volumes. NTFS partitions on the local computer will be inaccessible.

FAT – advantages:

It requires some RAM to work effectively.
Fast work with small and medium-sized catalogs.
The disk makes, on average, fewer head movements (compared to NTFS).
Work efficiently on slow disks.

FAT – cons:

Catastrophic loss of performance with increasing fragmentation, especially for large disks (FAT32 only).
Difficulties with random access to large (say, 10% or more of the disk size) files.
Very slow work with directories containing a large number of files.

NTFS - advantages:

File fragmentation has virtually no consequences for the file system itself—the performance of a fragmented system is only impaired in terms of access to the file data itself.
The complexity of the directory structure and the number of files in one directory also does not pose any special obstacles to performance.
Quick access to an arbitrary fragment of a file (for example, editing large .wav files).
Very fast access to small files (several hundred bytes) - the entire file is in the same place as the system data (MFT record).

NTFS - cons:

Significant system memory requirements (64 MB is the absolute minimum, more is better).
Slow disks and controllers without Bus Mastering greatly reduce the performance of NTFS.
Working with medium-sized directories is difficult because they are almost always fragmented.
A disk that operates for a long time at 80% - 90% full will show extremely low performance.

The following file systems are considered as “native” for Linux (that is, those on which it can be installed and from which it can start): ext2fs, ext3fs, ext4fs, ReiserFS, XFS, JFS. They are usually offered as a choice when installing the vast majority of distributions. Of course, there are ways Linux installations to FAT/VFAT/FAT32 file systems, but this is only for those honeys and gentlemen who understand perversions, and I won’t talk about them.

The main criteria when choosing a file system are usually reliability and performance. In some cases, you also have to take into account the compatibility factor - in this case, it means the ability of other operating systems to access a particular file system.
I’ll start the review with ReiserFS - because the reason for writing this note was the question: what should be considered small files? After all, it is well known that the efficiency of working with small files is the strength of this file system.

So, small files mean files smaller than a logical block of the file system, which in Linux in most cases is equal to four kilobytes, although it can be specified during formatting within certain limits (depending on the specific FS). There are countless such small files in any Unix-like OS. A typical example are the files that make up the tree of FreeBSD ports, Gentoo portages and similar port-like systems.
In most file systems, such mini-files have both their own inode (an information node containing meta information about the file) and a data block, which leads to both disk space consumption and a decrease in the performance of file operations. In particular, this is precisely the reason for the catastrophic thoughtfulness of the FreeBSD file system (both the old one, UFS, and the new one, UFS2) when working with its own system of ports.

In the ReiserFS file system, in such cases, separate blocks are not allocated for data - it manages to push the file data directly into the inode area. Due to this and disk space saves money, and performance increases - literally several times compared to all other FS.
This handling of small ReiserFS files has given rise to the legend of its unreliability. Indeed, when the file system collapses (that is, the destruction of service areas), data located together with its inodes disappears along with them - and irrevocably. Whereas in those file systems where inodes and data blocks are always separated spatially, the latter can theoretically be restored. So, for ext2/ext3 there are even tools that allow you to do this.

However, like any legend, this one only gives the impression of authenticity. First, permanent data loss only applies to very small files. Among the user ones there are practically no such ones, and all the others can be easily restored from the distribution kit.
Secondly, when speaking about the possibility of recovering data from blocks that have lost their connection to their inodes, it was not by chance that I used the word “theoretical”. Because in practice this activity is extremely labor-intensive and does not give a guaranteed result. Anyone who has had to do this will agree that one can only indulge in it out of complete despair. And this applies to all Linux file systems. So this aspect can be neglected when choosing a file system.

In terms of overall performance, ReiserFS is definitely faster than all other journaled FS, and in some respects it is superior to ext2. A speed comparison of some common file operations can be found here.
But with ReiserFS the compatibility situation is somewhat worse. Access to it from a Windows operating system, as far as I know, is impossible. Some operating systems of the BSD family (DragonFlyBSD, FreeBSD) support this file system, but in read-only mode. Even the probability that an arbitrary Linux LiveCD from previous years does not have ReiserFS support is not zero.

And here it’s time to remember ext3fs. Its advantage is not at all in greater reliability - this is the same legend as the instability of ReiserFS. I have heard no less about ext3fs crashes than about similar incidents with ReiserFS. I myself could not destroy either one or the other. Except that it worked with ext2 - but even then a very long time ago, during the time of kernel 2.2 (or even 2.0).

No, the main advantage of ext3fs is its compatibility - it is guaranteed to be read by any Linux system. For example, when restoring from some ancient LiveCD at hand - a situation that is practically not so incredible, I had to get into it. Again, most BSD systems can easily understand ext3fs (albeit without logging). For Windows there are also, as far as I know, all kinds of drivers and plug-ins for common file managers (such as Total Commander), providing access to partitions with ext2fs/ext3fs.

In terms of performance, ext3fs leaves a mixed impression. Firstly, its performance is very dependent on the logging mode, of which there are three: with full data logging, partial logging and logging only metadata. In each mode, it shows different performance on different types of file operations. However, in no case is the performance record-breaking.

However, if the performance requirement comes first, then ext2fs has no competition - however, in this case you will have to put up with the lack of logging at all. And, consequently, with lengthy checks of the file system in case of any incorrect shutdown - and with the volume of modern disks this can take a very long time...

The following can be said about XFS. In terms of compatibility, everything that was written for ReiserFS applies to it - moreover, until some time it was not supported by the standard Linux kernel. In terms of performance, XFS also does not shine, performing in total at about the same level as ext3fs. And the operation of deleting files generally demonstrates depressing slowness.
According to my observations, the use of XFS pays off when working not just with large, but with very large files - which are actually only DVD images and video files.

Let me return to the question of reliability. Banal power off during normal custom work, as a rule, painlessly transfer all journaled file systems (and none of them ensures the safety of user operations not written to disk - rescuing drowning people remains the work of the drowning people themselves). True, for any file system it is possible to simulate a situation in which turning off the power will lead to more or less serious damage to it. However, in real life Such situations are unlikely to occur. And you can completely eliminate them by purchasing an uninterruptible power supply - it will give more confidence in the safety of data than the type of file system. Well, in any case, the only guarantee for restoring damaged data can be regular backups...

I think the information presented above is enough for an informed choice. My personal choice over the past few years - ReiserFS. Occasionally, on systems where it is justified to move everything possible outside the root partition, it makes sense to use ext3fs for the root file system and ReiserFS for everyone else.

If a separate partition is provided for the /boot directory (and this is recommended when using GRUB bootloader by its developers) - for it, no other file system other than ext2fs is justified; any logging here makes no sense. Finally, if you create a separate partition for all kinds of multimedia materials, then you can think about XFS.

If we approach the explanation more methodically

ext - in the early days of Linux, ext2 (extended file system version 2) was dominant. Since 2002, it was replaced by the ext3 system, which is largely compatible with ext2, but also supports logging functions, and when working with kernel version 2.6 and higher, ACLs. The maximum file size is 2 TB, the maximum file system size is 8 TB. At the end of 2008, the release of ext4 was officially announced, which is backward compatible with ext3, but many functions are implemented more efficiently than before. In addition, the maximum file system size is 1 EB (1,048,576 TB), and you can expect this to be sufficient for some time. About reiser - the system was named after its founder, Hans Reiser, and was the first system with logging functions to access the Linux kernel for data. The SUSE version of Zp was even considered standard for some time. The main advantages of reiser compared to ext3 are higher speed and placement efficiency when working with small files (and in a file system, as a rule, most files are small). Over time, however, the development of reisefers stopped. It has long been announced that version 4 will be released, which is still not ready, and support for version 3 has ceased. About xfs - the xfs file system was originally developed for SGI workstations running on the IRIX operating system. Xfs is especially good for working with large files, and is particularly ideal for working with streaming video. The system supports quotas and extended attributes (ACLs).
jfs

jfs - a66peBHaTypaJFS stands for "Journaled File System". It was originally developed for IBM and then adapted for Linux. Jfs never enjoyed much recognition on Linux and currently languishes in a miserable existence, inferior to other file systems.
brtfs

brtfs - If it is the will of the leading kernel developers, the brtfs file system in Linux has a bright future. This system was developed from scratch at Oracle. It includes support for device-mapper and RAID. Brtfs is most similar to the ZFS system developed by Sun. To her very interesting features includes on-the-fly file system checks, as well as SSD support (solid state drives are hard disks, operating on the basis of flash memory). Unfortunately, work on brtfs will not be completed in the foreseeable future. Fedora, starting from version 11, provides the ability to install brtfs, but I recommend using it only for file system developers!
There is no "fastest" or "best" file system - the assessment depends on what you intend to use the system for. Beginner Linux users working on a local computer are recommended to work with ext3, and server administrators with ext4. Of course, with ext4 the speed of operation is higher than with ext3, but at the same time, in the ext4 system the situation with data reliability is much worse - you may well lose information when sudden shutdown systems.

If you have installed a second UNIX-like operating system on your computer, then the following file systems will be useful to you when exchanging data (from one OS to another).

sysv - used in SCO, Xenix and Coherent OS.

ufs - used in FreeBSD, NetBSD, NextStep and SunOS. Linux can only read information from such file systems, but cannot make changes to the data. To access segments from BSD, you will additionally need the BSD disklabel extension. A similar extension exists for SunOS partition tables.

ZFS is a relatively new system developed by Sun for Solaris. Because ZFS code is not GPL compliant, it cannot be integrated with the Linux kernel. For this reason, Linux only supports this file system indirectly, through FUSE.
Windows, Mac OS X

The following file systems will be useful when exchanging information with MS DOS, Windows, OS/2 and Macintosh.

vfat - used in Windows 9x/ME. Linux can read information from such partitions and make changes to it. vfat system drivers allow you to work with older MS DOS file systems (8 + 3 characters).

ntfs - the system is used in all modern versions of Windows: otNT and higher. Linux can read and modify its files.

hfs and hfsplus - these file systems are used in Apple computers. Linux can read and modify its files.

Data CDs and DVDs typically use their own file systems.

iso9660 - The file system for CD-ROMs is described in the ISO-9660 standard, which allows only short file names. Long titles are supported differently on different operating systems, using a variety of extensions that are incompatible with each other. Linux can run both the Rockridge extension, which is common in UNIX, and the Joliet extension, developed by Microsoft.

udf - this format ( universal format disk) appeared and developed as a successor to ISO 9660.

Network file systems

File systems do not have to be on local disk- They
can connect to a computer and via a network. The Linux kernel supports various network file systems, of which the following are the most commonly used.

smbfs/cifs - help connect Windows or Samba network directories to a directory tree.

nfs is the most important network file system in UNIX.

coda - this system is very similar to NFS. It has many additional features, but it is not very common.

ncpfs - runs on the NetWare kernel protocol; oH is used by Novell Netware.

Virtual file systems

Linux has several file systems that are not designed to store data on the hard drive (or other storage media), but only to exchange information between the kernel and user programs.
devpts - This file system provides access to pseudo terminals (abbreviated as PTY) via /dev/pts/* according to the UNIX-98 specification. (Pseudo-terminals emulate a serial interface. On UNIX/Linux systems, such interfaces are used by terminal emulators such as xterm. Typically, devices such as /dev/ttypn are used. In contrast, the UNIX-98 specification defines new devices. More detailed information reported in the text terminal H0WT0.)
proc and sysfs - the proc file system is used to display service information related to kernel and process management. In addition to this, the sysfs file system builds relationships between the kernel and the hardware. Both file systems are mounted at /proc and /sys.
tmpfs - This system is built on the basis of shared memory according to System V. It is usually mounted at the /dev/shm position and allows efficient exchange of information between two programs. On some distributions (such as Ubuntu), the /var/run and /var/lock directories are also created using the tmpfs file system. The files in these directories are used by some network daemons to store process identification numbers as well as file access information. Thanks to tmpfs, this data is now reflected in RAM. The method guarantees high speed, and also that after the computer is turned off, there will be no files left in the /var/run or /var/lock directories.

usbfs - the usbfs file system, starting with kernel version 2.6 and higher, provides information about connected USB devices. It is usually integrated into the proc file system. About USB device support in Linux.

Other file systems

auto - in fact, there is no file system under that name. However, the word auto can be used in /etc/fstab or with the mount command to specify the file system. In this case, Linux will try to recognize the file system on its own. This method works with most major file systems.
autofs, autofs4

autofs, autofs4 are also not file systems, but kernel extensions that automatically execute the mount command for selected file systems. If a file system is not used for some time, the umount command is automatically run on it. This method is convenient primarily in cases where only a few of many NFS directories are actively used at the same time.

To perform such operations, the /etc/init.d/ autofs script automatically executes the automount program when the system starts. It is configured using the /etc/auto.master file. The corresponding programs are automatically installed, for example, in Red Hat and Fedora. In any case, autofs is only activated after configuring /etc/auto.master or /etc/auto.misc.
cramfs and squashfs

cramfs and squashfs - Cram and Squash file systems are read-only. They are used to "pack" as many zipped files as possible into flash memory or ROM (read-only memory).

fuse - FUSE stands for Filesystem in Userspace and allows filesystem drivers to be developed and used outside the kernel. Therefore, FUSE is always used with an external file system driver. FUSE works, in particular, with the NTFS driver ntfs-3g.

gfs and ocfs - Global File System and Cluster File System from Oracle (Oracle Cluster File System) allow you to build giant network file systems that can be accessed in parallel by many computers at the same time.

jffs and yaffs - Journaling file system for flash media (Journaling Flash File System and Yet Another Flash File System are specifically optimized to work with solid-state drives and flash media. Using special algorithms, they try to use all memory cells evenly (wear leveling technology) to avoid premature system failure.
loop

loop - used to work with pseudo devices. A loopback device is an adapter capable of accessing regular file like a block device. Thanks to it, you can place any file system in any file, and then connect it to the directory tree using mount. The kernel function responsible for this - pseudo-device support - is implemented in the loop module.

There are a variety of uses for pseudodevices. In particular, they can be used when creating disks in RAM for initial initialization (Initial RAM disk) for GRUB or LILO, when implementing encrypted file systems or testing ISO images for CDs.

Storage media file systems

File systems
ISO 9660
Joliet ISO 9660 file system extension.
Rock Ridge (RRIP, IEEE P1282) – an ISO 9660 file system extension designed to store file attributes used in POSIX operating systems
Amiga Rock Ridge Extensions
El Torito
Apple ISO9660 Extensions
HFS, HFS+
Universal Disk Format is a specification of a file system format independent of the operating system for storing files on optical media. UDF is an implementation of the ISO/IEC 13346 standard
Mount Rainier

How to make it possible to access a disk partition or removable media with file systems in a Windows environment Ext2/3/4 ? If, for example, there is also a second system on the computer Linux. And you need to work with its data from the environment Windows. Or another example - when virtual disks with systems installed on virtual machines are mounted inside Windows Linux or Android. With Ext2/3/ 4 Windows cannot work natively; for this it needs third party tools. What are these means? Let's look at those below.

***
The first three tools will make it possible to only read information devices from Ext2/3/4. The latest solution will allow you to both read and write data. All the tools discussed below are free.

1. DiskInternals Linux Reader

A simple program is a primitive file manager, made like a standard Windows Explorer, with support for file systems Ext 2/3/4 , Reiser4 , HFS , UFS2. In the program window we will see partitions and devices with Linux or Android.

To copy, you need to select a folder or file, press the button "Save".

Then specify the copy path.

2. Plugin for Total Commander DiskInternals Reader

Fans of the popular can extract data Linux or Android inside Windows using this file manager. But first install a special plugin in it. One of these plugins is , it can connect and read information devices formatted in Ext2/3/4 , Fat/exFAT , HFS/HFS+ , ReiserFS. Download the plugin, unpack its archive inside , confirm the installation.

Let's launch (important) on behalf of the administrator. Let's go to the section. Click.

Here, along with other disk partitions and media, the one with Ext2/3/4 .

Data is copied traditionally for way - by pressing F5 on the second panel.

3. Plugin for Total Commander ext4tc

A simplified alternative to the previous solution - ext4tc, another plugin for . It can connect to read information devices formatted only in Ext2/3/4. Download the plugin, unpack its archive inside the file manager, and start the installation.

Let's launch (important) on behalf of the administrator. Click. Let's go to .

If you need to copy data, use the usual way with the F5 key.

4. Ext2Fsd support driver

Program Ext2Fsd– this is the driver Ext2/3/4, it implements support for these file systems at the operating system level. You can work with disk partitions and drives formatted in these file systems as with regular Windows-supported media devices in an Explorer window or third party programs. The driver allows you to both read and write data.

Download the latest current version Ext2Fsd.

During installation we activate (if for long-term work) three suggested checkboxes:

1 — Driver autorun with Windows;
2 - Recording support for Ext2;
3 - Formatting support for Ext3.

At the pre-finishing stage, we activate the option to launch the driver manager window - - along with assigning information to devices from Ext2/3/4 drive letters.

In the window that opens We will see the media with the letter already assigned. For example, in our case, a carrier with Ext4 the first free letter is given F.

Now we can work with the disk F in the Explorer window.

Assign a letter to new connected devices with Ext2/3/4 can be done using the context menu called up on each of those displayed in the window devices. But simply by assigning a drive letter, such a device will not appear after reboot Windows, this solution is only for one computer session. To make a new device with Ext2/3/4 permanently visible in the Windows environment, you need to double-click on it to open the configuration window and set permanent connection parameters. In the second column you need:

For removable media, activate the checkbox indicated by number 1 in the screenshot and specify the drive letter;
For internal disks and partitions, activate the checkbox indicated in the screenshot below with the number 2, and also indicate the drive letter.

If you have two operating systems installed, Windows and Linux, then you would probably like to be contained on partitions of a free operating system directly from Windows, without rebooting the computer.

Unfortunately, there is no support for Linux OS partitions in Windows. But in vain. It seems to me that this could be a nice gesture on Microsoft's part.

The essence of the problem is that Windows uses the NTFS file system, and Linux has its own way of organizing files, extended file system, the latest version of which has serial number 4.

Linux is more user-friendly than its commercial sister: Linux has file system support by default Windows NTFS. Of course, you won’t be able to install Linux on an NTFS partition, but you can read and write data from such a partition.

Ext2 IFS

Ext2 IFS supports Windows NT4.0/2000/XP/2003/Vista/2008 versions x86 and x64 and allows you to view the contents of Linux ext2 partitions and can also write to them. The utility installs the system driver ext2fs.sys, which extends Windows features and includes full ext2 support in it: ext2 partitions are assigned drive letters, and files and folders on them are displayed in the dialogs of all applications, for example, in Explorer.

Ext2 FSD

Ext2 FSD is a free driver for Windows systems (2K/XP/VISTA/7 versions x86 and x64). Like the previous utility, which is also a driver in its essence, it includes full support for the ext2 file system in Windows.

LTOOLS is a set of command line utilities that allows you to read and write data to/from Linux ext2, ext3 and ReiserFS (standard Linux file systems) partitions from a machine running DOS or Windows.

There is a version of the program with graphical shell(written in Java) – LTOOLSgui, as well as a version with a graphical shell written in .

Ext2Read

Dessert is, as always, the most delicious.

Ext2Read is a file manager-type utility that allows you to both view and write to ext2/ext3/ext4 partitions. It supports LVM2 and, what distinguishes it from other programs in this review, the ext4 file system. Built-in support for recursive directory copying.

And here is the second dessert. At first it was said that it would be a good gesture from Microsoft to include support Linux partitions in Windows by default.

The gesture was nevertheless made on the 20th anniversary of Linux. See for yourself.

That's all. Thank you for your attention. I'll go fight off the cockchafers. There are so many of them this spring. 🙂

Why may a smartphone not launch programs from a memory card? How is ext4 fundamentally different from ext3? Why will a flash drive last longer if you format it in NTFS rather than FAT? What is the main problem with F2FS? The answers lie in the structural features of file systems. We'll talk about them.

Introduction

File systems define how data is stored. They determine what limitations the user will encounter, how fast read and write operations will be, and how long the drive will operate without failures. This is especially true for budget SSDs and their younger brothers - flash drives. Knowing these features, you can get the most out of any system and optimize its use for specific tasks.

You have to choose the type and parameters of the file system every time you need to do something non-trivial. For example, you want to speed up the most common file operations. At the file system level this can be achieved different ways: indexing will provide quick search, and pre-reserving free blocks will make it easier to rewrite frequently changing files. Pre-optimizing the data in RAM will reduce the number of required I/O operations.

Such properties of modern file systems as lazy writing, deduplication and other advanced algorithms help to increase the period of trouble-free operation. They are especially relevant for cheap SSDs with TLC memory chips, flash drives and memory cards.

Separate optimizations exist for disk arrays different levels: For example, the file system can support lightweight volume mirroring, instant snapshotting, or dynamic scaling without taking the volume offline.

Black box

Users mostly work with the default file system operating system. They rarely create new disk partitions and even less often think about their settings - they simply use the recommended parameters or even buy pre-formatted media.

For Windows fans, everything is simple: NTFS on all disk partitions and FAT32 (or the same NTFS) on flash drives. If there is a NAS and it uses some other file system, then for most it remains beyond perception. They simply connect to it over the network and download files, as if from a black box.

On mobile gadgets with Android ext4 is most often found in internal memory and FAT32 on microSD cards. Yabloko does not care at all what kind of file system they have: HFS+, HFSX, APFS, WTFS... for them there are only beautiful folder and file icons drawn by the best designers. Linux users have the richest choice, but you can add support for non-native file systems in both Windows and macOS - more on that later.

Common roots

Over a hundred different file systems have been created, but a little more than a dozen can be considered current. Although they were all developed for their own specific applications, many ended up being related on a conceptual level. They are similar because they use the same type of (meta)data representation structure - B-trees (“bi-trees”).

Like any hierarchical system, a B-tree begins with a root record and then branches down to leaf elements - individual records of files and their attributes, or “leaves.” The main point of creating such a logical structure was to speed up the search for file system objects on large dynamic arrays- like hard drives with a capacity of several terabytes or even more impressive RAID arrays.

B-trees require far fewer disk accesses than other types balanced trees, while performing the same operations. This is achieved due to the fact that the final objects in B-trees are hierarchically located at the same height, and the speed of all operations is precisely proportional to the height of the tree.

Like other balanced trees, B-trees have equal path lengths from the root to any leaf. Instead of growing upward, they branch more and grow wider: all branch points in a B-tree store many references to child objects, making them easy to find in fewer calls. Big number pointers reduces the number of the most time-consuming disk operations - head positioning when reading arbitrary blocks.

The concept of B-trees was formulated back in the seventies and has since undergone various improvements. In one form or another it is implemented in NTFS, BFS, XFS, JFS, ReiserFS and many DBMSs. They are all relatives from the point of view basic principles data organization. The differences concern details, often quite important. Related file systems also have a common disadvantage: they were all created to work specifically with disks even before the advent of SSDs.

Flash memory as the engine of progress

Solid-state drives are gradually replacing disk drives, but for now they are forced to use file systems that are alien to them, passed down by inheritance. They are built on flash memory arrays, the operating principles of which differ from those of disk devices. In particular, flash memory must be erased before being written, an operation that NAND chips cannot perform at the individual cell level. It is only possible for large blocks entirely.

This limitation is due to the fact that in NAND memory all cells are combined into blocks, each of which has only one common connection to the control bus. We will not go into details of the page organization and describe the complete hierarchy. The very principle of group operations with cells and the fact that the sizes of flash memory blocks are usually larger than the blocks addressed in any file system are important. Therefore, all addresses and commands for drives with NAND flash must be translated through the FTL (Flash Translation Layer) abstraction layer.

Compatibility with the logic of disk devices and support for commands of their native interfaces is provided by flash memory controllers. Typically, FTL is implemented in their firmware, but can (partially) be implemented on the host - for example, Plextor writes drivers for its SSDs that accelerate writing.

It is impossible to do without FTL, since even writing one bit to a specific cell triggers a whole series of operations: the controller finds the block containing the desired cell; the block is read completely, written to the cache or to free place, then it is erased entirely, after which it is rewritten back with the necessary changes.

This approach is reminiscent of army everyday life: in order to give an order to one soldier, the sergeant does general construction, calls the poor fellow out of formation and commands the others to disperse. In the now rare NOR memory, the organization was special forces: each cell was controlled independently (each transistor had an individual contact).

The tasks for controllers are increasing, since with each generation of flash memory the technical process of its production decreases in order to increase density and reduce the cost of data storage. Along with technological standards, the estimated service life of chips is also decreasing.

Modules with single-level SLC cells had a declared resource of 100 thousand rewrite cycles and even more. Many of them still work in old flash drives and CF cards. For enterprise-class MLC (eMLC), the resource was declared in the range of 10 to 20 thousand, while for regular consumer-grade MLC it is estimated at 3-5 thousand. Memory of this type is actively being squeezed by even cheaper TLC, whose resource barely reaches a thousand cycles. Keeping the lifespan of flash memory at an acceptable level requires software tricks, and new file systems are becoming one of them.

Initially, the manufacturers assumed that the file system was unimportant. The controller itself must service a short-lived array of memory cells of any type, distributing the load between them in an optimal way. For the file system driver, it simulates a regular disk, and itself performs low-level optimizations on any access. However, in practice, optimization varies from device to device, from magical to bogus.

In enterprise SSDs, the built-in controller is a small computer. It has a huge memory buffer (half a gigabyte or more) and supports many data efficiency techniques to avoid unnecessary rewrite cycles. The chip organizes all blocks in the cache, performs lazy writes, performs on-the-fly deduplication, reserves some blocks and clears others in the background. All this magic happens completely unnoticed by the OS, programs and the user. With an SSD like this, it really doesn't matter which file system is used. Internal optimizations have a much greater impact on productivity and resources than external ones.

Budget SSDs (and even more so flash drives) cost much less smart controllers. The cache in them is limited or absent, and advanced server technologies are not used at all. The controllers in memory cards are so primitive that it is often claimed that they do not exist at all. Therefore, for cheap devices with flash memory, external methods of load balancing remain relevant - primarily using specialized file systems.

From JFFS to F2FS

One of the first attempts to write a file system that would take into account the principles of organizing flash memory was JFFS - Journaling Flash File System. Initially, this development by the Swedish company Axis Communications was aimed at increasing the memory efficiency of network devices that Axis produced in the nineties. The first version of JFFS supported only NOR memory, but already in the second version it became friends with NAND.

Currently JFFS2 has limited use. Basically it is still used in Linux distributions for embedded systems. It can be found in routers, IP cameras, NAS and other regulars of the Internet of Things. In general, wherever a small amount of reliable memory is required.

A further attempt to develop JFFS2 was LogFS, which stored inodes in a separate file. The authors of this idea are Jorn Engel, an employee of the German division of IBM, and Robert Mertens, a teacher at the University of Osnabrück. Source LogFS is available on GitHub. Judging by the fact that last change it was made four years ago, LogFS never gained popularity.

But these attempts spurred the emergence of another specialized file system - F2FS. It was developed by Samsung Corporation, which accounts for a considerable part of the flash memory produced in the world. Samsung makes chips NAND Flash for their own devices and on request from other companies, and also develop SSDs with fundamentally new interfaces instead of legacy disk ones. Creating a specialized file system optimized for flash memory was a long overdue necessity from Samsung's point of view.

Four years ago, in 2012, Samsung created F2FS (Flash Friendly File System). Her idea was good, but the implementation turned out to be crude. The key task when creating F2FS was simple: to reduce the number of cell rewrite operations and distribute the load on them as evenly as possible. This requires performing operations on multiple cells within the same block at the same time, rather than forcing them one at a time. This means that what is needed is not instant rewriting of existing blocks at the first request of the OS, but caching of commands and data, adding new blocks to free space and delayed erasing of cells.

Today, F2FS support is already officially implemented in Linux (and therefore in Android), but in practice it does not yet provide any special advantages. The main feature of this file system (lazy rewrite) led to premature conclusions about its effectiveness. The old caching trick even fooled early versions of benchmarks, where F2FS demonstrated an imaginary advantage not by a few percent (as expected) or even by several times, but by orders of magnitude. The F2FS driver simply reported the completion of an operation that the controller was just planning to do. However, if the real performance gain for F2FS is small, then the wear on the cells will definitely be less than when using the same ext4. Those optimizations that a cheap controller cannot do will be performed at the level of the file system itself.

Extents and bitmaps

For now, F2FS is perceived as exotic for geeks. Even in own smartphones Samsung is still using ext4. Many consider her further development ext3, but this is not entirely true. It's about more about revolution than about breaking the 2 TB per file barrier and simply increasing other quantitative indicators.

When computers were large and files were small, addressing was not a problem. Each file was allocated a certain number of blocks, the addresses of which were entered into the correspondence table. This is how the ext3 file system worked, which remains in service to this day. But in ext4 a fundamentally different addressing method appeared - extents.

Extents can be thought of as extensions of inodes as discrete sets of blocks that are addressed entirely as contiguous sequences. One extent can contain whole file medium size, and for large files it is enough to allocate a dozen or two extents. This is much more efficient than addressing hundreds of thousands of small blocks of four kilobytes.

The recording mechanism itself has also changed in ext4. Now blocks are distributed immediately in one request. And not in advance, but immediately before writing data to disk. Lazy multi-block allocation allows you to get rid of unnecessary operations that ext3 was guilty of: in it, blocks for a new file were allocated immediately, even if it entirely fit in the cache and was planned to be deleted as temporary.

FAT restricted diet

In addition to balanced trees and their modifications, there are other popular logical structures. There are file systems with a fundamentally different type of organization - for example, linear. You probably use at least one of them often.

Mystery

Guess the riddle: at twelve she began to gain weight, by sixteen she was a stupid fatty, and by thirty-two she became fat, and remained a simpleton. Who is she?

That's right, this is a story about the FAT file system. Compatibility requirements provided her with bad heredity. On floppy disks it was 12-bit, on hard drives- at first it was 16-bit, but it has reached our days as 32-bit. In each next version the number of addressable blocks increased, but in essence nothing changed.

The still popular FAT32 file system appeared twenty years ago. Today it is still primitive and does not support access control lists, disk quotas, background compression, or others modern technologies optimization of data processing.

Why is FAT32 needed these days? Everything is still solely to ensure compatibility. Manufacturers rightly believe that a FAT32 partition can be read by any OS. That's why they create it on external hard disks, USB Flash and memory cards.

How to free up your smartphone's flash memory

microSD(HC) cards used in smartphones are formatted in FAT32 by default. This is the main obstacle to installing applications on them and transferring data from internal memory. To overcome it, you need to create a partition on the card with ext3 or ext4. You can transfer everything to it file attributes(including owner and access rights), so any application can run as if it were launched from internal memory.

Windows does not know how to create more than one partition on flash drives, but for this you can run Linux (at least in a virtual machine) or an advanced utility for working with logical partitioning - for example, MiniTool Partition Wizard Free. Having discovered an additional primary partition with ext3/ext4 on the card, the Link2SD application and similar ones will offer many more options than in the case of a single FAT32 partition.

Another argument in favor of choosing FAT32 is often cited as its lack of journaling, which means faster write operations and less wear on NAND Flash memory cells. In practice, using FAT32 leads to the opposite and gives rise to many other problems.

Flash drives and memory cards die quickly due to the fact that any change in FAT32 causes overwriting of the same sectors where two chains of file tables are located. I saved the entire web page, and it was overwritten a hundred times - with each addition of another small GIF to the flash drive. Have you launched portable software? It creates temporary files and constantly changes them while running. Therefore, it is much better to use NTFS on flash drives with its failure-resistant $MFT table. Small files can be stored directly in the main file table, and its extensions and copies are written to different areas flash memory. In addition, NTFS indexing makes searching faster.

INFO

For FAT32 and NTFS, theoretical restrictions on the level of nesting are not specified, but in practice they are the same: only 7707 subdirectories can be created in a first-level directory. Those who like to play matryoshka dolls will appreciate it.

Another problem that most users face is that it is impossible to write a file larger than 4 GB to a FAT32 partition. The reason is that in FAT32 the file size is described by 32 bits in the file allocation table, and 2^32 (minus one, to be precise) is exactly four gigs. It turns out that neither a movie in normal quality nor a DVD image can be written to a freshly purchased flash drive.

Copying large files is not so bad: when you try to do this, the error is at least immediately visible. In other situations, FAT32 acts as a time bomb. For example, you copied portable software onto a flash drive and at first you use it without problems. After a long time, one of the programs (for example, accounting or email), the database becomes bloated, and... it simply stops updating. The file cannot be overwritten because it has reached the 4 GB limit.

A less obvious problem is that in FAT32 the creation date of a file or directory can be specified to within two seconds. This is not sufficient for many cryptographic applications that use timestamps. The low precision of the date attribute is another reason why FAT32 is not considered a valid file system from a security perspective. However, her weak sides can be used for your own purposes. For example, if you copy any files from an NTFS partition to a FAT32 volume, they will be cleared of all metadata, as well as inherited and specially set permissions. FAT simply doesn't support them.

exFAT

Unlike FAT12/16/32, exFAT was developed specifically for USB Flash and large (≥ 32 GB) memory cards. Extended FAT eliminates the above-mentioned disadvantage of FAT32 - overwriting the same sectors with any change. Like a 64-bit system, it has virtually no significant limits per file size. Theoretically, it can be 2^64 bytes (16 EB) in length, and cards of this size will not appear soon.

Another important thing exFAT difference- support for access control lists (ACLs). This is no longer the same simpleton from the nineties, but the closed nature of the format hinders the implementation of exFAT. ExFAT support is fully and legally implemented only in Windows (starting from XP SP2) and OS X (starting from 10.6.5). On Linux and *BSD it is supported either with restrictions or not quite legally. Microsoft requires licensing for the use of exFAT, and there is a lot of legal controversy in this area.

Btrfs

Another bright representative B-tree based file systems are called Btrfs. This FS appeared in 2007 and was initially created in Oracle with an eye to working with SSDs and RAIDs. For example, it can be dynamically scaled: creating new inodes directly on the running system or dividing a volume into subvolumes without allocating free space to them.

The copy-on-write mechanism implemented in Btrfs and full integration with the Device mapper kernel module allow you to take almost instantaneous snapshots through virtual block devices. Pre-compression (zlib or lzo) and deduplication speed up basic operations while also extending the lifetime of flash memory. This is especially noticeable when working with databases (2-4 times compression is achieved) and small files (they are written in orderly large blocks and can be stored directly in “leaves”).

Btrfs also supports full logging mode (data and metadata), volume checking without unmounting, and many other modern features. The Btrfs code is published under GPL license. This file system has been supported as stable in Linux since kernel version 4.3.1.

Logbooks

Almost all more or less modern file systems (ext3/ext4, NTFS, HFSX, Btrfs and others) belong to the general group of journaled ones, since they keep records of changes made in a separate log (journal) and are checked against it in the event of a failure during disk operations . However, the logging granularity and fault tolerance of these file systems differ.

Ext3 supports three logging modes: closed-loop, ordered, and full logging. The first mode involves recording only general changes(metadata) performed asynchronously with respect to changes in the data itself. In the second mode, the same metadata recording is performed, but strictly before making any changes. The third mode is equivalent to full logging (changes both in metadata and in the files themselves).

Only the last option ensures data integrity. The remaining two only speed up the detection of errors during the scan and guarantee restoration of the integrity of the file system itself, but not the contents of the files.

Journaling in NTFS is similar to the second logging mode in ext3. Only changes in metadata are recorded in the log, and the data itself may be lost in the event of a failure. This logging method in NTFS was not intended as a way to achieve maximum reliability, but only as a compromise between performance and fault tolerance. This is why people who are used to working with fully journaled systems consider NTFS pseudo-journaling.

The approach implemented in NTFS is in some ways even better than the default in ext3. NTFS additionally periodically creates checkpoints to ensure that all previously deferred disk operations are completed. Checkpoints have nothing to do with recovery points in \System Volume Information\ . These are just service log entries.

Practice shows that such partial NTFS journaling is in most cases sufficient for trouble-free operation. After all, even with a sudden power outage disk devices do not lose power instantly. The power supply and numerous capacitors in the drives themselves provide just the minimum amount of energy that is enough to complete the current write operation. Modern SSD given their speed and efficiency, the same amount of energy is usually enough to perform pending operations. An attempt to switch to full logging would reduce the speed of most operations significantly.

Connecting third-party files in Windows

The use of file systems is limited by their support at the OS level. For example, Windows does not understand ext2/3/4 and HFS+, but sometimes it is necessary to use them. This can be done by adding the appropriate driver.

WARNING

Most drivers and plugins for supporting third-party file systems have their limitations and do not always work stably. They may conflict with other drivers, antiviruses, and virtualization programs.

An open driver for reading and writing ext2/3 partitions with partial support for ext4. The latest version supports extents and partitions up to 16 TB. LVM, access control lists, and extended attributes are not supported.

There is a free plugin for Total Commander. Supports reading ext2/3/4 partitions.

coLinux is an open and free port Linux kernels. Together with a 32-bit driver, it allows you to run Linux on Windows from 2000 to 7 without using virtualization technologies. Supports 32-bit versions only. Development of a 64-bit modification was canceled. CoLinux also allows you to organize access to ext2/3/4 partitions from Windows. Support for the project was suspended in 2014.

Windows 10 may already have built-in support for Linux-specific file systems, it's just hidden. These thoughts are suggested by the kernel-level driver Lxcore.sys and the LxssManager service, which is loaded as a library by the Svchost.exe process. For more information about this, see Alex Ionescu’s report “The Linux Kernel Hidden Inside Windows 10,” which he gave at Black Hat 2016.

ExtFS for Windows is a paid driver produced by Paragon. It runs on Windows 7 to 10 and supports read/write access to ext2/3/4 volumes. Provides almost complete support for ext4 on Windows.

HFS+ for Windows 10 is another proprietary driver produced by Paragon Software. Despite the name, it works in all versions of Windows starting from XP. Provides full access to HFS+/HFSX file systems on disks with any layout (MBR/GPT).

WinBtrfs is an early development of the Btrfs driver for Windows. Already in version 0.6 it supports both read and write access to Btrfs volumes. It can handle hard and symbolic links, supports alternative data streams, ACLs, two types of compression and asynchronous read/write mode. While WinBtrfs does not know how to use mkfs.btrfs, btrfs-balance and other utilities to maintain this file system.

Capabilities and limitations of file systems: summary table

File system	Maximum volume size	Limit size of one file	Length of proper file name	Length of the full file name (including the path from the root)	Limit number of files and/or directories	Accuracy of file/directory date indication	Rights dos-tu-pa	Hard links	Symbolic links	Snap-shots	Data compression in the background	Data encryption in the background	Grandfather-ple-ka-tion of data
FAT16	2 GB in 512 byte sectors or 4 GB in 64 KB clusters	2 GB	255 bytes with LFN	—	—	—	—	—	—	—	—	—	—
FAT32	8 TB sectors of 2 KB each	4 GB (2^32 - 1 byte)	255 bytes with LFN	up to 32 subdirectories with CDS	65460	10 ms (create) / 2 s (modify)	No	No	No	No	No	No	No
exFAT	≈ 128 PB (2^32-1 clusters of 2^25-1 bytes) theoretical / 512 TB due to third-party restrictions	16 EB (2^64 - 1 byte)			2796202 in the catalog	10 ms	ACL	No	No	No	No	No	No
NTFS	256 TB in 64 KB clusters or 16 TB in 4 KB clusters	16 TB (Win 7) / 256 TB (Win 8)	255 Unicode characters (UTF-16)	32,760 Unicode characters, up to a maximum of 255 characters per element	2^32-1	100 ns	ACL	Yes	Yes	Yes	Yes	Yes	Yes
HFS+	8 EB (2^63 bytes)	8 EB	255 Unicode characters (UTF-16)	not limited separately	2^32-1	1 s	Unix, ACL	Yes	Yes	No	Yes	Yes	No
APFS	8 EB (2^63 bytes)	8 EB	255 Unicode characters (UTF-16)	not limited separately	2^63	1 ns	Unix, ACL	Yes	Yes	Yes	Yes	Yes	Yes
Ext3	32 TB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs)	2 TB (theoretically) / 16 GB for older programs	255 Unicode characters (UTF-16)	not limited separately	—	1 s	Unix, ACL	Yes	Yes	No	No	No	No
Ext4	1 EB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs)	16 TB	255 Unicode characters (UTF-16)	not limited separately	4 billion	1 ns	POSIX	Yes	Yes	No	No	Yes	No
F2FS	16 TB	3.94 TB	255 bytes	not limited separately	—	1 ns	POSIX, ACL	Yes	Yes	No	No	Yes	No
BTRFS	16 EB (2^64 - 1 byte)	16 EB	255 ASCII characters	2^17 bytes	—	1 ns	POSIX, ACL	Yes	Yes	Yes	Yes	Yes	Yes

EXT3 file system

Unlike EXT2, EXT3 is journaled file system, i.e. will not end up in an inconsistent state after failures. But it is fully compatible with EXT2.

Developed by Red Hat

Currently the main one for LINUX.

The Ext3 driver keeps complete exact copies of the blocks being modified (1KB, 2KB, or 4KB) in memory until the operation completes. This may seem wasteful. Complete blocks contain not only changed data, but also unmodified data.

This approach is called " physical logging", which reflects the use of "physical blocks" as the basic unit of logging. The approach of storing only mutable bytes rather than entire blocks is called " logical logging" (XFS is used). Because ext3 uses "physical logging", the journal in ext3 is larger than in XFS. Due to the use of full blocks in ext3, both the driver and the logging subsystem avoid the complexities that arise with "logical logging" .

Logging types supported by Ext3, which can be enabled from the /etc/fstab file:

o data=journal(full data journaling mode) - all new data is first written to the journal and only after that is transferred to its permanent location. In the event of a crash, the log can be re-read, bringing the data and metadata back to a consistent state.
The slowest, but the most reliable.

o data=ordered- only changes to the file system metadata are recorded, but logically the metadata and data blocks are grouped into a single module called transaction. Before writing new metadata to disk, related data blocks are written first. This ext3 logging mode is installed by default.
When appending data to the end of a file, data=ordered mode is guaranteed to ensure integrity (as with full data journaling mode). However, if data is written to a file over existing ones, then there is a possibility of mixing “original” blocks with modified ones. This is a result of data=ordered not tracking records where new block lies on top of the existing one and does not cause modification of metadata.

o data=writeback(metadata only) - only changes to file system metadata are recorded. The fastest logging method. This type of journaling is what you see with XFS, JFS, and ReiserFS file systems.

3.3.3 File system XFS

XFS is a journaling file system developed by Silicon Graphics, but now released as open source.

Official information at http://oss.sgi.com/projects/xfs/

XFS was created in the early 90s (1992-1993) by Silicon Grapgics (now SGI) for multimedia computers running Irix OS. The file system was aimed at very large files and file systems. A feature of this file system is the journal device - part of the metadata of the file system itself is written to the journal in such a way that the entire recovery process is reduced to copying this data from the journal to the file system. The log size is set when creating the system; it must be at least 32 megabytes; and you don’t need more - it’s hard to get such a number of unclosed transactions.

Some features:

o Works more efficiently with large files.

o Has the ability to move the log to another disk to improve performance.

o Saves cache data only when memory is full, and not periodically like the others.

o Only meta data is logged.

o B+ trees are used.

o Logical logging is used

3.3.4 RFS file system

RFS (RaiserFS)- journaling file system developed by Namesys.

Official information on RaiserFS

Some features:

o Works more efficiently with big amount small files, in terms of performance and efficient use of disk space.

o Uses specially optimized b* balanced tree (an improved version of the B+ tree)

o Dynamically allocates i-nodes instead of a static set of them created when creating a "traditional" file system.

o Dynamic block sizes.

3.3.4 JFS file system

JFS (Journaled File System) - A journaling file system developed by IBM for the AIX operating system, but now released as open source.

Official information on Journaled File System Technology for Linux

Some features:

o JFS logs follow the classic database transaction model

o Only meta data is logged

o The log size is no more than 32 megabytes.

o Asynchronous logging mode - performed when I/O traffic decreases

o Logical logging is used.

3.4 Comparison table of some modern file systems

	NTFS	EXT4	RFS	XFS	JFS
Storing file information	MFT	inode	inode	inode	inode
Maximum partition size	16 EB (2 60)	1 Ebyte	4 gigablocks (since the blocks are dynamic)	16 ebytes	32 PB
Block sizes	from 512 bytes to 64 KB	1 KB - 4 KB	Up to 64 KB (currently fixed 4 KB)	from 512 bytes to 64 KB	512/1024/ 2048/4096 bytes
Maximum number of blocks	2^48	2^32			2^32
Maximum file size	2^64	16 TB (for 4 KB blocks)	8 TB	8 Ebyte	4 PB (2 50)
Maximum file name length
Logging	Yes	Yes	Yes	Yes	Yes
Free block management		No	Bitmap based	B-trees indexed by offset and size	Tree+ Binary Buddy
Extents for free space		No	No	Yes	No
B-trees for directory items	Yes	No	As a subtree of the main file system tree	Yes	Yes
B-trees for addressing blocks of files		No	Inside the main file system tree	Yes	Yes
Extents for addressing blocks of files		No	Yes (from version 4)	Yes	Yes
Data inside inode (small files)		No	Yes	Yes	No
Symbolic link data within inode		No	Yes	Yes	Yes
Directory entries inside inodes (small directories)		No	Yes	Yes	Yes
Dynamic inode/MFT allocation	Yes	No	Yes	Yes	Yes
Structures for managing dynamically allocated inodes		No	General B*tree	B+tree	B+tree with contiguous inode regions

Sparse file support	Yes	No	Yes	Yes	Yes