Lesson topic: “Files and file structures.” What is a file structure

A file is information that is stored on a computer storage medium under a specific name.

Files can store programs, texts, and data.

Files are identified (uniquely identified) by names. Users give files symbolic names. In some operating systems, such as Microsoft's operating system, each file name consists directly of file name given by the user, and extensions. This takes into account OS restrictions on both the characters used in the name and the length of the name. Until recently, these boundaries were very narrow. For example, the file system of the MS-DOS operating system limited the name length to scheme 8.3 (8 characters were allocated for the name, 3 for the extension). Modern file systems typically support long symbolic file names. Windows operating systems allow names of up to 255 characters. The extension is separated from the file name by the symbol “.” (dot).

The extension shows the file type:

exe, com executable files, i.e. programs written in one of the programming languages;

doc - files created in the Word text editor;

xls - files created in the Excel spreadsheet;

mdb – Access database management system (DBMS) files.

Typically, for ease of use, files are combined into directories (folders).

In order for the operating system to access files, you must specify full file name, consisting of the name of the external device (usually a disk), a sequence of subfolders, and a file name. For example,

C:\User\Letter.doc full file name Letter.doc located on drive C: in the folder User. the sequence of the name of the external device and all subfolders is called full path to the file.

Sometimes when performing certain operations (searching, copying, deleting files) you can use filename patterns.

A template is a generic name for a group of files that contains the characters: * or?.

Symbol * means that any valid characters can appear instead of it, starting from the position where it appears and until the end of the name.

Symbol ? indicates that a given position can contain any, but only one valid character.

For example,

sample *.doc stands for all files with the extension .doc,

the Letter?.doc template denotes all files named Letter1.doc, Letter3.doc, LetterZ.doc, LetterA.doc, etc.

To store files on disks and provide access to them, modern disk operating systems create file systems. The principle of organization of many file systems is tabular.

Concept file system has two meanings. This is the name, firstly, for a specific way of organizing files, directories, etc., and secondly, for a specific set of files, directories, etc., organized according to this method.

Data about where on the disk a particular file is recorded is stored in the system area of ​​the disk in special file allocation tables.

Microsoft file systems.

Early versions of Microsoft's Windows operating system used file allocation tables FAT (FAT – FileAllocationTable).

As a result of formatting, tracks (concentric circles) are formed on the disk, each of which contains a certain number of sectors. A sector is a section of a track that stores the minimum piece of information that can be read from or written to a disk.

To organize access to files recorded on magnetic disks, the OS creates a list of sectors allocated to each file. Typically, disk space is allocated to files in blocks of several sectors, called clusters 5 . Cluster is the smallest unit addressing data (determining their location) on the disk.

FAT consists of cells that store cluster numbers, and the main difference between different FATs is the size of these cells, determined by the number of binary digits (bits). Windows 95 OS uses FAT16, in which 16 bits are allocated for the cluster address and, therefore, the number of clusters is 65,536 (2 16). In the case where one cluster is equal to one sector (512 bytes), the maximum disk capacity will be 32 MB. With the advent of high-capacity disks, the cluster began to consist of several sectors - 2, 4, 8, etc.

This is where the problem of wasted disk space arises. The fact is that one cluster cannot contain more than one file. Then a 1 KB file will use an 8 KB and 16 KB cluster depending on the disk size. Windows 95 OSR2 introduced the FAT32 (32-bit) file allocation table format and increased the number of clusters to 232 = 4,294,967,296, allowing for 4 KB clusters.

Each file is associated with the full file name, file creation date, file attributes, and file length.

The FAT element for file description includes:

    attribute byte;

    modification time;

    modification date;

    No. of the 1st cluster from which file recording begins;

    file size.

When a file is written to disk, the OS writes the number of the first cluster allocated to the file in the directory in which the file is created. Then, in the element representing this cluster in the FAT, the OS writes the number of the next cluster, the selected file, etc. Thus, by starting the search for a file in the directory and following the pointers in the FAT, the OS can select clusters related to the file in the appropriate order, cluster by cluster . This is why if the FAT table is destroyed, the file cannot be restored. The FAT table is stored on disk in two copies.

The FAT16 file system is supported by all Microsoft operating systems, some Unix operating systems, and OS/2 operating systems.

Windows NT Workstation, Windows 2000 Professional, and Windows XP support the NTFS file system.

The NTFS file system is represented as a table MFT (Master File Table), having the following form:

The maximum table length is 1500 bytes.

The first 16 records are service records; they store information that describes the MFT table itself (analogous to the FAT system area).

Starting from the 17th entry there are descriptions of files and folders:

    standard information – date and time of file creation, its size;

    file name – stored in 2 versions: long (up to 255 characters) and short (8 + 3), used when using the file in MS DOS;

    the security descriptor indicates who has what rights to a given file or folder;

    data – the data of the files themselves is stored. If the file is short, then all the data is in this place. If the file is large, then part of it is stored in a field of this MFT table, and the remaining part is stored in any other area, and this area is referenced in the MFT.

The NTFS file system supports a high level of security (each file can have a protection descriptor for copying, reading, writing, modifying, etc.), and different rights can be set for different user groups.

Operating system

File system

Microsoft MS-DOS

Microsoft Windows 95

Microsoft Windows 95 OSR2

Microsoft Windows 98

Microsoft Windows NT

Microsoft Windows 2000

NTFS, FAT16, FAT32

Microsoft Windows XP

NTFS, FAT16, FAT32

Windows NT, Windows 2000 Professional, and Windows XP support the FAT file system when running on floppy disks. The hard drive supports two file systems - FAT and NTFS.

Files and file structures

Logical times of external memory devices
Each computer can have multiple external memory devices connected. The main external memory device of a PC is the hard drive. It is usually divided into several logical partitions.

Having multiple logical partitions on one hard drive provides the user with the following benefits:


  • You can store the operating system on one logical partition and data on another, allowing you to reinstall the operating system without affecting the data;

  • You can install different operating systems on one hard drive in different logical partitions;

  • Maintenance of one logical partition does not affect other partitions.

External memory devices and logical disk partitions have logical name.

On Windows OS logical names consisting of a Latin letter and a colon are accepted:


  • for floppy disk drives (floppy disks) – A: and B:

  • for hard drives and logical partitions –C:, D:, E:, etc.

  • for optical drives, the names following the hard drives (for example, F:)

  • For flash memory connected to a computer, use the following name (for example, G:)
On Linux OS Other rules for naming disks have been adopted:

  • Logical partitions belonging to the first hard drive - names hda1, hda2, etc.;

  • Logical partitions belonging to the second hard drive are named hdb1, hdb2, etc.
All programs and data are stored in the external memory of the computer in the form of files.
^ File is a named area of ​​external memory.
File system- this is a part of the OS that determines the way of organizing, storing and naming files on storage media.
A file is characterized by a set of parameters (name, size, creation date, last modified date) and attributes used by the operating system to process it (archive, system, hidden, read-only). The file size is expressed in bytes.

File name , as a rule, consists of two parts separated by a dot: the name itself file and extensions .

The file name is given by the user. The name extension is usually set automatically by the program when saving the file. The extension allows the user to type it, and the operating system to open the file using the desired application.
In Windows OS, the following characters are prohibited in the file name: \, /, :, *, ?, ”, |. On Linux, these characters, except /, are valid.

The Linux operating system, unlike Windows, distinguishes between lowercase and uppercase letters in the file name: for example, FILE.txt, file.txt and FiLe.txt are three different files in Linux.

The most common file types and extensions:


File type

Examples of extensions

System file

drv, sys

Text file

txt, rtf, doc, docx, odt

Graphic file

bmp, gif, jpg, tif, png, psd

Web page

htm, html

Sound file

wav, mp3, midi, kar, ogg

Video file

avi, mpeg

Archive

zip, rar

Spreadsheet

xls, ods

Program code (text) in programming languages

bas, pas

The following file types are distinguished in Linux OS:


  • regular files – files with programs and data

  • directories – files containing information about directories

  • links – files containing links to other files

  • special device files – files used to represent the physical devices of a computer (hard and optical drives, printer, sound speakers, etc.)

Catalogs
Each computer or storage medium may contain a large number of files. To make it easier to find information, files based on certain characteristics are combined into groups called catalogs or folders .
Directories also get their own name. The directory itself can be part of another directory external to it. Each directory can contain many files and subdirectories.

Catalog is a named collection of files and subdirectories (subdirectories).
The top-level directory is called root directory .

In Windows OS, any information carrier has a root directory, which is created by the operating system without user intervention. Root directories are designated by adding the sign “\” (backslash) to the logical name of the corresponding external memory device: A:\, C:\, B:\, etc.

In Linux, directories on hard drives or their logical partitions do not belong to the top level of the file system (they are not root directories). They are "mounted" into the mnl directory. Other external storage devices (floppy, optical, and flash drives) are “mounted” in the media directory. The mnt and media directories, in turn, are “mounted” into a single root directory, which is denoted by the “/” sign (forward slash).

File structure disk is a collection of files on the disk and the relationships between them.

File structures are simple And multi-level(hierarchical).

Simple File Structures can be used for disks with a small (up to several dozen) number of files. In this case, the disk table of contents is a linear sequence of file names.
Hierarchical file structures used to store large (hundreds and thousands) of files. Hierarchy is the arrangement of parts (elements) of a whole in order from highest to lowest.

A graphical representation of a hierarchical file structure is called tree .

To access the desired file stored on a certain disk, you can specify the path to the file – the names of all directories from the root to the one in which the file is located.

The sequentially written path to the file and the file name make up full file name .
Example of a full file name in Windows OS:

E:\images\photos\Trip.jpeg

One of the OS components is the file system - the main storage of system and user information. All modern operating systems work with one or more file systems, for example, FAT (File Allocation Table), NTFS (NT File System), HPFS (High Performance File System), NFS (Network File System), AFS (Andrew File System), Internet File System.

A file system is a part of the operating system, the purpose of which is to provide the user with a convenient interface when working with data stored in external memory and to allow files to be shared among multiple users and processes.

In a broad sense, the concept of "file system" includes:

The collection of all files on the disk;

Sets of data structures used to manage files, such as file directories, file descriptors, free and used disk space allocation tables;

A set of system software tools that implement file management, in particular: creation, destruction, reading, writing, naming, searching and other operations on files.

The file system is usually used both when loading the OS after turning on the computer, and during operation. The file system performs the following main functions:

Determines possible ways to organize files and file structure on the media;

Implements methods for accessing file contents and provides tools for working with files and file structure. In this case, access to data can be organized by the file system both by name and by address (number of sector, surface and track of the media);

Monitors free space on storage media.

When an application program accesses a file, it has no idea how the information in a particular file is located, nor what type of physical media (CD, hard disk, or flash memory unit) it is recorded on. All the program knows is the file name, its size and attributes. It receives this data from the file system driver. It is the file system that determines where and how the file will be written on physical media (for example, a hard drive).

From the operating system's point of view, the entire disk is a set of clusters (memory areas) ranging in size from 512 bytes or larger. File system drivers organize clusters into files and directories (which are actually files containing a list of files in that directory). These same drivers keep track of which clusters are currently in use, which are free, and which are marked as faulty. To clearly understand how data is stored on disks and how the OS provides access to them, it is necessary to understand, at least in general terms, the logical structure of the disk.


3.1.5 Disk logical structure

In order for a computer to store, read and write information, the hard drive must first be partitioned. Partitions are created on it using appropriate programs - this is called “partitioning the hard drive”. Without this partitioning, it will not be possible to install the operating system on the hard drive (although Windows XP and 2000 can be installed on an unpartitioned disk, they do this partitioning themselves during the installation process).

The hard drive can be divided into several partitions, each of which will be used independently. What is this for? One disk can contain several different operating systems located on different partitions. The internal structure of a partition allocated to any OS is completely determined by that operating system.

In addition, there are other reasons for partitioning a disk, for example:

Possibility of using disks with a capacity greater than MS DOS
32 MB;

If a disk is damaged, only the information that was on that disk is lost;

Reorganizing and unloading a small disk is easier and faster than a large one;

Each user can be assigned their own logical drive.

The operation of preparing a disk for use is called formatting, or initialization. All available disk space is divided into sides, tracks and sectors, with tracks and sides numbered starting from zero, and sectors starting from one. A set of tracks located at the same distance from the axis of a disk or a package of disks is called a cylinder. Thus, the physical address of the sector is determined by the following coordinates: track number (cylinder - C), disk side number (head - H), sector number - R, i.e. CHR.

The very first sector of the hard disk (C=0, H=0, R=1) contains the master boot record Master Boot Record. This entry does not occupy the entire sector, but only its initial part. The Master Boot Record is a non-system boot loader program.

At the end of the first sector of the hard drive is the disk partition table - Partition Table. This table contains four rows describing a maximum of four partitions. Each row in the table describes one section:

1) active section or not;

2) the number of the sector corresponding to the beginning of the section;

3) the number of the sector corresponding to the end of the section;

4) partition size in sectors;

5) operating system code, i.e. what OS does this partition belong to?

A partition is called active if it contains the operating system boot program. The first byte in the section element is the section activity flag (0 – inactive, 128 (80H) – active). It is used to determine whether the partition is system (bootable) and to force the operating system to boot from it when the computer starts. Only one section can be active. Small programs called boot managers may be located in the first sectors of the disk. They interactively ask the user which partition to boot from and adjust the partition activity flags accordingly. Since the Partition Table has four rows, there can be up to four different operating systems on the disk, therefore, the disk can contain several primary partitions belonging to different operating systems.

An example of the logical structure of a hard disk consisting of three partitions, two of which belong to DOS and one belongs to UNIX, is shown in Figure 3.2a.

Each active partition has its own boot record - a program that loads a given OS.

In practice, the disk is most often divided into two partitions. The sizes of partitions, whether they are declared active or not, are set by the user during the process of preparing the hard drive for use. This is done using special programs. In DOS this program is called FDISK, in Windows-XX versions it is called Diskadministrator.

In DOS, the primary partition is Primary Partition, this is the section that contains the operating system bootloader and the OS itself. Thus, the primary partition is the active partition, used as a logical drive named C:.

The WINDOWS operating system (namely WINDOWS 2000) has changed the terminology: the active partition is called the system partition, and the boot partition is the logical disk that contains the WINDOWS system files. The boot logical drive can be the same as the system partition, but it can be located on a different partition of the same hard drive or on a different hard drive.

Advanced section Extended Partition can be divided into several logical drives with names from D: to Z:.

Figure 3.2b shows the logical structure of a hard drive, which has only two partitions and four logical drives.

File systems. Types of file systems. File operations. Catalogs. Operations with directories.

File is a named area of ​​external memory that can be written to and read from.

Main purposes of using the file.

    Long-term and reliable storage of information . Durability is achieved through the use of storage devices that do not depend on power, and high reliability is determined by means of protecting access to files and the general organization of the OS program code, in which hardware failures most often do not destroy the information stored in files.

    Sharing information . Files provide a natural and easy way to share information between applications and users by having a human-readable symbolic name and consistency in the information stored and file location. The user must have convenient tools for working with files, including directories that combine files into groups, tools for searching files by characteristics, a set of commands for creating, modifying and deleting files. A file can be created by one user and then used by a completely different user, and the file creator or administrator can determine the access rights of other users. These goals are implemented in the OS by the file system.

File system (FS) is a part of the operating system that includes:

    the collection of all files on the disk;

    sets of data structures used to manage files, such as file directories, file descriptors, free and used disk space allocation tables;

    a set of system software tools that implement various operations on files, such as creating, destroying, reading, writing, naming and searching files.

Thus, the file system plays the role of an intermediate layer that screens out all the complexities of the physical organization of long-term data storage, and creates a simpler logical model for this storage for programs, as well as providing them with a set of easy-to-use commands for manipulating files.

The following file systems are widely known:

    file system operating system MS - DOS , which is based on file allocation table - FAT ( File Allocation Table ).

The table contains information about the location of all files (each file is divided into clusters Clusters of the same file are not necessarily located next to each other, depending on the availability of disk space). The MS-DOS file system has significant limitations and disadvantages, for example, under Name The file is allocated 12 bytes; working with a large hard drive leads to significant file fragmentation;

The main functions in such a FS are aimed at solving the following tasks:

    file naming;

    application programming interface;

    mapping the logical model of the file system onto the physical organization of the data storage;

    File system resilience to power failures, hardware and software errors.

    OS /2 , called HPFS ( High - Performance File System - fast file system).

Provides the ability to have a file name of up to 254 characters. Files written to disk have minimal fragmentation. Can work with files written in MS DOS;

A new task is added to the tasks listed above sharing a file from multiple processes. The file in this case is a shared resource, which means that the file system must solve the whole range of problems associated with such resources. In particular, the FS must provide means for blocking a file and its parts, preventing races, eliminating deadlocks, reconciling copies, etc.

In multi-user systems, another task appears: protecting one user's files from unauthorized access by another user.

    operating system file system Windows 95

It has a level structure, which allows you to support several file systems simultaneously. The old MS-DOS file system is directly supported, and file systems not developed by the company Microsoft, are supported using special modules. It is possible to use long (up to 254 characters) file names.

    operating system file systems Unix

They provide a unified way to access I/O file systems.

File permissions practically determine access rights to the system (the owner of the file is the user who created it).

File types

File systems support several functionally different file types, which typically include regular files, directory files, special files, named pipes, memory-mapped files, and others.

Regular files , or simply files, contain arbitrary information that is entered into them by the user or that is generated as a result of the operation of system and user programs. Most modern operating systems (for example, UNIX, Windows, OS/2) do not restrict or control the contents and structure of a regular file in any way. The contents of a regular file are determined by the application that works with it. For example, a text editor creates text files consisting of strings of characters represented in some code. These can be documents, program source codes, etc. Text files can be read on the screen and printed on a printer. Binary files do not use character codes and often have complex internal structures, such as executable program code or an archive file. All operating systems must be able to recognize at least one file type - their own executable files.

Catalogs - this is a special type of files that contain system reference information about a set of files grouped by users according to some informal criterion (for example, files containing documents of the same contract, or files that make up one software package are combined into one group). On many operating systems, a directory can contain any type of file, including other directories, creating a tree structure that is easy to search. Directories establish a mapping between file names and file characteristics that are used by the file system to manage files. Such characteristics include, in particular, information (or a pointer to another structure containing this data) about the type of file and its location on the disk, access rights to the file, and the dates of its creation and modification. In all other respects, directories are treated by the file system as regular files.

Special files - These are dummy files associated with I/O devices, which are used to unify the mechanism for accessing files and external devices. Special files allow the user to perform I/O operations using normal commands for writing to a file or reading from a file. These commands are processed first by file system programs, and then at some stage of the request execution they are converted by the operating system into control commands for the corresponding device.

Modern file systems support other file types, such as symbolic links, named pipes, and memory-mapped files.

Hierarchical file system structure

Users access files by symbolic names. However, human memory limits the number of object names that a user can refer to by name. The hierarchical organization of the namespace allows us to significantly expand these boundaries. This is why most file systems have a hierarchical structure, in which levels are created by allowing a lower-level directory to be contained within a higher-level directory (Figure 7.3).

The graph describing the directory hierarchy can be a tree or a network. Directories form a tree if a file is allowed to be included in only one directory (Fig. 7.3, b), and a network - if the file can be included in several directories at once (Fig. 7.3, c). For example, in MS-DOS and Windows, directories form a tree structure, while in UNIX they form a network structure. In a tree structure, each file is a leaf. The top-level directory is called root directory, or root ( root ).

With this organization, the user is freed from remembering the names of all files; he only needs to have a rough idea of ​​which group a particular file can be assigned to in order to find it by sequentially browsing directories. The hierarchical structure is convenient for multi-user work: each user with their files is localized in their own directory or subtree of directories, and at the same time, all files in the system are logically connected.

A special case of a hierarchical structure is a single-level organization, when all files are included in one directory (Fig. 7.3, a).

File names

All file types have symbolic names. Hierarchically organized file systems typically use three types of filenames: simple, compound, and relative.

A simple, or short, symbolic name identifies a file within a single directory. Simple names are assigned to files by users and programmers, and they must take into account OS restrictions on both the range of characters and the length of the name. Until relatively recently, these boundaries were very narrow. Thus, in the popular FAT file system, the length of names was limited to scheme 8.3 (8 characters - the name itself, 3 characters - the name extension), and in the s5 file system, supported by many versions of the UNIX OS, a simple symbolic name could not contain more than 14 characters. However, it is much more convenient for the user to work with long names because they allow you to give the files easy-to-remember names that clearly indicate what is contained in the file. Therefore, modern file systems, as well as improved versions of pre-existing file systems, tend to support long, simple symbolic file names. For example, on the NTFS and FAT32 file systems included with the Windows NT operating system, a file name can contain up to 255 characters.

In hierarchical file systems, different files are allowed to have the same simple symbolic names, provided they belong to different directories. That is, the “many files - one simple name” scheme works here. To uniquely identify a file in such systems, the so-called full name is used.

The full name is a chain of simple symbolic names of all directories through which the path from the root to the given file passes. Thus, the full name is a compound name, in which simple names are separated from each other by the separator accepted in the OS. Often a forward or backslash is used as a delimiter, and it is customary not to specify the name of the root directory. In Fig. 7.3, b two files have the simple name main.exe, but their compound names /depart/main.exe and /user/anna/main.exe are different.

In a tree file system, there is a one-to-one correspondence between a file and its full name: one file - one full name. In file systems that have a network structure, a file can be included in several directories, and therefore have several full names; here the correspondence “one file - many full names” is valid. In both cases, the file is uniquely identified by its full name.

A file can also be identified by a relative name. The relative file name is determined through the concept of "current directory". For each user, at any given time, one of the file system directories is the current directory, and this directory is selected by the user himself upon an OS command. The file system captures the name of the current directory so that it can then use it as a complement to relative names to form the fully qualified file name. When using relative names, the user identifies a file by the chain of directory names through which the route from the current directory to the given file passes. For example, if the current directory is /user, then the relative file name /user/anna/main.exe is anna/main.exe.

Some operating systems allow you to assign multiple simple names to the same file, which can be interpreted as aliases. In this case, just as in a system with a network structure, the correspondence “one file - many full names” is established, since each simple file name corresponds to at least one full name.

And although the full name uniquely identifies the file, it is easier for the operating system to work with the file if there is a one-to-one correspondence between the files and their names. For this purpose, it assigns a unique name to the file, so that the relationship “one file - one unique name” is valid. The unique name exists along with one or more symbolic names assigned to the file by users or applications. The unique name is a numeric identifier and is intended only for the operating system. An example of such a unique file name is an inode number on a UNIX system.

File attributes

The concept of “file” includes not only the data and name it stores, but also its attributes. Attributes - This is information describing the properties of the file. Examples of possible file attributes:

    file type (regular file, directory, special file, etc.);

    file owner;

    file creator;

    password to access the file;

    information about permitted file access operations;

    times of creation, last access and last modification;

    current file size;

    maximum file size;

    read-only sign;

    “hidden file” sign;

    sign “system file”;

    sign “archive file”;

    "binary/character" attribute;

    attribute “temporary” (remove after process completion);

    blocking sign;

    file record length;

    pointer to the key field in the record;

    key length.

The set of file attributes is determined by the specifics of the file system: different types of file systems may use different sets of attributes to characterize files. For example, on file systems that support flat files, there is no need to use the last three attributes in the list that are related to file structuring. In a single-user OS, the set of attributes will lack characteristics relevant to users and security, such as the owner of the file, the creator of the file, the password for accessing the file, information about authorized access to the file.

The user can access attributes using the facilities provided for this purpose by the file system. Typically, you can read the values ​​of any attribute, but only change some. For example, a user can change the permissions of a file (provided they have the necessary permissions to do so), but they cannot change the creation date or current size of the file.

File attribute values ​​can be directly contained in directories, as is done in the MS-DOS file system (Fig. 7.6a). The figure shows the structure of a directory entry containing a simple symbolic name and file attributes. Here the letters indicate the characteristics of the file: R - read-only, A - archived, H - hidden, S - system.

Rice. 7.6. Directory structure: a - MS-DOS directory entry structure (32 bytes), b - UNIX OS directory entry structure

Another option is to place attributes in special tables, when the catalogs contain only links to these tables. This approach is implemented, for example, in the ufs file system of the UNIX OS. In this file system, the directory structure is very simple. The record for each file contains a short symbolic file name and a pointer to the file index descriptor, this is the name in ufs for the table in which the file attribute values ​​are concentrated (Fig. 7.6, b).

In both versions, directories provide a link between file names and the files themselves. However, the approach of separating the file name from its attributes makes the system more flexible. For example, a file can easily be included in several directories at once. Entries for this file in different directories may have different simple names, but the link field will have the same inode number.

File Operations

Most modern operating systems treat a file as an unstructured sequence of variable-length bytes. Standard POSIX The following operations are defined on the file:

    int open ( char * fname , int flags , mode _ t mode )

This operation ``opens'' a file, establishing a connection between the program and the file. In this case the program receives file descriptor- an integer identifying this connection. In fact, this is an index in the system table of open files for a given task. All other operations use this index to reference the file.

The char * fname parameter specifies the file name. int flags is a bit mask that determines the file's opening mode. The file can be opened read-only, write-only, or read-write; in addition, you can open an existing file, or you can try to create a new file of zero length. The optional third parameter mode is used only when creating a file and specifies the attributes of this file.

    off _ t lseek ( int handle , off _ t offset , int whence )

This operation moves the read/write pointer in the file. The offset parameter specifies the number of bytes by which to offset the pointer, and the whence parameter specifies where to start the offset from. It is assumed that the offset can be counted from the beginning of the file (SEEK_SET), from its end (SEEK_END), and from current pointer position (SEEK_CUR). The operation returns the pointer position measured from the beginning of the file. Thus, calling lseek(handle, 0, SEEK_CUR) will return the current position of the pointer without moving it.

    int read(int handle, char * where, size_t how_much)

Read operation from a file. The where pointer specifies the buffer where the read data should be placed; the third parameter specifies how much data to read. The system reads the required number of bytes from the file, starting at the read/write pointer to that file, and moves the pointer to the end of the read sequence. If the file ends early, as much data is read as was left until its end. The operation returns the number of bytes read. If the file was opened for writing only, calling read will return an error.

    int write(int handle, char * what, size_t how_much)

A write operation to a file. The what pointer specifies the beginning of the data buffer; the third parameter specifies how much data to write. The system writes the required number of bytes to the file, starting at the read/write pointer to that file, replacing the data stored at that location, and moving the pointer to the end of the written block. If the file ends earlier, its length increases. The operation returns the number of bytes written.

If the file was opened read-only, calling write will return an error.

    int ioctl(int handle, int cmd, ...) ; int fcntl ( int handle , int cmd , ...)

Additional operations on the file. Initially, it seems that ioctl was intended to be operations on the file itself, and fcntl was intended to be operations on an open file handle, but then historical developments have somewhat mixed up the functions of these system calls. Standard POSIX defines some operations both on the handle, for example duplication (as a result of this operation we get two handles associated with the same file), and on the file itself, for example, the truncate operation - trim the file to a given length. In most versions Unix The truncate operation can also be used to cut data from the middle of a file. When reading data from such a cut area, zeros are read, and this area itself does not take up physical space on the disk.

An important operation is to block sections of the file.Standard POSIX offers a library function for this purpose, but in systems of the family Unix This function is implemented through the fcntl call.

Most implementations of the standard POSIX offers its own additional operations. So, in Unix SVR4 With these operations you can set synchronous or delayed recording, etc.

    caddr_t mmap(caddr_t addr, size_t len, int prot, int flags, int handle, off_t offset)

Mapping a section of a file into the virtual address space of the process. The prot parameter specifies access rights to the mapped section: read, write, and execute. The mapping can occur to a specified virtual address, or the system can select the address to map itself.

Two more operations are performed not on the file, but on its name: these are the operations of renaming and deleting the file. In some systems, for example in systems of the family Unix, a file can have multiple names, and there is only a system call to delete a name. The file is deleted when the last name is deleted.

It can be seen that the set of operations on a file in this standard is very similar to the set of operations on an external device. Both are considered as an unstructured byte stream. To complete the picture, it should be said that the main means of interprocess communication in systems of the family Unix (pipe) is also an unstructured data stream. The idea that most data transfers can be reduced to a byte stream is quite old, but Unix was one of the first systems where this idea was brought to its logical conclusion.

Approximately the same model of working with files is adopted in C.P./ M, and a set of file system calls MS DOS actually copied from calls Unix v7 . In its turn, OS/2 And Windows NT inherited the principles of working with files directly from MS DOS.

On the contrary, in systems without Unix in a pedigree, a slightly different interpretation of the concept of a file may be used. Most often, a file is treated as a set of records. Typically, the system supports both constant-length and variable-length records. For example, a text file is interpreted as a file with records of variable length, and each line of text corresponds to one record. This is the model for working with files in VMS and in OS line OS/360 -MVS from IBM.

Classification, structure, characteristics of file systems!!!

1. Concept, structure and operation of the file system.

A file system is a set (order, structure and content) of organizing data storage on storage media, which directly provides access to stored data; at the everyday level, it is a set of all files and folders on a disk. The main “units” of a file system are considered to be a cluster, a file, a directory, a partition, a volume, and a disk.
A collection of zeros and ones on a storage medium make up a cluster (the minimum size of space for storing information, they are also usually called the concept of a sector, their size is a multiple of 512 bytes).
Files - a named collection of bytes divided into sectors. Depending on the file system, a file may have a different set of properties. For convenience in working with files, their (symbolic identifiers) names are used.
To organize the structure of the file system, files are grouped into catalogs .
Chapter - an area of ​​a disk created when partitioning it and containing one or more formatted volumes.
Volume - partition area with file system, file table and data area. One or more sections make up disk .
All information about files is stored in a special area of ​​the partition - the file table. The file table allows you to associate numeric file identifiers and additional information about them (modified date, access rights, name, etc.) with the actual file contents stored in another partition area.

MBR (Master Boot Record) a special area located at the beginning of the disk - containing the information necessary for the BIOS to boot the operating system from the hard drive.
The partition table is also located at the beginning of the disk; its task is to store information about the partitions: beginning, length, load. The boot partition contains the boot sector, which stores the operating system boot program.

The countdown starts from the MBR (from sector number 0) for all primary partitions, both regular and extended, and only for primary ones.
All regular logical (not extended logical) sections are specified by an offset relative to the beginning of the extended section in which they are described.
All extended logical partitions are specified by an offset relative to the beginning of the extended primary partition.

The operating system boot process is as follows:
When you turn on the computer, the BIOS takes control of the processor, boots from the hard drive, loads the first sector of the disk (MBR) into the computer's RAM and transfers control to it).

The MBR can be written as a "standard" bootloader,

and bootloaders like LILO/GRUB.

The standard boot loader finds the first partition with the bootable flag in the main partition table, reads its first sector (boot sector) and transfers control to the code written in this boot sector. If instead of the standard MBR bootloader there is another one, then it does not look at the bootable flag and can boot from any partition (prescribed in its settings).

For example, to load the Windows NT/2k/XP/2003 operating system, code is written in the boot sector that loads the main loader (ntloader) from the current partition into memory.
Each FAT16/FAT32/NTFS file system uses its own bootloader. The root of the partition must contain the file ntldr. If you see the message "NTLDR is missing" when you try to boot Windows, then this is precisely the case when the ntldr file is missing. Also, for normal operation of ntldr, you may need the files bootfont.bin, ntbootdd.sys, ntdetect.com and a correctly written boot.ini.

Example boot.ini

C:\boot.ini

timeout=8
default=C:\gentoo.bin

C:\gentoo.bin="Gentoo Linux"
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows XP (32-bit)" /fastdetect /NoExecute=OptIn
multi(0)disk(0)rdisk(0)partition(3)\WINDOWS="Windows XP (64-bit)" /fastdetect /usepmtimer

Example grub.conf configuration file

#grub.conf generated by anaconda
#
#Note that you do not have to rerun grub after making changes to this file
#
#NOTICE: You have a /boot partition. This means that
#all kernel and initrd paths are relative to /boot/, eg.
#root (hdO.O)
#kernel /vmlinuz-version ro root=/dev/sda2
#initrd/initrd-version.img
#boot=/dev/sda default=0 timeout=5
splashimage=(hdO,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux server (2.6.18-53.el 5)
root (hdO.O)
kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet-
initrd /initrd-2.6.18-53.el5.img

Structure of the lilo.conf file

# LILO configuration file generated by "liloconfig"
//Section for describing global parameters
# Start LILO global section
//The location where Lilo is recorded. IN in this case this is MBR
boot = /dev/hda
//Message that is displayed when loading
message = /boot/boot_message.txt
//Output an invitation
prompt
//Time Out to select the operating system
timeout = 1200
# Override dangerous defaults that rewrite the partition table:
change-rules
reset
#VESA framebuffer console @ 800x600x256
//Selecting video mode for menu display
vga = 771
# End LILO global section
//Section for describing windows boot parameters
# DOS bootable partition config begins
other = /dev/hda1
label = Windows98
table = /dev/hda
# DOS bootable partition config ends
//Section for describing QNX boot parameters
# QNX bootable partition config begins
//Path to the operating system
other = /dev/hda2
label = QNX
table = /dev/hda
# QNX bootable partition config ends
//Section for describing Linux boot parameters
# Linux bootable partition config begins
//Path to the kernel image
image = /boot/vmlinuz
root = /dev/hda4
label = Slackware
read-only
# Linux bootable partition config ends


2.The most famous file systems.

  • Advanced Disc Filing System
  • AdvFS
  • Be File System
  • CSI - DOS
  • Encrypting File System
  • Extended File System
  • Second Extended File System
  • Third Extended File System
  • Fourth Extended File System
  • File Allocation Table (FAT)
  • Files - 11
  • Hierarchical File System
  • HFS Plus
  • High Performance File System (HPFS)
  • ISO 9660
  • Journaled File System
  • Macintosh File System
  • MINIX file system
  • MicroDOS
  • Next3
  • New Implementation of a Log-structured F (NILFS)
  • Novell Storage Services
  • New Technology File System (NTFS)
  • Protogon
  • ReiserFS
  • Smart File System
  • Squashfs
  • Unix File System
  • Universal Disk Format (UDF)
  • Veritas File System
  • Windows Future Storage (WinFS)
  • Write Anywhere File Layout
  • Zettabyte File System (ZFS)

3.Main characteristics of file systems.

The operating system provides applications with a set of functions and structures for working with files. The capabilities of the operating system impose additional restrictions on the limitations of the file system; the main restrictions include:

Maximum (minimum) volume size;
- Maximum (minimum) number of files in the root directory;
- Maximum number of files in a non-root directory;
- File-level security;
- Support for long file names;
- Self-healing;
- Compression at the file level;
- Maintaining transaction logs;

4. Brief description of the most common file systems FAT, NTFS, EXT.

File system FAT.

FAT (file allocation table) stands for file allocation table.
In the FAT file system, the logical disk space of any logical drive is divided into two areas:
- system area;
- data area.
The system area is created during formatting and updated when the file structure is manipulated. The data area contains files and directories subordinate to the root and is accessible through the user interface. The system area consists of the following components:
- boot record;
- reserved sectors;
- file allocation tables (FAT);
- root directory.
The file allocation table is a map (image) of the data area, which describes the state of each section of the data area. The data area is divided into clusters. A cluster is one or more contiguous sectors in a logical disk address space (data area only). In the FAT table, clusters belonging to the same file (non-root directory) are linked into chains. The FAT16 file management system uses a 16-bit word to indicate the cluster number, so you can have up to 65,536 clusters.
A cluster is the minimum addressable unit of disk memory allocated to a file or non-root directory. A file or directory occupies an integer number of clusters. In this case, the last cluster may not be fully used, which will lead to a noticeable loss of disk space if the cluster size is large.
Since FAT is used very intensively when accessing the disk, it is loaded into RAM and remains there for as long as possible.
The root directory differs from a regular directory in that it is located in a fixed location on a logical disk and has a fixed number of elements. For each file and directory, the file system stores information according to the following structure:
- file or directory name – 11 bytes;
- file attributes – 1 byte;
- reserve field – 1 byte;
- creation time – 3 bytes;
- creation date – 2 bytes;
- last access date – 2 bytes;
- reserved – 2 bytes;
- last modification time – 2 bytes;
- initial cluster number in FAT – 2 bytes;
- file size – 4 bytes.
The structure of the file system is hierarchical.

File system FAT32.
FAT32 is a completely independent 32-bit file system and contains numerous improvements and additions over FAT16. The fundamental difference between FAT32 is its more efficient use of disk space: FAT32 uses smaller clusters, which leads to savings in disk space.
FAT32 can move the root directory and use the FAT backup instead of the standard one. The FAT32 Enhanced Boot Record allows you to create copies of critical data structures, making drives more resilient to FAT structure violations than previous versions. The root directory is a regular chain of clusters, so it can be located in any location on the disk, which removes the limitation on the size of the root directory.


NTFS file system.
The NTFS (New Technology File System) file system contains a number of significant improvements and changes that significantly distinguish it from other file systems. From the users' point of view, files are still stored in directories, but working on large disks in NTFS is much more efficient:
- there are means to restrict access to files and directories;
- mechanisms have been introduced that significantly increase the reliability of the file system;
- many restrictions on the maximum number of disk sectors and/or clusters have been removed.

Main characteristics of the NTFS file system:
- reliability. High-performance computers and shared systems must have increased reliability; for this purpose, a transaction mechanism has been introduced in which file transactions are logged;
- expanded functionality. New features have been introduced into NTFS: improved fault tolerance, emulation of other file systems, a powerful security model, parallel processing of data streams, creation of user-defined file attributes;
- POSIX standard support. Basic features include optional case-sensitive file names, storage of the time a file was last accessed, and an alternate name mechanism that allows the same file to be referenced by multiple names;
- flexibility. Disk space allocation is highly flexible: the cluster size can vary from 512 bytes to 64 KB.
NTFS works well with large data sets and large volumes. The maximum volume (and file) size is 16 EB. (1 EB is equal to 2**64 or 16000 billion gigabytes.) The number of files in the root and non-root directories is not limited. Because the NTFS directory structure is based on an efficient data structure called a "binary tree", the search time for files in NTFS is not linearly related to the number of files.
NTFS has some self-healing capabilities and supports various mechanisms for checking the integrity of the system, including transaction logging, which allows you to track file write operations in the system log.
The NTFS file system supports the security object model and treats all volumes, directories, and files as independent NTFS objects. Access rights to volumes, directories, and files depend on the user account and the group to which the user belongs.
The NTFS file system has built-in compression capabilities that can be applied to volumes, directories, and files.

Ext3 file system.
The ext3 file system can support files up to 1 TB in size. With the Linux kernel 2.4, the file system size is limited by the maximum block device size, which is 2 terabytes. In Linux 2.6 (for 32-bit processors), the maximum block device size is 16 TB, however ext3 only supports up to 4 TB.
Ext3 has good NFS compatibility and does not have performance problems when there is a shortage of free disk space. Another advantage of ext3 comes from the fact that it is based on the ext2 code. The disk format of ext2 and ext3 is identical; It follows from this that, if necessary, the ext3 filesystem can be mounted as ext2 without any problems. And that is not all. Due to the fact that ext2 and ext3 use identical metadata, it is possible to update ext2 to ext3 on the fly.
Ext3 reliability
In addition to ext2-compatible, ext3 inherits other advantages of the common metadata format. ext3 users have at their disposal the fsck tool that has been proven for years. Of course, the main reason for switching to a journaling filesystem is to eliminate the need for periodic and lengthy checks of the consistency of metadata on disk. However, "logging" does not protect against kernel crashes or disk damage (or anything like that). In an emergency, you will appreciate the fact that ext3 has continuity from ext2 with its fsck.
Journaling in ext3.
Now that we have a general understanding of the problem, let's look at how ext3 does journaling. The ext3 logging code uses a special API called the Journaling Block Device layer or JBD. JBD was designed for logging on any block device. Ext3 is tied to the JBD API. In this case, the ext3 filesystem code informs the JBD about the need for modification and requests permission from the JBD to carry it out. The journal is managed by JBD on behalf of the ext3 filesystem driver. This convention is very convenient, since JBD is developed as a separate, universal object and can be used in the future for journaling in other filesystems.
Data protection in Ext3
Now we can talk about how the ext3 filesystem provides logging for both data and metadata. There are actually two methods for guaranteeing consistency in ext3.
ext3 was originally designed for logging full data and metadata. In this mode (called "data=journal" mode), JBD journals all changes to the filesystem, related to both data and metadata. In this case, JBD can use the journal to rollback and restore metadata and data. The disadvantage of “full” logging is its rather low performance and the consumption of a large amount of disk space for the journal.
Recently, a new journaling mode was added to ext3 that combines high performance with the guarantee of file system structure consistency after a crash (like "regular" journaled file systems). The new operating mode only serves metadata. However, the ext3 filesystem driver still tracks the processing of entire blocks of data (if they involve metadata modification), and groups them into a separate object called transaction. The transaction will be completed only after all data has been written to disk. A side effect of this crude technique (called "data=ordered" mode) is that ext3 provides a higher probability of data integrity (compared to "advanced" journaling file systems) while guaranteeing metadata consistency. In this case, only changes to the file system structure are logged. Ext3 uses this mode by default.
Ext3 has many advantages. It is designed for maximum ease of deployment. It is based on years of proven ext2 code and inherited the wonderful fsck tool. Ext3 is primarily intended for applications that do not have built-in capabilities to guarantee data integrity. Overall, ext3 is a wonderful file system and a worthy continuation of ext2. There is one more characteristic that positively distinguishes ext3 from other journaled filesystems under Linux - high reliability.

The ext4 file system is a worthy evolutionary continuation of the ext system.