Checking the FS and recovering deleted files in Linux. Checking and repairing file systems in Linux - fsck command

Linux is one of the most reliable operating systems you'll ever see, but that doesn't mean the hardware running Linux is as reliable. Hard disks may work with errors and, as a result, you will get errors on your file systems. It doesn't matter how reliable your operating system is if you accidentally delete necessary files or catalogs. However, do not despair if something similar happens to you. Linux has everything you need to help you recover lost files as a result of deletion or failure of disks and file systems. What tools are we talking about? First of all, we will look at the utilities e2fsck, scalpel And lsof. In today's post, we will see how using such a set of tools you can correct FS errors and recover deleted files.

Checking FS ext2/ext3/ext4 using e2fsck

Utility e2fsck is a descendant of the famous UNIX utility fsck, designed to check file systems. With help e2fsck you can check for errors and execute restoration work in file systems ext2/ext3/ext4.

One of the most important points The thing about working with e2fsck is that it can only be used to work on an unmounted file system, otherwise you can get yourself even more headaches, which is what the utility itself warns about when you try to launch it to work on a mounted file system. If the FS being checked is not root, then you can shut down all users and switch to single-user mode ( init 1), unmount the file system and work with it.

However, the author still recommends using one and, after booting from it, perform all the work. Using this method, you will have unmounted file files at your complete disposal without the need to perform any additional actions.

If for some reason you choose the first option, then after you switch to single-user mode:

# init 1

unmount the one needed for work file system:

# umount /dev/sdb1

and after successful unmounting, run e2fsck:

# e2fsck -y /dev/sdb1

Option "-y" informs the utility e2fsck that we agree in advance with all her questions and are leaving to drink coffee, in the hope that she will do everything on her own. Depending on the size of the file system, verification and recovery may take some time. After the check is completed, you can always run the test again to make sure that no new errors have arisen in the FS, which may be caused by hardware problems with the drive.

After all checks and repair work are completed, you can mount the verified file system and return back to multi-user mode. Or you can simply reboot the system.

Recovering deleted files using /proc and lsof

Now let's look at the recovery process deleted files. In general, the reason you can recover a deleted file is the fact that "file" is just a reference to the file's inode ( inode). Exactly at inode information about the physical location of the file is stored. When you delete a file, you are actually just deleting the link to inode, while the descriptor itself will exist for some time: until the process that previously opened this file releases the corresponding descriptor for writing. Thus, there is some time, albeit short, during which it is possible to restore the contents of a deleted file. The key to this process is the file system, which contains, among other things, information about all processes running on the system and the files they open. Each process running in the system has a corresponding PID directory in /proc. Knowing the PID of the process that is still holding the deleted file open, we can always restore its contents from the /proc// directory of the process that opened it. Let's get on simple example Let's see how it's done.

First let's create some file:

$ echo "Very important data" > ~/myfile.txt

Now we have a file myfile.txt with important data, located in the home directory. Let's try to remove it and then restore it as follows. First we will open the file for viewing with the command less, after which we will pause its work, thus leaving the file we need open. So, step by step.

Open the file with the command less to view

$ less ~/myfile.txt

Once the file is open and you can see its contents, click Ctrl+z to pause execution less.

Delete the file:

$ rm ~/myfile.txt

Make sure the file no longer exists

$ ls -l ~/myfile.txt

Since the work previously launched by us less is not completed yet, the file remains open for it and is not actually deleted. Let's restore it.

First you need to find out the PID of the process that opened the file and the number file descriptor. This can be done using the program lsof:

$lsof | grep myfile.txt less 2675 ashep 4r REG 8.1 37 294478 /home/ashep/myfile.txt (deleted)

In the second output field lsof contains PID - 2675, and the fourth descriptor number is 4. Now you can start recovery:

$ cp /proc/2675/fd/4 ~/recovered.txt

Check if the content is in the file we need:

$ cat ~/recovered.txt Very important data

As you can see, everything went well and we were able to recover the deleted file.

Recovering deleted files using Scalpel

Once the process that opened the file exits, recovering the file becomes more difficult because the inode is freed and all communication between the data in the disk blocks and the file system is lost. As long as the data is not physically overwritten on the disk, it is possible to restore it using the utility Scalpel. This tool traverses the contents of the disk block by block and analyzes its contents, trying to find signs of the existence of files there. For search Scalpel uses patterns from sequences of bytes inherent in certain file types. For example, PNG files contain the sequence of bytes in the header \x50\x4e\x47.

Scalpel you will find in the repositories of most modern distributions. After installing the utility, the first thing you need to do is decide what files the program will look for when running. Search pattern definitions are in the file /etc/scalpel/scalpel.conf. By default, the contents of the file are completely commented out and before you start working you need to uncomment necessary templates and/or add your own. The template description format is quite simple:

Extension case_sensitive size header

extension defines the file extension that Scalpel will be added upon restoration;
case_sensitive tells the utility whether the case of characters in the search pattern is important;
using size is determined maximum size recoverable files;
V header and optional footer the sequences of the file header and its bottom part are described, respectively.

For example, a template definition for JPG files might look like this:

Jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9

After you deposit necessary changes V configuration file Scalpel and prepare an empty (required!) directory to save the found files, you can start the search and recovery process:

# scalpel -o ~/recovered /dev/sdb1

where using the option "-o" The path to the directory for saving the found files is determined. The utility's operating process is usually very long, since it scans the entire device, so take a moment and go for a walk outside, fresh air has never bothered anyone;)

After Scalpel completes its work, examine the contents of the output directory to see if it contains the files you need.

Conclusion

Few people would like to find themselves in a situation where important data is accidentally deleted or damaged. And although Linux offers tools for recovering lost data; there is little pleasure in this process. Therefore, always be extremely careful about your data and how you work with it. And, of course, do not forget about timely treatment - an old and proven method that has saved more than one thousand nerve cells from certain death.

A faulty hard drive is one of the most unpleasant phenomena in computer operation. Not only can we easily lose a lot important information and files, and replacing the HDD takes a toll on the budget. Let's add to this the wasted time and nerves, which, as we know, are not restored. In order not to let the problem take us by surprise and diagnose it in advance, it is worth knowing how to check HDD for errors in Ubuntu OS. Software tools There are plenty of people providing such services.

How to test your hard drive for errors in Ubuntu.

It is not at all necessary to download programs to perform a disk check in Ubuntu. The operating system already has a utility that is designed for this task. It's called badblocks and is controlled via the terminal.

Open a terminal and enter:

This command displays information about all HDDs that are used by the system.

After this we enter:

sudo badblocks -sv /dev/sda

The command is already used to search for damaged sectors. Instead of /dev/sda, enter the name of your drive. The -s and -v switches are used to display in in the right order the progress of checking blocks (s) and to issue a report on all actions (v).

By pressing Ctrl keys+ C we stop checking the hard drive.

You can also use two other commands to monitor the file system.

To unmount the file system, enter:

To check and correct errors:

sudo fsck -f -c /dev/sda
“-f” forces the process, that is, it runs it even if the HDD is marked as healthy;
"-c" finds and marks bad blocks;
“-y” is an additional input argument that immediately answers Yes to all system questions. Instead, you can enter “-p”, it will check automatically.

Programs
Additional software also does an excellent job with this function. And sometimes even better. Moreover, some users find it easier to work with a graphical interface.
GParted is just for those who text interface not to my liking. The utility performs a large number of tasks related to HDD operation on Ubuntu. This includes checking the disk for errors.
First, we need to download and install GParted. Enter the following command to download from the official repositories:
sudo apt-get install gparted

Open the application. All media are immediately displayed on the main screen. If any of them are marked exclamation point, that means something is already wrong with him.
Click on the disk that you want to check.
Click on the “Section” button located at the top.
Select “Check for errors”.

The program will scan the disk. Depending on its volume, the process may take longer or less. After scanning we will be notified of its results.
This is a more complex utility that performs a more serious HDD check various parameters. As a result, it is also more difficult to manage. Graphical interface not provided in Smartmontools.
Download the program:
aptitude install smartmontools

Let's look at what drives are connected to our system. You need to pay attention to lines ending with a letter, not a number. These lines contain information about the disks.
ls -l /dev | grep -E 'sd|hd'

Enter the command to output detailed information about the carrier. It's worth looking at the ATA parameter. The fact is that when replacing a native disk, it is better to install a device with the same or larger ATA. This way you can maximize its capabilities. And also look and remember the SMART parameters.
smartctl --info /dev/sde

Let's start the check. If SMART is supported, then add “-s”. If it is not supported or is already enabled, then this argument can be removed.
smartctl -s on -a /dev/sde

After that, look at the information under READ SMART DATA. The result can take two values: PASSED or FAILED. If the latter happens, you can start making backup copies and looking for a replacement hard drive.
The program's capabilities do not end there. But for a one-time HDD checks this will be quite enough.
Safecopy
This is already the kind of program that is just right to use on a sinking ship. If we are aware that something is wrong with our disk and aim to save as many surviving files as possible, then Safecopy will come to the rescue. Its task is precisely to copy data from damaged media. Moreover, it extracts files even from broken blocks.
Install Safecopy:
sudo apt install safecopy

We transfer files from one directory to another. You can choose any other one. IN in this case we are transferring data from the sda drive to the home folder.
sudo safecopy /dev/sda /home/

Bad blocks
Some may have questions: “What are these broken blocks and where did they even come from on my HDD if I never touched it?” Bad blocks, or bad sectors - HDD partitions, which are no longer readable. At least that's how they are objective reasons were marked by the file system. And most likely, there is really something wrong with the disk in these places. “Bads” are found both on old hard drives and on the most modern ones, since they work using almost the same technologies.
They appear bad sectors By various reasons.
Recording interrupted due to power failure. All information entering the hard drive is broken down in the form of ones and zeros into its various parts. To disrupt this process means to greatly confuse the hard drive.
Poor quality assembly. There is nothing to say here. Cheap Chinese device Anything can fly.

Now you know how to scan your HDD for errors. Checking the disk, both on Ubuntu and on other systems, is a fairly important operation that should be carried out at least once a year.

command to check the file system and restore it interactively.

Syntax

fsck-p [-f] fsck[-l maxparallel] [-q] [-y] [-n] [-d]

Description

The fsck utility in the first version of the call checks a set of standard file systems or systems specified in the parameters. To do this, she uses a standard script located at /etc/rc performing an automatic reboot. Using the getfsent call, the utility reads the filesystem descriptor to determine which filesystems need to be scanned. Sections that have parameters "rw", "rq", "ro" and those that have a non-zero pass parameter will be checked. File systems with pass option 1 (standard root file system) will be scanned once.

Now fsck is a shell that calls other utilities from the group if necessary fsck_XXX. Now available fsck_hfs, fsck_msdos, fsck_exfat And fsck_udf. If the utility encounters serious file system violations or the format of the partition being scanned does not match one of the above, fsck will fail and the automatic reboot will fail. For each corrected system, one or more lines will be displayed with information describing the system, location and nature of the correction.

If a QUIT signal is received, fsck will complete the system check and exit with status crash. This is useful if you need to complete the check during automatic reboot, but do not launch multi-user mode.

Without the -p option, fsck checks for and interactively repairs incompatible conditions for file systems. Some of these recovery actions may result in the loss of some data. The amount of loss can be seen in the diagnostic output. If the user does not have write permissions on the file system, the utility will automatically be executed with the -n parameter.

Options

-f	Force verification. Ignore the "clean" flag
-l	Limit the number of parallel checks to the number specified after the argument. By default, fsck runs one process per disk. If the limit is set fewer, then the check occurs sequentially.
-p	Cleaning mode
-q	Execute quick check if the volume was unmounted
-y	Answer all questions asked affirmatively. Use with extreme caution.
-n	do not ask for any confirmation from the operator other than "CONTINUE?" (continue). Do not run the utility if the file system is writable.

On operating systems Mac OS X starting with dthcbb 10.3 there is practically no need to use this utility. And it solves most problems disk utility diskutil.

If you still have such a need, reboot your computer into single-user mode by holding down the Cmd+S keys while loading. Dial in command line

/sbin/fsck -fy

After checking the disks, either

Sometimes, for various reasons (as a result of a failure, incorrect shutdown), file systems accumulate errors. The errors themselves are “mismatched” data structures. Naturally, if such a situation arises, it is necessary to put the damaged thing in order as soon as possible. The utility copes with this task perfectly. fsck. It is indeed very effective and system administrators very often use it first of all to restore or repair file systems.

How does fsck work?

Utility fsck (F ile S system Consistency Che ck) initially deeply checked all data structures in a row, i.e. the entire file system. To find errors, she used heuristic analysis methods to speed up and optimize the error finding process. However, even in this case, for large file systems, this procedure could take many hours.

Later, a scheme for assessing the state of the file system was implemented, which is based on the sign of a “clean bit of the file system.” If a failure occurred and the file system (FS) was incorrectly dismantled, then this bit was set in the FS superblock. By default, in Linux systems, at one of the stages of system boot, file systems are checked, which are registered in the files /etc/fstab, /etc/vfstab, as well as in /etc/filesystems. Thus, by analyzing the “clean bit” of the FS during system boot, the utility determines whether it is worth checking.

Journaled file systems currently allow the utility to work only with those data structures that really need to be repaired or restored. If necessary, fsck can restore the entire FS thanks to the same FS logs.

Some features of using fsck in Linux

For Linux systems, quite often (especially when using an ext FS), the FS check can be organized in such a way that it will be carried out after a certain number of uninstalls, even if the FS is completely functional. This is especially true for desktop computers, which can be turned off/on every day, rebooted due to the nature of their operation and application, as well as due to free access to them for connection external devices. In such cases, checking the FS (although it is a useful and favorable procedure) turns out to be too frequent and therefore pointless.

By default, in Linux, the FS is checked after 20 dismantlings. In order to change the number of dismantlings after which a FS check is needed, you need to use the command tune2fs:

$ sudo tune2fs -c 50 /dev/sda1 tune2fs 1.44.1 (24-Mar-2018) Setting maximal mount count to 50

fsck syntax and basic options

The team fsck the following syntax:

Fsck [parameter] -- [FS parameters] [<файловая система> . . .]

Main parameters:

Option	Description
-A	Checks all FS
-WITH [ ]	Shows execution status. Here fd is a file descriptor when displayed via GUI
-l	Locks a device for exclusive access
-M	Prohibits checking mounted file systems
-N	Shows a simulated execution without running a real test
-P	Check together with the root file system
-R	Skips checking the root file system. Can only be used in conjunction with the -A option
-r [ ]	Displays statistics for each scanned device
-T	Don't show title on startup
-t<тип>	Specifies the FS to check. You can specify several FS, listing them separated by commas
-V	Outputs detailed description performed actions

In addition to the basic options for fsck, there are also specific ones that depend on the task being performed and/or the FS. You can read about this in more detail in the corresponding pages using the command man fsck. The table of contents of the main manual for the utility (under "SEE ALSO") contains links to other pages, such as fstab(5), mkfs(8), fsck.ext2(8), fsck.ext3(8), etc. Information These links can be viewed by running the man command with the appropriate parameters, for example man fsck.ext3.

The following table lists additional (special) options, as well as the most commonly used options, allowing you to use the command with maximum flexibility and efficiency:

Option	Description
-a	Deprecated option. Indicates that all errors found should be corrected without user approval.
-r	Used for ext file systems. Tells fsck to ask the user before fixing each error
-n	Performs only a file system check, without error correction. Also used to obtain information about the FS
-c	Applicable for ext3/4 file systems. Marks all damaged blocks to prevent subsequent writing to them
-f	Forcefully checks the FS, even if the FS is working
-y	Automatically confirms requests to the user
-b	Specifies the superblock address
-p	Automatically correct detected errors. Replaces the obsolete -a option

Examples of using fsck

For the most typical situation, typical for cases when you need to restore (or rather “repair”) the file system, for example on the /dev/sdb2 device, you should use the command:

$ sudo fsck -y /dev/sdb2

Here the -y option is necessary, because without it you will have to give confirmation too often. Next command will allow to produce forced verification FS, even if it is working:

$ sudo fsck -fy /dev/sdb2

One of the most useful is the option that allows you to mark bad sectors and this option is used most often. Typically, such situations (with damaged sectors) occur after failures caused by an abnormal power outage:

$ sudo fsck -c /dev/sdb2

Work with file systems must be carried out when they are unmounted from partitions. However, if a situation arises when you still need to check on mounted FS, then before using the fsck command with the appropriate option, you must first remount the desired FS in read-only mode:

$ sudo mount remount,ro /dev/sdb2 $ sudo fsck -fy /dev/sdb2

To specify which FS to use for a partition:

$ sudo fsck -t ext4 -y /dev/sdb2

If fsck fails to correct/repair the FS (which happens very rarely), then this may be due to a damaged FS superblock. It can also be restored because superblocks are backed up. But first you need to find out at what addresses these copies were written, and then try to restore the superblock from one of them backup copies:

$ sudo fdisk -l $ sudo mkfs -t ext4 -n /dev/xvdb1 $ sudo fsck -b 163840 /dev/xvdb1

The -l command is mentioned in in this example for clarity, you first need to imagine which device to work with, since it displays a list (in this conclusion omitted) available partitions. The mkfs command is designed to create a file system, but with the -n option it can be used to obtain information about the file system, including the location of superblocks. You should ensure that the -t switch for mkfs specifies the file system corresponding to the actual state, in this case ext4.

Conclusion

In this article we looked at the operation and use of the utility fsck. As can be seen from the article, using the utility doesn't provide much complexity. And its capabilities for checking and restoring file systems in Linux are quite large, so knowledge of this utility is simply necessary for the system administrator.

If you find an error, please highlight a piece of text and click Ctrl+Enter.

Sooner or later this happens, namely the crash of the system or partition, the inability to check the file system, etc. Therefore, the system administrator must know what to do in such situations, so to speak, know it like “Our Father.”

1) fsck when loading OS

When a power failure occurs, the fsck: file system consistency check and interactive repair or if in Russian, then “file system integrity check and interactive recovery”. By default, disk scanning is disabled. To enable it when the system boots, add the following line

fsck_y_enable="YES"

to file /etc/rc.conf. In this case, when incorrect termination server operation, a scan of all file systems will be automatically launched.

The verification itself consists of 5 stages:

**Phase 1 - Check Blocks and Sizes
**Phase 2 - Check Pathnames

**Phase 5 - Check Cyl groups

In fact, Phase 1 also divided into 1a And 1b. This can only be noticed when a serious crash has occurred.

All this is good, but there is one BUT! When the file system is checked, until the partition is checked, it will not be mounted and become accessible, and accordingly, the server boot time will increase. The developers foresaw and did this possible launch background checks. Although in reality it is only an attempt, it is still better than nothing. it is enabled by default. It’s true that in this regard there are discussions on the topic “whether it is necessary to enable background checks or not.” You decide.

There is one unpleasant moment in the process of checking the FS during loading. If the partition is large enough, then checking it may take a long time, and fsck It seems to freeze at each stage. In other words, it is visually unclear whether a check is in progress or the server is frozen. Well, with all this, it is not clear how much has already been checked and how much will be checked. To make life a little easier system administrators, the developers have introduced an undocumented feature. pressing combination Ctrl+T shows the current status of the scan: how much has already been checked, as a percentage. If after a couple of minutes you want to find out the status again, you need to press again Ctrl+T and so on every time (or just press and hold, then you will get dynamically updated data).

There are several parameters that are written in /etc/rc.conf and touch fsck. Below are their default values:

fsck_y_enable="NO" # Enable startup check if the job was completed incorrectly.
fsck_y_flags="" # Additional flags for fsck -y
background_fsck="YES" # Attempt to run scan in background
background_fsck_delay="60" # Delay time before running fsck in the background.

fsck_y_enable="YES"

And so, here are examples of work fsck:

— if the server was shut down correctly, then when loading we will see the following message:

Nov 10 14:36:33 mail kernel: Starting file system checks:
Nov 10 14:36:33 mail kernel: /dev/da0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
Nov 10 14:36:33 mail kernel: /dev/da0s1a: clean, 942456 free (2944 frags, 117439 blocks, 0.3% fragmentation)
Nov 10 14:36:33 mail kernel: /dev/da0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS
Nov 10 14:36:33 mail kernel: /dev/da0s1d: clean, 503428 free (60 frags, 62921 blocks, 0.0% fragmentation)
Nov 10 14:36:33 mail kernel: /dev/da0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
Nov 10 14:36:33 mail kernel: /dev/da0s1e: clean, 2301104 free (50872 frags, 281279 blocks, 1.0% fragmentation)
Nov 10 14:36:33 mail kernel: /dev/da0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS
Nov 10 14:36:33 mail kernel: /dev/da0s1f: clean, 162210122 free (2260506 frags, 19993702 blocks, 0.5% fragmentation)
Nov 10 14:36:33 mail kernel: Mounting local file systems:

Availability key phrase FILE SYSTEM CLEAN; SKIPPING CHECKS indicates a previous correct completion.

- if incorrect, then this

Starting background file system checks in 60 seconds.
Jan 26 18:39:19 mail kernel: Starting file system checks:
Jan 26 18:39:19 mail kernel: /dev/da0s1a: 56013 files, 201857 used, 3349718 free (1702 frags, 418502 blocks, 0.0% fragmentation)
Jan 26 18:39:19 mail kernel: /dev/da0s1d: DEFER FOR BACKGROUND CHECKING
Jan 26 18:39:19 mail kernel: /dev/da0s1f: DEFER FOR BACKGROUND CHECKING
Jan 26 18:39:19 mail kernel: /dev/da0s1e: DEFER FOR BACKGROUND CHECKING

But this doesn't always happen. If the attempt is unsuccessful, we will see this:

** /dev/ad2s1g (NO WRITE)
**Last Mounted on /var

INCORRECT BLOCK COUNT I=446041 (4 should be 0)
CORRECT? yes
INCORRECT BLOCK COUNT I=446045 (4 should be 0)
CORRECT? yes

**Phase 2 - Check Pathnames
**Phase 3 - Check Connectivity
**Phase 4 - Check Reference Counts

UNREF FILE I=89148 OWNER=root MODE=100600
SIZE=376 MTIME=Aug 13 13:49 2006
RECONNECT? yes
CLEAR? yes
UNREF FILE I=89152 OWNER=root MODE=100600
SIZE=755 MTIME=Aug 13 13:49 2006
RECONNECT? yes
CLEAR? yes

**Phase 5 - Check Cyl groups

FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
2242 files, 1607116 used, 973436 free (2196 frags, 121405 blocks, 0.1% fragmentation)

2) Manual start fsck

Let me immediately note that the check is being done ONLY ON AN UNMOUNTED SECTION! Otherwise, you may lose all your data.
And so, we will consider only those parameters that are often used. Namely

-y|-n : This option will respond accordingly YES|NO to all questions if inconsistencies arise.
-B|-F : background and non-background modes, respectively
-f : check the partition, even if it was disabled correctly.

fsck -y -f /dev/ad2s1g

If you run without a parameter -y, then when checking and finding inconsistencies, a question will be issued that can be answered Y or N. usually answer Y. It's not very convenient to answer every time Y, so it's better to run with the parameter Y

** /dev/ad2s1g (NO WRITE)
**Last Mounted on /var
**Phase 1 - Check Blocks and Sizes

INCORRECT BLOCK COUNT I=446041 (4 should be 0)
CORRECT?

Eat good news: combination CTRL+T also works in manual mode.