System for automatically creating signatures of executable files. Structure of software components

Obfuscators

Debuggers

Debugger or debugger(English debugger) is a development environment module or a separate application designed to find errors in a program. The debugger allows you to perform step-by-step tracing, monitor, set or change variable values during program execution, set and remove breakpoints or stopping conditions, and so on.

Obfuscation(from the Latin obfuscare - to obscure, darken; and English obfuscate - to make unobvious, confusing, confusing) or code obfuscation - bringing the source text or executable code of a program to a form that preserves its functionality, but makes it difficult to analyze, understand the operating algorithms and modification during decompilation.

« Entanglement» code can be carried out at the level of the algorithm, source text and/or assembly text. To create confusing assembly text, specialized compilers can be used that use non-obvious or undocumented capabilities of the program runtime environment. There are also special programs that perform obfuscation, called obfuscators.

Executable module, executable file- a file containing a program in a form in which it can be (after loading into memory and locally configured) executed by a computer.

Most often, it contains a binary representation of machine instructions for a specific processor (for this reason, in programming slang, the word binary is used in relation to it), but it can also contain instructions in an interpreted programming language, the execution of which requires an interpreter. In relation to the latter, the term “script” is often used.

The execution of binary files is carried out by hardware- and software-implemented machines. The first include processors - for example, the x86 or SPARC families. The second are virtual machines, for example, the Java virtual machine or the .NET Framework. The format of a binary file is determined by the architecture of the machine executing it. There are machines implemented in both hardware and software, for example, x86 family processors and the VMware virtual machine.

The executability status of a file is most often determined by the conventions adopted. Thus, in some operating systems, executable files are recognized thanks to a file naming convention (for example, by specifying the file extension - . exe or. bin), while in others, executable files have specific metadata (for example, the execute permission bit on UNIX-like operating systems).

In modern computer architectures, executable files contain large amounts of data that is not a computer program: a description of the software environment in which the program can be executed, data for debugging the program, constants used, data that the operating system may need to run the process (for example, the recommended size heap), and even descriptions of the graphics subsystem window structures used by the program.

Often, executable files contain calls to library functions, such as calls to operating system functions. Thus, along with processor dependence (machine-dependent is any binary executable file containing machine code), executable files may be characterized by dependence on the version of the operating system and its components.

memory by the operating system loader and then executed. In the Windows operating system, executable files usually have the extensions ".exe" and ".dll". The ".exe" extension refers to programs that can be directly launched by the user. The extension ".dll" has the so-called dynamic link libraries. These libraries export functions used by other programs.

In order for the operating system boot loader to correctly load executable file into memory, the contents of this file must correspond to the executable file format accepted in this operating system. Many different formats existed and still exist on different operating systems at different times. In this chapter, we will look at the Portable Executable (PE) format. The PE format is the primary format for storing executable files in the Windows operating system. Assemblies. NET files are also stored in this format.

Additionally, the PE format can be used to represent object files. Object files are used to organize separate compilation of a program. The point of separate compilation is that parts of the program (modules) are compiled independently into object files, which are then linked by the linker into one executable file.

And now - a little history. The PE format was created by the developers of Windows NT. Previously, the Windows operating system used the New Executable (NE) and Linear Executable (LE) formats to represent executable files and to store object files Object Module Format (OMF) was used. The NE format was intended for 16-bit Windows applications, while the LE format, originally developed for OS/2, was already 32-bit. The question arises: why did the Windows NT developers decide to abandon existing formats? The answer becomes obvious when you consider that most of the team that worked on the creation of Windows NT had previously worked at Digital Equipment Corporation. They were developing tools for the VAX/VMS operating system at DEC, and they already had the skills and ready-made code to work with executable files represented in the Common Object File Format (COFF). Accordingly, the COFF format, in a slightly modified form, was transferred to Windows NT and received the name PE.

The ".NET Framework Glossary" says that PE is Microsoft's implementation of the COFF format. At the same time, it is stated that PE is an executable file format, and COFF is a format object files. In general, we can observe confusion in the Microsoft documentation regarding the name of the format. In some places they call it COFF and in others they call it PE. True, one can notice that in new texts the name COFF is used less and less. Moreover, the PE format is constantly evolving. For example, several years ago Microsoft stopped storing debugging information inside the executable file, and therefore now many fields in COFF format structures are simply not used. Additionally, the COFF format is 32-bit, and the latest revision of the PE format (called PE32+) can be used on 64-bit hardware platforms. Therefore, apparently, things are moving towards the point that the name COFF will no longer be used at all.

It is interesting to note that executable files in the legacy NE and LE formats are still supported by Windows. Executable files in the NE format can be run under NTVDM (NT Virtual DOS Machine), and the LE format is used for virtual device drivers (

The operating system executable file format largely reflects the assumptions and behaviors built into the operating system. Dynamic linking, boot loader behavior, and memory management are just three examples of operating system-specific properties that can be understood as you study the executable file format.

The executable file on disk and the module received after loading are very similar. The loader simply uses memory-mapped Win32 files to load the appropriate parts of the PE file into the program's address space. Loading DLLs is just as easy. Once the EXE or .DLL module is loaded, Windows treats it the same as other memory-mapped files.

In Win32, by contrast, the memory used for programs, data, resources, input tables, output tables and other elements is one continuous linear array of address space. All that is enough to know in this case is the address where the loader mapped the executable file into memory. Then, in order to find any element of the module, it is enough to follow the pointers that are stored as part of the mapping.

MS-DOS header

The MS-DOS header occupies the first 64 bytes of the PE file. The structure representing the contents of the MS-DOS header is as follows:

typedef struct _IMAGE_DOS_HEADER ( //DOS .EXE header
USHORT e_magic; //MZ
USHORT e_cblp; //Bytes on the last
//file page
USHORT e_cp; //Pages in the file
USHORT e_crlc; //Settings
USHORT e_cparhdr; //Header size in
//paragraphs
USHORT e_minalloc; //Minimum allocated memory
USHORT e_maxalloc; //Maximum allocated memory
USHORT e_ss; //Initial (relative)
//SS value
USHORT e_sp; //Initial SP value
USHORT e_csum; //Check sum
USHORT e_ip; //Initial IP value
USHORT e_cs; //Initial (relative)
//CS value
USHORT e_lfarlc; //address of the configuration table file
USHORT e_ovno; //Overlay number
USHORT e_res ; //Reserved words
USHORT e_oemid; //OEM identifier (for
//e_oeminfo)
USHORT e_oeminfo; //OEM information; e_oemid
//specific
USHORT e_res2 ; //Reserved words
LONG e_lfanew; //address of PE header offset
) IMAGE_DOS_HEADER, * PIMAGE_DOS_HEADER;

The main header of the PE file represents a structure of type IMAGE_NT_HEADERS, defined in the file WINNT.H. The in-memory IMAGE_NT_HEADERS structure is what Windows uses as its in-memory module database. Each loaded EXE file or DLL is represented in Windows by the IMAGE_NT_HEADERS structure. This structure consists of a doubleword and two substructures, as shown below:

DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;

PE File Signature

The Signature field, represented as an ASCII code, is PE\0\0 (two zero bytes after PE). If the e_lfanew field in the DOS header indicated the NE designation in this place instead of the PE designation, then you are working with a Win16 NE file. Likewise, if the designation LE is specified in the Signature field, then this is a VxD (VirtualDeviceDriver) file. The LX designation refers to a file from Windows 95's old rival, OS/2.

PE File Header

The double word - the PE signature - in the header of the PE file is followed by a structure of type IMAGE_FILE_HEADER. The fields in this structure contain only the most general information about the file.
The following are the IMAGE_FILE_HEADER fields:

typedef struct _IMAGE_FILE_HEADER
{
USHORT Machine;
USHORT NumberOfSections;
ULONG TimeDateStamp;
ULONG PointerToSymbolTable;
ULONG NumberOfSymbols;
USHORT SizeOfOptionalHeader;
USHORT Characteristics;
) IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER

Machine– this is the central processing unit for which the file is intended. The following processor identifiers are defined:

Intel I386 0xl4C
Intel I860 0xl4D
MIPS R3000 0x162
MIPS R4000 0x166
DEC Alpha AXP 0x184
Power PC 0x1F0 (little endian)
Motorola 68000 0x268
PA RISC 0x290 (Precision Architecture)

NumberOfSections– the number of sections in the EXE or OBJ file.

TimeDateStamp– the time when the file was created by the linker (or compiler, if it is an OBJ file). This field indicates the number of seconds that have elapsed since 16:00 12/31/1969

PointerToSymbolTable– file offset of the COFF symbol table. This field is used only in OBJ and PE files with COFF debugger information. PE files support a variety of debug formats, so debuggers must refer to the IMAGE_DIRECTORY_ENTRY_DEBUG entry in the data directory.

NumberOfSymbols– the number of symbols in the COFF symbol table.

SizeOfOplionalHeader– the size of the optional header that may follow this structure. In executable files, this is the size of the IMAGE_OPTIONAL_HEADER structure that follows this structure.

Characteristics– flags containing information about the file. Some important fields are described here.
0x0001 – the file does not contain movements
0x0002 – the file represents an executable mapping (i.e. it is not an OBJ or LIB file)
0x2000 – the file is a dynamic link library (DLL), not a program

PE File Optional Header

The third component of the PE file header is a structure of type IMAGE_OPTIONAL_HEADER. For PE files this part is mandatory. The most important fields are the ImageBase and Subsystem fields.

ImageBase- When the linker creates an executable file, it expects the file to be mapped to a specific location in memory, and it is this address that is stored in this field.

Subsystem– the type of subsystem that this executable uses for its user interface. WINNT.H defines the following values:
NATIVE = 1 – no subsystem required (for example, for a device driver)
WINDOWS_GUI = 2 – runs in the Windows GUI subsystem
WINDOWS_GUI = 3 – runs in the Windows character subsystem (terminal application)
OS2_GUI = 5 – runs in the OS/2 subsystem (OS/2 IJC applications only)
POSIX_GUI = 7 – runs in the Posix subsystem

Section table

Immediately after the PE file header in memory there is an array of 1MAGE_SECT10N_HEADER. This table. contains information about each display section. The number of elements of this array is specified in the header of the PE file (field IMAGE_NT_HEADER.FileHeader.NumberOfSections). The sections in the display are ordered by their starting address rather than alphabetically.
Each IMAGE_SECTION_HEADER represents a complete database of one section of an EXE or OBJ file and has the following format.

#define IMAGE_SIZEOF_SHORT_NAME 8
typedef struct _IMAGE_SECTION_HEADER
{
UCHAR Name;
union (
ULONG PhysicalAddress;
ULONG VirtualSize;
) Misc;
ULONG VirtualAddress;
ULONG SizeOfRawData;
ULONG PointerToRawData;
ULONG PointerToRelocations;
ULONG PointerToLinenumbers;
USHORT NumberOfRelocations;
USHORT NumberOfLinenumbers;
ULONG Characteristics;
) IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Name– An 8-byte ANSI (not Unicode) name that names the section.

Misc– This field has different purposes depending on whether it appears in an EXE or OBJ file. In an EXE file, it contains the virtual size of a section of program code or data. For OBJ files, this field specifies the physical address of the section.

VirtualAddress– in the case of EXE files, this field contains the RVA where the loader should map the section. Microsoft tools set the default RVA of the first section to 0x101. For object files this field is set to 0.

SizeOfRawData– in EXE files, this field contains the section size aligned to the nearest upper bound of the file size.
PointerToRawData – file offset of the area where the source data for the section is located. If the user maps the PE or COFF file to memory (instead of having the operating system load it), this field is more important than in VirtualAddress.

PointerToRelocations– in object files, this is the file offset of the correction information that follows the source data for a given section. In EXE files this field is set to 0.

PointerToLinenumhers– file offset of the table of line numbers. The line number table matches the line numbers of the source file to the addresses where you can find the code generated for a given line. Typically, only code sections (such as .text or CODE) have line numbers. In EXE files, line numbers are collected at the end of the file after the source data for the sections. In object files, the row number table for a section follows the section's source data and the relocation table for that section.

NumberOfRelocations– the number of movements in the correction table for this section (used only in object files).
NumberOfLinenumbers – the number of line numbers in the table of line numbers for this section.
Characteristics – a set of flags that indicate the attributes of the section (program/data, intended for reading, intended for writing, etc.).

Frequent sections

Section.text(or CODE

This section contains all general-purpose program code generated by a compiler or assembler. The linker combines all the .text sections from the various object files into one large .text section in the EXE file.

Section.data(or DATA, if the PE file is created by Borland C++)

The initialized data goes into the .data section. Initialized data consists of those global and static variables that were initialized at compile time. They also include string literals (for example, the string "Hello World" in a C/C++ program). The linker combines all .data sections from different object and LIB files into one .data section in the EXE file. Local variables are located on the chain's stack and do not take up space in the .data and .bss sections.

Section.bss

The .bss section stores uninitialized static and global variables. The linker combines all .bss sections from different object and LIB files into one .bss section in the EXE file.

Section.CRT

Another section for initialized data, used by the Microsoft C/C++ runtime libraries. The data in this section is used for purposes such as calling C++ static class constructors before calling main or WinMain.

Section.rsrc

The .rsrc section contains module resources.

Section.idata

The.idata section (or import table) contains information about the functions (and data) that the module imports from other DLLs. The import table starts with an array consisting of IMAGE_IMPORT_DESCRIPTOR. Each element (IMAGE_IMPORT_DESCRIPTOR) corresponds to one of the DLLs that the PE file is implicitly associated with. The number of elements in the array is not taken into account anywhere. Instead, the last structure of the IMAGE_IMPORT_DESCRIPTOR array has fields containing NULL.
The IMAGE_IMPORT_DESCRIPTOR structure has the following format

typedef struct _IMAGE_IMPORT_DESCRIPTOR (
union (
DWORD Characteristics;
DWORD OriginalFirstThunk;
};
DWORD TimeDateStamp;

DWORD ForwarderChain;
DWORD Name;
DWORD FirstThunk;
) IMAGE_IMPORT_DESCRIPTOR

Characteristics/OriginalFirstThunk– This field contains the offset (RVA) of the doubleword array. Each of these double words is actually a union of IMAGE_THUNK_DATA. Each IMAGE_THUNK_DATA double word corresponds to one function imported by that EXE file or DLL.

TimeDateStamp– a time and date stamp indicating when the file was created.
ForwarderChain - this field is related to transfer, when one DLL passes a link to some of its functions to another DLL.
Name is an RVA of a null-terminated ASCII character string containing the names of the DLLs to be imported.

FirstThunk– RVA offset of the double word array IMAGE_THUNK_DATA. In most cases, a doubleword is treated as a pointer to the IMAGE_IMPORT_BY_NAME structure. This structure looks like this:

typedef struct _IMAGE_IMPORT_BY_NAME (
WORD hint;
BYTE Name;
) IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

Hint– export number of the import function.
Name– An ASCIIZ string with the name of the imported function.

A fragment of a program that reads from a PE file a list of OS functions imported by the program.

The following fragment of the program writes a list of libraries with functions imported by the program into the FunctionList.txt file.

void ShowImportFunction()
BYTE *pImage = (BYTE*) GetModuleHandle(NULL ) ;
IMAGE_DOS_HEADER *idh;
IMAGE_OPTIONAL_HEADER *ioh;
IMAGE_SECTION_HEADER *ish;
IMAGE_IMPORT_DESCRIPTOR *iid;
IMAGE_IMPORT_BY_NAME *ibn;
IMAGE_THUNK_DATA *thunk;
int i = 0 ;
DWORD j = 0 ;
char lib = "Imported library: ";
HANDLE file = CreateFile(TEXT("FunctionList.txt" ) ,GENERIC_READ|GENERIC_WRITE,
0 ,0 ,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL|FILE_FLAG_RANDOM_ACCESS,0 ) ;
idh = (IMAGE_DOS_HEADER*) pImage;
ioh = (IMAGE_OPTIONAL_HEADER*)
(pImage + idh->e_lfanew + 4 +
sizeof (IMAGE_FILE_HEADER) ) ;
ish = (IMAGE_SECTION_HEADER*) ((DWORD) ioh + sizeof (IMAGE_OPTIONAL_HEADER) ) ;
for (i = 0 ; i< 16 ; i++)

Typedef struct _IMAGE_FILE_HEADER ( WORD Machine; WORD NumberOfSections; DWORD TimeDateStamp; DWORD PointerToSymbolTable; DWORD NumberOfSymbols; WORD SizeOfOptionalHeader; WORD Characteristics; ) IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
I will only dryly describe these fields, because... the names are intuitive and represent direct meanings, and not VA, RVA, RAW and other scary, intriguing things that so far we have only heard about from old pirates. Although we have already encountered RAW - these are just offsets relative to the beginning of the file (they are also called raw pointers or file offset). That is, if we have a RAW address, this means that we need to step from the beginning of the file to RAW positions ( ptrFile+ RAW). Then you can start reading the values. A striking example of this type is e_lfnew- which we discussed above in the Dos heading.

*Machine: WORD - this number (2 bytes) specifies the processor architecture on which this application can run.
NumberOfSections: DWORD - number of sections in the file. Sections (hereinafter we will call them a table of sections) follow immediately after the header (PE-Header). The documentation says that the number of sections is limited to 96.
TimeDateStamp: WORD - a number storing the date and time the file was created.
PointerToSymbolTable: DWORD is the offset (RAW) to the symbol table, and SizeOfOptionalHeader is the size of this table. This table is intended to serve for storing debugging information, but the detachment did not notice the loss of a soldier from the very beginning of his service. Most often this field is cleared with zeros.
SIzeOfOptionHeader: WORD - the size of the optional header (which immediately follows the current one) The documentation states that for an object file it is set to 0...
*Characteristics: WORD - file characteristics.

* - fields that are defined by a range of values. Tables of possible values are presented in the structure description at the office. website and will not be listed here, because They don’t carry anything particularly important for understanding the format.

Let's leave this island! We need to move on. The reference point is a country called Optional-Header.

“Where's the map, Billy? I need a map.”
(Treasure Island)

Optional-Header (IMAGE_OPTIONAL_HEADER)

The title of this continent is not very good. This header is required and has 2 formats PE32 and PE32+ (IMAGE_OPTIONAL_HEADER32 and IMAGE_OPTIONAL_HEADER64 respectively). The format is stored in the field Magic: WORD. The header contains the necessary information to download the file. As always :

IMAGE_OPTIONAL_HEADER

typedef struct _IMAGE_OPTIONAL_HEADER ( WORD Magic; BYTE MajorLinkerVersion; BYTE MinorLinkerVersion; DWORD SizeOfCode; DWORD SizeOfInitializedData; DWORD SizeOfUninitializedData; DWORD AddressOfEntryPoint; DWORD BaseOfCode; DWORD BaseOfData; DWORD ImageBase ; DWORD SectionAlignment; WORD MajorOperatingSystemVersion; MinorImageVersion; WORD MinorSubsystemValue; DWORD SizeOfHeaders; izeOfStackCommit; DWORD SizeOfHeapCommit; DWORD NumberOfRvaAndSizes; , *PIMAGE_OPTIONAL_HEADER;

*As always, we'll only examine the main fields that have the biggest impact on understanding the download and how to move forward with the file. Let's agree - the fields of this structure contain values with VA (Virtual address) and RVA (Relative virtual address) addresses. These are not RAW addresses, and you need to be able to read (or rather count) them. We will certainly learn how to do this, but first we will analyze the structures that follow each other so as not to get confused. For now, just remember - these are addresses that, after calculations, point to a specific location in the file. You will also encounter a new concept - alignment. We will consider it in conjunction with RVA addresses, because these are quite closely related.

AddressOfEntryPoint: DWORD - RVA address of the entry point. Can point to any point in the address space. For .exe files, the entry point corresponds to the address from which the program begins execution and cannot be equal to zero!
BaseOfCode: DWORD - RVA of the beginning of the program code (code section).
BaseOfData: DWORD - RVA of the beginning of the program code (data sections).
ImageBase: DWORD - the preferred base address for loading the program. Must be a multiple of 64kb. In most cases it is equal to 0x00400000.
SectionAlignment: DWORD - alignment size (bytes) of the section when unloading into virtual memory.
FileAlignment: DWORD - alignment size (bytes) of the section inside the file.
SizeOfImage: DWORD - the size of the file (in bytes) in memory, including all headers. Must be a multiple of SectionAligment.
SizeOfHeaders: DWORD - the size of all headers (DOS, DOS-Stub, PE, Section) aligned to FileAligment.
NumberOfRvaAndSizes: DWORD - the number of directories in the directory table (the table itself is below). At the moment, this field is always equal to the symbolic constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES, which is equal to 16.
DataDirectory: IMAGE_DATA_DIRECTORY - data directory. Simply put, this is an array (of size 16), each element of which contains a structure of 2 DWORD values.

Let's look at what the IMAGE_DATA_DIRECTORY structure is:

Typedef struct _IMAGE_DATA_DIRECTORY ( DWORD VirtualAddress; DWORD Size; ) IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
What we have? We have an array of 16 elements, each element of which contains an address and size (what? how? why? all in a minute). The question arises of what exactly these characteristics are. For this, Microsoft has special constants for matching. They can be seen at the very end of the structure description. In the meantime:

// Directory Entries #define IMAGE_DIRECTORY_ENTRY_EXPORT 0 // Export Directory #define IMAGE_DIRECTORY_ENTRY_IMPORT 1 // Import Directory #define IMAGE_DIRECTORY_ENTRY_RESOURCE 2 // Resource Directory #define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 // Exception Directory #define IMAGE_DIRECTORY_ENTRY _SECURITY 4 // Security Directory #define IMAGE_DIRECTORY_ENTRY_BASERELOC 5 // Base Relocation Table #define IMAGE_DIRECTORY_ENTRY_DEBUG 6 // Debug Directory // IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7 // (X86 usage) #define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 // Architecture Specific Data #define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 // RVA of GP #define IMAGE_D IRECTORY_ENTRY_TLS 9 // TLS Directory #define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 // Load Configuration Directory #define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 // Bound Import Directory in headers #define IMAGE_DIRECTORY_ENTRY_IAT 12 // Import Address Table #define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 // Delay Load Import Descriptors #define IMAGE_DIRECTORY_ENTRY_ COM_DESCRIPTOR 14 // COM Runtime descriptor
Yeah! We see that each element of the array is responsible for the table attached to it. But alas and ah, these shores are still inaccessible to us, because... we do not know how to work with VA and RVA addresses. And in order to learn, we need to study what sections are. They are the ones who will talk about their structure and work, after which it will become clear why VA, RVA and alignments are needed. In this article, we will only touch on exports and imports. The purpose of the remaining fields can be found in the office. documentation or in books. So here it is. The actual fields:

VirtualAddress: DWORD - RVA for the table to which the array element corresponds.
Size: DWORD - table size in bytes.

So! To get to such exotic shores as tables of imports, exports, resources and others, we need to go through a quest with sections. Well, cabin boy, let’s take a look at the general map, determine where we are now and move on:

And we are located directly in front of the wide open spaces of the sections. We definitely need to find out what they are hiding and finally figure out another type of addressing. We want real adventures! We want to quickly go to such republics as import and export tables. Old pirates say that not everyone was able to reach them, but those who did returned with gold and women with sacred knowledge about the ocean. We set off and head towards Section header.

“You are deposed, Silver! Get off the barrel!”
(Treasure Island)

Section-header (IMAGE_SECTION_HEADER)

Right behind the array DataDirectory sections follow each other. The section table represents a sovereign state, which is divided into NumberOfSections cities. Each city has its own craft, its own rights, and also a size of 0x28 bytes. The number of sections is indicated in the field NumberOfSections, which is stored in File-header. So, let's look at the structure:

Typedef struct _IMAGE_SECTION_HEADER ( BYTE Name; union ( DWORD PhysicalAddress; DWORD VirtualSize; ) Misc; DWORD VirtualAddress; DWORD SizeOfRawData; DWORD PointerToRawData; DWORD PointerToRelocations; DWORD PointerToLinenumbers; WORD NumberOfRelocations; WORD NumberOfLinenumbers; DWORD Characteristics; ) IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
Name: BYTE - section name. Currently it is 8 characters long.
VirtualSize: DWORD - section size in virtual memory.
SizeOfRawData: DWORD - section size in the file.
VirtualAddress: DWORD - RVA section address.
SizeOfRawData: DWORD - section size in the file. Must be a multiple FileAlignment.
PointerToRawData: DWORD - RAW offset to the beginning of the section. Must also be a multiple FileAlignment…
Characteristics: DWORD - access attributes to the section and rules for loading it into virtual. memory. For example, an attribute for defining the contents of a section (initial data, non-initial data, code). Or access attributes - read, write, execute. This is not their entire range. The characteristics are set by constants from the same WINNT.h, which begin with IMAGE_SCN_. You can get acquainted with the attributes of sections in more detail. The attributes in Chris Kaspersky's books are also well described - the list of references is at the end of the article.

Regarding the name, you should remember the following - the section with resources should always have the name.rsrc. Otherwise, resources will not be loaded. As for the remaining sections, the name can be anything. Usually there are meaningful names, for example .data, .src, etc... But it also happens:

Sections are an area that is unloaded into virtual memory and all work happens directly with this data. The address in virtual memory, without any offsets, is called Virtual address, abbreviated VA. Preferred address for downloading the application, set in the field ImageBase. This is like the point at which the application area begins in virtual memory. And RVA (Relative virtual address) offsets are measured relative to this point. That is, VA = ImageBase+ RVA; ImageBase we always know and having at our disposal VA or RVA, we can express one through the other.

We seem to have gotten used to it here. But this is virtual memory! And we are in the physical. Virtual memory for us now is like a trip to other galaxies that we can only imagine. So we can’t get into virtual memory at the moment, but we can find out what will be there, because it’s taken from our file.

Alignment

In order to correctly represent uploading to virtual. memory, it is necessary to understand such a mechanism as alignment. First, let's take a look at a diagram of how sections are paged into memory.

As you can see, the section is not loaded into memory according to its size. This is where alignments are used. This is a value that must be a multiple of the size of the section in memory. If we look at the diagram, we will see that the section size is 0x28, and the size of the section is 0x50. This is due to the alignment size. 0x28 “does not reach” 0x50 and, as a result, the section will be unloaded, and the remaining space in the size 0x50-0x28 will be zeroed. And if the section size was larger than the alignment size, then what? For example sectionSize= 0x78, a sectionAlignment= 0x50, i.e. remained unchanged. In this case, the section would occupy 0xA0 (0xA0 = 0x28 * 0x04) bytes in memory. That is, a value that is a multiple of sectionAlignment and completely covers sectionSize. It should be noted that sections in the file are aligned in a similar way, only by size FileAlignment. Having received the necessary base, we can figure out how to convert from RVA to RAW.

“This is not a plain, the climate here is different.”
(V.S. Vysotsky)

A little arithmetic lesson

Before execution can begin, some part of the program must be sent to the processor's address space. Address space is the amount of RAM physically addressed by the processor. The “piece” in the address space where the program is unloaded is called a virtual image. The image is characterized by the base download address (Image base) and size (Image size). So VA (Virtual address) is the address relative to the beginning of the virtual memory, and RVA (Relative Virtual Address) is relative to the place where the program was unloaded. How to find out the base download address of an application? For this purpose there is a separate field in the optional header called ImageBase. This was a little prelude to refresh your memory. Now let's look at a schematic representation of different addressing:

So how can you still read information from a file without dumping it into virtual memory? To do this, you need to convert the addresses to RAW format. Then we can step inside the file to the area we need and read the necessary data. Since RVA is the virtual memory address to which the data was projected from the file, we can do the reverse process. To do this we need a key nine by sixteen simple arithmetic. Here are some formulas:

VA = ImageBase + RVA; RAW = RVA - sectionRVA + rawSection; // rawSection - offset to the section from the beginning of the file // sectionRVA - RVA of the section (this field is stored inside the section)
As you can see, in order to calculate RAW, we need to determine the section to which RVA belongs. To do this, you need to go through all sections and check the following conditions:

RVA >= sectionVitualAddress && RVA< ALIGN_UP(sectionVirtualSize, sectionAligment) // sectionAligment - выравнивание для секции. Значение можно узнать в Optional-header. // sectionVitualAddress - RVA секции - хранится непосредственно в секции // ALIGN_UP() - функция, определяющая сколько занимает секция в памяти, учитывая выравнивание
Putting all the puzzles together, we get this listing:

Typedef uint32_t DWORD; typedef uint16_t WORD; typedef uint8_t BYTE; #define ALIGN_DOWN(x, align) (x & ~(align-1)) #define ALIGN_UP(x, align) ((x & (align-1))?ALIGN_DOWN(x,align)+align:x) // IMAGE_SECTION_HEADER sections; // init array sections int defSection(DWORD rva) ( for (int i = 0; i< numberOfSection; ++i) { DWORD start = sections[i].VirtualAddress; DWORD end = start + ALIGN_UP(sections[i].VirtualSize, sectionAligment); if(rva >= start && rva< end) return i; } return -1; } DWORD rvaToOff(DWORD rva) { int indexSection = defSection(rva); if(indexSection != -1) return rva - sections.VirtualAddress + sections.PointerToRawData; else return 0; }
*I did not include a type declaration or array initialization in the code, but only provided functions that will help in calculating addresses. As you can see, the code was not very complicated. Just a little confusing. This goes away... if you spend a little more time tinkering with the .exe through the disassembler.

HOORAY! We figured it out. Now we can go to the lands of resources, import and export libraries, and generally wherever our heart desires. We just learned how to work with a new type of addressing. Let's hit the road!

"-Not bad, not bad! Still, they got their rations for today!”
(Treasure Island)

Export table

In the very first element of the array DataDirectory RVA is stored in the export table, which is represented by the IMAGE_EXPORT_DIRECTORY structure. This table is common to dynamic library (.dll) files. The main purpose of the table is to relate exported functions to their RVA. The description is presented in the office. Specifications:

Typedef struct _IMAGE_EXPORT_DIRECTORY ( DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Name; DWORD Base; DWORD NumberOfFunctions; DWORD NumberOfNames; DWORD AddressOfFunctions; DWORD AddressOfNames; DWORD AddressOfNameOrdinals ; ) IMAGE_EXPORT_DIRECTORY,*PIMAGE_EXPORT_DIRECTORY;
This structure contains three pointers to three different tables. This is a table of names (functions) ( AddressOfNames), ordinals( AddressOfNamesOrdinals), addresses( AddressOfFunctions). The Name field stores the RVA of the dynamic library name. Ordinal is like an intermediary between the table of names and the table of addresses, and is an array of indexes (index size is 2 bytes). For greater clarity, consider the diagram:

Let's look at an example. Let's say the i-th element of the names array indicates the name of the function. Then the address of this function can be obtained by accessing the i-th element in the address array. Those. i is an ordinal.

Attention! If you take for example the 2nd element in a table of ordinals, it does not mean 2 - it is an ordinal for tables of names and addresses. The index is the value stored in the second element of the array of ordinals.

Number of values in name tables ( NumberOfNames) and ordinals are equal and do not always coincide with the number of elements in the address table ( NumberOfFunctions).

“They came for me. Thank you for your attention. Now they must be killing!”
(Treasure Island)

Import table

The import table is an integral part of any application that uses dynamic libraries. This table helps to correlate calls to dynamic library functions with the corresponding addresses. Import can occur in three different modes: standard, bound import, and delayed import. Because The topic of import is quite multifaceted and deserves a separate article; I will describe only the standard mechanism, and the rest I will describe only as a “skeleton”.

Standard import- V DataDirectory The import table is stored under the index IMAGE_DIRECTORY_ENTRY_IMPORT(=1). It is an array of elements of type IMAGE_IMPORT_DESCRIPTOR. The import table stores (in an array) the names of functions/ordinals and where the loader should write the effective address of this function. This mechanism is not very effective, because Frankly speaking, it all comes down to searching through the entire export table for each required function.

Bound import- with this work scheme, -1 is entered into the fields (in the first element of the standard import table) TimeDateStamp and ForwardChain and information about the binding is stored in the cell DataDirectory with index IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT(=11). That is, this is a kind of flag to the loader that you need to use bound import. Also, the “bound import chain” has its own structures. The operating algorithm is as follows: the necessary library is unloaded into the application’s virtual memory and all the necessary addresses are “binded” at the compilation stage. One of the disadvantages is that when recompiling the dll, you will need to recompile the application itself, because function addresses will be changed.

Delay import- with this method it is assumed that the .dll file is attached to the executable one, but it is not unloaded into memory immediately (as in the previous two methods), but only when the application first accesses the symbol (this is what unloaded elements from dynamic libraries are called). That is, the program is executed in memory and as soon as the process has reached the point of calling a function from a dynamic library, a special handler is called that loads the dll and distributes the effective addresses of its functions. For deferred import, the loader contacts the DataDirectory (item number 15).

Having covered the import methods a little, let’s move directly to the import table.

“This is a sailor! His clothes were nautical. - Yah? Did you think you would find a bishop here?”
(Treasure Island - John Silver)

Import-descriptor (IMAGE_IMPORT_DESCRIPTOR)

In order to find out the coordinates of the import table, we need to access the array DataDirectory. Namely, to the IMAGE_DIRECTORY_ENTRY_IMPORT element (=1). And read the RVA address of the table. Here is a general diagram of the path that needs to be taken:

Then we get RAW from RVA, in accordance with the formulas given above, and then “step” through the file. Now we are right in front of an array of structures called IMAGE_IMPORT_DESCRIPTOR. The end of the array is indicated by the “zero” structure.

Typedef struct _IMAGE_IMPORT_DESCRIPTOR ( union ( DWORD Characteristics; DWORD OriginalFirstThunk; ) DUMMYUNIONNAME; DWORD TimeDateStamp; DWORD ForwarderChain; DWORD Name; DWORD FirstThunk; ) IMAGE_IMPORT_DESCRIPTOR,*PIMAGE_IMPORT_DESCRIPTOR;
I couldn't find a link to a description of the structure on msdn, but you can see it in the WINNT.h file. Let's start to figure it out.

OriginalFirstThunk: DWORD - RVA of the import name table (INT).
TimeDateStamp: DWORD - date and time.
ForwarderChain: DWORD - index of the first forwarded character.
Name: DWORD - RVA string with the library name.
FirstThunk: DWORD - RVA of the import address table (IAT).

Everything here is somewhat similar to export. Also a table of names (INT) and also a rag of addresses on it (IAT). Also RVA of the library name. Only INT and IAT refer to an array of IMAGE_THUNK_DATA structures. It is presented in two forms - for 64 and 32 systems and differ only in the size of the fields. Let's look at x86 as an example:

Typedef struct _IMAGE_THUNK_DATA32 ( union ( DWORD ForwarderString; DWORD Function; DWORD Ordinal; DWORD AddressOfData; ) u1; ) IMAGE_THUNK_DATA32,*PIMAGE_THUNK_DATA32;
It is important to answer that further actions depend on the most significant bit of the structure. If it is set, then the remaining bits represent the number of the character being imported (import by number). Otherwise (the most significant bit is cleared), the remaining bits specify the RVA of the symbol being imported (import by name). If we have an import by name, then the pointer stores the address to the following structure:

Typedef struct _IMAGE_IMPORT_BY_NAME ( WORD Hint; BYTE Name; ) IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
Here Hint is the function number, and Name- Name.

What is this all for? All these arrays, structures... For clarity, let's consider a wonderful diagram with

Understanding the Linux file system, directory structure, configuration, executable and temporary file placement will help you better understand your system and become a successful system administrator. The Linux file system will be unusual for a beginner who has just switched from Windows, because everything here is completely different. Unlike Windows, the program is not located in one folder, but, as a rule, is distributed along the root file system. This distribution is subject to certain rules. Have you ever wondered why some programs are located in /bin, or /sbin, /usr/sbin, /usr/local/bin, what is the difference between these directories?

For example, the less program is located in the /usr/bin directory, but why not in /sbin or /usr/sbin. And programs such as ifconfig or fdisk are located in the /sbin directory and nowhere else.

This article will completely cover the structure of the Linux file system, after reading it you will be able to understand the purpose of using most of the folders in the Linux root directory.

/ - root

This is the main directory on a Linux system. Essentially, this is the Linux file system. There are no disks or anything like that in Windows. Instead, the addresses of all files start from the root, and additional partitions, flash drives or optical drives are mounted in folders of the root directory.

Note that the root user has a home directory of /root, but not / itself.

/bin - (binaries) user binary files

This directory contains executable files. Here are programs that can be used in single-user mode or recovery mode. In a word, those utilities that can be used are not yet connected to the /usr/ directory. These are common commands like cat, ls, tail, ps, etc.

/sbin - (system binaries) system executable files

Like /bin, it contains binary executable files that are available during the early stages of boot, when the /usr directory is not mounted. But there are programs here that can only be executed with superuser rights. These are different utilities for system maintenance. For example, iptables, reboot, fdisk, ifconfig, swapon, etc.

/etc - (etcetera) configuration files

This folder contains configuration files of all programs installed on the system.

In addition to configuration files, the Init Scripts initialization system contains scripts for starting and ending system daemons, mounting file systems, and starting programs. The linux directory structure in this folder may be a little confusing, but the purpose of all of them is setup and configuration.

/dev - (devices) device files

In Linux, everything, including external devices, are files. Thus, all connected flash drives, keyboards, microphones, cameras are just files in the /dev/ directory. This directory contains an unusual file system. The Linux file system structure and the files contained in the /dev folder are initialized when the system boots, by the udev service. All connected devices are scanned and special files are created for them. These are devices such as: /dev/sda, /dev/sr0, /dev/tty1, /dev/usbmon0, etc.

/proc - (proccess) information about processes

This is also an unusual file system, but a subsystem dynamically created by the kernel. It contains all the information about running processes in real time. Essentially, it is a pseudo-file system containing detailed information about each process, its Pid, executable file name, startup parameters, access to RAM, and so on. You can also find information about system resource usage here, such as /proc/cpuinfo, /proc/meminfo or /proc/uptime. In addition to the files in this directory there is a large structure of Linux folders, from which you can find out a lot of information about the system.

/var (variable) - Variable files

The name of the /var directory is self-explanatory; it should contain files that change frequently. The size of these files is constantly increasing. This contains system log files, various caches, databases, and so on. Next we will look at the purpose of the Linux directories in the /var/ folder.

/var/log - Log files

/var/lib - databases

Another type of files that are modified are database files, packages saved by a package manager, etc.

/var/mail - mail

The mail server stores all received or sent emails in this folder; its logs and configuration files may also be located here.

/var/spool - printer

Initially, this folder was responsible for print queues on the printer and the operation of a set of cpus programs.

/var/lock - lock files

This is where the lock files are located. These files indicate that a particular resource, file, or device is in use and cannot be used by another process. Apt-get, for example, locks its database so that other programs cannot use it while the program is running on it.

/var/run - PID of processes

Contains files with PIDs of processes that can be used for interaction between programs. Unlike the /run directory, data is saved after reboot.

/tmp (temp) - Temporary files

This directory contains temporary files created by the system, any programs or users. All users have write permission to this directory.

The files are deleted every time you reboot. An analogue of Windows is the Windows\Temp folder; all temporary files are also stored here.

/usr - (user applications) User programs

This is the largest catalog with many features. This is the largest Linux directory structure. Here you can find executable files, program sources, various application resources, pictures, music and documentation.

/usr/bin/ - Executable files

Contains executable files of various programs that are not needed during the first stages of system boot, for example, music players, graphic editors, browsers, and so on.

/usr/sbin/

Contains system administration program binaries that must be run with superuser rights. For example, such as Gparted, sshd, useradd, userdel, etc.

/usr/lib/ - Libraries

Contains libraries for programs from /usr/bin or /usr/sbin.

/usr/local - User files

Contains files of programs, libraries, and settings created by the user. For example, programs compiled and installed from source and scripts written manually can be stored here.

/home - Home folder

This folder stores the home directories of all users. They can store their personal files, program settings, etc. in them. For example, /home/sergiy, etc. Compared to Windows, this is your user folder on drive C, but unlike WIndows, home is usually located on a separate section, so when you reinstall the system, all your data and program settings will be saved.

/boot - Bootloader files

Contains all files associated with the system boot loader. This is the vmlinuz kernel, the initrd image, as well as the bootloader files located in the /boot/grub directory.

/lib (library) - System libraries

Contains system library files that are used by executable files in the /bin and /sbin directories.

Libraries have file names with a *.so extension and begin with the lib* prefix. For example, libncurses.so.5.7. The /lib64 folder on 64-bit systems contains 64-bit versions of the libraries from /lib. This folder can be compared with WIndows\system32, all the system libraries are also downloaded there, only there they are mixed with executable files, but here everything is separate.

/opt (Optional applications) - Additional programs

Proprietary programs, games or drivers are installed in this folder. These are programs created as separate executable files by the manufacturers themselves. Such programs are installed in sub-directories /opt/, they are very similar to Windows programs, all executable files, libraries and configuration files are located in one folder.

/mnt (mount) - Mounting

System administrators can mount external or additional file systems into this directory.

/media - Removable media

The system mounts all connected external drives - USB flash drives, optical drives and other storage media - into this directory.

/srv (server) - Server

This directory contains server and service files. For example, it may contain files from the apache web server.

/run - processes

Another directory containing process PID files, similar to /var/run, but unlike it, it is located in TMPFS, and therefore all files are lost after a reboot.

/sys (system) - System information

The purpose of the Linux directories from this folder is to obtain information about the system directly from the kernel. This is another file system organized by the kernel and allows you to view and change many system operating parameters, for example, swap operation, control fans and much more.