Computer speed and performance. Design and development of a software product. Standard Performance Measurement Tests

In order to measure computer performance using tests, it is not necessary to download any third-party applications and utilities.

It is enough to use the resources already built into the operating system.

Although to get more detailed information the user will have to find a suitable program.

Based on the test results, you can draw conclusions about which part of your PC or laptop requires replacement sooner than others - and sometimes you can simply understand the need to buy a new computer.

The need to perform a check

Computer speed testing is available to any user. The test does not require any specialized knowledge or experience with specific versions of Windows OS. And the process itself is unlikely to require spending more than an hour.

Reasons why you should use the built-in utility or third-party application refers to:

Unreasonable slowdown of the computer. Moreover, not necessarily the old one - the check is needed to identify problems with new PCs. For example, the minimum results and indicators of a good video card indicate incorrectly installed drivers;
checking the device when selecting several similar configurations in a computer store. This is usually done before buying laptops - running a test on 2-3 devices with almost identical parameters helps to find out which one is better suited to the buyer;

the need to compare the capabilities of various components of a gradually modernized computer. So, if the HDD has the lowest performance value, then it should be replaced first (for example, with an SSD).

According to the results of testing, which revealed the speed at which the computer performs various tasks, you can detect problems with drivers and incompatibility of installed devices. And sometimes even poorly functioning and broken parts - for this, however, you will need more functional utilities than those built into Windows by default. Standardized tests reveal minimal information.

System check

You can check the performance of individual computer components using the built-in capabilities of the Windows operating system. Their operating principle and information content are approximately the same for all versions of the Microsoft platform. And the differences lie only in the method of launching and reading information.

Windows Vista, 7 and 8

For versions 7 and 8 of the platform, as well as Windows Vista, the performance counter of computer elements can be found in the list of basic information about the operating system. To display them on the screen, just right-click on the “My Computer” icon and select properties.

If testing has already been carried out, information about its results will be available immediately. If you are running the test for the first time, you will have to run it by going to the performance test menu.

The maximum score that Windows 7 and 8 can achieve is 7.9. You should think about the need to replace parts if at least one of the indicators is below 4. For a gamer, values above 6 are more suitable. For Windows Vista, the best indicator is 5.9, and the “critical” indicator is about 3.

Important: To speed up performance calculations, you should turn off almost all programs during the test. When testing a laptop, it is advisable to plug it into the network - the process significantly consumes battery power.

Windows 8.1 and 10

For more modern operating systems, finding information about computer performance and starting to calculate it is no longer so easy. To run a utility that evaluates system parameters, you should do the following:

1Go to operating system command line(cmd via menu "Run" caused by pressing keys simultaneously Win + R);

2Enable evaluation process, leading the team winsat formal –restart clean;

3Wait for the job to complete;

4Go to folder Performance\WinSAT\DataStore located in the Windows system directory on the computer’s system drive;

5Find and open the file in a text editor "Formal.Assessment (Recent).WinSAT.xml".

Among the multitude of text, the user must find the WinSPR block, where approximately the same data is located that is displayed on the screen of Windows 7 and 8 systems - only in a different form.

Yes, under the name SystemScore hides the general index calculated by minimum value, A MemoryScore, CpuScore And GraphicsScore indicate the memory, processor and graphics card indicators, respectively. GamingScore And DiskScore– performance for gaming and for reading/writing the hard drive.

The maximum value for Windows 10 and version 8.1 is 9.9. This means that the owner of an office computer can still afford to have a system with numbers less than 6, but for full operation of a PC and laptop it must reach at least 7. And for a gaming device - at least 8.

Universal method

There is a method that is the same for any operating system. It consists of launching the task manager after pressing the Ctrl + Alt + Delete keys. A similar effect can be achieved by right-clicking on the taskbar - there you can find an item that launches the same utility.

You will be able to see several graphs on the screen - for the processor (for each thread separately) and RAM. For more detailed information, go to the “Resource Monitor” menu.

Using this information, you can determine how heavily loaded individual components PC. First of all, this can be done by the loading percentage, secondly - by the color of the line ( green means normal operation of the component, yellow– moderate, red– need to replace the component).

Third party programs

By using third party applications It's even easier to check your computer's performance.

Some of them are paid or shareware (that is, they require payment after completion trial period or to improve functionality).

However, these applications conduct more detailed testing - and often provide a lot of other information useful to the user.

1. AIDA64

AIDA64 includes tests for memory, cache, HDDs, SSDs and flash drives. And when testing a processor, 32 threads can be checked at once. Among all these advantages, there is also a small drawback - you can use the program for free only during the “trial period” of 30 days. And then you will have to either switch to another application, or pay 2265 rubles. for a license.

2. SiSoftware Sandra Lite

3.3DMark

4.PCMark 10

The application allows you not only to test the operation of computer elements, but also to save test results for further use. The only drawback of the application is the relatively high cost. You will have to pay $30 for it.

5. CINEBENCH

The test images consist of 300 thousand polygonal images that add up to more than 2000 objects. And the results are given in the form PTS indicator - the higher it is, the more powerful the computer. The program is distributed free of charge, which makes it easy to find and download it on the Internet.

6. ExperienceIndexOK

Information is displayed on the screen in points. The maximum quantity is 9.9, as for latest versions Windows. This is exactly what ExperienceIndexOK is designed for. It is much easier to use such a program than to enter commands and search for files with results in the system directory.

7.CrystalDiskMark

To test a disk, select the disk and set the test parameters. That is, the number of runs and file sizes that will be used for diagnostics. After a few minutes, information about the average read and write speed for the HDD will appear on the screen.

8. PC Benchmark

Having received the test results, the program offers to optimize the system. And after improving performance, a page opens in the browser where you can compare the performance of your PC with other systems. On the same page you can check whether your computer can run some modern games.

9. Metro Experience Index

10.PassMark PerformanceTest

conclusions

Using different methods to test your computer's performance allows you to check how your system is performing. And, if necessary, compare the speed of individual elements with the performance of other models. For a preliminary assessment, you can conduct such a test using built-in utilities. Although it is much more convenient to download special applications for this - especially since among them you can find several that are quite functional and free.

Video:

Processor performance is an integral characteristic that depends on the processor frequency, its bit capacity, as well as architectural features (presence of cache memory, etc.). Processor performance cannot be calculated. It is determined during the testing process, i.e. determining the speed at which the processor performs certain operations in any software environment.

So, let's look at a number of key characteristics and components of the processor:

Clock frequency

It is generally accepted that in order to choose the right processor, you first need to look at its main characteristic - the clock frequency, also called speed. As mentioned above, the speed of your entire system (performance) depends on the capabilities of the processor. The clock frequency sets the rhythm of the computer's life. The higher the clock frequency, the shorter the duration of one operation and the higher the performance of the computer. Therefore, frequency is the main characteristic of the processor.

By takt we mean the period of time during which an elementary operation can be performed. The clock frequency can be measured and its value determined. It is measured in Megahertz (MHz) (MHz) or GHz (GHz). Hertz is a unit of measurement that determines the frequency of a periodic process. This unit of measurement has a direct relationship with the unit of time, the value of one second. In other words, when we say 1 Hz, this means one execution of a process in one second (1 Hz = 1/s). For example, if we have 10 Hz, then this means that we have ten executions of such a process in one second. The Mega prefix increases the base value (Hz) by a million times (1 MHz - a million cycles per second), and the Giga prefix by a billion (1 GHz - a billion cycles per second).

Core (a set of technological, physical and software tools underlying the processor)

The core is the main part of the central processing unit (CPU). It is this part that will determine most of the key parameters of your CPU. First of all, the socket type, operating frequency range and operating frequency of the internal data bus (FSB).

The processor core is characterized by the following parameters:

volume of internal cache of the first and second levels (see below),

technological process (sequential chain of operations and connections between elements),

heat dissipation or heat dissipation (the power that the cooling system removes to ensure normal operation of the processor. The higher the value of this parameter, the more hot your processor will be. Please note that some processor manufacturers measure heat dissipation differently, so comparisons should be made within the same manufacturer).

Before buying a CPU with a particular core, you need to make sure that your motherboard can work with such a processor. Within the same line there may be CPUs with different cores. For example, the Pentium IV line contains processors with Northwood, Prescott, and Willamette cores.

The same core can be the basis of different processor models that differ from each other in cost and performance level. If you come across the same kernel name in different processor models, this indicates that they belong to the same generation. Most often they are compatible with the same motherboard models.

Number of Cores

For decades, single-core processors were the only reality in the personal computer segment. This was the case until 2005, when two microprocessor giants - Intel and AMD - released their first dual-core processors. These products were not just the latest innovations from industry leaders, but heralded the beginning of an entire era in the development of professional technologies for personal computers. Over time, they had more and more successors.

To be fair, let’s say that despite the deeply rooted idea that one core is completely incompatible with modern realities, the “old single-core ones” still find their users. This is due to the fact that most programs do not yet know how to use the capabilities of multi-core chips. Here are a few names of single-core processors that are widely used to this day: AMD Athlon 64, AMD Sempron, Intel Celeron, Intel Core Solo, Intel Pentium 4.

What are the real benefits of a multi-core processor? Parallel operation of two or more cores at a lower clock speed provides greater performance. The currently running program distributes data processing tasks to both cores. This provides maximum effect when both the operating system and application programs run in parallel, as is often the case with graphics applications. Multi-core also affects concurrency standard applications. For example, one processor core may be responsible for a program running in the background while antivirus program takes up the resources of the second core.

It is worth noting, however, that managing parallel tasks nevertheless takes time and uses other system resources. And sometimes even to solve one of the problems you have to wait for the result of another. Therefore, in practice, having a 2-core processor does not mean computing performance is twice as fast. Although the performance increase can be quite significant, it depends on the type of application.

For games that do not yet use the new technology at all, performance increases by no more than 5% at the same clock frequency. To a large extent, games only benefit from big size processor cache installed on your computer. It will still be some time before there are games that will run noticeably faster on multi-core processors.

But music and video processing programs optimized for multi-core processors will run 50% faster. The increase in speed is especially noticeable if the application used for processing and compressing video files is tailored for a multi-core processor. The presence of 2 or more cores will also improve the quality of playback of high-resolution movies (Blu-ray, HD-DVD), because When decompressing such a video containing large amounts of data, the processor must perform a huge number of calculations.

Another real advantage of a multi-core processor is reduced power consumption. Multi-core chips, which implement all modern energy-saving technologies, cope with assigned tasks faster than usual and therefore can quickly switch to a mode with a lower clock frequency and, accordingly, with lower power consumption. This detail is especially important for a laptop whose autonomous operation from battery to to a large extent is extended.

Nowadays, the most common models of 2-core processors are AMD Athlon 64 x 2, Intel Core 2 Duo, Intel Pentium Dual Core, Intel Pentium D. There are also 4-core processors for desktop computers, such as Intel Core 2 Quad .

However, to take advantage of all the benefits of a multi-core processor described above, the operating system and applications must support multi-core processing. Modern Windows operating systems, aimed at increasing computer performance, independently distribute software tasks across different processor cores. So, Windows XP shifts to the second processor core background tasks programs specifically designed for multi-core processors (for example, professional graphics applications from Adobe). Windows Vista goes further by distributing the tasks of application programs across the cores that were not originally designed for use on multi-core chips. If you are using older versions of operating systems - for example, Windows 98 or Windows Me - then you will not benefit from using multi-core processors.

Cache memory

All modern processors have a cache (in English - cache) - an array of ultra-fast RAM, which is a buffer between the controller of the relatively slow system memory and processor. The processor's cache memory performs approximately the same function as RAM. Only the cache is memory built into the processor. Cache memory is used by the processor to store information that your computer is currently working with, as well as other frequently used data. Thanks to the cache memory, the time of the next access to them is significantly reduced. This significantly increases overall performance processor.

In general, we can distinguish the following tasks that cache memory performs:

providing quick access to intensively used data;

coordination of processor and memory controller interfaces;

delayed data recording.

Let's say you regularly visit the same website on the Internet or launch your favorite game every day. Your processor's cache memory will store the bulk of images and video fragments, thereby significantly reducing the number of times the processor accesses the extremely slow (compared to the speed of the processor) system memory.

If the RAM capacity on new computers is from 1 GB, then their cache is about 2-8 MB. As you can see, the difference in memory capacity is noticeable. But even this volume is quite enough to ensure normal performance of the entire system.

Moreover, in modern processors the cache is no longer a single memory array, as before, but is divided into several levels. Previously, processors with two levels of cache memory were common: L1 (first level) and L2 (second). The fastest, but relatively small in volume (usually about 128 KB) first-level cache with which the processor core works, is most often divided into two halves - the instruction cache and the data cache. The first level cache is much smaller than the second level cache, because used solely for storing instructions. But the second level is used to store data, so it is usually much larger in volume. Most processors have a shared second level cache, i.e. mixed, without division into command cache and data cache. But not everyone has it, for example, in the AMD Athlon 64 X 2, each core has its own L2 cache.

On November 19, 2007, a historic moment arrived. AMD, after 18 months behind Intel with their highly successful Intel Core 2 line, has introduced the long-awaited AMD Phenom processor with four cores and three cache levels. After this, most modern server processors began to acquire a third level cache (L3). The L3 cache is usually even larger in size, although somewhat slower than the L2 (due to the fact that the bus between L2 and L3 is narrower than the bus between L1 and L2), but its speed, in any case, is disproportionately higher than the speed system memory.

There are two types of cache: exclusive and non-exclusive. In the first case, information in caches of all levels is clearly demarcated - each of them contains exclusively original information, while in the case of a non-exclusive cache, information can be duplicated at all caching levels. Today it is difficult to say which of these two schemes is more correct - both have both minuses and pluses. The exclusive caching scheme is used in AMD processors, while the non-exclusive one is used in Intel processors.

Bit depth

Another characteristic of a processor that affects its performance is its bit depth. In general, the higher the processor capacity, the higher the processor performance. Currently, almost all programs are designed for 32- and 64-bit processors.

32-bit processors process 32 bits per clock cycle, and 64-bit processors process twice as much data, that is, 64 bits. This advantage is especially noticeable when processing large amounts of data (for example, when converting photographs). But to use it, the operating system and applications must support the 64-bit processing mode. Under specially designed 64-bit Windows versions XP and Windows Vista run 32-bit and 64-bit programs depending on the need. But 64-bit applications are still quite rare: most programs, even professional ones, only support 32-bit mode.

When specifying the processor bit size, they write, for example, 32/20, which means that the processor has a 32-bit data bus and a 20-bit address bus. The address bus width determines the address space of the processor, i.e. the maximum amount of RAM that can be installed in a computer.

The first domestic personal computer “Agat” (1985) had a processor installed with a bit capacity of 8/16, respectively, its address space was 64 KB. The Pentium II processor had a bit capacity of 64/32, i.e. its address space is 4 GB. All 32-bit applications have a process address space of no more than 4 GB

The 64-bit processor made it possible to expand the addressable RAM space and get rid of the existing 4 GB limit. This is its significant advantage, which allows the computer to manage more RAM. With a 64-bit chip, up to 32GB of RAM can currently be used. But this difference is of little relevance to most ordinary users.

Socket

A socket (in English - Socket) is a socket (connector) on the motherboard into which the processor is inserted. Each processor type has its own socket type. Therefore, if necessary, after a year or two, replace the processor with a more modern one, you almost always have to change the motherboard.

The name of the socket usually contains a specific number indicating the number of contacts on the connector. Recently, the most used sockets are with the following numbers: 478, 604, 754, 775, 939, 940.

Some exceptions to the general rule for installing processors were the Pentium 2 and 3 processor variants, which were installed not in sockets, but in narrow slots, similar to slots for expansion cards on the motherboard. However, this design did not take root.

System bus frequency

The system bus (in English - Front Side Bus, or FSB) is a highway running along the motherboard and connecting the processor with other key system components with which it exchanges data and commands (for example, a memory controller hub).

The system bus frequency determines the speed at which the processor communicates with other system devices on the computer, receiving the necessary data from them and sending it in the opposite direction. The higher the system bus frequency, the greater the overall system performance. The system bus frequency is measured in GHz or MHz.

Modern processors are designed to work with a specific FSB frequency, while motherboards support several values of this frequency.

Processor operating temperature.

Another CPU parameter is the permissible maximum processor surface temperature at which normal operation is possible (from 54.8 to 100 C). The processor temperature depends on its load and on the quality of the heat dissipation. In idle mode and with normal cooling, the processor temperature is in the range of 25-40C; under high load it can reach 60-65 degrees. At temperatures exceeding the maximum allowed by the manufacturer, there is no guarantee that the processor will function normally. In such cases, errors in the operation of programs or the computer may freeze. Processors different manufacturers heat up differently. Accordingly, the more the processor heats up, the more powerful a fan (cooler) should be purchased to cool it. You can also buy a system unit with additional fans - this will reduce the temperature inside the system unit.

Arithmetic Logic Unit and Control Unit

The required components of the processor are an arithmetic-logical unit and a control unit, or FPU (in English, Floating Point Unit, a device for performing floating point operations). This unit is especially powerful in AMD lines. These parameters are important for games and mathematical calculations (that is, for programmers).

The arithmetic logic unit is responsible for performing arithmetic and logical operations. A computer processor is designed to process information, and each processor has a certain set of basic operations (instructions), for example, one of these operations is the operation of adding binary numbers.

Technically, the processor is implemented on a large integrated circuit, the structure of which is constantly becoming more complex, and the number functional elements(such as a diode or transistor) on it is constantly increasing (from 30 thousand in the 8086 processor to 5 million in the Pentium II processor and up to 30 million in the Intel Core Duo). The control unit coordinates the operation of all these components and the execution of processes occurring in the computer.

The speed and performance of a computer is determined by many factors. It is impossible to achieve significant performance improvements by improving the characteristics of any one device, for example, by increasing the processor clock speed. Only by carefully selecting and balancing all computer components can you achieve a significant increase in computer performance.

It is important to remember that the computer cannot run faster than the slowest device used to perform the task.

CPU clock speed

The most important parameter of computer performance is processor speed, or, as it is called, clock frequency, which affects the speed of operations in the processor itself. The clock frequency is the operating frequency of the processor core (that is, the part that performs the main calculations) at maximum load. Note that other computer components may operate at frequencies different from the processor frequency.

The clock frequency is measured in megahertz (MHz) and gigahertz (GHz). The number of cycles per second performed by a processor is not the same as the number of operations a processor performs per second, since many mathematical operations require multiple clock cycles to implement. It is clear that under the same conditions, a processor with a higher clock speed should work more efficiently than a processor with a lower clock frequency.

As the clock frequency of the processor increases, the number of operations performed by the computer in one second increases, and therefore the speed of the computer also increases.

RAM capacity

An important factor affecting computer performance is the amount of RAM and its speed (access time, measured in nanoseconds). The type and amount of RAM has a big impact on the speed of your computer.

The fastest running device on a computer is CPU. The second fastest device in a computer is RAM, however, RAM is significantly slower than the processor.

To compare the speed of the processor and RAM, it is enough to cite only one fact: almost half of the time the processor is idle. waiting for a response from RAM. Therefore, the shorter the access time to RAM (i.e., the faster it is), the less the processor idle, and the faster the computer runs.

Reading and writing information from RAM is much faster than from any other device for storing information, for example, from a hard drive, Therefore, increasing the amount of RAM and installing faster memory leads to increased computer performance when working with applications.

Hard drive capacity and hard drive speed

Computer performance is affected by the hard drive bus communication speed and the free amount of disk space.

The size of your hard drive typically affects the number of programs you can install on your computer and the amount of data you can store. The capacity of hard drives is usually measured in tens and hundreds of gigabytes.

Hard drive is slower than RAM. Since the data exchange speed for Ultra DMA 100 hard drives does not exceed 100 megabytes per second (133 MB/sec for Ultra DMA 133). Data exchange in DVD and CD drives is even slower.

Important characteristics of the hard drive that affect the speed of the computer are:

Spindle speed;
Average data retrieval time;
Maximum data transfer rate.

Amount of free hard disk space

When there is not enough space in the computer's RAM, Windows and many application programs are forced to place part of the data necessary for current work on the hard drive, creating so-called temporary files(swap files) or swap files.

Therefore, it is important that there is enough free space on the disk to write temporary files. If there is not enough free disk space, many applications simply cannot work correctly or their operating speed drops significantly.

After the application terminates, all temporary files are usually automatically deleted from the disk, freeing up space on the hard drive. If the size of the RAM is sufficient for work (at least several GB), then the size of the paging file for a personal computer does not significantly affect the performance of the computer and can be set to a minimum.

Defragmenting files

Operations of deleting and changing files on the disk lead to file fragmentation, which is expressed in the fact that the file does not occupy adjacent areas on the disk, but is broken into several parts stored in different areas of the disk. File fragmentation results in additional costs for searching for all parts of the file being opened, which slows down access to the disk and reduces (usually not significantly) the overall performance of the disk.

For example, to perform defragmentation on the Windows 7 operating system, click the button Start and in the main menu that opens, select the commands sequentially All Programs, Accessories, System Tools, Disk Defragmenter .

Number of simultaneously running applications

Windows is a multitasking operating system that allows you to work with several applications simultaneously. But the more applications run simultaneously, the more the load on the processor, RAM, HDD, and thereby slows down the speed of the entire computer and all applications.

Therefore, it is better to close those applications that are not currently in use, freeing up computer resources for remaining applications.

The speed of the processor is one of its most important characteristics, which determines the efficiency of the entire microprocessor system as a whole. The performance of a processor depends on many factors, which makes it difficult to compare the performance of even different processors within the same family, not to mention processors from different companies and for different purposes.

Let's highlight the most important factors, affecting the performance of the processor.

First of all, performance depends on the processor clock speed. All operations inside the processor are performed synchronously, clocked by a single clock signal. It is clear that the higher the clock frequency, the faster the processor operates, and, for example, doubling the clock frequency of a processor halves the time it takes to execute commands on that processor.

However, we must take into account that different processors execute the same instructions in different quantities clock cycles, and the number of clock cycles spent on a command can vary from one clock cycle to tens or even hundreds. In some processors, due to parallelization of microoperations, even less than one clock cycle is spent per command.

The number of clock cycles it takes to execute an instruction depends on the complexity of the instruction and how the operands are addressed. For example, the fastest (in fewer clock cycles) commands are executed to transfer data between the processor's internal registers. Complex floating-point arithmetic instructions whose operands are stored in memory are executed the slowest (over a large number of clock cycles).

Initially, to quantify the performance of processors, the MIPS (Mega Instruction Per Second) unit of measurement was used, which corresponded to the number of millions of instructions (commands) executed per second. Naturally, microprocessor manufacturers tried to focus on the most quick commands. It is clear that such an indicator is not very successful. To measure performance when performing floating point (point) calculations, the FLOPS (Floating point Operations Per Second) unit was proposed a little later, but by definition it is highly specialized, since in some systems floating point operations are simply not used.

Another similar indicator of processor speed is the time it takes to perform short (fast) operations. As an example, Table 3.1 shows the performance indicators of several 8-bit and 16-bit processors. Currently, this indicator is practically not used, like MIPS.

Command execution time is an important, but far from the only factor that determines performance. The structure of the processor instruction system is also of great importance. For example, some processors will need one instruction to perform an operation, while other processors will need several instructions. Some processors have a command system that allows you to quickly solve problems of one type, and some - problems of another type. The addressing methods allowed in a given processor, the presence of memory segmentation, the way the processor interacts with I/O devices, etc. are also important.

The performance of the system as a whole is also significantly influenced by how the processor “communicates” with the command memory and data memory, and whether the fetching of commands from memory is combined with the execution of previously selected commands.

Performance is the most important characteristic of a computer. Performance is the ability of a computer to perform certain tasks in certain periods of time.

A computer that does the same amount of work in less time is faster. The execution time of any program is measured in seconds. Often performance is measured as the rate at which a certain number of events occur per second, so less time means more performance.

However, depending on what we believe, time can be defined in different ways. The simplest way to determine time is called astronomical time, response time, execution time, or elapsed time. This is the latency of a job and includes literally everything: CPU work, disk accesses, memory accesses, I/O, and operating system overhead. However, when operating in multiprogram mode, while waiting for I/O for one program, the processor may be executing another program, and the system will not necessarily minimize the execution time of that particular program.

To measure the time the processor runs on a given program, a special parameter is used - CPU time, which does not include I/O latency or execution time of another program. Obviously the response time user visible, is full time program execution, not CPU time. CPU time can be further divided into time spent by the CPU directly executing the user program, called user CPU time, and CPU time spent by the operating system executing jobs requested by the program, called system CPU time.

In some cases, CPU system time is ignored due to the possible inaccuracy of measurements made by the operating system itself, as well as problems associated with comparing the performance of machines with different operating systems. On the other hand, system code on some machines is user code on others, and besides, virtually no program can run without some kind of operating system. Therefore, when measuring processor performance, the sum of user and system CPU time is often used.

In the majority modern processors the speed of interaction processes between internal functional devices is not determined by natural delays in these devices, but is set by a single system of clock signals generated by some clock pulse generator, usually operating at a constant speed. Discrete time events are called clock ticks, ticks, clock periods, cycles, or clock cycles. Computer designers usually talk about the clock period, which is defined either by its duration (for example, 10 nanoseconds) or its frequency (for example, 100 MHz). The duration of the synchronization period is the reciprocal of the synchronization frequency.

Thus, the CPU time for a certain program can be expressed in two ways: the number of clock cycles for a given program multiplied by the clock cycle duration, or the number of clock cycles for a given program divided by the clock frequency.

An important characteristic often published in processor reports is the average number of clock cycles per instruction - CPI (clock cycles per instruction). When you know the number of commands running in a program, this option allows you to quickly estimate the CPU time for a given program.

Thus, CPU performance depends on three parameters: the clock cycle (or frequency), the average number of clock cycles per instruction, and the number of instructions executed. It is impossible to change any of specified parameters isolated from the other, since the underlying technologies used to change each of these parameters are interrelated: the clock frequency is determined by the hardware technology and the functional organization of the processor; the average number of clock cycles per instruction depends on the functional organization and architecture of the instruction system; and the number of instructions executed in a program is determined by the architecture of the instruction set and the technology of the compilers. When two machines are compared, all three components must be considered to understand relative performance.

In the search for a standard unit of measurement for computer performance, several popular units of measurement have been adopted. They are discussed in detail in the first chapter.

1. Review of methods and tools for assessing the performance of computing systems. Formulation of the problem.

1.1 Indicators for assessing the performance of computing systems

MIPS

One alternative unit for measuring processor performance (relative to execution time) is MIPS (million instructions per second). There are several different interpretations of the MIPS definition.

In general, MIPS is the speed of operations per unit time, i.e. for any given program, MIPS is simply the ratio of the number of instructions in the program to its execution time. Thus, performance can be defined as the inverse of execution time, with faster machines having a higher MIPS rating.

Positive aspects MIPS is that the characteristic is easy to understand, especially by the buyer, and that a faster car is characterized by a large number MIPS, which matches our intuitions. However, using MIPS as a comparison metric faces three problems. First, MIPS depends on the processor instruction set, which makes it difficult to compare MIPS between computers that have different systems commands Secondly, MIPS, even on the same computer, varies from program to program. Third, MIPS can change in the opposite direction relative to performance.

The classic example for the latter case is the MIPS rating of a machine that includes a floating point coprocessor. Since, in general, each floating-point instruction requires more clock cycles than an integer instruction, programs using a floating-point coprocessor instead of the corresponding subroutines from the software, complete in less time but have a lower MIPS rating. In the absence of a coprocessor, floating-point operations are implemented using subroutines that use simpler integer arithmetic instructions and, as a result, such machines have a higher MIPS rating, but execute so many more instructions that the overall execution time increases significantly. Similar anomalies are observed when using optimizing compilers, when, as a result of optimization, the number of instructions executed in the program is reduced, the MIPS rating is reduced, and performance increases.

Another definition of MIPS is associated with the once very popular VAX 11/780 computer from DEC. It was this computer that was adopted as the standard for comparing the performance of different machines. The performance of the VAX 11/780 was considered to be 1MIPS (one million instructions per second).

At that time, the synthetic D hrystone test became widespread, which made it possible to evaluate the efficiency of processors and compilers from the C language for non-numerical processing programs. It was a test mixture of 53% assignment statements, 32% control statements, and 15% function calls. This was a very short test: the total number of commands was 100. The speed of execution of the program from these 100 commands was measured in Dhrystone per second. The performance of the VAX 11/780 on this synthetic test was 1757 Dhrystone per second. Thus 1 MIPS is equal to 1757 Dhrystone per second.

The third definition of MIPS is related to IBM RS/6000 MIPS. The fact is that a number of manufacturers and users (followers of IBM) prefer to compare the performance of their computers with the performance of modern IBM computers, and not with the old DEC machine. The relationship between VAX MIPS and RS/6000 MIPS has never been widely published, but 1 RS/6000 MIPS is approximately equal to 1.6 VAX 11/780 MIPS.

MFLOPS

Measuring computer performance in solving scientific and technical problems that make significant use of floating point arithmetic has always been of particular interest. It was for such calculations that the question of measuring performance first arose, and based on the achieved indicators, conclusions were often drawn about the general level of computer development. Typically, for scientific and technical tasks, processor performance is estimated in MFLOPS (millions of floating-point numbers per second, or millions of elementary arithmetic operations on floating point numbers executed per second).

As a unit of measurement, MFLOPS is intended to measure floating-point performance only, and is therefore not applicable outside of this limited area. For example, compiler programs have MFLOPS ratings close to zero no matter how fast the machine is, since compilers rarely use floating point arithmetic.

MFLOPS is based on the number of operations performed, not the number of instructions executed. According to many programmers, the same program running on different computers will perform a different number of instructions, but the same number of floating point operations. That is why the MFLOPS rating was intended to fairly compare different machines with each other.

1.2. Standard Performance Measurement Tests

This paragraph discusses the most common standard benchmarking tests - Whetstone, Dhrystone, Linpack. Linpack, and Whetstone characterize the processing real numbers, and Dhrystone - integer processing. More modern SPEC tests are essentially multiple packages to provide a set of performance scores. batch processing and therefore are described separately.

1.2.1. Whetstone (general description)

In 1976, H.J. Curnow and B.A. Wichmann of the British National Physical Laboratory presented a set of performance measurement programs written in the ALGOL-60 language. This was the first time that benchmark tests were published, all the more remarkable because the Whetstone package is composed of synthetic tests developed using the distribution statistics of intermediate-level instructions (Whetstone instructions) of the Whetstone Algol compiler (from which the name of this measurement package comes), collected on the basis a large number of computing tasks. More detailed description of this test package is given in the next chapter.

1.2.2. Dhrystone (general description)

Dhrystone tests are based on the typical distribution of language structures. Dhrystone includes 12 modules representing various typical processing modes. Dhrystone tests are designed to evaluate performance related to the functioning of specific types of system and application software (operating systems, compilers, editors, etc.). A more detailed description of this test package is given in the next chapter.

1.2.3. Linpack (general description)

Linpack is a collection of linear algebra functions. The package programs process two-dimensional matrices, the size of which is the main testing parameter (matrices 100x100 or 1000x1000 are most often used): than more elements in the matrix, the higher the parallelism of operations when testing performance. This parameter is of particular importance for computers with vector architecture (in this case, it characterizes the length of the processed vectors), however, it would be a big mistake not to take it into account when testing systems of other classes. The fact is that almost all modern computers widely use parallel processing tools (pipelined and/or superscalar arithmetic, VLIW processor architecture, MPP system organization, etc.), so assessing performance at different depths of software parallelism is very indicative for any modern system.

Interpretation of results:

Grade Linpack characterizes mainly the performance of processing floating point numbers; when specifying a large matrix size (1000x1000), the influence of integer operations and control commands (IF type operators) on this estimate is small.
The operator of the form Y[i]=Y[i]+a*X[i], represented by the SAXPY/DAXPY test procedure (SAXPY - single precision; DAXPY - double precision), has the greatest weight in the resulting performance assessment. According to Weicker, this procedure takes up over 75% of the execution time of all Linpack tests, and Dongarra gives an even higher figure - 90%. Therefore, the Linpack evaluation is not indicative of the entire set of floating point operations, but mainly for the addition and multiplication instructions.
The small size of the Linpack executive code (approximately 4.5 KB) and a small number of jump operations practically do not create any significant load on the command buffering facilities in the processor: most of the modules of the package are located entirely in the instruction cache and do not require dynamic swapping of commands during execution (for example, the most “weighty” procedure SAXPY/DAXPY is represented by only 234 bytes of code). However, the load on the processor-memory interaction path is quite high: single-precision tests with 100x100 matrices process 40 KB of data, and double-precision tests - 80 KB. Of course, for most modern computers, the entire volume of Linpack data will most likely be localized in the secondary cache, and yet the test results, especially for matrices of size 1000x1000, are more consistent with the concept of “system batch processing performance” than the estimates , obtained using Whetstone and Dhrystone and reflecting primarily processor performance.
The absence of calls to library functions in Linpack tests eliminates the possibility of optimization of results by computer suppliers and allows the obtained scores to be treated almost as a “pure” characteristic of system performance.
The Linpack methodology requires mandatory publication of the name of the compiler that translated the source code (during this procedure, any manual intervention in the actions of the compiler is prohibited, it is not even allowed to remove comments from the text of the programs), and the operating system under which testing was carried out. The absence of this data, as well as information about the installed testing attributes (Single/Double, Rolled/Unrolled, Coded BLAS/Fortran BLAS) and the size of the matrices should serve as a warning about a possible violation of standard conditions for measuring performance using the Linpack methodology .

1.3 SPEC Core Test Suites

1.3.1. SPECint92, SPECfp92

The importance of creating test suites based on real world application programs wide range of users and providing effective evaluation of processor performance, was realized by most of the largest computer equipment manufacturers, who in 1988 established the non-profit corporation SPEC (Standard Performance Evaluation Corporation). The main purpose of this organization is to develop and maintain a standardized set of specially selected test programs to evaluate the performance of the latest generations of high-performance computers. Any organization that has paid an entry fee can become a member of SPEC.

SPEC's main activities are the development and publication of test suites designed to measure computer performance. Before publication, the object codes of these sets along with source texts and tools are intensively checked for the possibility of importing into different platforms. They are available to a wide range of users for a fee that covers development and administrative costs. A special license agreement governs the execution of testing and publication of results in accordance with the documentation for each test set. SPEC publishes a quarterly report of SPEC news and testing results: "The SPEC Newsletter", which provides a centralized source of information for testing results from SPEC benchmarks.

The main output of SPEC is test suites. These sets are developed by SPEC using codes coming from various sources. SPEC is working on importing these codes into different platforms, and also creating tools to turn the codes selected as tests into meaningful workloads. Therefore, SPEC tests are different from free software. Although they may exist under similar or the same names, their execution times will generally differ.

Currently, there are two basic sets of SPEC tests, focused on intensive calculations and measuring the performance of the processor, memory system, and the efficiency of code generation by the compiler. Typically, these tests are focused on the UNIX operating system, but they are also imported to other platforms. The percentage of time spent on operating system and I/O functions is generally negligible.

The CINT92 test suite, which measures processor performance when processing integers, consists of six programs written in C and selected from various application areas: circuit theory, Lisp language interpreter, development logic circuits, text file packaging, spreadsheets, and program compilation.

The CFP92 test suite, which measures processor floating-point performance, consists of 14 programs also selected from various application areas: analog circuit design, Monte Carlo simulation, quantum chemistry, optics, robotics, quantum physics, astrophysics, weather forecasting and other scientific and engineering tasks. Two programs from this set are written in C, and the remaining 12 are written in Fortran. Five programs use single precision and the rest use double precision.

The results of running each individual test from these two sets are expressed as the ratio of the execution time of one copy of the test on the machine under test to the time it takes to execute it on the reference machine. The reference machine is VAX 11/780. SPEC publishes the results of each individual test run, as well as two composite scores: SPECint92 - the geometric average of 6 individual test results from the CINT92 set and SPECfp92 - the geometric average of 14 individual test results from the CFP92 set.

It should be noted that the test results on the CINT92 and CFT92 sets strongly depend on the quality of the optimizing compilers used. To more accurately determine hardware capabilities, since mid-1994 SPEC has introduced two additional composite scores: SPECbase_int92 and SPECbase_fp92, which impose certain restrictions on the compilers used by computer vendors

during testing.

1.3.2. SPECrate_int92, SPECrate_fp92

The SPECint92 and SPECfp92 composite scores characterize the performance of the processor and memory system quite well when operating in single-tasking mode, but they are completely unsuitable for assessing the performance of multiprocessor and single-processor systems operating in multitasking mode. This requires an estimate of the system's throughput or capacity, indicating the number of jobs the system can complete within a given time interval. System throughput is determined primarily by the number of resources (number of processors, RAM and cache memory capacity, bus bandwidth) that the system can make available to the user at any given time. It was this rating, called SPECrate and replacing the previously used SPECthruput89 rating, that SPEC proposed as a unit of measurement for the performance of multiprocessor systems.

In this case, the “homogenous capacity method” was chosen for measurement, which consists in simultaneous execution of several copies of the same test program. The results of these tests show how many tasks of a particular type can be completed in a specified time, and their geometric average values (SPECrate_int92 - on a set of tests measuring the performance of integer operations and SPECrate_fp92 - on a set of tests measuring performance on floating point operations) clearly reflect throughput single-processor and multiprocessor configurations when working in multitasking mode in shared systems. The same CINT92 and CFT92 sets were selected as test programs for carrying out throughput tests.

When running a test package, independent measurements are taken for each individual test. Typically, a parameter such as the number of copies of each individual test to run is selected based on considerations of optimal resource use, which depends on the architectural features of a particular system. One obvious option is to set this parameter to the number of processors on the system. In this case, all copies of a separate test program are launched simultaneously, and the completion time of the last of all launched programs is recorded.

1.3.3. TPC

As the use of computers in business transaction processing increases, it becomes increasingly important to be able to fairly compare systems to each other. To this end, the Transaction Processing Performance Council (TPC) was created in 1988, which is a non-profit organization. Any company or organization can become a member of TPC after paying the appropriate fee. Today, almost everyone is a member of TPC largest producers hardware platforms and software for business automation. To date, TPC has created three test packages to provide a fair comparison various systems transaction processing and plans to create new evaluation tests.

In the computer industry, the term transaction can mean almost any type of interaction or exchange of information. However, in the business world, “transaction” has a very specific meaning: a commercial exchange of goods, services or money. Nowadays, almost all business transactions are carried out using computers. Common examples of transaction processing systems are accounting management systems, airline reservation systems, and banking systems. Thus, the need for standards and test packages to evaluate such systems is increasingly increasing.

Before 1988, there was no general agreement on how to evaluate transaction processing systems. Two test packages were widely used: Debit/Credit and TPI. However, these packages did not allow for adequate evaluation of systems: they did not have complete, thorough specifications; did not provide objective, verifiable results; did not contain a complete description of the system configuration, its cost and testing methodology; did not provide an objective, unbiased comparison of one system with another.

To address these issues, TPC was created with the primary mission of precisely defining test suites for evaluating transaction processing and database systems and disseminating objective, verifiable data to industry.

The TPC publishes test suite specifications that govern issues related to test performance. These specifications ensure that buyers have objective data values for comparing the performance of different computing systems. Although implementation of assessment test specifications is left to the discretion of individual test sponsors, sponsors themselves must submit a TPC when announcing TPC results detailed reports documenting compliance with all specifications. These reports include, but are not limited to, system configuration, pricing methodology, performance charts, and documentation showing that the test meets atomicity, consistency, isolation, and durability (ACID) requirements, which ensure that all transactions from the evaluation test are processed as expected.

TPC defines and manages the format of several On-Line Transaction Processing (OLTP) performance tests, including the TPC-A, TPC-B, and TPC-C tests. As noted, the creation of an assessment test is the responsibility of the organization performing the test. TPC only requires that certain conditions be met when creating an assessment test. Although the TPC tests mentioned are not representative tests for evaluating database performance, relational database systems are key components of any transaction processing system.

It should be noted that, like any other test, no TPC test can measure system performance that is applicable to all possible transaction processing environments, but these tests can indeed help the user compare similar systems fairly. However, when a user makes a purchase or plans a purchasing decision, he must understand that no test can replace his specific application task.

1.3.3.1. TPC-A test

Released in November 1989, TCP-A was designed to evaluate the performance of systems running in a database-intensive environment typical of on-line data processing (OLDP) applications. This environment is characterized by:

multiple terminal sessions on-line
significant I/O volume when working with disks
moderate operating time of the system and applications
integrity of transactions.

In practice, when performing the test, a typical bank computing environment is emulated, including a database server, terminals and communication lines. This test uses single, simple transactions that update the database intensively. A single transaction (similar to a normal customer account update operation) provides a simple, repeatable unit of work that tests key components of the OLTP system.

The TPC-A test determines the throughput of a system, measured by the number of transactions per second (tps A) that the system can perform when operating across multiple terminals. Although the TPC-A specification does not specify the exact number of terminals, system vendors must increase or decrease the number according to capacity requirements. The TPC-A test can be run in local or regional computer networks. In this case, its results determine either the “local” throughput (TPC-A-local Throughput) or the “regional” throughput (TPC-A wide Throughput). Obviously, these two benchmarks cannot be directly compared. The TPC-A test specification requires all companies to fully disclose the details of their test's operation, their system configuration, and its cost (based on a five-year service life). This allows us to determine the normalized cost of the system ($/tpsA).

1.3.3.2. TPC-B test

In August 1990, the TPC approved TPC-B, an intensive database test characterized by the following elements:

significant amount of disk I/O
Moderate operating time of the system and applications
transaction integrity.

TPC-B measures system throughput in transactions per second (tpsB). Because there are significant differences between the two tests TPC-A and TPC-B (in particular, TPC-B does not emulate terminals or lines), they cannot be directly compared.

1.3.3.2. TPC-C test

The TPC-C test package simulates an order processing application. It models a fairly complex OLTP system that must manage order taking, inventory management, and distribution of goods and services. The TPC-C test tests all major system components: terminals, communication lines, CPU, disk I/O and database.

TPC-C requires five types of transactions to be performed:

new order entered using a complex screen form
simple database update related to payment
simple database update related to delivery
information about the status of orders

certificate of goods accounting

Among these five types of transactions, at least 43% should be payments. Transactions related to certificates of order status, delivery status and accounting should be 4%. It then measures the transaction rate of new orders processed in conjunction with a mixture of other transactions running in the background.

The TPC-C database is based on a wholesale supplier model with remote locations and warehouses. The database contains nine tables: warehouses, district, customer, order, order order, new order, invoice item, inventory, and history.

Typically two results are published. One of them, tpm-C, represents the peak transaction rate (expressed in transactions per minute). The second result, $/tpm-C, is the normalized cost of the system. The cost of the system includes all hardware and software used in the test, plus the cost of maintenance for five years.

Synthetic kernels and natural benchmarks cannot serve as true test suites for evaluating systems: they cannot accurately simulate the end-user environment and evaluate the performance of all relevant system components. Without such a guarantee, performance measurement results remain in question.

1.4. Modern basic computer configurations

Selecting a computer configuration based on the expected material costs of the purchase and/or the list of tasks that will be solved on it is something that almost any person who decides to purchase a computer is forced to do. To make your choice correctly, you need to consider the criteria by which it is made:

Performance

In principle, there is no point in limiting PC performance “from above”: “if you can be 100 times faster than everyone else, but for the same money, then why not?” But the lower limit is determined, first of all, by the requirements of the most common at the moment in user environment software. For example, the de facto standard for an office computer at the moment is the OS of the family Microsoft Windows not younger than Windows 98SE, then, accordingly, an office computer on which this operating system cannot run is unlikely to satisfy the buyer, even if it costs 100 rubles. This implies that the user will have the opportunity not only to observe the hourglass, but also to perform meaningful, useful actions.

Reliability

Modern technologies in the field of computer hardware they are moving forward by ten mile steps, but when configuring a PC intended for a wide range of users, it is hardly reasonable to include products that are a couple of weeks old. Yes, this certain device may seem like the height of perfection. Yes, so far the reviews have been very positive, and no one has noticed any problems. In the end, yes, maybe they won’t exist at all! Or maybe they will... This is simply unknown. Pay attention to the products of leading Western brands - HP, Dell, IBM. At first it may seem that their computer lineup is somewhat conservative. However, this is precisely why users who buy their computers can be sure that the components installed there will not end up “thrown into the dustbin of history” in six months, will not be left without technical support and driver updates, etc.

Upgradability

Unfortunately, supporting the computer's hardware configuration to further increase power by replacing some (and not all) of the main components is still the province of rather expensive models. Those. For the possibility of a subsequent upgrade, you must pay immediately upon purchase. Accordingly, the most “upgradable” models are those in the middle and higher price ranges. However, any, even the cheapest PC should have a minimum ability to upgrade. At least, for example, allow the installation of a processor with one and a half to two times higher clock speed and doubling (preferably tripling) the amount of memory.

In table 1.1. a description of modern computer configurations is given, indicating in which areas of life they are used.

Table 1.1.

Description of modern computer configurations

The minimum configuration, which is formed according to the principle “the main thing is the price, we’ll somehow put up with everything else.” Accordingly, a motherboard is used even without an AGP slot, i.e., the possibilities for further system upgrades are actually limited only by installing a more powerful CPU and increasing the amount of memory. At the same time, such a PC is quite capable of performing a limited set of office applications - text editor, spreadsheet, Internet browser and email client. By increasing the amount of RAM, work becomes more comfortable, but in most cases you can do without it. Windows 2000 (and even more so Windows XP) are “categorically contraindicated” for such a machine, as are the latest generation office suites.

Office middle-end

A full-fledged work machine, equipped with a powerful processor and sufficient RAM, can satisfy the needs of almost any office worker. In addition to being used as an “electronic typewriter,” such a PC can become a very convenient workplace for both an accountant and an “in-office designer” working with simple business graphics. Fortunately, the ATI video card provides excellent image quality even at very high resolutions.

The presence of 128 MB of RAM, in principle, allows you to install even Windows 2000 on such a computer if desired, although it is hardly reasonable to recommend this OS for a regular office PC. The optimal way to increase performance is to install more memory - an 800 MHz processor running on a 100 MHz bus is unlikely to be "not enough" for office work for at least another year and a half.

These configurations fell into the office category solely for the reason that it is impractical to introduce division of PCs into more than two groups. Secretary working with electronic documents and at the same time experiencing an urgent need for a Pentium 4 1.5 GHz, it’s hard to imagine. That is, if the first two configurations were purely office in nature - document management, accounting and work on the Internet, then the "office high-end" is workplace a certain “advanced user” who is also in the office, but is engaged not only in drawing up and viewing documents, but also in layout, design, working with sound or video, or, for example, writing simple programs “for internal use”. In this case powerful processor and a large amount of RAM will be in demand. In addition, these PCs have excellent upgradability indicators - support for modern high-frequency CPUs and DDR memory (in the case of the AMD platform) will allow these systems to be “completed” without major modifications for quite a long time.

Entertainment low-end

Often a computer is used not only (and often not so much) for work: home PCs are also very common. But the requirements for these two varieties of the same computer are completely different. For the user of such a system, its main purpose is games and entertainment, which imposes certain restrictions “from below” on the hardware configuration - AC"97 sound and a weak 3D accelerator in a home computer are completely inappropriate. Entertainment low - if financial restrictions are very strict, then for a modest amount it will allow you to get a device from which the user will begin his journey into the world of computer entertainment. Moreover, the possibilities for further upgrading this system are quite wide, which, if you have the finances, desire or need, will allow you to gradually increase its power without resorting to large one-time monetary expenditures. costs.

Entertainment middle-end

A fairly powerful configuration that allows you not to think too much when buying each new game about the questions: “Will it have enough power for my PC? Will the game process turn into contemplation of a slide show?” In addition, the presence of an audio card that supports five-channel sound will make it possible to organize a home theater or amateur sound studio based on this PC. To configuration based on AMD processor Duron 1 GHz installed motherboard on the VIA Apollo KT266A chipset with support for DDR memory, even though, according to the results of testing Duron in combination with PC2100 DDR, there is no significant increase in performance compared to PC133.Entertainment high-end

The owner of this configuration can forget about all possible inconveniences that arise during games. “Everything to the maximum” is the optimal set of options for almost any game that exists today, if it runs on this PC. Sound card provides excellent sound quality and support for all modern 3D audio standards. With high-quality acoustics, this system can easily become a home entertainment center. It is completely pointless to talk about its working application, since everyone has long known that the requirements gaming applications PC power is an order of magnitude greater than the requirements of application software.

1.5. Evaluating Recursive Functions

Recursion is the process of defining or expressing a function, procedure, language construct, or solving a problem through them themselves.

Recursion in calculations is a situation in which the same value (function) calculated under different conditions (function arguments) is used to calculate a certain value of a quantity (function). The recursive property is not limited to computational problems. This is a general property that algorithms and programs of any nature can have.

Recursion in a program (algorithm) is the ability of a program (procedure, algorithm) to access itself to perform the same sequence of operations under different external conditions (parameters). A sign of the presence of recursion is the structure of the program (procedure) call that is present in the text of this program.

Recursions in algorithms and programs are powerful and effective means programming. On the one hand, they reflect the internal nature of the problem and the method of solving it, in which various parts of the algorithm are similar to itself. We can say that recursive algorithms are an algorithmic manifestation of the properties of equality and similarity that actually exist in the world of objects. On the other hand, recursions allow maximum degree take advantage of procedural programming - a procedure or function uses calls of its own operations. As a result, the program text is the most compact and easy to read.

Each recursive call to a procedure and function uses a part of the computer's RAM, organized in a special way - stack. The stack stores the information needed to return to execution calling program. Since the size of the stack, like any memory, is limited, too many call structures are impossible in real conditions. This is important to keep in mind when the expected number of recursive calls (recursion depth) can be measured in the hundreds.

The most famous of the completely recursive functions is the Ackermann function. It is a good idea to test whether this feature can be used by testing different computer configurations.

1.6. Formulation of the problem

Based on the general requirements presented in the assignment for the diploma project, we will formulate a more detailed description of the package of test programs being developed.

It is required to develop an algorithm for a program that calculates the Ackermann function, and to implement a performance test of computer systems using this algorithm. The program should be simple and convenient user interface: the user only needs to select the required test and specify its parameters. Using the test being developed, it is necessary to compare sequences of computer configurations. To ensure a fair comparison, the same configurations must be tested using standard performance measurement tests. The results obtained must be analyzed, and appropriate conclusions must be drawn on their basis.

The final stage of design is the preparation of software documentation, which includes, in addition to terms of reference program texts, program description, as well as program and test methods.

2. Design and development of a software product

The software product being developed contains three test programs: one main and two additional. As additional ones, standard tests for measuring computer performance were chosen: Whetstone and Dhrystone. These programs characterize real number processing and integer data processing, respectively. The main test is a program based on the calculation of the Ackermann function.

2.1. Description of additional tests

2.1.1. Whetstone (procedure description)

The Whetstone test suite consists of several modules that simulate the software load in the most typical execution modes of computing tasks (floating-point arithmetic, IF operators, function calls, etc.). Each module is executed multiple times, in accordance with the initial statistics of Whetstone instructions (in practice, this is implemented by enclosing modules in cyclic structures with different numbers of cycle “turns”), and performance is calculated as the ratio of the number of Whetstone instructions to the total execution time of all modules of the package. This result is reported in KWIPS (Kilo Whetstone Instructions Per Second) or MWIPS (Mega Whetstone Instructions Per Second). A significant advantage of these assessments is that Whetstone instructions are not tied to the command system of any computer, i.e. performance assessment in MWIPS is model independent.

The program was adapted in the following ways: additionally, the “test execution time” parameter was introduced - this parameter is specified by the user. This was done so that any Ackerman function (with any parameters) could be compared with the number of passes (the number of external loops executed) of the Whetstone test, which allows us to estimate the number of operations performed by the processor during the calculation of a given Ackerman function.

Based on the above description of the test operation, a general diagram of the Whetstone program was drawn up, shown in Figure 2.1.

Fig.2.1. General scheme Whetstone programs

The main logic of the program is included in the Whets function. As part of this function, a software load is implemented, which consists of eight sequentially executed modules:

calculating array elements
calculation of array elements (the array is specified as a function parameter)
branch operations (if – else)
integer arithmetic (subtraction, addition and multiplication operations)
trigonometric functions (sin, cos, atan)
calling procedures that operate on pointers
processing an array of links
calling standard functions (sqrt, log, exp)

The diagram of the testing procedure algorithm is presented in Figure 2.2.

Rice. 2.2. Scheme of the testing procedure algorithm

Interpretation of results

The Whetstone package is focused on evaluating the performance of processing floating point numbers: almost 70% of the execution time is spent on floating arithmetic and library execution mathematical functions.
Big number calls to the library of mathematical functions included in the Whetstone tests require special care when comparing results obtained for different computers: manufacturers have the opportunity to optimize the Whetstone evaluation by making changes to the library.
Since Whetstone test modules are represented by very compact execution code, for modern processors they do not allow us to evaluate the effectiveness of the mechanism for dynamically swapping instructions into the instruction cache: any Whetstone module is located entirely in cache memory, even of the smallest capacity.
A feature of the tests under consideration is the almost complete absence of local variables. As a result, Whetstone estimates largely depend on the efficiency of the computer resources that provide access to RAM and data buffering in the processor (including the number of registers, data cache capacity and its replacement mechanism). However, this same circumstance makes Whetstone tests practically insensitive to means of increasing the efficiency of working with local variables.

2.1.2. Dhrystone (procedure description)

Description of the test algorithm

Dhrystone tests are designed to evaluate the performance of specific types of system and application software (operating systems, compilers, editors, etc.). This left a noticeable imprint on the structure of the data and executive code: in the Dhrystone tests there is no processing of floating point numbers, but operations on other data types (characters, strings, logical variables, pointers, etc.) predominate. In addition, compared to the Whetstone tests, the number of cyclic constructs has been reduced, simpler computational expressions are used, but the number of IF statements and procedure calls has increased.

Dhrystone test procedures are combined into one measurement cycle, which contains 103 statements in the C version. This global cycle is taken as a unit of work (one Dhrystone), and productivity is measured in the number of measurement cycles completed per second (Dhrystones/s). However, recently, when publishing Dhrystone estimates, other units of measurement have begun to be used - MIPS VAX. This deviation from the standard rules is dictated by two circumstances: firstly, the Dhrystones/s units look too exotic; secondly, the MIPS VAX score coincides in meaning with the conventional units of very common SPEC tests that characterize computer performance relative to the VAX 11/780 system (for example, 1.5 MIPS VAX means that the system under test runs one and a half times faster than the VAX 11/780 ) .

2.2. Development of a test based on the Ackerman function

2.2.1.Use of recursion to evaluate aircraft performance

Recursion expresses the characteristic property of an object, which is that the object refers to itself. In programming, the term "recursion" describes a situation where a program continues to call itself as long as some specified condition is met. To solve such problems, it is necessary to evaluate the efficiency of the computer system on which the problem is solved. this task when performing recursion. Below are some typical situations using recursive algorithms and programs to solve real problems.

Factorial

Consider some function defined as follows:

F(n) = 1 * 2 * 3 * ... * n, where n is a given natural number. Such a work natural numbers from 1 to n is called in mathematics the factorial of the number n and is denoted as follows: “n!” In general, the value "n!" is also determined for the zero value. Thus, the full definition of the factorial of a non-negative number is:

2.n! = 1 * 2 * 3 * ... * n, for n > 0

Function "n!" is of great importance in those problems of mathematics that deal with obtaining and exploring various options. So, for example, the number of different ways to arrange a group of n different objects is “n!” Here the recursive property is that the factorial value of any number n can be obtained based on the factorial value of the preceding number: (n-1).

Indeed, it is easy to see that:

4! = (1 * 2 * 3) * 4 = 3! * 4

In other words, to determine the algorithm for calculating the factorial, you can set the following relations:

n! = (n-1)! * n, for n > 0 (2.2)

Let us pay attention to expression number 2.2. Both the right and left sides of the expression contain the same calculated function! To calculate the factorial of n, you need to calculate (n-1)!, to calculate (n-1)! you need to know what (n-2) is equal to! etc.