sse 4.2 instruction set. AMD Bulldozer architecture. Instructions

A few months ago, AMD introduced a new architecture that will be used in new processors starting in 2011. New architecture is called Bulldozer and is completely different from the current AMD64 architecture that AMD has been using since 2003.

The Bulldozer architecture will inherit some of the technology introduced with the AMD64 architecture, such as: integrated memory and bus controller HyperTransport for communication between the processor and the chipset.

Bulldozer is the code name of the architecture, not the name of a specific processor. As is usually the case, the first release of processors will be focused on the server market, then a release for the market of expensive high-performance computers, then for the mid-price segment, and finally for the budget-level market.

Although AMD did not reveal the specifications of the new processors, they noted that the first processors for desktop computer, will be executed on the new socket AM3+, which will be compatible with existing socket AM3. However, Socket AM3+ will not be compatible with motherboards under Socket AM3.

The Bulldozer architecture will have similar technology Intel Turbo Boost, which allows you to automatically overclock the processor.
Before we talk about interior architecture Bulldozer, let's look at the set of instructions supported by the new architecture.

The Bulldozer architecture, in addition to being compatible with the x86 instruction standard, will support the following additional sets instructions:

  • SSE4.1 and SSE4.2
  • AVX (Advanced Vector Extensions) with two additional instructions XOP and FMA4
  • AES (Advanced Encryption Standard) – advanced encryption standard
  • LWP (Light Weight Profiling)

SSE4.1 and SSE4.2

Finally AMD processors will support the SSE4 instruction set. Currently, AMD processors do not support this instruction set, which increases performance in multimedia applications(for example, image and video processing applications). On this moment AMD processors support their own instruction set called SSE4a, which is not the same as SSE4.

AVX (Advanced Vector Extensions)

At one time, AMD proposed using new set SSE5 instructions. That is why Intel decided to create its own implementation of what was called SSE5 and called this instruction AVX (Advanced Vector Extensions). AMD decided to add this instruction set for the Bulldozer architecture.

AVX instructions will also be supported by new processors from Intel based on the Sandy Bridge architecture.

Kit AVX instructions adds 12 new instructions and increases the size of XMM registers from 128 bits to 256 bits.

In the Bulldozer architecture, AMD decided to use some of the instructions that were proposed for SSE5. Thus, the use of AVX in the Bulldozer architecture is more complete than that of Intel. These additional instructions called XOP and FMA4. AMD also noted that AVX has a subset of FMAC (Fused Multiply Accumulate) instructions, but in fact, it is part of the XOP instruction set

AES (Advanced Encryption Standard)

This set of commands is already used in new Intel processors, based on the Westmere architecture (except Core i3), and consists of six new instructions related to encryption. Intel calls this instruction set AES-NI.

LWP (Light Weight Profiling)

LWP instructions will improve multi-threaded performance software working for multi-core processors. LWP includes six new instructions.

In the new Nehalem microarchitecture, Intel continued its previously taken course of increasing the number of supported SIMD instructions. The updated instruction set expanded with seven new instructions and was called SSE4.2 (the designation SSE4.1 was used for the SIMD instruction system of Penryn processors). At the same time, Intel specifically draws attention to the fact that the instructions introduced into the SSE4.2 set are focused not so much on accelerating the processing of streaming media content, but on other purposes. That is why the new instructions introduced in Nehalem also received the symbol ATA (Application Targeted Accelerators). The ATA concept is presented in such a way that modern technological processes make it possible to use part of the processor transistors not only for universal functional blocks, but also for specific needs, increasing the performance of specific tasks. Thus, in accordance with this concept, five instructions have been added to SSE4.2 designed to speed up the parsing of XML files. Also, using the same instructions, it is possible to increase the speed of processing strings and texts. Two more new instructions from the SSE4.2 set are aimed at completely different applications. The first of them, CRC32, accumulates the CRC32c checksum, and the second, POPCNT, counts the number of non-zero bits in the source. These commands can also be widely used in various application and network applications.

Integrated memory controller

Nehalem was the first Intel microarchitecture to integrate a memory controller inside the processor. It would seem that Intel engineers here borrowed the idea of ​​their colleagues from AMD, who have been building a memory controller inside processors since 2003. However, this is not entirely true, since the first processors with an integrated memory controller were supposed to be the never released Intel Timna, work on which was actively carried out in 1999. In addition, accusations of plagiarism should be dismissed because the memory controller developed by Intel for Nehalem is very different from the controller used in existing AMD processors. Intel's approach to the problem turned out to be much more ambitious. The main property of the memory controller of the Nehalem family of processors is flexibility. Considering the modular design of the entire promising processor family, which may contain products that differ greatly in characteristics and market positioning, Intel has provided the ability not only to enable or disable support for buffered modules, but also to vary the number of channels and memory speed. At the same time, the first processors with the Nehalem microarchitecture, which will be released in a quad-core version, will receive a three-channel memory controller with support for DDR3 SDRAM. Thus, desktop systems built on the new processors will be able to boast of unsurpassed throughput of the memory subsystem, which in the case of using three modules DDR3-1067 will reach 25.6 GB/s. However, the main advantage of moving the DRAM controller to the processor is not so much the growth bandwidth, how much in reducing the latency of the memory subsystem. Despite the fact that Intel offers relatively high latency memory with the new DDR3 processors, Nehalem memory access latencies will in any case be lower than in systems based on Core processors 2 and using DDR3 SDRAM (and, for sure, DDR2 SDRAM). To confirm these words, I would like to provide data obtained from measuring the practical parameters of the memory subsystem of a Nehalem-based system in the Everest 4.60 test utility.

Table 2. Testing memory performance

In fact, even working in single-channel mode, the Nehalem memory controller is capable of showing better performance than the memory controller of today's LGA775 platforms. This is a completely logical result, since there are no intermediate devices on the path between the processor and memory in new generation systems - while previously the northbridge of the chipset was responsible for working with memory, which introduced its own very significant delays caused by the need to synchronize the memory buses and FSB . Another indirect advantage of the memory built into the processor is that its operation now does not depend on either the chipset or the motherboard. As a result, Nehalem will show the same memory performance when running on platforms from different developers and manufacturers.

Hello everyone, today we’ll talk about how to find out which SSE instructions the processor supports. But what is SSE do you know? I don’t know, and it’s not that I don’t know, I can’t even understand what it is. Well, that is, I understand that this is a processor instruction that is needed to optimize its operation, that is, so that at the same frequency the processor with this instruction can process more commands. But this is so, roughly speaking, so to speak...

About SSE, I don’t even know where in life it is needed, maybe for games? I know what Hyper-threading is (though it’s not a processor instruction, it’s a technology), what VT-x, VT-d is, I know what EM64T is, but I don’t know what SSE is! Well, these are the pies guys

In short, guys, I’ll tell you right away that there is a small bummer with this matter, what I mean is that regular means In Windows, such a thing as SSE cannot be found out whether it is there or not. Here you need to download a special program. But don’t worry, this super duper program is free, weighs very little, doesn’t load the computer at all, but at the same time it’s MEGA USEFUL and its name is CPU-Z (by the way, you can download it here: cpuid.com/softwares/cpu-z.html , this is the official website).

So guys, downloaded CPU-Z, installed it and then launched it. And right away you will find out everything, this is how many of these SSEs I have:

Not one, not two, but six, wow guys!

By the way, as you can see, there is still a lot of useful information here, see? If you urgently need to find out something about your process, then you quickly launch CPU-Z and oops, everything you need is at your fingertips! I’m telling you that the CPU-Z program is one of a kind! Don't believe me? Well, no problem, I’ll prove it to you right now. Look, do you know when this or that memory stick was released? Well, that is, the date of its release at the factory, so to speak. Or are you not interested? Well, some people are very interested, but for example, I am very interested! And the CPU-Z program can show such information! So guys, look, we launched CPU-Z, go to the SPD tab, there you select the slot with the bracket (on the left), that is, the connector where it is installed and look at the information on the selected bracket. I have one 8 gig stick in the fourth slot and this is the information the CPU-Z program showed:

Here you can see that my bar was released in the 30th week of 2014. It is also written that my manufacturer is Hyundai Electronics, well, that’s what the Hynix bar is called

Well, in short, CPU-Z is super, if you need to quickly see the most important information about the hardware of a computer or laptop, it will show it all without gags! In short, I recommend it guys!

And also, I forgot to write something about SSE. SSE cannot be enabled or disabled. Because this instruction either exists or it doesn’t. For example, Hyper-threading can be enabled/disabled, but SSE cannot!

That's all guys, I hope that everything was clear to you here, and if something is wrong, then I apologize. Was this information useful to you, honestly? I hope with all my heart that yes! Good luck to you in life, may you be healthy and not get sick, good luck

09.12.2016

Often modern software or games require the processor to have SSE 4.1 - 4.2 instructions. If there are none, run the right application It doesn’t work, some error occurs or just nothing happens.

FarCry 5 complains about the lack of SSE 4.2

At the same time, the processor power may be quite enough for a more or less comfortable game (for example, some Xeon processors for the 775 socket they are still capable of delivering passable FPS in new products), and the requirement for instructions is sometimes necessary not even for the game itself, but for the operation of copy protection. For example, Denuvo protection prevented owners of older processors from playing Assassin's Creed Origins, although the game itself is available latest instructions didn't demand it.

Others also require SSE 4.1 or 4.2 popular games or their components: No Man Sky, Far Cry 5, Dishonored 2, Mafia 3 and others.

Nevertheless, there is a solution, although it does not guarantee 100% success. To launch the desired application, you can use an emulator sde external, which can be downloaded from the link (choose the version for Windows) or at the bottom of this article.

How to use the SSE 4.1-4.2 emulator

  • Download the archive from sde external and unpack it so that sde.exe is in the folder with the right game or program
  • Create a shortcut for sde.exe. Then open the properties of the shortcut and add object in the parameter - the required .exe file. For example: D:\Games\No Man"s Sky\Binaries\sde.exe" - NMS.exe. There must be a space after the last quote, otherwise the system will not allow you to save the shortcut.
  • Also, in the shortcut properties on the “Compatibility” tab, you should check the “run as administrator” option.
  • Save the shortcut and launch it. A black window appears, you can close it. After some time, the desired application should launch.