Intel Optane Memory: fast memory for budget PCs. and then Ostap got carried away. What is Optane: short and clear

Most users imagine What does RAM speed affect?. It is responsible for data transfer, and the more powerful this component is, the faster applications, and especially games, will run. If its volume is insufficient, all processes and programs will take quite a long time to load, or even crash completely, up to an unplanned reboot of the OS, which will cause genuine irritation for the user.

Operating principle and main characteristics of RAM

  1. Memory

RAM is a microcircuit that does not contain autonomous power supply. In other words, if the computer is turned off, then all information stored in RAM is erased. The interaction of the RAM with the processor is carried out through a cache or zero-level memory.

The performance of RAM depends on several parameters, including type and frequency. At the same time, the most important indicator is the volume. For modern computers, the minimum threshold should be 2 gigabytes. This is due to the fact that starting from version Windows Vista, the operating system takes 1 GB for its needs, and accordingly, for applications to fully work, it is necessary to have at least the same size. A smaller volume, of course, can be found (although there are no such brackets in stores anymore), but these computers are already hopelessly outdated and it is almost impossible to install resource-intensive programs on them.

Most the best option on this moment For budget computer There will be an installation of 4 gigabytes of RAM. This will ensure normal and fast work on the Internet, will allow you to watch videos various quality and install modern games using medium settings (although videos and games also depend on).

For more advanced users who work with graphics or edit audio video streams, the amount of RAM required is from 8 to 16 GB, but we must not forget that in this case a good graphics card with a GDDR5 chipset and which will have at least 4 GB of RAM. When installing a larger amount of RAM, for example 32 GB, installing additional sticks in free slots for it (if there are any) can be delayed for several years.

Note: when installing more RAM, you shouldn’t think that the computer will fly after this, because performance depends on both the processor and other components. In addition, do not forget that 32-bit versions operating systems can use only 3.2 GB of RAM, the rest will be idle.

  1. RAM type

The data transfer speed also depends on this parameter. IN modern computers The DDR type is no longer used anywhere, but only with indexes 2,3 or 4. This should definitely be taken into account if the user decides to buy and install an additional bracket in a free slot, because even though they are the same in length and width, there is a slot, which is located on different distances(see screenshot), and therefore cannot be installed.

It is worth noting that DDR 2 is rarely seen anymore and at the moment, DDR 3 is installed almost everywhere. The most modern type of RAM, DDR 4, is extremely rare, mainly on those computers that were purchased or upgraded relatively recently. And if you consider that all motherboards that support DDR 4 can only install Intel processors, which are significantly more expensive than AMD, this also affects the popularization modern type memory. Although we can say with confidence that with DDR 4, efficiency will increase by 1.5-2 times.

  1. Frequency

This parameter is also directly related to . The higher the frequency, the faster the data exchange rate is produced. Among the above-mentioned types of RAM, there are no longer any strips in which the frequency would be lower than 1600 MHz, however, this value is latest models can reach 3200 MHz.

Again, if the computer owner decides to purchase RAM and install it in additional slot, he should consider the following:

  • the frequency of the new bar must be identical to the one that is already installed, otherwise they will not be able to work in parallel;
  • It is advisable to install RAM from one manufacturer, because there are situations when some sticks with the same frequency, but different brands may conflict with each other and the computer simply will not start;
  • may also be limited by this parameter: before purchasing new RAM, review the characteristics of the motherboard so that all the nuances are met and the computer works;
  1. Increased work efficiency

Sometimes the user has enough RAM installed, but the computer slows down and the person decides to buy more RAM. In some cases, this may not be necessary at all; you can only perform optimization:

  • look in the task manager to see how much RAM is loaded, and if there is sufficient reserve, then most likely the problem is not in the RAM and an additional slot will not solve the problem;

  • unload applications that are not currently in use, and also check the list of programs that are located in startup. If it contains applications that are quite rare and definitely not needed when starting the computer, also remove them from this list;
  • , because sometimes some processes can freeze in RAM and clutter it, which can lead to slowdowns and freezes.

You can also try overclocking the RAM. This can be done from the BIOS. But it is worth remembering that some stores in similar cases may be denied warranty service(exchange), and the service life will be less than without performing this action.

At dawn computer technology Dynamic memory worked quite well at the processor frequency. My first experience with a computer was with a clone of the ZX Spectrum computer. The Z80 processor processed instructions at an average of 4 clock cycles per operation, with two clock cycles used for regeneration dynamic memory, which gives us, at a frequency of 3.5 MHz, no more than 875,000 operations per second.

However, after some time, processor frequencies reached a level where dynamic memory could no longer cope with the load. To compensate for this, an intermediate link was introduced in the form of cache memory, which made it possible, through operations performed on a small amount of data, to smooth out the difference in the speed of the processor and main memory.

Let's look at what computer RAM is now, and what can be done with it to increase the performance of the computer system.

Briefly about static and dynamic memory

Memory is built in the form of a table consisting of rows and columns. Each cell of the table contains an information bit (we are discussing semiconductor memory, however, many other implementations are built on the same principle). Each such table is called a “bank”. A chip/module can house multiple banks. A set of memory modules is projected into the linear address space of the processor depending on the bit capacity of individual elements.

Cell static memory is built on the basis of a trigger, which is usually in one of the stable states “A” or “B” (A =! B). The minimum number of transistors for one cell is 6 pieces, while the complexity of routing in cells apparently does not allow making static memory modules of 1 gigabyte, at the price of a regular module of 8 gigabyte.

Otherwise, the operating principle is identical and is as follows:

The initial fetch of a memory line results in access to its entire contents placed in the buffer line with which it comes further work, or column access is multiplexed (old, slow approach);
- the requested data is transferred to the master device (usually the CPU), or modification occurs given cells during a write operation (there is a slight difference here; for static memory, direct modification of the cell of the selected row is possible; for dynamic memory, the buffer row is modified, and only then is it executed write back the contents of the entire line in a special loop);
- closing and changing a memory line is also different for different types memory, for static memory it is possible to instantly change a line if the data has not changed, for dynamic memory it is necessary to write the contents of the buffer line in place, and only then you can select another line.

If, in the early days of computing, each read or write operation completed a full memory cycle:

Row selection;
- read/write operation from a cell;
- change/reselect line.

Modern operations with “synchronous memory a la DDRX” chips are as follows:

Row selection;
- read/write operations of row cells in groups of 4-8 bits/words (multiple accesses within one row are allowed);
- closing the line with recording information in place;
- change/reselect line.

This solution made it possible to save data access time when, after reading a value from cell “1”, it is necessary to access cells “2, 3, 4, or 7” located in the same row, or immediately after the reading operation, it is necessary to write back the changed value .

Learn more about how dynamic memory works in conjunction with cache

The memory controller (in the chipset or built into the processor) sets the block address and line number (the high part of the block address) to the memory chip/module. The corresponding block is selected (we will consider further work within one block) and the resulting “binary number” is decoded into the positional address of the line, after which the information is transferred to the buffer, from which the data is subsequently accessed. Time in cycles required for this operation called tRCD and appears in second place in the “9-9-9/9-9-9-27” patterns.

After the row is activated, you can access the “columns”; for this, the memory controller transmits the address of the cell in the row, and after a time “CL” (indicated in the above diagram “x-x-x” in 1st place) data begins to be transmitted from the memory chip to the processor (why in the plural? because the cache intervenes here) in the form of a package of 4-8 bits (for a single chip) to a cache line (the size depends on the processor, the typical value is 64 bytes - 8 words of 64 bits, but there are and other meanings). After a certain number of clock cycles required to transmit a data packet, you can generate the next request to read data from other cells of the selected row, or issue a command to close the row, which is expressed as tRP specified as the third parameter from “x-x-x-... " When closing a line, data from the buffer is written back to the block line; after the end of writing, you can select another line in this block. In addition to these three parameters, there are minimum time during which the line must be active “tRAS”, and the minimum time for a full cycle of working with the line separating two commands to activate the line (affects random access).

grossws April 19, 2016 at 12:40 pm

CL - CAS latency, tRCD - RAS to CAS delay, tRP - row precharge, CAS - column address strobe, RAS - row address strobe.

The performance of semiconductor technology is determined by the delays of circuit elements. In order to get at the output reliable information, you need to wait certain time so that all elements assume a stable state. Depending on the current state of the memory bank, the data access time changes, but in general the following transitions can be characterized:

If the block is at rest (no active row), the controller issues a row select command, as a result, the binary row number is converted to a position number, and the contents of the row are read in “tRCD” time.

Once the contents of a row have been read into the buffer zone, a column select command can be issued, which converts the binary column number to a position number, in CL time, but depending on the alignment of the low addresses, the order of bit transmission may change.

Before changing/closing a line, it is necessary to write the data in place, since during reading, the information was actually destroyed. The time required to restore the information in the “tRP” line.

According to the full specification for dynamic memory, there are many more time parameters that determine the order and delay of changes in control signals. One of these is “tRCmin”, which determines the minimum time for a full row cycle, including: row selection, data access and writeback.

The RAS signal determines whether a row address has been issued;
The CAS signal determines whether a column address has been issued.

If previously all control was transferred to the side of the memory controller and controlled by these signals, now there is a regime commands, when a command is issued to the module/chip, and after some time time is running data transfer. It is better to read the standard specification for more details, for example DDR4.

If we talk about working with dram in general, then during mass reading it usually looks like this:

Set the line address
set RAS (and removed it after a beat),
waited tRCD,
set the address of the column from which we are reading (and every next bar we set the next column number),
issued CAS,
waited for CL, started reading the data,
removed CAS, read the rest of the data (more CL cycles).

When transitioning to a row other than the next one, precharge (RAS + WE) is done, tRP is waited, RAS is performed with established address lines and then reading is performed as described above.

The latency of reading a random cell naturally follows from what was described above: tRP + tRCD + CL.

It really depends on previous state“memory bank” that is being accessed.

It is important to remember that DDR RAM has two frequencies:

Main clock frequency determining the tempo of command transmission and timings;
- effective data transfer frequency (double the clock frequency with which memory modules are marked).

The integration of a memory controller increased the speed of the memory subsystem by eliminating the intermediate transmission link. An increase in memory channels will require the application to take this into account, for example, a four-channel mode with a certain file location does not provide a performance increase (12 and 14 configurations).


Processing one element of a linked list with different steps (1 step = 16 bytes)

Now a little math

Processor: Processor operating frequencies now reach 5 GHz. According to manufacturers, circuit solutions (pipelines, predictions and other tricks) allow executing one instruction per clock cycle. To round off the calculations, let's take a clock frequency of 4 GHz, which will give us one operation in 0.25 ns.

RAM: let’s take as an example RAM of the new DDR4-2133 format with a timing of 15-15-15.

CPU
Fclock = 4 GHz
Ttact = 0.25 ns (also the execution time of one operation “conditionally”)

RAM DDR4-2133
Fclock = 1066 MHz
Fdate = 2133 MHz
ttact = 0.94 ns
tdate = 0.47 ns
SPDmax = 2133 MHz * 64 = 17064 MB/s (data transfer rate)
tRCmin = 50 ns (minimum time between two row activations)

Time to receive data
From registers and cache, data can be provided within a clock cycle (registers, level 1 cache) or with a delay of several processor cycles for level 2 and 3 caches.

For RAM the situation is worse:

Line selection time is: 15 clk * 0.94 ns = 14 ns
- time until data is received from the column selection command: 15 clk * 0.94 ns = 14 ns
- line closing time: 15 clk * 0.94 ns = 14 ns (who would have thought)

This means that the time between a command requesting data from a memory cell (if it is not in the cache) can vary:

14 ns - the data is in the already selected row;
28 ns - the data is in an unselected row, provided that the previous row is already closed (the block is in the “idle” state);
42-50 ns - the data is in another line, while current line needs closure.

The number of operations that the (above) processor can perform during this time ranges from 56 (14 ns) to 200 (50 ns line change). It is worth noting that the cache line loading delay is added to the time between the column selection command and the receipt of the entire data packet: 8 bits of the packet * 0.47 ns = 3.76 ns. For a situation where the data will be available to the “program” only after loading the cache line (who knows what and how the processor developers screwed it up, memory according to the specification allows you to output the necessary data ahead), we get up to 15 more missed cycles.

As part of one work, I conducted a study of memory speed, the results showed that it is possible to completely “recycle” memory bandwidth only in operations of sequential memory access; in the case of random access, processing time increases (using the example of a linked list of a 32-bit pointer and three double words, one of which is updated) from 4-10 ( sequential access) up to 60-120 ns (line changes), which gives a difference in processing speed of 12-15 times.

Processing speed
For the selected module we have a peak throughput of 17064 MB/s. Which, for a frequency of 4 GHz, makes it possible to process 32-bit words per clock (17064 MB / 4000 MHz = 4,266 bytes per clock). Here they overlap the following restrictions:

Without explicit cache load scheduling, the processor will be forced to idle (the higher the frequency, the more the core just waits for data);
- in “read modification write” cycles, the processing speed is reduced by half;
- multi-core processors will divide the memory bus bandwidth between the cores, and for a situation where there are competing requests (degenerate case), memory performance may deteriorate by “200 times (line changes) * X cores.”

Let's do the math:

17064 MB/s / 8 cores = 2133 MB/s per core in the optimal case.
17064 MB/s / (8 cores * 200 skipped operations) = 10 MB/s per core for the degenerate case.

Translated into operations we get for 8 nuclear processor: from 15 to 400 operations to process a byte of data, or from 60 to 1600 operations/cycles to process a 32-bit word.

It's kind of slow in my opinion. Compared to DDR3-1333 9-9-9 memory, where the full cycle time is approximately 50 ns, but the timings are different:

Data access time is reduced to 13.5 ns (1.5 ns * 9 clock cycles);
- transmission time for a packet of eight words is 6 ns (0.75 * 8 instead of 3.75 ns) and with random access to memory, the difference in data transfer speed practically disappears;
- peak speed will be 10,664 MB/s.

It hasn't gone too far. The situation is helped a little by the presence of “banks” in the memory modules. Each "bank" is separate table memory that can be accessed separately, which makes it possible to change a line in one bank while data is being read/written from a line in another; by reducing downtime, it allows you to “clog” the data exchange bus to capacity in optimized situations.

Actually, this is where the ridiculous ideas came from.

The memory table contains a specified number of columns, equal to 512, 1024, 2048 bits. Taking into account the row activation cycle time of 50 ns, we get a potential data transfer speed: “1/0.00000005 s * 512 columns * 64 bit word = 81,920 MB/s” instead of the current 17,064 MB/s (163,840 and 327,680 MB/s for 1024 and 2048 column rows). You will say: “only 5 (4.8) times faster,” to which I will answer: “this is the exchange speed when all competing requests are addressed to one memory bank, and the available bandwidth increases in proportion to the number of banks, and the increase in the row length of each table (will require an increase in length operating line), which in turn depends mainly on the speed of the data exchange bus.”

Changing the data exchange mode will require transferring the entire contents of the line to the cache lower level, for which it is necessary to divide the cache levels not only by operating speed, but also by the size of the cache line. So, for example, by implementing the “length” of the N-level cache line to (512 columns * 64 word size) 32,768 bits, we can, by reducing the number of comparison operations, increase the total number of cache lines and, accordingly, increase its maximum volume. But if we make a parallel bus in a cache of this size, we can get a reduction in the operating frequency, from which we can apply another approach to organizing the cache, if we break the specified “Jumbo” cache line into blocks along the length of the upper cache line and exchange with small portions, this will allow you to maintain the frequency of operation by dividing the access delay into stages: searching for a cache line, and selecting the desired “word” in the found line.

As for the direct exchange between the cache and main memory: it is necessary to transfer data at the rate of access to the rows of one bank, or with a certain margin for distributing requests to different banks. In addition, there is a difficulty with access time to data located in different areas lines, for serial transmission, in addition to the initial delay associated with fetching the line, there is a data transmission delay depending on the amount of data “in the packet” and the transmission speed. Even the rambus approach may not be able to cope with the increased load. The situation can be saved by switching to a serial bus (possibly differential), by further reducing the data bit depth, we can increase throughput speed channel, I to reduce the time between the transmission of the first and last bit data, apply division of line transmission into several channels. This will allow you to use a lower clock frequency of one channel.

Let's estimate the speed of such a channel:

1/0.00000005 ns = 20 MHz (line change frequency within one block)
20 MHz * 32,768 bits = 655,360 Mbps
For differential transmission with the same data bus size we get:
655,360 Mbps / 32 channels = 20,480 Mbps per channel.

This speed looks acceptable for an electrical signal (10 Gbit/s for a signal with built-in synchronization at 15 meters is available, why not 20 Gbit/s with external synchronization cannot be mastered by 1 meter), however, the necessary further increase in the transmission speed to reduce the transmission delay between the first and last bits of information may require an increase in bandwidth, with the possible integration of an optical transmission channel, but this is a question for circuit designers, I have little experience working with such frequencies.

and then Ostap suffered
Changing the concept of projecting the cache onto the main memory to using “main memory as an intermediate ultra-high-speed block storage device” will make it possible to shift the prediction of data loading from the controller circuitry to the processing algorithm (and who better knows where it will go after a while, obviously not the memory controller), which in in turn, it will allow you to increase the size of the external level cache, without compromising performance.

If we go further, we can further change the concept of targeting the processor architecture from “context switching” actuator", on " work environment programs". Such a change can significantly improve code security by defining a program as a set of functions with given points entry of individual procedures, accessible region placement of data for processing, and the ability to hardware control the possibility of calling a particular function from other processes. Such a change will also make it possible to more efficiently use multi-core processors by eliminating context switching for some threads, and to use a separate thread within the available “process” environment to process events, which will allow more efficient use of 100+ core systems.

P.S.: Incidental use of registered trademarks or patents is accidental. All original ideas available for use at license agreement"anthill".

Everyone loves interesting innovations, but they do not appear as often as we would like. New type Intel® Optane™ drives are that rare case. We tell you what it is, why it appeared and why fits better Total.

A little background

Intel Corporation is known as a leading manufacturer of processors for computers and servers. However, not everyone remembers that the company's history did not begin with the legendary Intel 4004 microprocessor: the first product was memory with a capacity of only 64 bits - it could store exactly 64 values ​​(zeros or ones). This was back in 1968.

Intel 4004 microprocessor released in 1971

Currently, two types of drives are used to store data: HDD or SSD. The first ones are slow, quite noisy due to their mechanical design, but capacious, durable and very affordable. The second ones are expensive, with a much smaller capacity, absolutely silent, fast, but at the same time they have a fairly limited number of rewrite cycles.

Truly fast and capacious SSDs appeared relatively recently - in the second half of the 2000s, and became popular among a wide audience only in the early 2010s, when their prices gradually decreased. Up to this point, at least three attempts have been made to find a solution to make slow HDDs work faster.

Three attempts to speed up the HDD

ReadyDrive technology appeared in 2008. Microsoft Company offered Windows users Vista insert a regular flash drive into one of the USB ports of your computer so that the system can use it as a software cache or, more simply put, to speed up disk operations.

The solution seemed interesting, but was not widespread for several reasons: at that time flash drives were much slower, limited bandwidth was a hindrance USB ability 2.0, and using ReadyDrive in portable devices It turned out to be simply inconvenient.

A little later, a solution from Intel, Turbo Memory, created for the same purposes, appeared. A small NAND memory module (actually a very “small” SSD with a capacity of 512 MB to 2 GB) was connected to the motherboard via a mini-PCIe interface and was also used as a cache. The most popular files were transferred here.

Turbo Memory could be found more often than ReadyBoost flash drives, but the technology didn’t work either - the increase in operating speed turned out to be modest.


SSD and hybrid HDD

The third attempt to speed up HDDs came at the time of the appearance of the first ultrabooks, since the requirements for this class of devices were clearly formalized (and again by Intel). One of them was the need to score a certain number of points in the test for the speed of the built-in drive. Manufacturers had to choose very expensive ones moment SSD or popularize hybrid drives- HDD with built-in flash memory of 8, 16 or 32 GB.

The last attempt could be called successful, but over time, hybrid drives left the market. SSDs with capacities of 128 and 256 GB have fallen in price enough to replace them.

If the speed problem was solved more or less successfully, the volume problem remained. Choose today affordable laptop- real torture. You need to settle on the option with a capacious but slow HDD or buy a more expensive modification with fast SSD for 128 GB.

A similar situation occurs when assembling a budget home computer: the main part of the data here is usually stored on HDD, and the system is recorded on a budget SSD, the capacity of which, as a rule, is limited to the same 128 GB. This space is enough for the system and a couple of serious games, but no more. This is where Intel Optane comes into the picture.


The evolution of information storage methods over 25 years

What is Optane: short and clear

Optane is a family of drives built on 3D X-Point™ technology. This is not NAND, which is used in SSDs, not DRAM, which is used in RAM, but something completely different.

3D X-Point was created in close partnership by Intel and Micron. Details about the operating principles of the new type of memory are kept secret - after all, a trade secret. Enthusiasts and experts suggest that the basis lies in the phase transition method, but official confirmations Not yet.

However, all the advantages of 3D X-Point compared to other types of memory have been carefully measured and recorded. Here is a list of the most important:

On average, 3D X-Point is 1000 times faster than NAND memory in SSD drives;

Approximately 1000 times more durable than SSD drives, which eliminates the problem of limitations on the number of rewrite cycles;

Slightly slower than DRAM, on which RAM is based, but at the same time significantly cheaper;

It works with data at the level of an individual cell, unlike an SSD, which accesses entire pages when writing, and works with blocks when erasing. This increases IOPS, eliminates the need to erase old data before writing new data, and reduces access latency.

It seems like this is an ideal replacement for an SSD, but there is one caveat. Making memory based on 3D X-Point is still quite expensive. Therefore the capacity Intel devices Optane is smaller, but this does not prevent them from being used effectively now.

Optane Memory and Optane SSD

The Optane SSD server solution is a drive in the form of a PCIe 3.0 card with 375 GB of memory. The most reliable server SSDs can be completely rewritten no more than 10-17 times a day to ensure their operation for five years. Optane SSD can be rewritten 30 times a day for the same period of use.

Its second advantage is the stable speed of reading and writing data: in 99.99999% of cases the user will receive the same result, while with an SSD this figure is at the 95% level. For many mission-critical requests, this is not enough.

Considering the absence of problems with the number of rewrite cycles and very high speed, Optane SSDs can be combined into a single array with server RAM. This is an ideal option for today's demanding tasks related to machine learning, pattern recognition, neural networks, etc.

Optane Memory for desktop and laptop computers is an M.2 card with 16 or 32 GB of internal memory, designed specifically to speed up the performance of computers with HDD drives. That is, for those cases when the user cannot or does not want to buy an expensive and capacious SSD of 512 gigabytes or more.

Optane Memory works exactly according to the same principle that we wrote about at the beginning of the material: the memory acts as a cache in which files frequently used by the system are stored. Only now this cache works at fantastic speed, and it is controlled by intelligent algorithms that, in order to perform specific task often they operate not even with whole files, but with separate clusters of files.

It may seem that 16 or 32 GB is not a lot, but for caching office programs Optane Memory requires several hundred megabytes, and for demanding games A few gigabytes will be enough.

The setup process is fully automated: the user needs to update Intel driver Rapid up to version 15.5, connect Optane Memory to system board, press the “Enable” button and wait 5-7 minutes until the machine reboots, and the system will transfer everything most needed from the HDD to Optane, combining both components into single space for data storage.

To fully work with Optane Memory, it is important that motherboard was built on the 200 or 300 series of chipsets, and the processor was a seventh generation Core™ i3, i5 and i7 or older ( ). If the user has more old processor and a motherboard with an M.2 connector, then Optane Memory will be detected as a separate drive.

What is Optane Memory for and why is it better than a 128-256 GB SSD

Given the benefits, two key use cases emerge for Intel Optane Memory. First of all, this is a great solution to improve the performance of laptops. Instead of an SSD with a small capacity, you can buy a laptop with a terabyte HDD and speed it up using Optane, purchased with the money you saved.

It is likely that manufacturers of laptop computers themselves will adopt such a scheme, because Optane Memory with a capacity of 16 GB costs between 3-3.5 thousand rubles, and prices for the 32 GB version vary between 5 and 5.5 thousand. This is comparable to the cost of the most budget SSDs of 120-128 and 240-256 GB.

The second option is to use Optane Memory in budget gaming computers. Eighth generation Intel Core already fully allows you to reveal the capabilities of powerful video cards with Help Core i3-8100: it now has not two, but four cores plus a high clock speed of 3.6 GHz.

If you save on buying a capacious SSD, you can purchase a video card of a higher class, and this is in conjunction with good processor- the most important thing in a PC for gaming. At the same time, I also want fast loading, and here the thought arises of taking a budget 128 GB SSD, but Optane will successfully replace it even with 16 GB of memory.

The user does not have to think every time which two or three games should be stored on a fast drive and which ones on a slow HDD. With Optane it will have a single array and everything will work quickly.

If you compare the speed of Optane Memory even with expensive SSDs, then new Intel memory at home use will show more impressive results. And not in soulless performance tests that load disks full program, but in real scenarios.

For example, the benchmark creates for SSD test a very long queue of tasks to complete, thereby loading it to its full capacity. IN real life the user is unlikely to come up with more than three simultaneous tasks for the drive. For SSD this is significant difference: if it works at its maximum capacity with a set of 32 instructions in the queue, then when working with one instruction its performance drops significantly.

Optane Memory does not have such a feature: the drive reaches its peak performance right from 1-3 instructions. There is another side of the coin: the less memory in the SSD, the slower it works. Solid State Drives with a capacity of 0.5-1 TB can really be called high-speed, but the 128 and 256 GB versions, alas, are not. The less memory, the fewer channels available SSD controller. It's like asking two muscular people to dig a hole instead of eight.

Further - only more interesting. Versions of Optane Memory with capacities of 64 and 128 GB will appear very soon, and the Intel Rapid driver will be updated to version 16. In it experienced users will be able to independently choose what information to cache in the memory of a fast drive.

Intel also plans to release a consumer version of the Optane SSD, reducing the cost of the product by eliminating numerous server technologies, which ordinary users to nothing. And if we look further into the future, the corporation plans to release the third component in the Optane family - RAM for servers based on 3D X-Point.