Methods for organizing high-performance processors. Database processors. Stream processors. Neural processors. Processors with multi-valued (fuzzy) logic. TrueNorth - new generation processor

In 1965, Gordon Moore, an engineer and founder of Intel, noticed that the number of transistors in new processor models was doubling every year. Ten years later, he updated his estimate: by 1975, the number of transistors was doubling every two years.

But it is impossible to endlessly increase the number of transistors. To overcome this limitation, a fundamentally new approach is needed.

Neural networks

All modern computers use von Neumann architecture, which involves moving data in a linear sequence between the central processing unit and memory chips. This method implies strict implementation of the laid down program instructions. Building neural networks using this architecture is possible, but is associated with significant difficulties. For example, Google engineers needed 16,000 processors for this. IN in this case, an attempt was made to emulate the behavior of the human brain on a traditional architecture, rather than model the structure of the processor itself as closely as possible to the structure of the brain.

Neuromorphic chips

Just like the human brain, which solves different problems in parallel, neuromorphic processors respond to external stimuli in a similar way. In the process of admission new information neurons can change in response to changes in sounds, images, and other stimuli.

Thanks to distributed data processing technologies, problems are solved differently, and significantly fewer processors are used to work with large amounts of information. Besides, neural networks can be trained and achieve a more effective response to user actions. Despite the fact that the new neuromorphic chips are still very far from the capabilities of the brain, they are still much more productive than modern computers. Until now, neural networks were built on conventional silicon processors, but recently everything changed: the first neuromorphic processor was created in China.

Application

So, neural networks are not programmed in the usual sense of the word, they are trained, therefore, very big changes await the programming profession. Nowadays, something vaguely similar can be observed in heuristic algorithms, when the computer chooses suitable solution, although the unambiguous correctness of the chosen approach to solving the problem has not been proven.

In addition to the potential to create artificial intelligence, neuromorphic networks can be used to solve various tasks. For example, unmanned vehicles will be able to act more effectively in offline mode and make decisions in response to an ever-changing environment. Life support systems that better analyze information and issue correct recommendations, weather analysis, control traffic and much more. Neural chips will even find application in medicine; they will help doctors diagnose correct diagnosis based on an analysis of all the patient’s symptoms and the results of diagnostic studies.

Chinese Huawei company announced Kirin 970 - the first chipset to have a dedicated neural processing unit (NPU). Following the Chinese, Apple showed its A11 Bionic for iPhone models 8, 8 Plus and X. This chip, among other things, supports Neural Engine technology, which, according to company representatives, is “specially designed for machine learning" Quite recently, Qualcomm introduced its Snapdragon 845 chip, which can transfer tasks related to artificial intelligence to specific cores. There is no particular difference in the approaches of the companies. It all depends on available to developers core control levels and chip energy efficiency.

But are the new chips really significantly different from existing analogues on the market, and if so, what is their difference? The answer to this can be given by the term often found in reports on artificial intelligence - “heterogeneous computing”. It applies to processors that use specialized system features to improve performance or reduce power consumption. This approach has already been repeatedly implemented in previous generations of chips. New mobile processors they simply use this concept with some variations.

Natural development?

IN recent generations processors actively use ARM Big .Little technology. It combines slow, energy-efficient cores with faster, more efficient cores. high level energy consumption. The idea was to reduce the amount of energy to increase the autonomy of the devices. In the past year, neural chips have taken this a step further by adding a separate element for processing artificial intelligence tasks, or, in the case of , using separate low-power cores for this task.

Apple's A11 Bionic mobile processor uses Neural Engine in combination with a graphics chip to speed up Face ID, Animoji, and speed up some non-native apps. When the user runs these processes on new iPhone, the chip includes a Neural Engine to process the wearer's face or project his facial expressions onto an animated picture.

The NPU takes over the functions of scanning and translating words in images obtained from using Microsoft Translator. However, for now the program is the only third party application, working with adapted Chinese manufacturer technology. According to Huawei, new technology HiAI accelerates most elements of the chipset and is capable of performing a much wider range of tasks than other NPUs.

New Horizons

When considered separately, the technology makes it possible to carry out, with no less efficiency, directly on the device those tasks that were previously processed using third-party cloud solutions. With the help of new components, a phone equipped with such chips will be able to perform more action simultaneously. This will affect many aspects of the device’s operation, from reducing the time for translations to searching for photos using internal hashtags. Also transferring the execution of such processes directly to the smartphone instead of using cloud solutions will have a positive impact on security and privacy, reducing the chances of hackers getting hold of user data.

One more important point New chips are power consumption, because energy is a valuable resource that requires reasonable distribution, especially when it comes to repetitive tasks. Graphics chips like to use up battery reserves very quickly, so offloading their processes to a DSP might be a good solution.

In fact, mobile processors themselves cannot independently make decisions about which cores need to be used when performing certain tasks. This depends on developers and equipment manufacturers using third-party supported libraries for this. , and actively integrate solutions such as TensorFlow Lite and Facebook Caffe2. Qualcomm also supports the new Open Neural Networks Exchange (ONNX), and Apple recently added interoperability for many new machine learning models in its Core ML framework.

Alas, the new mobile processors do not yet provide any special advantages. Manufacturers are already measured by their own test results and benchmarks. But without close integration with the environment modern user In reality, these indicators have little meaning. The technology itself is at a very early stage of development, and the developers using it are still few and scattered.

In any case, every new technology is a win for the user, be it increased productivity or improved energy efficiency. Manufacturers are serious about investing time and money in the development of neural chips, which means that future mobile processors will be able to offer a much wider list of tasks that will involve artificial intelligence.

Four Russian companies teamed up to create the first domestic processor designed to radically improve the performance of computer neural networks. The chip allows you to significantly increase the speed of recognition of faces, letters, pictures, and analyze images faster and more accurately. computed tomography and other medical data, solve complex strategic problems. Experts believe that Russian developers have a real chance of making a name for themselves in the emerging global neuroprocessor market.

From pixels to neurons

Fans of computer games are familiar with this GPU(GP) - a chip for processing pictures and videos. Unlike the central processing unit (CPU), the graphics processor can perform only a small number of highly specialized computing operations, but it does it extremely quickly and efficiently. It is thanks to him that modern computer games demonstrate those realistic video graphics that so captivate lovers of electronic entertainment.

Special mathematical operations for which the GP is “tailored” turned out to be applicable for efficient mining cryptocurrency. Therefore, last year, with the rise of interest in Bitcoin, the world witnessed an absolutely unprecedented phenomenon - a global shortage of video cards.

The demand for them continues to grow this year thanks to rapid development now neural networks - computing systems, allowing, on the basis of big data, to solve such problems as facial and speech recognition, literary translation of texts, analysis of medical data - computed tomography, magnetic resonance imaging, x-rays and others.

The GPU can significantly speed up the operation of some neural network algorithms, but in this case it is not nearly as effective as in solving graphics processing problems. Therefore, now on the agenda in the global computer industry is the creation of a neural processor (NP) designed to significantly accelerate the operation of such networks. Some experimental devices of this type already exist, but the final formation of the global neuroprocessor market will take, according to experts, another four to six years. During this time, small development companies and even startups will have a chance to gain a foothold in this market.

From competition to trust

The NeuroNet industry union decided to take part in this race, combining the efforts of four small but advanced companies that are part of the National Technology Initiative (NTI) system. The created consortium will develop a national neuroprocessor capable of not only competing with Western models, but also becoming 100% domestic, “trusted”, that is, guaranteed free from undocumented features and hardware “bookmarks”. The latter is especially important for customers from the Russian military-industrial complex, where neural networks are also becoming widespread - in control systems for combat drones, in planning military operations, in high-precision guidance equipment for small arms.

According to the director of the NTI Neuronet Union, Alexander Semenov, the composition of the consortium and the start of its activities will be officially announced in February of the coming year.

Russian mathematicians and engineers developing hardware and algorithms in the field of artificial intelligence and neural networks are the best in the world, Alexander Semenov is convinced. - Now they have about four years to get ahead of their foreign colleagues and set the standards for the future market.

According to Stanislav Ashmanov, head of the laboratory of neural network technologies and computer linguistics at the Moscow Institute of Physics and Technology, there are currently about two thousand companies in the world participating in the race to create a reference neural processor.

Whoever manages to make a chip that will become an industry standard will earn money commensurate with the income of current market leaders central processing units, such as Intel or AMD, says Stanislav Ashmanov. - So far, out of these couple of thousand startups around the world, no more than five companies are closest to victory.

From hard to software

According to the expert, the race in this area is now going in two directions: firstly, the development of a server chip for powerful servers in data centers, and secondly, the creation of an economical embedded neural processor for installation on all kinds of “ smart devices»: smartphones, robots, drones, self-driving cars. The work being carried out in Russia, according to Ashmanov, has a chance to win on both fronts.

The development of domestic equipment that speeds up the calculation of neural networks is, of course, the most important, necessary project given the current conditions of the world market, - Konstantin Trushkin, deputy director, told Izvestia. general director MCST company, which produces the domestic Elbrus CPU and motherboards based on it. - Connection of universal processor cores with specialized blocks that perform calculations using neural network algorithms with high efficiency - current modern trend. But in order for such a system to be considered trusted, both the core and the neural network accelerator must be developed in Russia.

However, Konstantin Trushkin reminded, it is not enough to make the NP microcircuit itself, it is also necessary to create one that serves it software environment: operating system, development tools, libraries of neural network algorithms, neural network training environment. Only then will it be possible to talk about the existence of a full-fledged domestic hardware and software neural network platform.

IBM Corporation has overcome the next step in creating a chip for future supercomputers - a neural chip that works on the principle of human brain. The peculiarity of such a chip is that it is capable of self-learning, and also consumes hundreds of thousands of times less energy than conventional microprocessors. The new chip can already analyze visual information, which is confirmed by the test results.

Majority modern computers arranged according to the principle of von Neumann architecture. It is based on the joint storage of data and commands, while outwardly they are indistinguishable: the same information can become data, a command or an address, depending on how it is accessed. It is precisely this principle of operation of the von Neumann architecture that created its significant drawback - the so-called bottleneck (limitation bandwidth between processor and memory). The processor is constantly forced to wait for the necessary data, because program memory and data memory cannot be accessed at the same time: after all, they are stored on the same bus.

This problem was solved by the American programmer Howard Aiken, the author of Harvard architecture. It differs from the von Neumann architecture in that the data and instruction lines are physically separated, allowing the processor to simultaneously read instructions and access data, improving the speed of the computer. Despite this, at the end of the 1930s, at the competition for the development of a computer for naval artillery, announced by the US government, the von Neumann architecture won due to ease of implementation.

Later it became possible creation hybrid systems, combining the advantages of both architectures. However, with the development of programming, the minds of scientists began to be occupied by the idea of creating artificial neural systems: processors connected and interacting with each other, operating on the principle of functioning of the nerve cells of a living organism. The peculiarity of such systems is that they are not programmed, but trained.

The concept of an artificial neural network arose when studying the functioning of biological neural networks - a set of interconnected nervous system neurons that perform specific physiological functions. Each neuron is connected to a huge amount others, the place where neurons contact each other is called a synapse, which serves to transmit nerve impulses between cells.

The pioneers in the creation of artificial neural networks were Americans Warren McCulloch and Walter Pitts. In the early 1940s, scientists invented a model of the brain that simplistically viewed neurons as operating devices. binary numbers. The network of electronic “neurons” they invented could theoretically perform numerical or logical operations of any complexity. Fundamentally new theoretical foundations for such a model of the brain laid the basis for the subsequent development of neurotechnologies, and next step didn't keep me waiting.

Already in 1949, Donald Hebb proposed the first working algorithm for training artificial neural systems, and in 1958, Frank Rosenblatt created the first neurocomputer “Mark-1”. This computer was built on the basis of a perceptron, a neural network developed by Rosenblatt three years earlier.

It’s quite strange that no one wrote on Habré, but, in my opinion, a significant event happened today. IBM introduced a new, fully finished chip that implements a neural mesh. The program for its development existed for a long time and was quite successful. There was already an article on Habré about a full-scale one.

The chip has 1 million neurons and 256 million synapses. Apparently, as in the simulation, the chip has a similar architecture to the neocortex.

Why is this so freaking cool? Because all today's neural networks must perform an astronomical number of operations, especially during training. Often this comes down to performance. In real time, only simple tasks can be solved on one device. Parallelization on clusters and video cards significantly speeds up processing (due to huge computing power and high energy consumption). But it all comes down to main problem von Neumann architecture: memory is separated from processing units. In real neurons, everything is different: the memory itself performs the processing (on Habré there is a cool series of articles about).

If IBM starts producing such processors, then many video analytics tasks can be solved directly on them. The simplest thing that comes to mind is the classification of objects in a video stream (people, cars, animals). It was this task that IBM demonstrated as an example of work. In a 400*240 30fps video stream, they identified people, cyclists, cars, trucks and buses.

If everything is so cool, then robotic cars will soon not require lidars, heels of video cameras with such a chip - and forward.

By the way, if you count the performance of such a chip in teraflops, you get an astronomical number. After all, in fact, such a chip is 1 million processors, in one clock cycle each of which processes information from 256 input channels (well, approximately).

A little more information on the IBM Research website.

Z.Yu.Sorry for the article without any special details in the style of Alizar, but I was really surprised that such a significant event passed by Habr.