# Why do we need neural processors?

For example, recently in one of the most prestigious scientific journals Nature published a study by a group of American scientists. They created a neural network that can read the activity of the cerebral cortex and convert the received signals into speech. With an accuracy of 97 percent. In the future, this will allow deaf-mute people to “speak.”

And this is just the beginning. Now we are on the verge of a new technological revolution comparable to the discovery of electricity. And today we will explain to you why.

#### How do neural networks work?

The central processor is a very complex microchip. He knows how to carry out a bunch of different instructions and therefore copes with any tasks. But it is not suitable for working with neural networks. Why is that?

The neural network operations themselves are very simple: they consist of only two arithmetic operations: multiplication and addition.

For example, in order to recognize an image in a neural network, you need to upload two sets of data: the image itself and some coefficients that will indicate the signs that we are looking for. These ratios are called weights.

For example, weights for handwritten numbers look like this. It looks like a lot of pictures of the numbers stacked on top of each other.

And this is how a cat or dog looks like for a neural network. Artificial Intelligence clearly has its own ideas about the world.

But back to arithmetic. Multiplying these weights by the original image, we get some value. If the value is large, the neural network understands:

- Aha! It coincided. I find out, this is a cat.

And if the figure turned out to be small, then in the regions with high weight there was no necessary data.

Here's how it works. One can see how the number of neurons decreases from layer to layer. At the beginning there are as many pixels as there are in the image, and at the end there are only ten - the number of answers. With each layer, the image is simplified to the correct answer. By the way, if you run the algorithm in the reverse order, you can generate something.

Everything seems to be simple, but not quite. There are a lot of neurons and weights in neural networks. Even in a simple single-layer neural network that recognizes the numbers in 28 x 28 pixel pictures, 784 coefficients are used for each of the 10 neurons, i.e. weight, total 7840 values. And there are millions of such coefficients in deep neural networks.

#### CPU

And here is the problem: classic processors are not sharpened for such massive operations. They will simply multiply and add together and input data with coefficients. That's because processors are not designed to perform massive parallel operations.

Well, how many cores in modern processors? If you have an eight-core processor at home, consider yourself lucky. On powerful server stones, there are 64 cores, well, maybe a little more. But this does not change things at all. We need at least thousands of cores.

Where to get such a processor? At the IBM office? In the secret laboratories of the Pentagon?

#### GPU

In fact, many of you have such a processor at home. This is your graphics card.

Video cards are just geared towards simple parallel computing - rendering pixels! To display an image on a 4K monitor, you need to draw 8,294,400 pixels (3840x2160) and so 60 times per second (or 120/144, depending on the capabilities of the monitor and the wishes of the player, approx. Ed.). Nearly 500 million pixels per second!

Video cards differ in structure from the CPU. Almost all the place in the video chip is occupied by computing units, that is, small simple kernels. There are thousands of them in modern vidyuhi. For example, in the GeForce RTX2080 Ti, there are more than five thousand cores.

All this allows neural networks to spin the GPU much faster.

Performance RTX2080 Ti somewhere around 13 TFLOPS (

*), which means 13 trillion floating point operations per second. For comparison, the powerful 64-core Ryzen Threadripper 3990X produces only 3 TFLOPS, which is a processor sharpened for multitasking.*

**FLOPS**- FLoating-point Operations Per SecondTrillions of operations per second sounds impressive, but for truly advanced neural computing, it's like running FarCry on a calculator.

We recently played with a machine-learning based DAIN frame interpolation algorithm. The algorithm is very cool, but with the Geforce 1080 graphics card it took 2-3 minutes to process one frame. And we need such algorithms to work in real time, and preferably on phones.

#### TPU

That is why there are specialized neural processors. For example, the tensor processor from Google. Google made the first such chip back in 2015, and the third version was released in 2018.

The performance of the second version is 180 TFLOPS, and the third - as many as 420 TFLOPS! 420 Trillion operations per second. How did they achieve this?

Each such processor contains 10 thousand tiny computing cores, sharpened for the sole task of adding and multiplying weights. So far, it looks huge, but after 15 years it will significantly decrease in size. But this is still garbage. Such processors are clustered in 1024 pieces, without any performance drawbacks. GPUs can't do that.

Such a cluster of tensor processors of the third version can produce 430 PFLOPS (pet flops) performance. If anything, it's 430 million billion operations per second.

#### Where are we and what awaits us?

But as we said, this is only the beginning.Current neural supercomputers are like the first classic mainframes occupying entire floors in buildings.

In 2000, the first supercomputer with 1 teraflops performance occupied 150 square meters and cost $ 46 million.

After 15 years, NVIDIA with a capacity of 2? 3 teraflops, which fits in your hand, costs $ 59.

So in the next 15-20 years, the Google supercomputer will also fit in your hand. Or where will we carry processors there?

[caption id=“attachment_144271” align=“aligncenter” width=“638”] Shot from the directorial version of the movie Terminator-2 [/caption]

But while we wait for the moment, we are satisfied with the neuromodules in our smartphones - in the same Qualcomm Snapdragon, Kirin from Huawei and Apple Bionic - they are quietly doing their job.

And after a few presentations, they will begin to be measured not by gigahertz, cores and teraflops, but by something understandable to everyone - for example, recognized cats per second. Everything is better than parrots!

.

Source