Why is GPU faster than CPU?

2023-11-29

GPU is faster than CPU, is it not accurate?

Simply speaking, it i not fair whether the GPU is faster or the CPU is faster. The design concepts of the two are different.

The CPU is called the "brain" of the computer. It is mainly responsible for computing processing functions. Operations such as operating system and application programs must rely on it. The CPU also determines the overall speed of the computer.

The role of GPU is more professional. It was originally designed to assist 3D rendering and can parallelize more instructions at the same time. It is very suitable for popular workloads such as animation rendering, image processing, and artificial intelligence.

Simply put, CPUs are optimized for latency, while GPUs are optimized for bandwidth. CPUs are better at handling one task at a time, while GPUs can handle multiple tasks at the same time. Just like some people are good at performing tasks one by one in sequence, some people can perform multiple tasks at the same time.

Let us explain the difference between the two in a simple way by using an analogy. The CPU is like a Ferrari, and the GPU is like a freight truck. The task of both is to transport 100 Packages from location A to location B. The CPU (Ferrari) can quickly obtain some memory data (cargo) in RAM. , while GPU (freight truck) performs slower (higher latency). But the CPU (Ferrari) can only deliver 2 Packages at a time, and it takes 50 times to complete the delivery.

However, GPUs (freight trucks) can acquire more memory data at once and transport it.

In other words, the CPU is more inclined to process small amounts of data quickly (for example, arithmetic operations: 5*6*7), and the GPU is better at processing large amounts of repeated data (for example, matrix operations: (A*B)*C). Therefore, although the CPU's single delivery time is faster, the advantages of the GPU are more significant when processing image processing, animation rendering, and deep learning that require a large number of repetitive workloads.

However, the biggest problem of GPU is the impact of latency on performance. However, for typical task scenarios of deep learning, data generally occupies large contiguous memory spaces. GPU can provide the best memory bandwidth, and the latency caused by thread parallelism is almost negligible. Will have an impact.

So what causes CPUs and GPUs to work differently? That also depends on the design structure of the two.

Why do GPUs and CPUs work differently?

1. Different architecture cores

The following two pictures can help us understand the difference in how CPU and GPU work. We mentioned above that the CPU is designed for sequential serial processing, and the GPU is designed for data parallelism. The GPU has hundreds or thousands of smaller and simpler contents, while the CPU has several A large and complex kernel.

GPU kernels are optimized to perform similar simple processing operations on multiple data elements simultaneously. Moreover, the CPU is optimized for sequential instruction processing, which also leads to differences in the core processing capabilities of the two.

There is a metaphor on the Internet used to compare the difference between GPU and CPU cores. I think it is very appropriate. The core of the CPU is like a knowledgeable professor, and the core of the GPU is more like a bunch of elementary school students who can only do simple arithmetic operations. No matter how powerful the professor is, , and cannot calculate 500 additions and subtractions in one second, so for simple repeated calculations, a single professor is no match for a large number of primary school students. When it comes to performing simple arithmetic operations, 500 primary school students (concurrently) Can easily defeat the professor.

2. Different memory architectures

In addition to computing differences, GPUs also utilize specialized high-bandwidth memory architectures to send data to all cores. Currently, GPUs typically use GDDR or HBM memory, which provide higher bandwidth than the standard DDR memory bandwidth in CPUs.

Data processed by the GPU is transferred to this dedicated memory to minimize access latency during parallel computing. The GPU's memory is segmented so that concurrent accesses from different cores can be performed for maximum throughput.

In contrast, CPU memory systems are highly optimized for low-latency access to cached data. There is less emphasis on total bandwidth, which can reduce the efficiency of data-parallel workloads.

3. Parallelism

The combination of dedicated cores and memory enables GPUs to exploit data parallelism to a greater extent than CPUs. For tasks like graphics, rendering, the same shader program can run in parallel on many vertices or pixels.

Modern GPUs contain thousands of cores, while high-end CPUs have less than 100 cores at most. With more cores, GPUs can process data across a wider range of parallelism with higher arithmetic intensity. For parallel workloads, GPU cores can achieve 100x or more higher throughput than CPUs.

In contrast, Amdahl's law means that there is a limit to the parallel acceleration a CPU can achieve for an algorithm. Even with 100 internal cores, the actual speed is limited to 10x or less due to the serial part and communications. Due to its massively parallel architecture, GPU can achieve almost perfect parallel acceleration.

4. Just-in-time (JIT) compilation

Another advantage of GPUs is just-in-time (JIT) compilation, which reduces the overhead of scheduling parallel workloads. GPU drivers and runtimes feature JIT compilation that converts high-level shader code into optimized device instructions before execution.

This provides programmers with flexibility while avoiding the traditional offline compilation step required by the CPU. JIT also supports optimization based on runtime information, and the combined effect reduces GPU overhead to almost zero.

In contrast, the CPU must stick to precompiled machine code and cannot adaptively recompile based on runtime behavior, so the CPU has higher scheduling overhead and less flexibility.

5. Programming model

Compared with CPU, GPU also provides a better parallel programming model CUDA, developers can write parallel code more quickly without having to worry about low-level threading, synchronization and communication issues.

CUDA and OpenCL provide C/C++ programming languages, where the code is focused on parallel computation across abstract threads, and the messy coordination details are handled invisibly behind the scenes.

Instead, CPU parallelism requires handling threads directly using libraries such as OpenMP. There are significant additional complexities in terms of thread management, locking, and avoiding race conditions. This makes it more difficult to think about parallelism from a high level.

6. Different response methods

The CPU basically responds in real time and has high requirements on the speed of a single task, so many layers of caching are used to ensure the speed of a single task.

GPUs often use a batch processing mechanism, that is, tasks are queued up and processed one by one.

7. Different application directions

Applications such as operating systems that CPUs are good at need to respond quickly to real-time information and need to be optimized for latency. Therefore, the number of transistors and energy consumption need to be used in control parts such as branch prediction, out-of-order execution, and low-latency cache.

GPUs are suitable for architectural operations with extremely high predictability and a large number of similar operations, as well as high latency and high throughput. It is currently widely used in three major application markets: games, virtual reality and deep learning.

Game market

Gaming is one of the earliest applications of GPU. Since GPU has natural advantages in image processing and physical effects, GPU is widely used in game engines and game rendering in game development. In games, GPU can quickly calculate a large amount of geometry, texture, light and shadow and other data to achieve more realistic picture effects.

Virtual reality market

Virtual reality technology is a technology that combines computer-generated three-dimensional images with the real world. In virtual reality applications, GPU can achieve realistic rendering of the virtual world and object motion control. With the continuous development of virtual reality technology, GPUs are increasingly used in the virtual reality market, especially in head-mounted devices and immersive experiences.

Deep Learning

Deep learning is a machine learning algorithm based on artificial neural networks. In deep learning, GPU can efficiently train neural networks and accelerate the training process through large-scale parallel computing. Currently, as the application of GPU in deep learning continues to expand, it has become the main accelerator for training deep learning models.

In addition, GPUs can also be used in fields such as autonomous driving, medical image analysis, and financial risk control. However, since different application scenarios have different requirements for GPU performance, factors such as its computing power, power consumption, and application fields need to be considered when selecting a GPU. The most appropriate GPU needs to be selected based on the type of task and optimized to take advantage of its performance.

Development of domestic GPUs

The development of domestic GPUs lagged behind domestic CPUs. It was not until April 2014 that Jingjia Micro successfully developed the first domestically produced high-performance, low-power GPU chip - JM5400.

In the development of domestic GPUs, the dependence of GPUs on CPUs and the high difficulty of GPU research and development have hindered the rapid development of the industry. First, the GPU is dependent on the CPU. The GPU structure has no controller and must be controlled by the CPU to work, otherwise the GPU cannot work alone. Therefore, it is in line with the development logic of the chip industry that domestic CPUs are one step ahead of domestic GPUs.

Furthermore, GPU technology is very difficult. Moor Head, chief analyst at Moor Insights & Strategy, once said: "Compared with CPUs, it is more difficult to develop GPUs, and there are fewer GPU designers, engineers and driver developers." The domestic talent gap is also the reason for the slow development of domestic GPUs. One of the important reasons.

At present, although China's GPU chips still occupy a small proportion of the market share, more and more domestic GPU chips are entering the market, and more and more domestic companies are transforming into the field of graphics processing, such as Innosilicon and Jingdong Technology. Jiawei and others, domestic GPU chips also have better development opportunities.

Now, with the implementation of a series of US policies, many people see the future of domestic GPU chips replacing imported chips, and will begin to support domestic GPU chip companies from multiple angles. According to the latest statistics, the three domestic GPU companies Biren Technology, Moore Thread, and Muxi have received more than 10 billion yuan in investment, which shows that they are indeed putting great efforts into technology research and development.

At present, it seems that as the United States implements more export control measures, it may create a window of opportunity for the rise of "Chinese chips", which may cause Nvidia to face greater competitive pressure in the Chinese market.

View more at EASELINK

Previous: TCL dissolves chip company, compensates N+1 Next: Apple may have given up on Touch ID technology

Back to list