On August 23, GPU giant Nvidia released its second quarter 2023 financial report, and the results far exceeded expectations. Overall, Nvidia's second quarter revenue reached $13.5 billion, an increase of 101% over the same period last year; net profit reached $6.1 billion, an increase of 843% over the same period last year. This amazing financial report released by Nvidia once caused Nvidia's stock to rise by 6% after the market closed, and even led many artificial intelligence-related technology stocks to rise after the market closed.
Nvidia's revenue surged in the second quarter, mainly due to the current boom in artificial intelligence. Since the third quarter of last year, large model technologies represented by ChatGPT have been sought after by almost all Internet companies in the world, including Google and Amazon in Silicon Valley, and giants such as Baidu, Tencent, and Alibaba in China. The training and reasoning of these large models are inseparable from artificial intelligence acceleration chips, and Nvidia's GPU is currently the preferred solution for large model training and reasoning acceleration. Since major technology giants and start-ups are purchasing Nvidia's A-series and H-series high-end GPUs on a large scale to support large model training computing power, this has also caused Nvidia's data center GPUs to be in short supply. Of course, this is reflected in the financial report as an astonishing increase in revenue and net profit.
In fact, in addition to the impressive revenue and net profit figures in Nvidia's financial report, there is another key figure worthy of our attention, which is Nvidia's data center business revenue in the second quarter. According to the financial report, Nvidia's data center business revenue in the second quarter exceeded US$10 billion, an increase of 171% over the same period last year. Nvidia's data center business figures are certainly amazing in themselves, but if we link them to other companies' related revenues in the same period and compare them, we can see the deeper meaning behind this number. Also in the second quarter of 2023, Intel's data center business revenue was US$4 billion, a decrease of 15% over the same period last year; AMD's data center business revenue was US$1.3 billion, a decrease of 11% over the same period last year. We can see from this that in terms of data center business revenue figures, Nvidia's revenue in the second quarter of 2023 has exceeded the total revenue of Intel and AMD in the same market.
Behind this comparison, it reflects the reversal of the status of artificial intelligence acceleration chips (GPUs) and general-purpose processor chips (CPUs) in the era of artificial intelligence. At present, in data centers, Nvidia is the most mainstream supplier of AI acceleration chips/GPUs, while Intel and AMD are the two major suppliers of general-purpose processor chips/CPUs. Therefore, comparing the revenue figures of Nvidia and Intel+AMD in the data center field is equivalent to comparing the shipment scale between GPUs and CPUs. Although AI has been hot since 2016, the market share growth of AI-related chips compared with general-purpose chip CPUs in data centers is not achieved overnight: before 2023, the share of data center CPUs has always been much higher than that of GPUs; even in the first quarter of 2023, Nvidia's revenue from data center business (US$4.2 billion) is still lower than the total revenue of Intel and AMD in data center business; in the second quarter, this power comparison is reversed, and the revenue of GPUs in data centers exceeds that of CPUs in one fell swoop.
This is also a historic moment. Since the PC era in the 1990s, the CPU has been the leader of Moore's Law. Its glory has continued from the personal computer era to the cloud data center era, and it has also promoted the continuous development of the semiconductor field. In 2023, with the impact of artificial intelligence on the entire high-tech industry and human society, the CPU used for general computing is giving way to GPUs used for artificial intelligence acceleration (and other related artificial intelligence acceleration chips) in the semiconductor chip field.
As we all know, the rise of CPUs is inseparable from Moore's Law for semiconductors. According to Moore's Law, the feature size of semiconductor processes evolves by one generation every 18 months, and at the same time, the performance of transistors must be greatly improved. This allowed CPUs to advance by leaps and bounds during the golden age of Moore's Law (from the 1980s to the first decade of this century): on the one hand, CPU performance was iterated every year and a half, driving the emergence of new applications, and on the other hand, the emergence of new applications further promoted the demand for CPU performance, thus forming a positive cycle between the two. This positive cycle continued until the 2010s, when it gradually disappeared as Moore's Law gradually approached the physical bottleneck - we can see that in the past 10 years, CPU performance growth has gone from a 15% compound annual growth rate in the 1980s and 1990s (i.e., performance doubled every 18 months) to a 3% compound annual growth rate after 2015 (i.e., performance took 20 years to double).
However, although the driving force of Moore's Law for the performance growth of semiconductor transistors has disappeared, the exponential performance growth predicted by Moore's Law has not disappeared, but has shifted from CPU to GPU. If we look at the performance (computing power) growth of GPUs after 2005, we will find that it has actually been following the law of exponential growth, with performance doubling every 2.2 years!
Why can GPU continue to grow exponentially as a chip? Here, we can analyze from two aspects: demand and technical support: demand means whether there are applications in the market that have a strong demand for the exponential growth of GPU performance? And technical support is whether it is technically possible to achieve exponential performance growth?
In terms of demand, there is indeed such a strong demand for artificial intelligence. We can see that from 2012 (when the resentment of neural network artificial intelligence revival began) to the present, the computing power demand of artificial intelligence models has indeed grown exponentially. 2012 to 2018 was the most popular year for convolutional neural networks. During this period, we saw that the computing power demand of artificial intelligence models increased by about 15 times every two years. At that time, GPUs were mainly responsible for model training, and the performance of GPUs in the inference part was generally more than enough. Since entering the era of large models represented by the Transformer architecture in 2018, the evolution speed of computing power demand of artificial intelligence models has increased significantly, reaching 750 times every two years. In the era of large models, even the inference of models cannot be separated from GPUs, and even a single GPU may not be able to meet the inference needs; and training requires hundreds of GPUs to be completed in a reasonable time. This growth rate of performance demand actually dwarfs the speed at which GPU performance doubles every two years. In fact, the current GPU performance improvement rate is still in short supply! Therefore, from the demand side, the exponential growth curve of GPU performance is expected to continue for a long time. In the next decade, GPUs are likely to take over the banner of Moore's Law from CPUs and continue the myth of exponential performance growth.
In addition to the demand side, in order for GPU performance to truly maintain exponential growth, there must be corresponding chip technology support behind it. We believe that in the next few years, there are three technologies that will be the key to maintaining exponential growth in GPU performance.
Although both chips are used, GPU performance can grow exponentially while CPU cannot. One of the important factors is that GPU performance growth comes not only from transistor performance improvement and circuit design improvement, but also from the idea of using domain-specific design. For example, before 2016, GPUs mainly supported 32-bit floating point numbers (fp32), which is also the default number system in the field of high-performance computing; but after the rise of artificial intelligence, research shows that artificial intelligence does not need the high precision of 32-bit floating point numbers. In fact, 16-bit floating point numbers are enough for training, and 8-bit integers or even 4-bit integers are enough for reasoning. Since the overhead of low-precision calculations is relatively small, the design idea of using domain-specific calculations and making special optimizations for such low-precision calculations can achieve a large performance improvement in the field of artificial intelligence at a relatively low cost. We can see this idea from the design of Nvidia GPUs. We have seen the efficient support of computing number systems from fp32 to fp16 to int8 and int4 in the past 10 years. It can be said that it is a low-cost and fast way to improve performance. In addition, there is support for neural networks (TensorCore), support for sparse computing, and hardware support for Transformer, etc. These are all good manifestations of domain-specific design on GPUs. In the future, a large part of the improvement in GPU performance may come from such domain-specific design. Often, the introduction of one or two dedicated acceleration modules can break the operating bottleneck of the latest artificial intelligence model and greatly improve the overall performance, thus achieving a multiplier effect.
The impact of advanced packaging technology on GPU comes from two parts: high-speed memory and higher integration. In the era of large models, as the number of model parameters increases, the impact of memory access performance on the overall performance of GPU becomes more and more important. Even if the GPU chip itself has extremely strong performance, if the memory access speed does not keep up, the overall performance will still be limited by the memory access bandwidth. In other words, it will encounter the "memory wall" problem. In order to avoid memory access limiting the overall performance, advanced packaging is essential. The current high-bandwidth memory access interface (such as the HBM memory interface that has been widely used on data center GPUs) is a standard for advanced packaging. In the future, we expect to see advanced packaging play an increasingly important role in memory interfaces, thereby promoting further improvement of GPU performance. Another aspect of advanced packaging for GPU performance improvement comes from higher integration. In the most cutting-edge semiconductor processes (such as 3nm and below), as the chip scale increases, the chip yield will encounter challenges, and GPU is expected to be the most radical chip category for chip scale improvement in the future. In this case, using chiplets to divide a large chip into multiple small chiplets and integrating them together using advanced packaging technology will be one of the important ways for GPUs to break through chip scale limitations. Currently, AMD's data center GPUs already use advanced chip packaging technology, and Nvidia is expected to introduce this technology in the near future to further improve GPU chip integration.
As mentioned earlier, the computing power requirements of large models increase by 750 times every two years, far exceeding the speed of GPU Moore's Law. In this way, if the performance of a single GPU cannot keep up with the computing power requirements of the model, it is necessary to make up for it with quantity, that is, to divide the model into multiple GPUs for distributed computing. In the next few years, we can expect to see large models use increasingly aggressive distributed computing strategies, using hundreds, thousands, or even tens of thousands of GPUs to complete training. In such large-scale distributed computing, high-speed data interconnection will become the key, otherwise the data exchange between different computing units will become the bottleneck of the overall computing. These data interconnections include close-range electrical interconnection-based SerDes technology: for example, in Nvidia's Grace Hopper Superchip, NVLINK C2C is used for data interconnection, which can provide up to 900GB/s of data interconnection bandwidth (equivalent to 7 times that of x16 PCIe Gen5). On the other hand, long-distance data interconnection based on optical interconnection will also become another core technology. When distributed computing requires the use of thousands of computing nodes, such long-distance data exchange will become common and may become one of the decisive factors in system performance.
We believe that in the era of artificial intelligence, GPU will further continue the story of Moore's Law and continue the exponential development of performance. In order to meet the strong performance requirements of artificial intelligence models, GPUs will use core technologies such as domain-specific design, advanced packaging and high-speed data interconnection to maintain rapid performance improvement, and GPUs and their artificial intelligence acceleration chips will also become the main driving force for technological and market progress in the semiconductor field.
View more at EASELINK
2023-11-13
2023-09-08
2023-10-12
2023-10-20
2023-10-13
2023-09-22
2023-10-05
2023-10-16
Please leave your message here and we will reply to you as soon as possible. Thank you for your support.
Sell us your Excess here. We buy ICs, Transistors, Diodes, Capacitors, Connectors, Military&Commercial Electronic components.
Leave Your Message