"We need a huge amount of HBM (high bandwidth memory) and are currently negotiating with Samsung, SK Hynix and Micron. We have already received products from these three companies." This was said by Nvidia CEO Huang Renxun. Relying on GPU, Nvidia is booming, which has attracted the world's top three memory chip manufacturers to offer olive branches.
Huang Renxun shows Blackwell GPU prototype
At present, the wave of generative AI is surging, allowing GPUs to show their strengths and driving a wave of innovation in the entire semiconductor industry chain: it has become the "life-saving straw" of the storage market, enabling it to turn losses into profits within a quarter; the deep integration of CPU and GPU has become the killer weapon for current products to break through the performance limits. NVIDIA, AMD, and Intel, the three leading companies, are competing to release collaborative solutions to seize the market; the continuous development of GPUs has also activated the innovative vitality of semiconductor equipment, heat dissipation technology, and back-end packaging technology. New technologies continue to emerge.
Since 2021, the memory chip industry has entered a nearly two-year downward cycle, which has also led to a continuous decline in the profits of memory chip manufacturers, and even fell into a loss-making state.
For example, in 2023, Samsung's comprehensive operating profit was 6.6 trillion won, down 84.86% from the same period last year. SK Hynix has a cumulative operating loss of 7.7303 trillion won and a net loss of 9.1375 trillion won. It was not until the fourth quarter of 2023 that the two major memory manufacturers gradually recovered their profitability. The key point of this turnaround is the explosive growth of GPUs.
When GPUs process large amounts of data, especially in high-performance computing, artificial intelligence, and graphics processing, they have extremely high demands for storage bandwidth and capacity.
The high bandwidth, low power consumption, and low latency characteristics of GDDR (a type of video memory used for graphics processors and high-performance computing modules) and HBM are exactly what GPUs need most. Major memory chip companies have begun to study these two memory chip categories.
SK Hynix's new DRAM product HBM3E
Wang Xiaolong, director of the enterprise service department of Strategy Analytics, said in an interview with China Electronics News that in order to meet the GPU's demand for high bandwidth, HBM technology significantly improves memory bandwidth by stacking memory chips and connecting directly to the GPU using a silicon interposer. As GPU demand grows, iterative versions such as HBM2, HBM2E and even the latest HBM3 will continue to be launched, further improving bandwidth and capacity while reducing power consumption.
Guided by the demand in the GPU market, major memory chip companies have received a lot of GDDR and HBM orders. SK Hynix recently stated that based on its production capacity by the end of this year, it has completed the allocation of HBM memory production capacity in 2025. Samsung is not to be outdone, saying that its own HBM orders have also been sold out, and it is estimated that there will be no oversupply of HBM memory next year. Micron also stated that it has basically completed the HBM memory supply negotiations for 2025, and it is expected that HBM memory will bring hundreds of millions of dollars in revenue in the current fiscal year ending in September 2024, and sales of related businesses are expected to increase to billions of dollars in fiscal 2025.
In terms of production capacity, HBM manufacturers SK Hynix plans to significantly increase the 1bnm process DRAM memory capacity to meet the demand for HBM3E memory. The goal is to increase the 1bnm memory wafer production to 90,000 pieces by the end of this year, and further increase it to 140,000 to 150,000 pieces in the first half of next year. To this end, SK Hynix plans to upgrade its M16 memory wafer fab in Icheon City, Gyeonggi Province to 1bnm process. Samsung expects that all existing facilities will be fully used by the end of 2024. The new P4L plant is scheduled to be completed in 2025, and the No. 15 production line plant will transition from 1Y nanometer process to 1bnm and above.
In terms of next-generation technology, SK Hynix plans to speed up the supply cycle of new HBM products from 2 years to 1 year. In addition, it is planned to complete the technical development and mass production of HBM4 (6th generation) and HBM4E (7th generation) in 2025 and 2026. Samsung also said that HBM4 memory is planned to be developed next year and mass produced in 2026. Micron has already started sampling 12-layer stacked HBM3E memory, which is expected to become an important driver of performance in 2025.
According to Mordor Intelligence, the HBM market size is expected to surge from approximately US$2.52 billion to US$7.95 billion from 2024 to 2029, with a compound annual growth rate of 25.86% during the forecast period.
Chi Xiannian, a semiconductor industry expert, said: "Not only GDDR and HBM, in order to cope with the continued surge in GPU storage demand, major companies are also exploring new storage media. For example, the industry has begun to explore new non-volatile storage technologies such as 3D XPoint, ReRAM (resistive random access memory), and PCM (phase change memory), which are expected to provide performance close to DRAM while maintaining data persistence, suitable for fast storage and exchange of data in GPU-intensive applications."
At the recent Taipei Computer Show, every sentence in the speeches of the CEOs of companies such as NVIDIA, AMD, and Intel was inseparable from GPU. Amidst the "open and secret competition", they released the latest solutions for the collaboration between CPU and GPU. The performance increases were more amazing than each other, which shows the role of GPU in improving the CPU.
The CPU is the central processing unit, responsible for program control, sequential execution and other operations, and is the final execution unit for information processing and program operation. The GPU is a graphics processor. After joining the system, the GPU can work together under the control of the CPU and share some of the work originally done by the CPU, especially in areas that require processing large amounts of data, such as graphics rendering, 3D graphics acceleration, and large-scale parallel computing. This allows the CPU to use more resources to perform other tasks, improving the overall performance of the system. Therefore, how to promote further collaboration between the CPU and GPU and improve the overall performance and efficiency of the system has become a topic of focus for major CPU companies.
To this end, the first solution that leading companies such as NVIDIA, AMD, and Intel thought of was to develop a heterogeneous computing platform for CPUs and GPUs. By developing high-speed interconnect technologies such as NVLink, CCIX, CXL, and Gen-Z, the data transmission speed and efficiency between the CPU and GPU are enhanced, allowing the two to work more closely and efficiently together.
Huang Renxun shows the acceleration effect of adding CPU and GPU
For example, Huang Renxun proposed that NVIDIA will launch the latest Vera CPU and Rubin GPU in 2026, and form the Vera Rubin superchip, which is expected to replace the existing Grace Hopper superchip. In addition, the Rubin platform will also be equipped with a new generation of NVLink 6 Switch, providing a connection speed of up to 3600 GB/s, and a CX9 SuperNIC component of up to 1600 GB/s to ensure the efficiency of data transmission.
In addition to establishing a heterogeneous computing platform, semiconductor industry expert Chi Xiannian said that software and programming models should also be optimized. In order to solve the communication bottleneck between CPU and GPU, companies have invested resources to develop new programming models and libraries, such as CUDA, OpenCL, DirectX, Vulkan, oneAPI, etc., so that developers can more easily write parallel programs across CPU and GPU and make full use of the computing advantages of both. In some application scenarios, companies can also integrate specific hardware accelerators (such as AI accelerators, network accelerators) to work with CPU and GPU to achieve extreme acceleration of specific tasks and meet specific needs in cloud computing, edge computing, data centers and other fields.
In terms of semiconductor equipment, NVIDIA previously released a new lithography technology, cuLitho, which can make computational lithography "smarter". Previously, computational lithography relied on CPU server clusters, but now, cuLitho can achieve the same workload as 40,000 CPU computing servers on 500 DGX H100s (including 4,000 Hopper GPUs), but 40 times faster and 9 times lower in power consumption. It can be seen that after GPU acceleration, the computational lithography work time for producing photomasks can be reduced from two weeks to eight hours. TSMC can use cuLitho acceleration on 500 DGX H100 systems to reduce power from 35MW to 5MW, thereby replacing 40,000 CPU servers used for computational lithography. Huang Renxun said that NVIDIA will continue to work with TSMC, ASML and Synopsys to advance advanced processes to 2 nanometers and higher precision processes.
In terms of heat dissipation technology innovation, NVIDIA decided to use a liquid cooling solution on the newly released GPU product B100. Huang Renxun once publicly stated that liquid cooling technology is the direction of future heat dissipation technology and is expected to lead the entire heat dissipation market to a comprehensive innovation.
NVIDIA's liquid cooling technology solution
Compared with traditional air cooling technology, liquid cooling technology has higher cooling efficiency, lower energy consumption, and lower noise. As AI computing power and power consumption continue to increase, when the power of a single high-computing chip reaches 1000W, the existing cooling technology will face revolutionary changes, and liquid cooling solutions will almost become a must.
Minsheng Securities said that the rapid development of the AI industry has driven the penetration rate of liquid-cooled servers to gradually increase. From the development trend, it is expected that the penetration rate of liquid-cooled servers will remain at about 20%-30% by 2025.
Chi Xiannian said that packaging technology can also improve the performance of GPUs. For example, through the application of flip chip packaging (FCBGA), the heat dissipation level of key components such as CPU and GPU can be improved, and the signal transmission speed and electrical performance can be improved. Fan-out wafer-level packaging (FOWLP) can accommodate more storage chips in the same package size, increase bandwidth, and reduce the actual size of the GPU or free up space for other components, which is crucial to improving the integration and performance of GPUs. CoWoS-L packaging technology allows multiple chips (such as GPU, HBM, etc.) to be integrated in a single package, and high-speed interconnection is achieved through silicon interposers, which not only improves performance but also optimizes the heat dissipation path. At the same time, GPU manufacturers are exploring 3D packaging technology to build systems by stacking multiple chips or chiplets, which can not only increase functions, but also reduce power consumption and improve heat dissipation efficiency by shortening signal paths.
It has to be said that driven by AI, the rapid development of GPUs has also become one of the important driving forces of the semiconductor industry.
View more at EASELINK
2023-11-13
2023-09-08
2023-10-12
2023-10-20
2023-10-13
2023-09-22
2023-10-16
2023-10-05
Please leave your message here and we will reply to you as soon as possible. Thank you for your support.
Sell us your Excess here. We buy ICs, Transistors, Diodes, Capacitors, Connectors, Military&Commercial Electronic components.
Leave Your Message