Alibaba Cloud 2022 “Cloud Procurement Season”” Hundreds of cloud products at least 0.24% off
Apple Online Store (China)
In March 2022, Apple once again touched the rules of the game in the chip world. Apple released the M1 Ultra chip, the company’s most powerful chip to date, but it’s a “package”. Although many computing chips have adopted Chiplet technology to improve performance, the performance of the “packaged” M1 Ultra has shocked the PC world.
The M1 Ultra supports up to 128GB of high-bandwidth, low-latency unified memory, supports 20 CPU cores, 64 GPU cores, and a 32-core neural network engine, and can run up to 22 trillion operations per second, providing the GPU performance of the Apple M1 chip 8x faster, delivering 90% more GPU performance than the latest 16-core PC desktops.
Apple’s new M1 Ultra chip “packed” performance is possible thanks to its UltraFusion architecture. In fact, the UltraFusion feature was already built into the previously released Apple M1 Max chip, but it wasn’t explicitly mentioned until Apple’s Peek Performance event in March.
▲UltraFusion architecture of Apple M1 Ultra
The UltraFusion architecture of the M1 Ultra chip uses Silicon Interposers and Micro-Bumps to connect the chip to more than 10,000 signals.
This technology provides ultra-high inter-processor bandwidth of 2.5TB/s, as well as low latency. This performance is more than 4 times the bandwidth of other multi-chip interconnect technologies. This rate bandwidth is also significantly ahead of the current performance of the Universal Chip Interconnect Alliance (UCIe), a group of industry giants such as Intel, AMD, Arm, TSMC, and Samsung.
▲ UCIe promoted by giants such as Intel
According to the published patents and papers of Apple and TSMC, we analyze the UltraFusion packaging architecture from the 2.5D/3D interconnect and technical level.
01. Chip packaging moves towards 2.5D/3D interconnection
As described by Moore’s Law, the number of transistors on a chip doubles every 24 months. This is still true for CPUs, GPUs, FPGAs and DSAs.
▲The number of chip transistors is gradually increasing (YH Chen et al., 2020)
With the exponential growth of chip computing power, the size of the chip gradually exceeds the size of the lithography reticle. System on Package (SoP), especially the Chiplet technology, has become an effective way to maintain Moore’s Law and surpass the limit of the reticle. (YH Chen et al., 2020)
Moore’s Law has evolved from individual transistor scaling (Moore’s Law 1.0) to system-level scaling (dubbed Moore’s Law 2.0 by the industry) through rapidly evolving inter-chip interconnect and packaging technologies.
▲Inter-chip interconnect technology is developing rapidly year by year (YH Chen et al., 2020)
Packaging has gradually evolved from 2D (two-dimensional) to 2.5D and 3D. Integrated circuits improve the overall performance from two ways of expanding area and three-dimensional development.
▲The packaging has gradually developed from 2D (two-dimensional) to 2.5D and 3D (Kuo-Chung Yee et al., 2020)
02. Analysis of UltraFusion architecture from Apple TSMC patent papers
Judging from the UltraFusion diagram released by M1 Ultra, as well as the published patents and papers of Apple and its foundry (TSMC), UltraFusion should be an interconnect architecture based on TSMC’s fifth-generation CoWoS Chiplet technology.
▲ Apple’s Chiplet patent and M1 Ultra (refer to patent US 20220013504A1)
Chip-on-Wafer-on-Substrate with Si interposer (CoWoS-S) is a TSV-based multi-chip integration technology that is widely used in the fields of high-performance computing (HPC) and artificial intelligence (AI) accelerators.
With the advancement of CoWoS, the manufacturable interposer area has steadily increased, from one full reticle size (about 830mm2) to two reticle sizes (about 1700mm2). The area of the interposer determines the maximum packaged chip area.
The 5th generation CoWoS-S (CoWoS-S5) reaches levels as large as three full-mask sizes (~2500mm2). Through a two-way lithography splicing method, the silicon interposer of this technology can accommodate multiple logic dies of 1200mm2 and eight HBM (high bandwidth memory) stacks. The connection between the core and the silicon interposer is face-to-face (Face to Face, butt interconnection layer and interconnection layer).
▲The total chip area that CoWoS technology can carry is gradually increasing (PK Huang 2021)
In UltraFusion technology, by using die stitching (Die Stitching) technology, 4 masks can be spliced to expand the area of the interposer. In this method, 4 masks are exposed simultaneously and four stitched “edges” are generated in a single chip.
▲UltraFusion architecture interconnection technology (single-layer and multi-layer, refer to patent US 20220013504A1/US 20210217702A1)
According to Apple’s patent, in this technology, the interconnection between chips can be a single layer of metal or multiple layers of metal. (US 20220013504A1/US 20210217702A1)
03. Special optimization of six major technologies
UltraFusion is more than a simple physical connection structure. In this package architecture, there are several specially optimized technologies. (PK Huang 2021)
1) Low RC interconnect
In UltraFusion, there are new low RC (capacitance x resistance = propagation delay) metal layers to provide better inter-chip signal integrity at the millimeter interconnect scale.
Compared to other packaging solutions such as multi-chip modules (MCMs), UltraFusion’s interposers provide dense and short metal interconnects between logic dies or between logic dies and memory stacks. Better inter-chip integrity, lower power consumption, and the ability to run at higher clock rates. This new interposer interconnect scheme reduces trace resistance and via resistance by more than 50%.
▲Interconnect power consumption control for transmission across interposers (US 20210217702A1)
2) Interconnect power consumption control
Apple’s patent shows that UltraFusion uses a closeable buffer (Buffer) to control the power consumption of the interconnect buffer, effectively reducing the power consumption of the suspended interconnect line.
3) Optimize TSV
High aspect ratio through silicon vias (TSVs) are another very critical part of silicon interposer technology. UltraFusion/CoWoS-S5 redesigns TSV and optimizes transmission characteristics to suit high-speed SerDes transmission.
4) Capacitors integrated in the interposer (iCAP)
UltraFusion integrates deep trench capacitors (iCaps) in the interposer to help improve the power integrity of the chip. The capacitance density integrated in the interposer exceeds 300nF/mm2, helping each die and signal interconnection to enjoy a more stable power supply.
5) New thermal interface materials
UltraFusion uses a new type of non-gel thermal interface material (TIM) integrated in CoWoS-S5, with thermal conductivity >20W/K and 100% coverage, providing better heat dissipation support for each high computing power core, thus Enhances overall cooling.
▲Improve yield and reduce cost through Die-Stitching (US 20220013504A1)
6) Effectively improve packaging yield and reduce costs through Die-Stitching technology
In UltraFusion, only KGD (Known Good Die) is used for bonding, which avoids the problem of failed chips being packaged in traditional WoW (Wafer on Wafer) or CoW (Chip on Wafer), thereby improving the yield after packaging , reducing the overall average cost. (The fewer bad chips, the lower the average cost of a single chip under the premise of fixed tape-out and R&D costs)
04. Conclusion: Provide imagination space for stronger computing chips
In this article, we start from the patents and papers of Apple and TSMC, and make a preliminary analysis of UltraFusion technology.
UltraFusion fully combines packaging interconnection technology, semiconductor manufacturing and circuit design technology, providing a huge imagination space for integrating computing power chips with larger area and higher performance, and providing a very good help and reference for the development of computing architecture.