Thursday, November 21, 2024

Leveraging CXL, Astera Labs drives AI acceleration and memory expansion

Astera Labs is currently offering a new cable that aims to enhance the capabilities of GPU clustering for AI workloads. By connecting multiple racks together and distributing heat output and energy usage, this new cable expands the possibilities for GPU clustering.
The Aries PCIe and Compute Express Link (CXL) Smart Cable Modules (SCMs) utilize copper cabling to extend the reach of the PCIe 5.0 signal from 3 meters to 7 meters. Astera Labs achieves this by incorporating a digital signal processor retimer into the cables. This retimer acts as a protocol-aware device that compensates for transmission impairments.
The increased length of the cable allows for better interconnectivity between GPUs and also extends the connectivity to CPUs and disaggregated memory. This enables the use of larger GPU clusters, including across different racks, and can facilitate the incorporation of more GPUs in AI infrastructure, despite the increased energy requirement.
Baron Fung, an analyst at Dell’Oro Group, suggests that the Aries SCMs take CXL connectivity beyond a single rack. This enables customers to connect multiple servers and create clusters that enhance data interconnectivity at a time when AI models are becoming more complex.
Fung further explains that the Aries SCM enables cache-coherent and phase-coherent communication between AI servers and GPUs, even beyond the rack. This level of consistency between caches in a multiprocessor environment ensures optimal signal transmission. Introducing such a product could reinvigorate the CXL market and level the playing field for GPU manufacturers competing with Nvidia.
In addition to enhancing interconnectivity, the Aries SCMs also address power usage concerns. Each new generation of GPUs requires more power, and existing racks may not be equipped to handle this increased demand. Astera Labs states that the Aries SCMs offer scaling capabilities for AI workloads, according to Fung. They enable a scale-up architecture based on the CXL standard, which serves as an alternative to Nvidia’s proprietary NVLink.
Brookwood, a research fellow at Insight 64, suggests that customers interested in alternatives to Nvidia GPUs, such as AMD’s MI series, may find the Aries SCM appealing. While it doesn’t connect competing GPUs, it allows for clustering of AMD GPUs similar to NVLink.
Furthermore, the Aries SCMs can be used to add DRAM to a system, which provides additional memory for certain applications. This is not feasible with off-the-shelf servers. The SCMs also support an array of CXL memory modules, offering an interesting application for CXL technology.
Astera Labs assures customers that despite the longer cable length, the Aries SCMs maintain the same signal integrity as shorter modules. GPU-based workloads are more sensitive to bandwidth than latency, making the extended cable length advantageous. While there may be some risk involved in the evolving external PCIe connectivity standard, Astera Labs is confident in the current capabilities of the Aries SCM.
Overall, Astera Labs’ Aries SCM expands the possibilities for larger GPU clustering in AI workloads. It enables better interconnectivity, addresses power usage concerns, and offers alternative options to Nvidia’s proprietary technology.