Saturday, November 23, 2024

HPC and Supercomputing Team Up with Enterprise AI at SC24

The Supercomputing Conference has long been a staple for high-performance computing and academia, but this year at SC24, the presence of AI began reshaping that landscape. Major IT firms like Dell, HPE, Weka, Pure Storage, and DDN showcased not only new supercomputing products but also emphasized enterprise solutions. This shift signals a movement of High-Performance Computing (HPC) from the purely academic realm into the business world. HPE and Dell particularly stood out with their focus on liquid cooling technologies, addressing the intense heat and energy demands generated by GPUs handling AI tasks.

Since its inception in 1988, the Supercomputing Conference has evolved, especially over the last decade. Matt Kimball from Moor Insights & Strategy noted that the innovations we see today are leaps ahead of what was once just a glimpse into the future. “The pace of innovation has drastically accelerated,” he said.

AI is beginning to bridge the gap between enterprise needs and supercomputing, according to Camberley Bates from The Futurum Group. Companies are starting to blend goals and processes despite traditionally held distinctions. HPE acquired Cray and designed the AMD-powered Frontier supercomputer for Oak Ridge labs, while Dell supports the Texas Advanced Computing Center’s supercomputer. While enterprise infrastructure isn’t yet the main attraction at SC, the lines between enterprise and HPC workloads are starting to blur.

Steve McDowell of NAND Research thinks we are on the brink of AI becoming fully integrated into enterprise operations. Companies will soon move past simple proof-of-concept projects. “Large language models will demand even more computing resources,” he pointed out. Although high levels of compute characterize both HPC and supercomputing, it’s uncertain how extensively these methods will influence enterprise applications.

At SC24, Dell launched the high-performance PowerEdge XE9685L and XE7740 servers, geared specifically for enterprise AI and HPC workloads. They presented their Integrated Rack Scalable System (IRSS), which features Dell Smart Cooling technology to optimize AI deployments. Dell is also backing the Nvidia GB200 Grace Blackwell NVL4 superchip in its Dell IR7000 model. Their Data Lakehouse now incorporates Apache Spark for scalable data processing.

HPE continued to push boundaries by showcasing El Capitan, a liquid-cooled supercomputer, and new Cray supercomputing blades along with its Cray storage system. HPE also rolled out new ProLiant Servers designed for enterprise AI.

DataDirect Networks introduced its fourth-generation A3I AI storage system that boasts enhanced performance and scalability. DDN teamed up with Nvidia for xAI’s Project Colossus, a supercomputer that will train Elon Musk’s AI chatbot, Grok.

Pure Storage revealed its GenAI Pod, a comprehensive solution for generative AI storage, and its FlashBlade//S500 achieved Nvidia DGX SuperPOD certification. Recently, Pure also invested in CoreWeave, a GPU cloud service provider.

Weka previewed a new high-performance storage offering tailored for enterprise AI, which combines its parallel file system with Nvidia hardware. They also launched the Weka AI RAG Reference Platform, a guide for executing retrieval-augmented generation tasks.

As for the role of storage in AI optimization, McDowell expects a growing focus in future offerings. Just a year ago, only a handful of vendors like Weka and Vast discussed storage’s potential for data and management. As the demand for AI computing increases, the storage landscape is bound to evolve.

Bates pointed out that while HPC typically involves handling large files with extensive read operations, AI requires interaction with many smaller files. This presents a shift in data requirements, although the extent of change for storage due to AI remains uncertain. Still, she noted, innovation will arise to tackle data challenges, regardless of its volume.

Liquid cooling emerged as a hot topic at SC24, with participation from 22 independent vendors alongside major players like Dell and HPE. Kimball found the diverse range of solutions impressive, but he believes we are still at the beginning of defining effective cooling methods. “It’s reminiscent of the early days of discovering fire,” he said.

The liquid cooling challenges faced in AI today are those the supercomputing community has tackled for years. Companies like Dell and HPE are now bringing those solutions into the mainstream enterprise space. “The tech community is becoming the new cool,” Dekate added.