DDN, known for high-performance computing (HPC), just secured a $300 million investment from Blackstone. They plan to leverage this funding to merge their supercomputing expertise with storage solutions tailored for artificial intelligence.
HPC and AI workloads sound similar, but they operate differently. HPC typically processes a small set of mathematical formulas to generate massive simulation data. In contrast, AI involves handling huge volumes of data to create smaller models during training or to respond to prompts during inference.
DDN’s EXAscaler arrays serve the HPC market and utilize the Lustre parallel file system, which has been around for about 20 years. An EXAscaler array consists of multiple disk drives, with one serving as an index for the others. Compute nodes access this index to read and write data, connecting through a direct network—usually Infiniband—to ensure efficient data flow with minimal latency.
Now, DDN has applied this technology to its AI400X2 arrays, specifically designed for AI workloads. These arrays share the same 2U node structure as the EXAscaler but incorporate Nvidia Ethernet SpectrumX controller cards. These cards, powered by Nvidia’s BlueField DPU, convert the advantages of Infiniband to Ethernet networks. They also support RDMA over Converged Ethernet (RoCE), allowing data to flow without packet loss directly between GPUs and storage.
The AI400X2 is optimized for fast communication with GPUs during training. However, enterprises might find it costly when storing the vast amounts of data generated from pre-trained models. For this purpose, DDN introduced its Infinia arrays in 2023, which deliver S3 object storage and allow for non-disruptive drive additions.
DDN has moved S3 storage functions to containers, including the metadata and storage servers. This lets Infinia mimic the functionality of Lustre when specific containers are activated on compute nodes. Infinia arrays can also include SpectrumX cards for improved transfer speeds.
DDN claims to understand the complexities of intense storage needs. Issues can arise when GPUs write data simultaneously and then need to read it quickly—this can create inconsistencies. While checkpointing is a common solution, it can slow down processes and doesn’t yield beneficial data. DDN says it can manage data flows effectively to minimize these delays.
DDN is already entrenched in AI, serving clients like Elon Musk’s xAI, which has rolled out a supercomputer with 100,000 H100 GPUs. How exactly Blackstone’s new investment will be utilized remains to be seen. The fund also has a seat on DDN’s board and last year supported CoreWeave, an AI-focused infrastructure provider.
Mark your calendars for February 20, as DDN plans to unveil a significant announcement, teasing it with the tagline: “We’re making AI real.”