Saturday, January 18, 2025

GPU Scarcity Drives Growing Interest in GPU-as-a-Service

GPUs are now essential for managing high-end AI tasks. However, their high costs and limited availability are pushing customers to explore alternative access options.

Different companies are diving into the GPU as a Service (GPUaaS) arena. From specialized GPU service providers to public cloud companies and OEMs, they’re trying various approaches to meet customer needs. CoreWeave and Lambda Labs, prominent GPUaaS providers, are showing strong market interest. CoreWeave plans to go public soon, while Lambda Labs is introducing AI grants to broaden its offerings.

Others, like Lenovo and Rackspace, are joining the GPUaaS trend, but analysts view their services more as a way to maximize existing resources than as true GPUaaS solutions. The concept of GPUs isn’t new, but the term came from Sony in 1994 when they aimed to enhance 3D graphics for gaming. Nvidia revolutionized the field in 2006 with Cuda, a parallel programming framework that quickly became vital for AI and machine learning. Nowadays, Nvidia dominates the data center GPU market.

The launch of ChatGPT by OpenAI in 2022 ignited the generative AI boom. AI applications require vast datasets that GPUs can process efficiently thanks to their parallel processing capabilities. However, the rush for data center GPUs for AI training has caused a supply and demand imbalance, as noted by Gartner analyst Chirag Dekate. In this environment, Nvidia prioritizes supplying its major clients, primarily hyperscalers.

Regarding GPUaaS, Dekate defines it as a cloud service offering on-demand access to GPUs. Companies like Lambda Labs and CoreWeave feature strategic ties with Nvidia and boast many GPUs, mainly catering to hyperscalers and innovators like OpenAI. But these services can come with limitations. Dekate points out that these platforms provide compute power but not a comprehensive tech stack for AI workloads. It’s akin to peering through a telescope—viewing a stunning Nvidia-centered ecosystem but missing out on a broader perspective.

Hyperscalers go a step further, providing a complete technology stack along with GPU access, although users must work within that ecosystem. Companies like Lenovo and Rackspace have added GPU options to their offerings as well. Lenovo now includes GPU access in its TruScale infrastructure service, allowing metered access for AI model training. Rackspace has adopted an auction system for its limited GPU resources.

Despite this, Dekate views these offerings as more aligned with private cloud services than as distinct GPUaaS solutions. They create private cloud environments with GPU access rather than enabling broader interactions with a dedicated GPU cloud like CoreWeave offers. Nonetheless, Lenovo’s approach helps customers avoid the high costs of buying GPUs outright, appealing particularly to those who lack expertise in larger GPUaaS options.

While GPUaaS is an alternative, on-premises options are still available for those who need them, albeit at a cost, according to Russ Fellows from The Futurum Group. Retail prices for individual GPUs are steep. He advises users to assess their actual needs rather than overinvesting.

He emphasizes, “You really only need larger GPU clusters for specific occasions, like training or fine-tuning.” Companies should reflect on the intended use of GPUs and gauge the ROI and impact on their operations before deciding on purchasing or a service model.

The current GPU scarcity may create a fear of missing out, driving demand even higher. However, most companies can leverage GenAI without extensive high-end GPU setups. Smaller language models can function on regular CPUs.

There’s also a skill gap in managing GPU clusters, as maintaining them requires specialized knowledge often lacking in many businesses. GPUs can fail for various reasons, and users must be prepared to troubleshoot issues, whether simple server restarts or more complex component failures.

Looking ahead, Fellows predicts availability issues may linger for the next few years, but the intense demand for GPUs will eventually stabilize. Companies will ultimately need fewer GPUs for tasks like inference or smaller language models. Not every organization needs a vast number of GPUs—some could manage with just one or two for basic operations.

GPUaaS will still hold value, adapting to data size and task requirements. For instance, batching large datasets for training doesn’t need continuous resource availability, making GPUaaS an attractive option.