Red Hat has unveiled plans to acquire Neural Magic, the driving force behind the open-source vLLM project. This move aims to make it easier for organizations to run machine learning workloads without needing expensive GPU servers. Many businesses face hurdles in adopting artificial intelligence because of this reliance on costly hardware, which slows down AI’s transformative potential across various fields.
According to its GitHub page, vLLM serves as a “high-throughput and memory-efficient inference and serving engine for large language models.” In a blog post, Red Hat’s CEO, Matt Hicks, emphasized Neural Magic’s accomplishment in making it possible to execute machine learning algorithms on standard CPUs, removing the need for high-end GPU servers.
Hicks highlighted the founders’ mission to democratize AI, enabling anyone to harness its capabilities, regardless of their budget. Through techniques like pruning and quantization, Neural Magic optimizes machine learning models, allowing them to run efficiently without losing performance.
He also pointed out the trend toward smaller, specialized AI models that provide remarkable performance while being more resource-efficient to train and deploy. This shift offers significant benefits in customization and adaptability.
Red Hat advocates for a method called sparsification, which focuses on eliminating unnecessary connections within a model. This process cuts down both the size and computational needs of the model, all while maintaining accuracy. Afterward, quantization further reduces the model size, enabling efficient operation on devices with less memory.
Hicks remarked that the result is lower costs, quicker inference, and the ability to run AI workloads on a broader array of hardware. This acquisition aligns with IBM’s strategy to support enterprise customers in leveraging AI models effectively.
In a recent chat with Computer Weekly, Kareem Yusuf, who leads product management for IBM’s software portfolio, noted a growing demand for tools that allow customers to efficiently incorporate their data into large language models. This capability helps businesses utilize AI while ensuring data protection and control.
IBM has rolled out a project called InstructLab, designed to help users make changes to existing LLMs without needing to retrain them entirely. It’s part of a broader offering that includes IBM Granite, an AI foundation model tailored for enterprise datasets.
Dario Gil, IBM’s senior vice president and director of research, remarked that as clients strive to scale AI across diverse environments, virtualized and cloud-native LLMs built on open foundations will set industry standards. Red Hat’s expertise in open-source technology, combined with efficient models like IBM Granite and Neural Magic’s innovations, provide businesses with the flexibility and control they need to implement AI enterprise-wide.