The Significance of Small Language Models in Enterprise AI Applications

Gartner’s analysis reveals that small language models (SLMs) could offer a smart, budget-friendly option for developing and deploying generative artificial intelligence (GenAI). They’re easier to fine-tune, more efficient, and simpler to control compared to their larger counterparts.

In a report titled “Explore Small Language Models for Specific AI Scenarios,” released in August 2024, Gartner dives into how our understanding of what constitutes “small” and “large” in AI language models has shifted. They mention that models like GPT-4 from OpenAI, Gemini 1.5 from Google, Llama 3.1 from Meta, and Claude 3 Opus from Anthropic boast parameter counts ranging from half a trillion to two trillion. In contrast, smaller models, such as Mistral 7B, Phi-3 mini and small from Microsoft, Llama 3.1 8B, and Gemma 2 9B from Google, have 10 billion parameters or fewer.

To illustrate the differences in computational demands, consider Llama 3 8B, which requires 27.8GB of GPU memory, versus Llama 3 70B that needs a whopping 160GB. The more memory you need, the higher the costs. For example, running the complete DeepSeek-R1 model with 670 billion parameters can set you back over $100,000.

Now, let’s talk about knowledge distillation. Larger models are constructed using a wealth of data, while SLMs typically rely on a subset. This often results in gaps in their knowledge, meaning they might not always provide the best answers.

Jarrod Vawdrey, the field chief data scientist at Domino Data Lab, highlights that SLMs can learn from larger models using knowledge distillation. This technique lets SLMs effectively absorb insights from larger counterparts. “This transfer opens up advanced language capabilities without the heavy demands of billion-parameter models,” he points out. Distilled SLMs can enhance response quality while consuming far less computational power.

Vawdrey explains that this starts with a pre-trained large language model (LLM) acting as the “teacher” and a smaller model taking on the “student” role. The smaller model is usually booted up either randomly or through some initial training.

Often, a single model—be it an LLM or an SLM—can’t meet all an organization’s needs. Many businesses want to integrate the data from their existing IT systems with AI tools. Dominik Tomicevic, CEO of Memgraph, emphasizes that context is vital. He notes that while LLMs handle general questions excellently, when it comes to specific tasks, SLMs shine. For example, they can help optimize unique processes in an organization, like determining how to mix paints or scheduling deliveries, rather than answering random trivia.

However, feeding supply chain data into these focused models can pose challenges. Tomicevic says, “Until the transformer architecture evolves, updating a language model stays complicated.” These models typically prefer to process all data in a single, massive batch, making it tough to keep them current. “The context window needs constant updates,” he adds.

To address this, many organizations find success using knowledge graphs. These graphs serve as a tutor for SLMs, improving their ability to pull relevant insights. Using retrieval-augmented generation (RAG) powered by graph technology can connect structured and unstructured data, leading to more accurate answers and better reasoning by fetching up-to-date information dynamically.

Chris Mahl, CEO of Pryon, emphasizes that SLMs can run efficiently on standard hardware while delivering targeted intelligence where it’s needed. This capability reshapes how organizations leverage AI and makes advanced technology accessible across different regions and infrastructures.

On the downside, while LLMs have immense potential, they also tend to generate errors, often referred to as hallucinations. Rami Luisto from Digital Workforce notes that SLMs can offer greater transparency regarding their operations compared to LLMs. This transparency becomes crucial when you need to ensure accuracy and trust in AI outputs.

As the industry progresses, there’s momentum toward developing lighter, domain-specific language models that can refine their responses. Anushree Verma from Gartner observed that these smaller models might evolve to work alongside broader AI systems, improving overall accuracy.

Imagine it like finding a specialist for specific advice, akin to the “phone a friend” option in game shows. Demis Hassabis, CEO of DeepMind, envisions a future where multiple AI agents collaborate to achieve goals. An SLM, fueled by knowledge from larger models and tailored for specific tasks, could effectively assist a general LLM in tackling niche, domain-focused questions.

Mindcraft

Agentforce London: Salesforce Reports 78% of UK Companies Embrace Agentic AI

WhatsApp Aims to Collaborate with Apple on Legal Challenge Against Home Office Encryption Directives

AI and the Creative Industries: A Misguided Decision by the UK Government

CityFibre Expands Business Ethernet Access Threefold

Fusion and AI: The Role of Private Sector Technology in Advancing ITER

Strengthening Retail: Strategies for UK Brands to Combat Cyber Breaches

Apple Encryption Debate: Should Law Enforcement Use Technical Capability Notices?

Sweden Receives Assistance in Strengthening Its Sovereign AI Capabilities

MPs to Explore Possibility of Government Digital Identity Program

The Significance of Small Language Models in Enterprise AI Applications

Mindcraft

Share please!