In this episode, we’re diving into the intersection of storage and artificial intelligence with Jason Hardy, CTO for AI at Hitachi Vantara. He highlights how AI processing significantly impacts storage performance demands. Enterprises often find themselves toggling between training models and running inference workloads, which creates a lot of context switching.
Jason explains that AI is all about speed—it’s not just a luxury, it’s a requirement. When building large language models or conducting foundational model training, performance capabilities must be exceptionally high. That need for speed is constant, especially as enterprises ramp up their AI operations into different areas like inference and retrieval-augmented generation (RAG). However, he points out another critical factor that sometimes gets less attention: data management.
You can’t harness AI effectively without understanding the data you have. Sure, data lakes are marketed for this purpose, but sometimes they end up being just dumping grounds. It’s important to know which data is relevant to your AI goals and to manage it properly, especially to meet compliance and regulatory standards. It’s like tackling a two-headed problem—you need performance while also managing your data effectively.
Transitioning to the features enterprise data storage needs, Jason mentions that performance must not only be fast but also scalable. Model training historically demanded massive volumes and throughput, but as organizations start operationalizing AI, it brings along compliance and data visibility demands while still maintaining performance.
Current workflows are less predictable due to the end-users’ queries driving the inferencing process. This unpredictability adds complexity because businesses won’t always know what’s coming. So, storage solutions need to handle rapid back-and-forth transitions between workloads. As new data is generated from various sources, keeping this data organized and accessible becomes essential.
When we look at the differences between training and inference, things get even more complicated. The focus has been heavily on model building, but by 2025 and 2026, the emphasis will shift more towards integrating and running inference at scale. Inference operates on a random basis, fluctuating with business needs, unlike the more scheduled training sessions.
This dynamic creates a situation where multiple use cases will demand simultaneous support. We’re not just talking about loading one model; numerous models might need to be queried simultaneously across various workloads, driving up storage demands significantly. Jason highlights how trends such as agentic AI will further amplify these challenges since it automates decision-making, leading to more complex interactions that place additional strain on storage platforms.
The result? A storage system needs to support not just high performance but also rapid context switching between various models and workloads. This scenario involves high-speed checkpointing to quickly halt ongoing tasks and respond to user demands real-time. Then, after fulfilling those demands, resources have to shift back to training without missing a beat. You really start to see how unpredictable and demanding these workloads can become.