This podcast discusses the impact of AI on data storage and the challenges that come with it. Shawn Rosemarin, VP for R&D in customer engineering at Pure Storage, talks about how AI transforms enterprise data into valuable insights for businesses. He discusses the complexity of AI operations, the need for data portability, quick storage access, and the ability to scale capacity in the cloud. Rosemarin also addresses the specific types of data found in AI, such as vectors and checkpoints, and the importance of having efficient and manageable data storage infrastructure.
In terms of AI workloads, Rosemarin explains that AI differs from previous analytics iterations because it focuses on specific datasets within each enterprise. He emphasizes the volume of data involved and the performance requirements for effective learning. He also highlights the need for integrating data sources across different organizational silos and the shortage of skilled staff in this complex technology.
Regarding storage requirements for AI workloads, Rosemarin mentions the shift from hard drives to all-flash storage due to their reliability, performance, and environmental benefits. He emphasizes the need for a central storage architecture that supports information gathering, training, and interpretation of models. Performance to meet GPU demands, low latency for quick answers, scalability, and non-disruptive upgrades are essential. He also mentions the importance of easily extending storage to the cloud for training and consumption.
When it comes to the ways data is held for AI, Rosemarin explains that the use of vectors, checkpointing, and AI frameworks like TensorFlow and PyTorch affect storage needs. He compares GPUs to expensive PhD students and emphasizes the need to provide them with continuous work and collect their completed tasks. This results in higher write ratios, smaller writes, and a different performance profile than traditional storage requirements.
Looking ahead, Rosemarin envisions storage as a critical component in driving the value of AI projects. He expects denser storage arrays, with 300TB drives and lower energy consumption. He also predicts the continued decline in the cost per gigabyte or terabyte of storage, allowing for more data utilization. Finally, he mentions the shift towards autonomous storage to reduce the manual effort involved in storage operations, allowing enterprises to focus on building future systems.
Overall, the podcast explores the impact of AI on data storage and the evolving storage requirements to support AI workloads effectively.