Monday, April 7, 2025

Keysight Unveils AI-Driven Tool for Network Architecture Validation and Optimization

Keysight Technologies has just rolled out the KAI Data Centre Builder, a software suite that mimics real-world tasks to see how new algorithms, components, and protocols affect AI training performance.

With this tool, AI operators, GPU cloud providers, and infrastructure vendors can bring realistic AI workloads into their labs. This allows them to test and refine AI cluster designs, model partitioning strategies, parameters, and algorithms, ultimately aiming to improve AI workload performance.

The KAI Data Centre Builder helps AI operators accelerate model training by using various parallel processing techniques, particularly model partitioning. By aligning model partitioning with the AI cluster’s layout, they can boost training efficiency. Keysight emphasizes that during the AI cluster design stage, experimentation can answer crucial questions, especially regarding how efficiently data moves between GPUs.

Key aspects to consider include the scale-up design of GPU connections within an AI host or rack, the scale-out network setup—covering bandwidth per GPU and topology, and tuning network load balancing and congestion control. It also involves adjusting the training framework parameters.

This tool specializes in integrating large language model (LLM) and other AI training workloads within AI infrastructure design. The aim is to create better alignment between hardware design, protocols, architectures, and AI training algorithms, leading to improved system performance. KAI Data Centre Builder reproduces network communication patterns found in real AI training tasks, speeding up experimentation and helping users gain insights into performance issues that are typically hard to uncover through live training jobs.

With access to a library that includes workloads like GPT and Llama along with popular model partitioning schemas—like data parallel, fully sharded data parallel, and three-dimensional parallelism—users can explore various parameters. They can tweak partition sizes, how they’re distributed across AI infrastructure, and understand how communication within these partitions affects overall job completion times.

The emulation tool enables users to pinpoint slow collective operations, identify bottlenecks, and assess network utilization, tail latency, and congestion to better understand their influence on job completion.

Ram Periakaruppan, Keysight’s vice president and general manager of network test and security solutions, highlighted the growing complexity of AI infrastructure. He stressed the importance of validating and optimizing these systems early in the design and manufacturing process to prevent costly delays. With the KAI Data Centre Builder, Keysight aims to introduce a new level of realism for AI component and system design while optimizing workloads for peak performance.