Friday, October 18, 2024

CTO Interview: Mastering Budgeting in Nanoseconds

Andrew Phillips has been with the brokerage firm LMAX for 17 years and currently serves as the company’s Chief Technology Officer (CTO). His role demands precision measured in milliseconds; while LMAX guarantees an end-to-end latency of 50 milliseconds for high-frequency trades, each application is strictly limited to an execution time of just eight nanoseconds. In such a time-sensitive environment, even minor delays can significantly impact performance. “Our clients are major trading firms who demand quick and reliable execution,” Phillips explains. “We prefer to test various Twinax cables and utilize direct connections instead of optical fiber, as there is a slight but measurable delay when converting electrical signals to light and back again.”

### A Fresh Take on Software Performance

When LMAX started developing its platform in 2010, the prevalent practice was to adopt a waterfall methodology, primarily using C++ for software development. Today, however, the LMAX exchange operates using an agile approach and is programmed in Java. “At that time, using agile techniques and Java, with its extensive testing ecosystem, was quite unconventional,” Phillips notes.

In contrast to C++, which compiles application code into machine code that runs directly on the processor, Java employs runtime compilation, adjusting the code dynamically as the program executes. Additionally, Java features automatic memory management known as “garbage collection,” which can affect performance. “We often face tough questions from potential clients about latency spikes related to garbage collection,” Phillips shares. “Initially, we utilized the standard JDK back in 2013, but we quickly realized that Java was originally designed for set-top boxes, thriving with around 4GB of memory.”

According to Phillips, exceeding 4GB leads to unpredictable garbage collection times from a latency standpoint. “We aimed to retain the expressiveness and speed of Java, along with its strong testing ecosystem, which has been crucial to our success.” To address the memory constraints of standard Java, LMAX turned to the Java platform provided by Azul Systems. “Back then, it was critical to avoid garbage collection entirely on a server with 64GB of memory,” he adds.

Azul’s approach involves triggering garbage collection only as a last resort, which has accelerated LMAX’s latency reduction from milliseconds to their current target of 50 microseconds. Within that brief 50-microsecond span, a lot transpires. “That 50 microseconds represents the complete process—from order submission at the network’s edge to order matching, processing, and sending acknowledgment,” Phillips explains.

“Short of being a professional compiler developer, I challenge even the best programmers to optimize as effectively as a compiler does,” he says.

### The Power of Java at Low Latencies

Within the 50-microsecond timeframe, the Java code is afforded only eight nanoseconds to execute, as most latency arises from the transaction’s journey across network infrastructure to the server. Phillips is confident that Java can better optimize code than manual coding efforts for high performance. “My background is in C, C++, and Fortran, where you often resort to assembly language for speed advantages. It’s somewhat counterintuitive with Java,” he admits.

Modern microprocessors are extremely complex, meaning that a developer writing in C or C++ optimizes only for the specific processor architecture intended in the compiler settings. In contrast, “Running Java allows for optimization based on the actual processor it’s utilizing,” Phillips notes. “That is highly beneficial. Unless one is a compiler expert, I doubt even the most skilled programmers can outperform a compiler’s optimization capabilities.”

Typically, a C++ programmer would compile their application using a specific machine as the target, effectively fixing the optimization for that architecture. But, as Phillips explains, development and testing environments often lag behind production servers in processor generations. This mismatch can lead to suboptimal performance. In contrast, Java dynamically optimizes code at runtime, leveraging any acceleration features available on the hardware it runs. “Initially skeptical, I became a believer when I engaged in a coding competition against a proficient Java developer while I used C and assembly language; I couldn’t match the speed of the Java application,” he recalls.

When asked about his biggest challenge, Phillips cites: “The major hurdle in Java for us is efficiently accessing large amounts of memory with minimal deterministic latency. This remains a central engineering challenge.”

### Exploring Low Latency Innovations

Looking ahead, Phillips is excited by the potential of CXL technology to reduce hardware latency. Compute Express Link (CXL) allows for direct memory connectivity across different hardware. “CXL holds immense promise for revolutionizing our operations, as it merges the memory, peripheral, and network layers,” he explains.

Nevertheless, CXL has yet to achieve widespread adoption, likening its delayed rollout to the long-promised potential of fusion power: “It’s always a decade away. The idea of executing remote procedure calls over a CXL architecture is compelling.”

For Phillips, CXL presents a means to reduce the overhead associated with traditional networking protocols, which were designed in the 1960s and 70s, when dial-up connections were cutting-edge. “While significant engineering advancements have evolved these protocols to what we see today, like 25 gigabit Ethernet, eliminating IP encapsulation overhead could greatly expedite processes,” he states.

This ongoing exploration of new possibilities enables LMAX to process trading transactions with minimal latency, providing a buffer and accommodating unexpectedly high throughput. Reflecting on the recent volatility in the cryptocurrency market, which led to substantial surges in trading volume, Phillips proudly states, “We didn’t experience downtime. In fact, we encountered a notable spike in activity as traders transferred risks among themselves on our exchange.” Despite the increased trading volume, LMAX maintained a strong performance capacity, with ample headroom for growth.