How we consistently deliver low latency software solutions
Our roots in trading have made us aware of the need for consistent speed. Earlier this year, we decided to look at what our experience taught us building low latency software solutions. Our commitment to providing reliable and dependable products has taught us to treat performance as a primary design driver, measure realistically and to cultivate mechanical sympathy. Read below to learn more about what we’ve learned along the way, and what mechanical sympathy is exactly.
Treat performance as a primary design driver
When we first started working on the BTS quoter, we followed common Software Engineering wisdom. We started with top-down design techniques that reused a number of pre-existing libraries and neatly separated concerns. We also leveraged classical Object Oriented principles to maximize extensibility and minimize long-term maintenance needs. This worked until we got to the point where measured end-to end performance. It was here that the fancy abstractions and helpful libraries were shown to impose unacceptable performance penalties. After numerous trials, we found that the numbers were just not there and it was next to impossible to improve them beyond a certain limit. Our conclusion was that performance could not be bolted on top of a working design. It had to drive the entire design process from the beginning.
Measure Realistically
Another lesson learned was that not all measurements are created equal. For example, in the beginning of a project, only small individual components can be measured, typically using fake data and in an isolated setting. While these micro-benchmarks are useful, particularly in fine-tuning the inner loop algorithms used in the system, they only provide limited insight into the production performance.
In the real world, caches are not always warm, interrupts happen at the most inconvenient of times and data rates are wildly variable. While we were aware of this fact and implemented end-to-end measurements as soon as we had a bare bones system, we made the initial mistake of using a low fidelity market data simulator to drive our quoter. Unfortunately, this simulator produced messaging patterns that were not realistic and which hid some design limitations. Switching to a realist market data source immediately uncovered some of these issues. The lesson was clear: test in an environment that mimics production as accurately as possible.
Cultivate Mechanical Sympathy
The term ‘mechanical sympathy’ is used to express the idea that in order to obtain the best performance out of a machine, be it a racecar or a modern CPU, one has to understand the basics of how the machine works. The complexity of modern CPUs might make this goal look untenable at first but what we found is that even a basic knowledge of some of the key mechanisms goes a long way when writing fast code. We kept in mind the cache hierarchy, the branch predictor and the hardware prefetcher, which had a profound influence in the way we wrote the latency critical parts of our system. For those more used to Object Oriented designs, the resulting code was in some ways unusual but the performance gains were substantial.
Conclusion
Developing reliable low-latency software is a demanding activity that is best pursued using specific design, coding and testing guidelines – some which go against conventional Software Engineering wisdom. We understand through our experience that performance and speed need to be considered when initially designing solutions, that realistic measurements are important and that comprehension of mechanical sympathy is critical. If you would like to know more about what we learned and how we are providing reliable products to our leading edge clients, please read our white paper titled "Lessons learned in the pursuit of low latency options quoting.”