I’m Open-Sourcing my Custom Benchmark GUI
I think one of the reasons why I was able to do good performance work over the years is that at some point I started taking benchmarking seriously enough to write my own library. I used to use Google Benchmark, which is a fine library, but at some point you realize that you need a GUI to really scale up benchmarks1. Here is a github link, and this video gives a quick intro:
The main problems it tries to address is:
- Getting good numbers by running benchmarks repeatedly, visualizing them in context, picking a single opinionated good visualization, handling noise and even adding a bit of well-justified noise, and being careful about what statistics to do on the numbers.
- Dealing with the inevitable combinatorial explosion of benchmarks when you want to try different data structures (min-max heap vs interval heap vs binary heap) with different operations (make_heap, push, pop) on different types (int vs string), different compilers, debug build vs release build, different variants of the code (e.g. trying loop unrolling), different input lengths etc. The full combinatorial explosion might be millions or billions of possible benchmarks. I want to be able to get a first impression for a subset in a few minutes. And then if I want less noisy results I can let it run overnight. And then I can try a new variation and visualize it together with the overnight results in under a minute.
- Various ergonomic issues. Making it easy to select which numbers are together on the screen. Having the numbers as a graph first, CSV second. Being robust to the code crashing halfway through a long run: Record the partial results and be able to resume the same run. Making it easy to attach a profiler to one specific benchmark that I’m interested in.
This sounds complicated, and I have to admit that this is very much an app written by a programmer for a programmer, but the whole point of a GUI is that I can make this both more powerful and easier to use at the same time. In fact I think the patterns might be more widely useful for people who do slow-running experiments of other kinds (like training a ML model).
Read the rest of this entry »