Benchmarking BOF

Kristof Beyls

This is a summary of what was discussed at the Performance Tracking and Benchmarking Infrastructure BoF session last week at the LLVM dev meeting.

At the same time it contains a proposal on a few next steps to improve the setup and use of buildbots to track performance changes in code generated by LLVM.

The buildbots currently are very valuable in detecting correctness regressions, and getting the community to quickly rectify those regressions. However, performance regressions are hardly noted and it seems as a community, we don't really keep track of those well.

The goal for the BoF was to try and find a number of actions that could take us closer to the point where as a community, we would at least notice some of the performance regressions and take action to fix the regressions. Given that this has been discussed already quite a few times at previous BoF sessions at multiple developer meetings, we thought we should aim for a small, incremental, but sure improvement over the current status. Ideally, we should initially aim for getting to the point where at least some of the performance regressions are detected and acted upon.

We already have a central database that stores benchmarking numbers, produced for 2 boards, see perf's page. However, it seems no-one monitors the produced results, nor is it easy to derive from those numbers if a particular patch really introduced a significant regression.

At the BoF, we identified the following issues blocking us from being able to detect significant regressions more easily:

We'd appreciate any feedback on the above proposals. We're also looking for more volunteers to implement the above improvements; so if you're interested in working on some of the above, please let us know.