Benchmarking tips

Introduction

For benchmarking a patch we want to reduce all possible sources of noise as much as possible. How to do that is very OS dependent.

Note that low noise is required, but not sufficient. It does not exclude measurement bias. See “Producing Wrong Data Without Doing Anything Obviously Wrong!” by Mytkowicz, Diwan, Hauswith and Sweeney (ASPLOS 2009) for example.

General

  • Use a high resolution timer, e.g. perf under linux.

  • Run the benchmark multiple times to be able to recognize noise.

  • Disable as many processes or services as possible on the target system.

  • Disable frequency scaling, turbo boost and address space randomization (see OS specific section).

  • Static link if the OS supports it. That avoids any variation that might be introduced by loading dynamic libraries. This can be done by passing -DLLVM_BUILD_STATIC=ON to cmake.

  • Try to avoid storage. On some systems you can use tmpfs. Putting the program, inputs and outputs on tmpfs avoids touching a real storage system, which can have a pretty big variability.

    To mount it (on linux and freebsd at least):

    mount -t tmpfs -o size=<XX>g none dir_to_mount
    

Linux

  • Disable address space randomization:

    echo 0 > /proc/sys/kernel/randomize_va_space
    
  • Set scaling_governor to performance:

    for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    do
      echo performance > $i
    done
    
  • Use https://github.com/lpechacek/cpuset to reserve cpus for just the program you are benchmarking. If using perf, leave at least 2 cores so that perf runs in one and your program in another:

    cset shield -c N1,N2 -k on
    

    This will move all threads out of N1 and N2. The -k on means that even kernel threads are moved out.

  • Disable the SMT pair of the cpus you will use for the benchmark. The pair of cpu N can be found in /sys/devices/system/cpu/cpuN/topology/thread_siblings_list and disabled with:

    echo 0 > /sys/devices/system/cpu/cpuX/online
    
  • Run the program with:

    cset shield --exec -- perf stat -r 10 <cmd>
    

    This will run the command after -- in the isolated cpus. The particular perf command runs the <cmd> 10 times and reports statistics.

With these in place you can expect perf variations of less than 0.1%.

Linux Intel

  • Disable turbo mode:

    echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo