34216 – Benchmarks we might add to the test suite

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 34216 - Benchmarks we might add to the test suite

Summary: Benchmarks we might add to the test suite

Status:	NEW

Alias:	None

Product:	Test Suite
Classification:	Unclassified
Component:	Nightly Tester (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:	beginner

Depends on:
Blocks:

Reported:	2017-08-16 17:43 PDT by Hal Finkel
Modified:	2018-11-20 16:36 PST (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hal Finkel 2017-08-16 17:43:51 PDT

I'm collecting here links to some open-source benchmarks, or benchmark suites, that might add value to our test suite.

https://github.com/flwende/simd_benchmarks

https://github.com/tbepler/PWM-benchmarking

https://github.com/pamela-project/slambench

https://github.com/stream-benchmarking/firehose (http://firehose.sandia.gov/)

https://github.com/hiraditya/std-benchmark

https://openbenchmarking.org/suite/pts/cpu

http://impact.crhc.illinois.edu/parboil/parboil.aspx

https://github.com/breagen/MachSuite/ (https://breagen.github.io/MachSuite/)

http://lava.cs.virginia.edu/Rodinia/download_links.htm

https://bitbucket.org/eschnett/vecmathlib/wiki/Home (the library has a good regression test / benchmark)

Comment 1 Hal Finkel 2017-08-16 18:19:21 PDT

http://parsec.cs.princeton.edu/

https://github.com/graph500/graph500/tree/v2-spec (the current reference implementation requires MPI, but the v2 branch still has serial and OpenMP versions (etc.))

https://github.com/benchmark-subsetting/NPB3.0-omp-C

http://www.highproductivity.org/SSCABmks.htm (this web site does not exist any more, but there seems to be a copy of some of the benchmarks https://github.com/gtcasl/hpc-benchmarks/tree/master/SSCA2v2.2)

Comment 2 Hal Finkel 2017-09-08 11:25:34 PDT

https://github.com/kokkos/kokkos-kernels/tree/master/perf_test
https://github.com/kokkos/kokkos/tree/master/benchmarks

Comment 3 Michael Kruse 2018-10-23 12:49:33 PDT

A proposal list has been added (docs/Proposals/TestSuite.rst) in r345074

Comment 4 Simon Pilgrim 2018-10-23 14:04:18 PDT

Roman has been using his image processing suite as part of the AMD Piledriver (BDVER2) scheduler model development, he might be able to advise if this is something that would work well as a test-suite addition:

https://github.com/darktable-org/rawspeed

Comment 5 Roman Lebedev 2018-10-23 14:09:46 PDT

(In reply to Simon Pilgrim from comment #4)
> Roman has been using his image processing
Not really processing, *decoding*.

> suite as part of the AMD
> Piledriver (BDVER2) scheduler model development, he might be able to advise
> if this is something that would work well as a test-suite addition:
> 
> https://github.com/darktable-org/rawspeed

It totally could, both as benchmark, and a test, i would love that.
I even brought it up in https://reviews.llvm.org/D46714#1095030

One caveat: it requires a test set, which is kinda heavy - 756M right now,
not something you'd want to put into test-suite repo.
https://raw.pixls.us/data-unique/ (+ there is rsync), we are looking into
putting it into a git repo to simplify syncing.

Comment 6 Michael Kruse 2018-10-24 10:41:56 PDT

I added it to the proposal list (http://llvm.org/docs/Proposals/TestSuite.html) in r345166.

A data set of that size might also be unsuitable for compiler performance testing because a large portion of the runtime will be I/O (unless the host has a sufficient amount of RAM and we can somehow guarantee it being in the cache).

Instead, could we craft smaller RAW images, or use well-compressible images (e.g. all-gray) that are decompressed on-the-fly?

Comment 7 Roman Lebedev 2018-10-24 11:41:18 PDT

(In reply to Michael Kruse from comment #6)
> I added it to the proposal list
> (http://llvm.org/docs/Proposals/TestSuite.html) in r345166.
Thanks!
 
> A data set of that size might also be unsuitable for compiler performance
> testing because a large portion of the runtime will be I/O (unless the host
> has a sufficient amount of RAM and we can somehow guarantee it being in the
> cache).
I'm not sure i follow. What specifically is the concern here?
The disk IO itself, or that it will be counted as the time consumed by benchmark?
Because the latter is absolutely avoidable by only counting the actual decoding time, not reading time.
(The whole file is *already* read to memory first, and decoded from memory.)
 
> Instead, could we craft smaller RAW images, or use well-compressible images
> (e.g. all-gray) that are decompressed on-the-fly?

Sadly, it's not your typical libpng / libjpeg.
There aren't actual specs for these compression schemes available.
And even then, one would need to implement the compressors,
and then ensure that they correctly roundtrip, etc.
So while it's not unfeasible, just not *too* feasible..

Comment 8 Michael Kruse 2018-10-24 12:30:33 PDT

(In reply to Roman Lebedev from comment #7)
> I'm not sure i follow. What specifically is the concern here?
> The disk IO itself, or that it will be counted as the time consumed by
> benchmark?

The latter

> Because the latter is absolutely avoidable by only counting the actual
> decoding time, not reading time.
> (The whole file is *already* read to memory first, and decoded from memory.)

Most programs in test-suite, which the easiest to do, have their total execution time measured (timeit.c or linux perf). It's the user time (not wall-clock or kernel), I don't how much I/O has still an influence on it.

I now see that rawspeed's benchmarks (https://github.com/darktable-org/rawspeed/tree/develop/bench) are already using Google Benchmark, which is understood by test-suite's microbenchmark.py.

Comment 9 Roman Lebedev 2018-10-24 12:41:17 PDT

(In reply to Michael Kruse from comment #8)
> (In reply to Roman Lebedev from comment #7)
> > I'm not sure i follow. What specifically is the concern here?
> > The disk IO itself, or that it will be counted as the time consumed by
> > benchmark?
> 
> The latter
OK, great :)

> > Because the latter is absolutely avoidable by only counting the actual
> > decoding time, not reading time.
> > (The whole file is *already* read to memory first, and decoded from memory.)
> 
> Most programs in test-suite, which the easiest to do, have their total
> execution time measured (timeit.c or linux perf). It's the user time (not
> wall-clock or kernel), I don't how much I/O has still an influence on it.
> 
> I now see that rawspeed's benchmarks
> (https://github.com/darktable-org/rawspeed/tree/develop/bench)
Eh, those "benchmarks" are bad; so far, i never really looked into writing small proper benchmarks. 

> are already using Google Benchmark
Absolutely :)
The actual interesting thing is https://github.com/darktable-org/rawspeed/blob/develop/src/utilities/rsbench/main.cpp

> which is understood by test-suite's microbenchmark.py.

Comment 10 David Bolvansky 2018-10-30 16:10:42 PDT

https://gitlab.com/chriscox/CppPerformanceBenchmarks

Comment 11 Michael Kruse 2018-11-20 16:36:01 PST

(In reply to David Bolvansky from comment #10)
> https://gitlab.com/chriscox/CppPerformanceBenchmarks

Thanks for the suggestion. I added it to https://llvm.org/docs/Proposals/TestSuite.html in r347369.