Fuzzing LLVM libraries and tools

Introduction

The LLVM tree includes a number of fuzzers for various components. These are built on top of LibFuzzer. In order to build and run these fuzzers, see Configuring LLVM to Build Fuzzers.

Available Fuzzers

clang-fuzzer

A generic fuzzer that tries to compile textual input as C++ code. Some of the bugs this fuzzer has reported are on bugzilla and on OSS Fuzz’s tracker.

clang-proto-fuzzer

A libprotobuf-mutator based fuzzer that compiles valid C++ programs generated from a protobuf class that describes a subset of the C++ language.

This fuzzer accepts clang command line options after ignore_remaining_args=1. For example, the following command will fuzz clang with a higher optimization level:

% bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3

clang-format-fuzzer

A generic fuzzer that runs clang-format on C++ text fragments. Some of the bugs this fuzzer has reported are on bugzilla and on OSS Fuzz’s tracker.

llvm-as-fuzzer

A generic fuzzer that tries to parse text as LLVM assembly. Some of the bugs this fuzzer has reported are on bugzilla.

llvm-dwarfdump-fuzzer

A generic fuzzer that interprets inputs as object files and runs llvm-dwarfdump on them. Some of the bugs this fuzzer has reported are on OSS Fuzz’s tracker

llvm-demangle-fuzzer

A generic fuzzer for the Itanium demangler used in various LLVM tools. We’ve fuzzed __cxa_demangle to death, why not fuzz LLVM’s implementation of the same function!

llvm-isel-fuzzer

A structured LLVM IR fuzzer aimed at finding bugs in instruction selection.

This fuzzer accepts flags after ignore_remaining_args=1. The flags match those of llc and the triple is required. For example, the following command would fuzz AArch64 with Global Instruction Selection:

% bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0

Some flags can also be specified in the binary name itself in order to support OSS Fuzz, which has trouble with required arguments. To do this, you can copy or move llvm-isel-fuzzer to llvm-isel-fuzzer--x-y-z, separating options from the binary name using “–”. The valid options are architecture names (aarch64, x86_64), optimization levels (O0, O2), or specific keywords, like gisel for enabling global instruction selection. In this mode, the same example could be run like so:

% bin/llvm-isel-fuzzer--aarch64-O0-gisel <corpus-dir>

llvm-opt-fuzzer

A structured LLVM IR fuzzer aimed at finding bugs in optimization passes.

It receives optimization pipeline and runs it for each fuzzer input.

Interface of this fuzzer almost directly mirrors llvm-isel-fuzzer. Both mtriple and passes arguments are required. Passes are specified in a format suitable for the new pass manager. You can find some documentation about this format in the doxygen for PassBuilder::parsePassPipeline.

% bin/llvm-opt-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple x86_64 -passes instcombine

Similarly to the llvm-isel-fuzzer arguments in some predefined configurations might be embedded directly into the binary file name:

% bin/llvm-opt-fuzzer--x86_64-instcombine <corpus-dir>

llvm-mc-assemble-fuzzer

A generic fuzzer that fuzzes the MC layer’s assemblers by treating inputs as target specific assembly.

Note that this fuzzer has an unusual command line interface which is not fully compatible with all of libFuzzer’s features. Fuzzer arguments must be passed after --fuzzer-args, and any llc flags must use two dashes. For example, to fuzz the AArch64 assembler you might use the following command:

llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4

This scheme will likely change in the future.

llvm-mc-disassemble-fuzzer

A generic fuzzer that fuzzes the MC layer’s disassemblers by treating inputs as assembled binary data.

Note that this fuzzer has an unusual command line interface which is not fully compatible with all of libFuzzer’s features. See the notes above about llvm-mc-assemble-fuzzer for details.

lldb-target-fuzzer

A generic fuzzer that interprets inputs as object files and uses them to create a target in lldb.

Mutators and Input Generators

The inputs for a fuzz target are generated via random mutations of a corpus. There are a few options for the kinds of mutations that a fuzzer in LLVM might want.

Generic Random Fuzzing

The most basic form of input mutation is to use the built in mutators of LibFuzzer. These simply treat the input corpus as a bag of bits and make random mutations. This type of fuzzer is good for stressing the surface layers of a program, and is good at testing things like lexers, parsers, or binary protocols.

Some of the in-tree fuzzers that use this type of mutator are clang-fuzzer, clang-format-fuzzer, llvm-as-fuzzer, llvm-dwarfdump-fuzzer, llvm-mc-assemble-fuzzer, and llvm-mc-disassemble-fuzzer.

Structured Fuzzing using libprotobuf-mutator

We can use libprotobuf-mutator in order to perform structured fuzzing and stress deeper layers of programs. This works by defining a protobuf class that translates arbitrary data into structurally interesting input. Specifically, we use this to work with a subset of the C++ language and perform mutations that produce valid C++ programs in order to exercise parts of clang that are more interesting than parser error handling.

To build this kind of fuzzer you need protobuf and its dependencies installed, and you need to specify some extra flags when configuring the build with CMake. For example, clang-proto-fuzzer can be enabled by adding -DCLANG_ENABLE_PROTO_FUZZER=ON to the flags described in Configuring LLVM to Build Fuzzers.

The only in-tree fuzzer that uses libprotobuf-mutator today is clang-proto-fuzzer.

Structured Fuzzing of LLVM IR

We also use a more direct form of structured fuzzing for fuzzers that take LLVM IR as input. This is achieved through the FuzzMutate library, which was discussed at EuroLLVM 2017.

The FuzzMutate library is used to structurally fuzz backends in llvm-isel-fuzzer.

Building and Running

Configuring LLVM to Build Fuzzers

Fuzzers will be built and linked to libFuzzer by default as long as you build LLVM with sanitizer coverage enabled. You would typically also enable at least one sanitizer to find bugs faster. The most common way to build the fuzzers is by adding the following two flags to your CMake invocation: -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On.

Note

If you have compiler-rt checked out in an LLVM tree when building with sanitizers, you’ll want to specify -DLLVM_BUILD_RUNTIME=Off to avoid building the sanitizers themselves with sanitizers enabled.

Note

You may run into issues if you build with BFD ld, which is the default linker on many unix systems. These issues are being tracked in https://llvm.org/PR34636.

Continuously Running and Finding Bugs

There used to be a public buildbot running LLVM fuzzers continuously, and while this did find issues, it didn’t have a very good way to report problems in an actionable way. Because of this, we’re moving towards using OSS Fuzz more instead.

You can browse the LLVM project issue list for the bugs found by LLVM on OSS Fuzz. These are also mailed to the llvm-bugs mailing list.

Utilities for Writing Fuzzers

There are some utilities available for writing fuzzers in LLVM.

Some helpers for handling the command line interface are available in include/llvm/FuzzMutate/FuzzerCLI.h, including functions to parse command line options in a consistent way and to implement standalone main functions so your fuzzer can be built and tested when not built against libFuzzer.

There is also some handling of the CMake config for fuzzers, where you should use the add_llvm_fuzzer to set up fuzzer targets. This function works similarly to functions such as add_llvm_tool, but they take care of linking to LibFuzzer when appropriate and can be passed the DUMMY_MAIN argument to enable standalone testing.