We have a fuzzer of clang-format in the source tree. Details: llvm/lib/Fuzzer/README.txt It has found a few bugs so far: r226685, r226678, r226451, r226446, r226448, r227427, r226447, r226685, r226680, r226698, r229485, r227677, r227433, r227427, r230395, r231066, (probably missed a couple more) There are a few remaining, we will be posting them here, one per comment. There is also a build bot which runs the fuzzer 24/7 and will report new bugs (regressions) if they appear or old bugs if the fuzzer discovers them. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer
Clang-format(-fuzzer) is very slow on a tiny input. May not be a big problem by itself (or may be it is), but this hurts fuzzing very much. With all the fuzzer instrumentation it takes ~1.5 second to format 60 bytes. W.o. instrumentation it takes ~0.5 second. cat << EOF | base64 --decode | clang-format PDw8SAQEMigqLCioKjFoLGgKPDw8PDw8Cjw8PCxkKiQcPDw8KCosKKgiaCxoCigKCjw8PAo8PGQq KKA6 EOF Perf: 51.83% clang::format::(anonymous namespace)::AnnotatingParser::next() 13.12% clang::format::(anonymous namespace)::AnnotatingParser::parseParens(bool) 11.87% clang::format::(anonymous namespace)::AnnotatingParser::consumeToken() 8.32% clang::format::(anonymous namespace)::AnnotatingParser::parseAngle() 5.01% clang::getBinOpPrecedence(clang::tok::TokenKind, bool, bool) 4.90% clang::format::(anonymous namespace)::AnnotatingParser::updateParameterCount(clang::format::FormatToken*, clang::format::Format 2.27% clang::format::FormatToken::isSimpleTypeSpecifier() const
This one is worse: 31 seconds w/o instrumentation for 64 bytes, same profile. cat << EOF | base64 --decode | clang-format PDw8SAQEMigqLCioKDFoLGgKPDw8PDw8CjwKPDw8PEhoCjw8PBw8PDwoKiwoqCJoLGgKKAoKPDw8 Cjw8PDw8PA== EOF
A chain of < seems to trigger superlinear runtime in the parser. perl -e 'print "<" x 20'|clang-format n | seconds 20 | 0.101 21 | 0.191 22 | 0.367 23 | 0.722 24 | 1.431 25 | 2.730 26 | 5.173 27 | 10.026 28 | 19.779 29 | 39.350
echo LypcAAov | base64 --decode | clang-format - Assertion `TokenText.startswith("/*") && TokenText.endswith("*/")' failed.
echo PCo+Iis/J2FjIDpTDT46zvxcXAp1NzI49zxGPg== | base64 --decode | clang-format - Assertion `EndColumn >= StartColumn' failed.
*** Bug 22920 has been marked as a duplicate of this bug. ***
the clang/clang-format fuzzer bot lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer has been extended to run both with and w/o assertions. whenever a bug is found, the fuzzer will print the base64-encoded reproducer so that one can copy-paste it from the buildbot logs: E.g. from the bot logs: =============== SUMMARY: AddressSanitizer: ... CRASHED; file written to crash-80193815206841682354717562770799349303 Base64: OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs= =============== Just do this: echo OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs= | base64 -d | clang -x c++ -
Daniel, many thanks for the fixes. The next biggest offender is clang-format-fuzzer: /mnt/b/sanitizer-buildbot5/sanitizer-x86_64-linux-fuzzer/build/llvm/tools/clang/lib/Format/ContinuationIndenter.cpp:1066: unsigned int clang::format::ContinuationIndenter::breakProtrudingToken(const clang::format::FormatToken &, clang::format::LineState &, bool): Assertion `NewRemainingTokenColumns < RemainingTokenColumns' failed. reproducer (base64-encoded): SCQhJCwxLGNvbnN0ZQx4ciBjaHIzaDJ0IDMqMiAjJCgpABkMLTo9IGdldCxRKiJzdFwwXPSKpKQ6JFxcIg== You may get more reproducers from the bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer
Fixed crasher in r242738.
The clang-format-fuzzer bot has been mostly green lately, with only one periodic assert failure, bug 26032 I've changed the bot to treat clang-format-fuzzer failures as real ones, not just warnings.