23052 – fuzz clang-format

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 23052 - fuzz clang-format

Summary: fuzz clang-format

Status:	NEW

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Duplicates (1):	22920 (view as bug list)
Depends on:
Blocks:

Reported:	2015-03-28 00:03 PDT by Kostya Serebryany
Modified:	2016-01-05 10:48 PST (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kostya Serebryany 2015-03-28 00:03:18 PDT

We have a fuzzer of clang-format in the source tree.
Details: llvm/lib/Fuzzer/README.txt
 
It has found a few bugs so far: 
r226685, r226678, r226451, r226446, r226448, r227427, r226447,
r226685, r226680, r226698, r229485, r227677, r227433, r227427, 
r230395, r231066, (probably missed a couple more)

There are a few remaining, we will be posting them here, one per comment. 

There is also a build bot which runs the fuzzer 24/7 and will report new
bugs (regressions) if they appear or old bugs if the fuzzer discovers them. 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer

Comment 1 Kostya Serebryany 2015-03-28 00:19:28 PDT

Clang-format(-fuzzer) is very slow on a tiny input.
May not be a big problem by itself (or may be it is), 
but this hurts fuzzing very much. With all the fuzzer instrumentation
it takes ~1.5 second to format 60 bytes. 
W.o. instrumentation it takes ~0.5 second.

cat << EOF | base64 --decode | clang-format
PDw8SAQEMigqLCioKjFoLGgKPDw8PDw8Cjw8PCxkKiQcPDw8KCosKKgiaCxoCigKCjw8PAo8PGQq
KKA6
EOF

Perf: 
 51.83%  clang::format::(anonymous namespace)::AnnotatingParser::next()                                                                 
 13.12%  clang::format::(anonymous namespace)::AnnotatingParser::parseParens(bool)                                                      
 11.87%  clang::format::(anonymous namespace)::AnnotatingParser::consumeToken()                                                         
  8.32%  clang::format::(anonymous namespace)::AnnotatingParser::parseAngle()                                                           
  5.01%  clang::getBinOpPrecedence(clang::tok::TokenKind, bool, bool)                                                                   
  4.90%  clang::format::(anonymous namespace)::AnnotatingParser::updateParameterCount(clang::format::FormatToken*, clang::format::Format
  2.27%  clang::format::FormatToken::isSimpleTypeSpecifier() const

Comment 2 Kostya Serebryany 2015-03-28 00:25:26 PDT

This one is worse: 31 seconds w/o instrumentation for 64 bytes, same profile. 

cat << EOF | base64 --decode | clang-format
PDw8SAQEMigqLCioKDFoLGgKPDw8PDw8CjwKPDw8PEhoCjw8PBw8PDwoKiwoqCJoLGgKKAoKPDw8
Cjw8PDw8PA==
EOF

Comment 3 Benjamin Kramer 2015-03-28 10:27:31 PDT

A chain of < seems to trigger superlinear runtime in the parser.

perl -e 'print "<" x 20'|clang-format

n  | seconds
20 | 0.101
21 | 0.191
22 | 0.367
23 | 0.722
24 | 1.431
25 | 2.730
26 | 5.173
27 | 10.026
28 | 19.779
29 | 39.350

Comment 4 Kostya Serebryany 2015-04-01 17:38:26 PDT

echo  LypcAAov  | base64 --decode  | clang-format -

Assertion `TokenText.startswith("/*") && TokenText.endswith("*/")' failed.

Comment 5 Kostya Serebryany 2015-04-01 17:39:22 PDT

echo  PCo+Iis/J2FjIDpTDT46zvxcXAp1NzI49zxGPg==  | base64 --decode  | clang-format - 

Assertion `EndColumn >= StartColumn' failed.

Comment 6 Kostya Serebryany 2015-04-01 17:40:15 PDT

*** Bug 22920 has been marked as a duplicate of this bug. ***

Comment 7 Kostya Serebryany 2015-05-05 23:37:05 PDT

the clang/clang-format fuzzer bot
lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer
has been extended to run both with and w/o assertions. 
whenever a bug is found, the fuzzer will print the base64-encoded reproducer 
so that one can copy-paste it from the buildbot logs: 
E.g. from the bot logs:
===============
SUMMARY: AddressSanitizer: ...
CRASHED; file written to crash-80193815206841682354717562770799349303
Base64: OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs=
===============

Just do this: 
echo OiDgO3gKUyYhU0Z4KhFoEztFKGV1bZNTe5Hsk1MmKUMheCoTIWgTO0VTKMFldW2TUzs= | base64 -d | clang -x c++ -

Comment 8 Kostya Serebryany 2015-05-06 10:00:03 PDT

Daniel, many thanks for the fixes. 
The next biggest offender is 

clang-format-fuzzer: /mnt/b/sanitizer-buildbot5/sanitizer-x86_64-linux-fuzzer/build/llvm/tools/clang/lib/Format/ContinuationIndenter.cpp:1066: unsigned int clang::format::ContinuationIndenter::breakProtrudingToken(const clang::format::FormatToken &, clang::format::LineState &, bool): Assertion `NewRemainingTokenColumns < RemainingTokenColumns' failed.

reproducer (base64-encoded):
SCQhJCwxLGNvbnN0ZQx4ciBjaHIzaDJ0IDMqMiAjJCgpABkMLTo9IGdldCxRKiJzdFwwXPSKpKQ6JFxcIg==

You may get more reproducers from the bot:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer

Comment 9 Daniel Jasper 2015-07-20 18:28:26 PDT

Fixed crasher in r242738.

Comment 10 Kostya Serebryany 2016-01-05 10:48:53 PST

The clang-format-fuzzer bot has been mostly green lately, 
with only one periodic assert failure, bug 26032
I've changed the bot to treat clang-format-fuzzer failures as real ones, 
not just warnings.