Fuzz llvm-as #25013

llvmbot · 2015-08-31T19:23:16Z


Bugzilla Link	24639
Version	trunk
OS	Linux
Depends On	#25018 #25031 #25035 #25014 #25019 #25020 #25030 #25036
Reporter	LLVM Bugzilla Contributor
CC	@kcc,@jfbastien

llvmbot · 2015-08-31T19:25:26Z

This bug is to track bugs found when fuzzing llvm-as. Dependent bugs are bugs found by afl-fuzz, or a lib/Fuzzer version of llvm-as.

llvmbot · 2015-08-31T20:00:51Z

Committed lib/Fuzzer version of llvm-as (i.e. llvm-as-fuzzer). Revision: http://reviews.llvm.org/rL246458

kcc · 2015-08-31T23:11:24Z

The bot (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer)
is running llvm-as-fuzzer with -only_ascci=1 and the default max_len (=64).
It uses the build with assertions.

Currently, it fails instantly due to bug 24640

The corpus for the bot is https://github.com/kcc/fuzzing-with-sanitizers/tree/master/llvm/llvm-as/C1
which was generated by the fuzzer from scratch, with no seed corpus.

Later on we may want to extend the corpus with more valid inputs
and increase the max_len.

kcc · 2015-09-01T19:09:30Z

I've extended the corpus by adding all .ll files from the llmv test suite
and stripping ;.*
It instantly increased the coverage by 2x, still growing

Hitting several asserts so far, e.g.

lib/IR/Globals.cpp:209: void llvm::GlobalVariable::setInitializer(llvm::Constant *): Assertion `InitVal->getType() == getType()->getElementType() && "Initializer type must match GlobalVariable type"' failed.

llvm/include/llvm/Support/Casting.h:237: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::Function, Y = llvm::GlobalValue]: Assertion `isa(Val) && "cast() argument of incompatible type!"' failed.

kcc · 2015-09-01T20:41:43Z

The current llvm-as-fuzzer.cpp leaks a bit of memory every time
llvm::report_fatal_error is called:

Direct leak of 76 byte(s) in 1 object(s) allocated from:
#0 0x4d978b in operator new(unsigned long) /usr/local/google/home/kcc/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
#1 0x7f079bda8668 in _gnu_cxx::new_allocator::allocate(unsigned long, void const*) /usr/local/google/home/kcc/tmp/gcc-4.8.2/build/x86_64-unknown-linux-gnu/libstdc++-v3/include/ext/new»
#2 0x7f079bda8668 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) /usr/local/google/home/kcc/tmp/gcc-4.8.2/build/x86_64-unknown-linux-gnu/libstdc++-»
#3 0xc70021 in llvm::report_fatal_error(llvm::Twine const&, bool) /usr/local/google/home/kcc/llvm/lib/Support/ErrorHandling.cpp:85:26
#4 0xc6fdd5 in llvm::report_fatal_error(char const*, bool) /usr/local/google/home/kcc/llvm/lib/Support/ErrorHandling.cpp:62:3
#5 0xb48101 in llvm::DataLayout::parseSpecifier(llvm::StringRef) /usr/local/google/home/kcc/llvm/lib/IR/DataLayout.cpp:330:11
...

This is because the code in llvm::report_fatal_error calls this:
handler(handlerData, Reason.str(), GenCrashDiag);

void MyFatalErrorHandler(void *user_data, const std::string& reason,
bool gen_crash_diag) {

We create a string object and the long-jump over it's DTOR.

This is not a blocker at this point -- we can always restart the process after processing e.g. 100M units w/o losing much speed, but ideally we need to fix this.

kcc · 2015-09-01T22:24:31Z

With max_len=512 I start seeing more interesting things, like bug 24661

kcc · 2015-09-02T17:33:49Z

After a night of fuzzing I've got ~15 assertion unique failures:

llvm/include/llvm/IR/DebugInfoMetadata.h:60: llvm::TypedDINodeRefllvm::DIType::TypedDINodeRef(const llvm::Metadata *) [T = llvm::DIType]: Assertion (!MD || isa<MDString>(MD) || isa<T>(MD)) && "Expected valid ref"' failed. llvm/include/llvm/IR/Instructions.h:1099: void llvm::ICmpInst::AssertOK(): Assertion getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.
llvm/include/llvm/IR/Metadata.h:938: const llvm::MDOperand &llvm::MDNode::getOperand(unsigned int) const: Assertion I < NumOperands && "Out of range"' failed. llvm/include/llvm/IR/Type.h:97: void llvm::Type::setSubclassData(unsigned int): Assertion getSubclassData() == val && "Subclass data too large for field"' failed.
llvm/include/llvm/Support/Casting.h:237: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::DICompileUnit, Y = llvm::MDNode]: Assertion isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. llvm/include/llvm/Support/Casting.h:237: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::Function, Y = llvm::Constant]: Assertion isa(Val) && "cast() argument of incompatible type!"' failed.
llvm/include/llvm/Support/Casting.h:237: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::Function, Y = llvm::GlobalValue]: Assertion isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed. llvm/include/llvm/Support/Casting.h:251: typename std::enable_if<!is_simple_type<Y>::value, typename cast_retty<X, const Y>::ret_type>::type llvm::cast_or_null(const Y &) [X = llvm::DIType, Y = llvm::MDOperand]: Assertion isa(Val) && "cast_or_null() argument of incompatible type!"' failed.
llvm/include/llvm/Support/Casting.h:269: typename cast_retty<X, Y *>::ret_type llvm::cast_or_null(Y *) [X = llvm::MDTuple, Y = llvm::Metadata]: Assertion isa<X>(Val) && "cast_or_null<Ty>() argument of incompatible type!"' failed. llvm/lib/IR/Attributes.cpp:1210: llvm::AttrBuilder &llvm::AttrBuilder::addAlignmentAttr(unsigned int): Assertion isPowerOf2_32(Align) && "Alignment must be a power of two."' failed.
llvm/lib/IR/Attributes.cpp:833: llvm::AttributeSet llvm::AttributeSet::removeAttributes(llvm::LLVMContext &, unsigned int, llvm::AttributeSet) const: Assertion !Attrs.hasAttribute(Index, Attribute::Alignment) && "Attempt to change alignment!"' failed. llvm/lib/IR/ConstantRange.cpp:49: llvm::ConstantRange::ConstantRange(APIntMoveTy, APIntMoveTy): Assertion (Lower != Upper || (Lower.isMaxValue() || Lower.isMinValue())) && "Lower == Upper, but they aren't min or max value!"' failed.
llvm/lib/IR/Globals.cpp:209: void llvm::GlobalVariable::setInitializer(llvm::Constant *): Assertion InitVal->getType() == getType()->getElementType() && "Initializer type must match GlobalVariable type"' failed. llvm/lib/IR/Value.cpp:60: llvm::Value::Value(llvm::Type *, unsigned int): Assertion (VTy->isFirstClassType() || VTy->isVoidTy()) && "Cannot create non-first-class values except for constants!"' failed.
llvm/lib/IR/ValueTypes.cpp:184: llvm::Type *llvm::EVT::getTypeForEVT(llvm::LLVMContext &) const: Assertion `isExtended() && "Type is not extended!"' failed.

kcc · 2015-09-02T19:21:28Z

base64-encoded reproducers

After a night of fuzzing I've got ~15 assertion unique failures:

Attached are 12 base64-encoded reproducers for 12 different assertion failures.
We'll need to fix these (as well as bug 24661 and bug 24662)
to make further fuzzing more efficient.

kcc · 2015-09-04T02:20:24Z

I've added a dictionary mode to libFuzzer, similar to that in AFL.
http://llvm.org/docs/LibFuzzer.html#dictionaries
Once we get all currently known bugs in llvm-as fixed we may try that
big hummer too.

kcc · 2015-09-09T20:54:06Z

There is a problem with the current llvm-as-fuzzer -- it accumulates some
global state and as the result it finds multiple irrelevant (parasitic)
inputs every time it runs.

One part of the problem is that we use the global context.
It's easy to fix (by creating a new context for every unit),
but it does not seem to fix the problem completely.
Investigating.

kcc · 2015-09-10T02:42:12Z

parasitic-coverage-repro
attached patch adds a reproducer for parasitic coverage.
ninja llvm-as-parasitic-coverage-repro
for i in 2 3 10 50 100 1000; do ASAN_OPTIONS=verbosity=1:coverage=1 bin/llvm-as-parasitic-coverage-repro $i 2>&1 | grep PCs; done
==10485== CovDump: ./llvm-as-parasitic-coverage-repro.10485.sancov: 2637 PCs written
==10488== CovDump: ./llvm-as-parasitic-coverage-repro.10488.sancov: 2637 PCs written
==10491== CovDump: ./llvm-as-parasitic-coverage-repro.10491.sancov: 2637 PCs written
==10494== CovDump: ./llvm-as-parasitic-coverage-repro.10494.sancov: 2641 PCs written
==10498== CovDump: ./llvm-as-parasitic-coverage-repro.10498.sancov: 2644 PCs written
==10501== CovDump: ./llvm-as-parasitic-coverage-repro.10501.sancov: 2644 PCs written

The more times we parse the same input the more coverage we get.
After some time it saturates.

The new coverage comes from e.g. here:
#1 0x0000000000d486e0 in clear () at include/llvm/ADT/SmallVector.h:373
#2 clear () at include/llvm/ADT/FoldingSet.h:325
#3 FindNodeOrInsertPos () at lib/Support/FoldingSet.cpp:317
#4 0x000000000073c8c8 in FindNodeOrInsertPos () at include/llvm/ADT/FoldingSet.h:460
#5 getImpl () at lib/IR/Attributes.cpp:613
#6 0x000000000073dea3 in get () at lib/IR/Attributes.cpp:660
#7 0x000000000074038f in get () at lib/IR/Attributes.cpp:717
#8 0x00000000008671d0 in getAttributes () at include/llvm/IR/Intrinsics.gen:52574
#9 0x0000000000866b69 in Function () at lib/IR/Function.cpp:270
#10 0x000000000052b2fa in Create () at include/llvm/IR/Function.h:124
#11 ParseFunctionHeader () at lib/AsmParser/LLParser.cpp:4459
#12 0x00000000004fd744 in ParseDeclare () at lib/AsmParser/LLParser.cpp:399
#13 ParseTopLevelEntities () at lib/AsmParser/LLParser.cpp:216

So far I failed to understand why....

kcc · 2015-09-25T05:10:55Z

r248556 effectively disables llvm-as-fuzzer on the fuzzer bot because
a) there are too many known crashes that need to be fixed first,
and
b) too much garbage inputs are created due to parasitic coverage (see #10 and #11)

Karl, I wonder if you have time to work on b)?

llvmbot · 2015-09-25T18:43:54Z

r248556 effectively disables llvm-as-fuzzer on the fuzzer bot because
a) there are too many known crashes that need to be fixed first,
and
b) too much garbage inputs are created due to parasitic coverage (see #10
and #11)

Karl, I wonder if you have time to work on b)?

I'll look into both, and will focus on the parasitic coverage.

llvmbot · 2015-09-28T21:45:08Z

I believe the problem is due to a FoldingSet used to hold intrinsic functions. A folding set is a hashtable implemented on a SmallVector, using intrusive links to define the list of elements in a bucket.

Every time a function (declaration/definition) is parsed, a lookup is done (in the create method), it either returns a pointer (in the hashtable) to the corresponding definition, or it creates a spot for a new one, and returns the corresponding pointer so that it can be initialized.

When parsing is done, the table is cleared (but not resized). Hence, on the second parse, the table need not be increased since the first parse caused it to grow.

This explains the first growth, found in comment #11, but doesn't for the second. Still looking into this.

llvmbot · 2015-10-01T19:24:24Z

I looked at the parasitic growth in comment #11 some more. I'm convinced that the problem is that the "type signature" is encoded as part of the key in the folding set, and the "type signature" is a pointer to a type (which have been uniqued). This uniquifying of types happens on each iteration, using heap allocated elements. As a result, different iterations will have different hash values, which will cause collisions to randomly occur. As a result, the bucket lists change between iterations.

I don't see a way to get rid of this effect. It is ingrained in the guts of LLVM to use dynamically allocated type addresses to uniquely identify types.

kcc · 2015-10-01T19:28:14Z

I don't see a way to get rid of this effect. It is ingrained in the guts of
LLVM to use dynamically allocated type addresses to uniquely identify types.

I am ignorant about this part of LLVM: can you point to the exact code & objects?
Are these objects a real global state, or they are part of LLVM context?

llvmbot · 2015-10-01T20:01:00Z

Its in the LLVMContext (I think).

When constructor Function::Function() is called, somehow field IntID is set, and this value then causes a lookup. It then calls lookupIntrinsicID (in function.cpp). This lookup then looks up (in some table) that is a folding set for the intrinsic. This table has entries that contain other things than just intrinsics. The elements are VERY generic in FoldingSet, and all I know is that other elements vary on contents between iterations.

I'm guessing this is some sort of symbol table, and I'm guessing that these other entries have types. However, I haven't convinced myself of that because:

I really don't know what folding set I am in.
The generic nodeequals doesn't match these elements, and just skips over them.
The elements that are skipped I haven't figured out where they are added.

BTW, it is very hard to figure out what is going on because much of the driving code is automatically generated code, and the abstractions are totally gone when you look at the low-level code that is actually being run in the debugger.

llvmbot · 2015-10-01T23:08:24Z

I looked a bit deeper, and I think the cause of the problem is due to the uniquifying of "attribute sets". The method AttributeSetNode::get(LLVMContext &C, ArrayRef Attrs) looks for the attribute set in C.pImpl to see if the attribute set is already defined. If it is found, it returns the existing, Otherwise it creates a new (heap allocated) AttributeSetNode.

The problem is how nodes are "profiled" (i.e. the Profile method used to compute the hash on the unique bits of the data). The class AttributeSetImpl defines Profile in terms of method getNode, which returns a pair<unsigned, AttributeSetNode *> for each element. The ID is computed by adding these two values using ID.addInteger() and ID.addPointer().

If you look up the definition of method FoldingSetNodeID::AddPointer(const void* Ptr) it simply adds the pointer to the sequence of bits in the Node ID.

Hence, depending on dynamic allocation, you will get different hash values. This explains why we get different hash tables on different iterations.

llvmbot · 2021-11-26T20:47:45Z

mentioned in issue #25014

llvmbot · 2021-11-26T20:47:53Z

mentioned in issue #25018

llvmbot · 2021-11-26T20:47:54Z

mentioned in issue #25019

llvmbot · 2021-11-26T20:47:55Z

mentioned in issue #25020

kcc · 2021-11-26T20:48:12Z

mentioned in issue #25031

llvmbot · 2021-11-26T20:48:12Z

mentioned in issue #25030

kcc · 2021-11-26T20:48:18Z

mentioned in issue #25035

kcc · 2021-11-26T20:48:20Z

mentioned in issue #25036

llvmbot mentioned this issue Sep 1, 2015

Assertion in GlobalVariable::setInitializer fires on .ll file #25030

Closed

This was referenced Sep 1, 2015

llvm-as: Assertion `isExtended() && "Type is not extended!"' failed. #25031

Closed

heap-buffer-overflow in llvm::InlineAsm::ConstraintInfo::Parse #25035

Open

stack-buffer-overflow in llvm::APInt::tcExtract #25036

Closed

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

llvmbot added the confirmed Verified by a second party label Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz llvm-as #25013

Fuzz llvm-as #25013

llvmbot commented Aug 31, 2015

llvmbot commented Aug 31, 2015

llvmbot commented Aug 31, 2015

kcc commented Aug 31, 2015

kcc commented Sep 1, 2015

kcc commented Sep 1, 2015

kcc commented Sep 1, 2015

kcc commented Sep 2, 2015

kcc commented Sep 2, 2015

kcc commented Sep 4, 2015

kcc commented Sep 9, 2015

kcc commented Sep 10, 2015

kcc commented Sep 25, 2015

llvmbot commented Sep 25, 2015

llvmbot commented Sep 28, 2015

llvmbot commented Oct 1, 2015

kcc commented Oct 1, 2015

llvmbot commented Oct 1, 2015

llvmbot commented Oct 1, 2015

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

kcc commented Nov 26, 2021

llvmbot commented Nov 26, 2021

kcc commented Nov 26, 2021

kcc commented Nov 26, 2021

Fuzz llvm-as #25013

Fuzz llvm-as #25013

Comments

llvmbot commented Aug 31, 2015

llvmbot commented Aug 31, 2015

llvmbot commented Aug 31, 2015

kcc commented Aug 31, 2015

kcc commented Sep 1, 2015

kcc commented Sep 1, 2015

kcc commented Sep 1, 2015

kcc commented Sep 2, 2015

kcc commented Sep 2, 2015

kcc commented Sep 4, 2015

kcc commented Sep 9, 2015

kcc commented Sep 10, 2015

kcc commented Sep 25, 2015

llvmbot commented Sep 25, 2015

llvmbot commented Sep 28, 2015

llvmbot commented Oct 1, 2015

kcc commented Oct 1, 2015

llvmbot commented Oct 1, 2015

llvmbot commented Oct 1, 2015

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

llvmbot commented Nov 26, 2021

kcc commented Nov 26, 2021

llvmbot commented Nov 26, 2021

kcc commented Nov 26, 2021

kcc commented Nov 26, 2021