As part of our testing internally, we do a self-host build of the compiler on linux and run the lit tests to make sure that everything works. Recently this self-host build started failing 7 GWP-ASan tests in the stage 1 build of our self-host build. I bisected the test failure to recent commit b22910daab95be1ebc6ab8a74190e38130b0e6ef. The 7 tests that are failing are: 1. GWP-ASan-x86_64 :: double_delete.cpp 2. GWP-ASan-x86_64 :: double_deletea.cpp 3. GWP-ASan-x86_64 :: double_free.cpp 4. GWP-ASan-x86_64 :: invalid_free_left.cpp 5. GWP-ASan-x86_64 :: invalid_free_right.cpp 6. GWP-ASan-x86_64 :: realloc.cpp 7. GwpAsan-Unittest :: ./GwpAsan-x86_64- Test/BacktraceGuardedPoolAllocator.DoubleFree I was able to reproduce this locally on my linux machine running Ubuntu 20.04 using both Gcc 9.3.0 and clang 10.0.0 to build the initial compiler. Steps to reproduce the failure: 1. Build clang using the system compiler (gcc 9.3 or clang 10 in my case) 2. Use the clang built in step 1 to rebuild clang 3. Using the version of clang built in step 2, run the lit tests The cmake command I used to build was the following: cmake -G Ninja -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_VERSION_SUFFIX= -DLLVM_BUILD_RUNTIME=ON -DLLVM_TOOL_LLD_BUILD=OFF "-DLLVM_LIT_ARGS=--verbose -j150" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_TARGETS_TO_BUILD=all "-DLLVM_ENABLE_PROJECTS=clang;compiler-rt" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ <path to llvm sources> The Unittest that failed produced the following output: FAIL: GwpAsan-Unittest :: ./GwpAsan-x86_64-Test/BacktraceGuardedPoolAllocator.DoubleFree (32710 of 69710) ******************** TEST 'GwpAsan-Unittest :: ./GwpAsan-x86_64-Test/BacktraceGuardedPoolAllocator.DoubleFree' FAILED ******************** Note: Google Test filter = BacktraceGuardedPoolAllocator.DoubleFree [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from BacktraceGuardedPoolAllocator [ RUN ] BacktraceGuardedPoolAllocator.DoubleFree /home/dyung/src/upstream/llvm_clean_git/compiler-rt/lib/gwp_asan/tests/backtrace.cpp:45: Failure Death test: DeallocateMemory2(GPA, Ptr) Result: failed to die. Error msg: [ DEATH ] [ FAILED ] BacktraceGuardedPoolAllocator.DoubleFree (62 ms) [----------] 1 test from BacktraceGuardedPoolAllocator (62 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (69 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] BacktraceGuardedPoolAllocator.DoubleFree 1 FAILED TEST While the regular lit tests that failed all seemed to not produce any output: FileCheck error: '<stdin>' is empty. FileCheck command line: FileCheck /home/dyung/src/upstream/llvm_clean_git/compiler-rt/test/gwp_asan/double_delete.cpp
See ongoing disscussion in https://reviews.llvm.org/D87149 I believe TLDR is that unless these miscompiles are for volatile stores, it is GWP-ASan that needs to be fixed.
Warning: I don't know anything about GWP-ASan, and I'm not experienced in the UB subtleties exposed by my patch... but a first grep of that dir shows the likely problematic function: void GuardedPoolAllocator::trapOnAddress(uintptr_t Address, Error E) { State.FailureType = E; State.FailureAddress = Address; // Raise a SEGV by touching first guard page. volatile char *p = reinterpret_cast<char *>(State.GuardedPagePool); *p = 0; __builtin_unreachable(); } IIUC, we are probably going to change LLVM to allow this usage. But would it be safer/better to use __builtin_trap() or __builtin_debugtrap() in this function?
cc'ing people that I hopefully correctly cross-referenced from: https://reviews.llvm.org/D60593
(In reply to Sanjay Patel from comment #2) > Warning: I don't know anything about GWP-ASan, and I'm not experienced in > the UB subtleties exposed by my patch... > but a first grep of that dir shows the likely problematic function: > > void GuardedPoolAllocator::trapOnAddress(uintptr_t Address, Error E) { > State.FailureType = E; > State.FailureAddress = Address; > > // Raise a SEGV by touching first guard page. > volatile char *p = reinterpret_cast<char *>(State.GuardedPagePool); > *p = 0; > __builtin_unreachable(); > } > > > IIUC, we are probably going to change LLVM to allow this usage. But would it > be safer/better to use __builtin_trap() or __builtin_debugtrap() in this > function? Nice catch, thanks! Using __builtin_trap() would be better indeed. unreachable is quite nasty.
(In reply to Nuno Lopes from comment #4) > (In reply to Sanjay Patel from comment #2) > > Warning: I don't know anything about GWP-ASan, and I'm not experienced in > > the UB subtleties exposed by my patch... > > but a first grep of that dir shows the likely problematic function: > > > > void GuardedPoolAllocator::trapOnAddress(uintptr_t Address, Error E) { > > State.FailureType = E; > > State.FailureAddress = Address; > > > > // Raise a SEGV by touching first guard page. > > volatile char *p = reinterpret_cast<char *>(State.GuardedPagePool); > > *p = 0; > > __builtin_unreachable(); > > } > > > > > > IIUC, we are probably going to change LLVM to allow this usage. But would it > > be safer/better to use __builtin_trap() or __builtin_debugtrap() in this > > function? > > Nice catch, thanks! > Using __builtin_trap() would be better indeed. unreachable is quite nasty. I can confirm that changing the __builtin_unreachable() to __builtin_trap() does seem to fix the problem!
While the "problem" will be address either way, I would generally think `__builtin_trap()` is the better way to cause a trap.
Note that we have at least temporarily restored the failing tests to passing: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/29534 ...with this patch: https://github.com/llvm/llvm-project/commit/4e413e16216d0c94ada2171f3c59e0a85f4fa4b6 But it sounds like changing the GWP-ASan source would be safer.
The volatile store in GWP-ASan is (by designed) supposed to crash the program with a segv at a specific address. The __builtin_unreachable() is there to guide the compiler that it can tail-call optimise this function. It seems to me that the root cause is that the __builtin_unreachable() shouldn't be allowed to dead-code eliminate things with side effects (like the volatile store)...
(In reply to Mitch Phillips from comment #8) > The volatile store in GWP-ASan is (by designed) supposed to crash the > program with a segv at a specific address. > > The __builtin_unreachable() is there to guide the compiler that it can > tail-call optimise this function. > > It seems to me that the root cause is that the __builtin_unreachable() > shouldn't be allowed to dead-code eliminate things with side effects (like > the volatile store)... As we discuss in the thread https://reviews.llvm.org/D87149, the LLVM LangRef needs to be updated to provide the above guarantees. That said, generic "side effects", in the LLVM-IR sense, will *not* be preserved if they are inevitable followed by an unreachable. That is, a regular store to a global for example is a side effect but it won't be preserved. volatile accesses are different and you will probably get what you expect here, we'll update this once the discussion is settled.
(In reply to Johannes Doerfert from comment #9) > (In reply to Mitch Phillips from comment #8) > > The volatile store in GWP-ASan is (by designed) supposed to crash the > > program with a segv at a specific address. > > > > The __builtin_unreachable() is there to guide the compiler that it can > > tail-call optimise this function. > > > > It seems to me that the root cause is that the __builtin_unreachable() > > shouldn't be allowed to dead-code eliminate things with side effects (like > > the volatile store)... > > As we discuss in the thread https://reviews.llvm.org/D87149, the LLVM > LangRef needs to be updated to provide the above guarantees. That said, > generic "side effects", in the LLVM-IR sense, will *not* be preserved if > they are inevitable followed by an unreachable. That is, a regular store to > a global for example is a side effect but it won't be preserved. volatile > accesses are different and you will probably get what you expect here, we'll > update this once the discussion is settled. FYI looks like GWP-ASan still hasn't been fixed, nor did anybody submit an RFC to change LangRef. Chances are, these tests will break again tomorrow, once the temporary patch expires as part of https://reviews.llvm.org/D105338
Sent https://reviews.llvm.org/D105654 to add a workaround in GWP-ASan, but __builtin_unreachable() optimising away code with side effects still needs to be fixed.