-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm 3.5 optimizer miscompile regression in little testcase (zdoom is affected by it) #21676
Comments
From a cursory look at the code, I think that it invokes undefined behavior by accessing different members of the union "scripts", e.g:
Here the code writes member "b", making it the active member, and then reads the non-active member "dw". This is undefined behavior in C++ (although not always in C), c.f. http://stackoverflow.com/questions/2310483/. Undefined behavior essentially means the compiler is free to do whatever it wants, c.f. http://people.csail.mit.edu/akcheung/papers/apsys12.html and http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. Most assume no undefined behavior occurs in order to better optimize the code. Including the asm statement might disable these optimizations. Have you tried cleaning up the code to see if zdoom still crashes? |
Additional info: -fno-strict-aliasing makes the program fine; |
Does compiling the reduced code without -fsanitize=undefined result in a crash? If yes, could you please post it here?
I'm sorry, I don't quite understand. Do you mean the crash does not happen if FindChunk is not called, e.g. the following line (from comment 1) is removed:
(As well as all other lines that call FindChunk.) |
No, since the problem happens in a class member method, I can move this code: BYTE *FBehavior::FindChunk (DWORD id) const
} to another source file. If I move it, the crash does not happen. |
To make it easy the life to devs, I made a patch to apply to the zdoom source code, which:
I get the same problems with these changes. If you move 'FBehavior::FindChunk' definition back to p_acs.cpp, the crash does not happen. Link is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced.diff |
I looked at your patch, and unfortunately it still invokes undefined behavior, for example here: scripts.dw += 2; "scripts.dw" aliases "scripts.w" and "scripts.sw", as they are all fields of the same union. This is undefined behavior in C++. To fix this, the union "scripts" has to be removed and replaced by a pointer to char/unsigned char[1]. It must not be replaced by a pointer to WORD/DWORD, as casting e.g. from WORD* to DWORD* violates C++'s aliasing rules and again is undefined behavior. For example this is invalid code: WORD* pointer_to_WORD = ...; Only casts from char*/unsigned char*[1] to other pointer types are valid[2]. For example this is valid code: char* pointer_to_char = ...; These are valid as well: const char* pointer_to_const_char = ...; and: char* pointer_to_char = ...; See http://stackoverflow.com/questions/98650/. [1] Or BYTE* if BYTE is a typedef for char/unsigned char. |
Ok, removed the 'union' and relied on the intermediary casts to 'BYTE *' (yes, BYTE is a typedef to 'unsigned char'). It still crashes. I updated the diff, available here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced.diff |
I managed to reduce a lot both codepaths and dependencies from the big class FBehavior, so you'll get a huge code reduction. The new patch to apply to zdoom source is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced_2.diff . |
generated llvm ir from p_acs_minimal.cpp with -O0 The new patch is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced_3.diff Also, I'm providing the generated .ll file from p_acs_reduced.cpp with this command: clang++ -std=c++11 -fno-rtti -fomit-frame-pointer -O0 -DNDEBUG -S -emit-llvm -o p_acs_minimal.ll -c p_acs_minimal.cpp I tried to do: opt-3.6 -O2 p_acs_minimal.ll -S -o p_acs_minimal_opt2.ll but the result is different from generating the optimized llvm ir from clang++ itself: clang++ -std=c++11 -fno-rtti -fomit-frame-pointer -O2 -DNDEBUG -S -emit-llvm -o p_acs_minimal_opt.ll -c p_acs_minimal.cpp I have no idea what's going on. |
Independent and reduced testcase - main file I noticed that in pr21302.ii, if you exchange the two assignments inside this loop: for (int i = 0; i < n; ++i) the crash goes away. |
Thank you very much for your reduced testcase and your work! I compiled it using clang 3.5.0 on Mac OS X. However, the executable ran without crashing. The command line I used: /opt/local/llvm-3.5.0/bin/clang++ main.ii pr21302.ii The output of clang -v: clang version 3.5.0 (tags/RELEASE_350/final) I also compiled the testcase with -O3, but the executable still ran without crashing. Could you please post the complete command line that clang uses internally when you compile the testcase? You get it by passing the additional option -v. Then clang will output something like: /opt/local/llvm-3.5.0/bin/clang" -cc1 -triple x86_64-apple-macosx10.9.0 [...] (look for the "-cc1") |
Independent and more reduced testcase - main file |
Oops! Now I saw your message. Here: -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name main.cc -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-linker-version 2.24 -momit-leaf-frame-pointer -v -g -dwarf-column-info -resource-dir /usr/lib/llvm-3.5/bin/../lib/clang/3.5.0 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/backward -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-3.5/bin/../lib/clang/3.5.0/include -internal-externc-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -Weverything -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/edward-san/llvm -ferror-limit 19 -fmessage-length 80 -fsanitize=address -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp |
I can reproduce the miscompile, although my executable doesn't crash, but results in an error message from the address sanitizer. My clang command line: /opt/local/llvm-3.5.0/bin/clang++ -O2 -fsanitize=address main.ii pr21302.ii Executing a.out results in: ==2933==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000e09c at pc 0x00010a9bde6e bp 0x7fff552439d0 sp 0x7fff552439c8 0x60200000e09c is located 4 bytes to the right of 8-byte region [0x60200000e090,0x60200000e098) This only happens when compiling the code in two separate files. Merging main.ii and pr21302.ii and compiling the result does not result in an error message from the address sanitizer (nor a crash). This also only happens when compiling the separate files with -O2 or -O3. Compiling with -O1, -Os or -Oz or no -O at all does also not result in an error message from the address sanitizer (nor a crash). I can also reproduce this with clang 3.6.0 r219601 (trunk from last Monday). I think the 8-byte region mentioned by the address sanitizer is the single X allocated in f (n is 1 at that point): x = new X[n]; Changing n's type from "int&" to "int" makes the code compile correctly. Removing the call to qsort (which should not be called anyway, because "n" should be 1) also makes the code compile correctly. |
That's it. The 'crash' would happen randomly at zdoom in other places later, but if you compile with -fsanitize=address, the problem is detected before it's spreading :). |
testcase - main file |
Reduced testcase |
Preprocessed testcase Anyways, I can't reproduce the problem if the last element of 'struct X' has type 'int', independently from the type of the first element (I tested char, short, int, long so far). The other combinations will crash. |
I tried to reduce the test case as far as I could (I also tried removing one/both of the loops in "f" or the parameter "m", but that made the program run fine). That's why I did not care for the memory leak. As far as the "return 0" is concerned: It is implicit in main. See the C++ standard §3.6.1p5: | If control reaches the end of main without encountering a return statement, the |
Joerg and me found that r200198 is the cause of this problem. Adding the author to the CC list. |
Looking at the IR processing in detail, the loop vectorizer seems to mishandle the initial IV value of the second loop. As seen in the attached IR: %i2.0 = phi i32 [ %inc9, %for.body5 ], [ %bc.trunc.resume.val50, %scalar.ph39 ] and tracing back %bc.trunc.resume.val50 shows that it is not the expected constant 0, but a modified IV value of the first loop. |
Is it still reproducible? I tried Are any flags missing in my command line? |
This works for me as well with clang r220769. However the code in attachment 13229 still crashes. Diff: % diff all.cpp pr21302.ii
|
Thanks, I can reproduce it now (with all.cpp). |
Can you reproduce it if you reintroduce the 'delete[]' call? |
Yes. The relevant difference seems to be the definition of X. With this definition (from all.cpp minus "unsigned") pr21302.ii miscompiles: struct X pr21302.ii's original definition however does not result in a miscompile: struct X pr21302.ii also miscompiles with these definitions: struct X struct X struct X struct X struct X |
Avoid changing the struct as padding etc will change whether the access is invalid or not. |
Further attempt at bisection points to: commit 6c76835c7546686bdef45553c1b5144543040f05
...with the cleanup from Chandler applied on top. |
Ping. I'd like to see this fixed for the 3.5.1 release. |
I made a reproducer that works without asan (see below). It also shows where we mistakenly touch memory. $ bin/clang -O0 fail.cpp -o a.out && ./a.out; echo "Exit code: $?" $ bin/clang -O2 fail.cpp -o a.out && ./a.out; echo "Exit code: $?" $ cat fail.cpp void g(const X *) {} attribute((noinline)) void f(int m, int &n, X &x) if (m == 0) for (int i = 0; i < n; ++i) if (n > 1) int main() I'll take a closer look at it. |
The problem is that vectorized version performs an additional iteration. That happens because the original loop isn't bottom-tested - it checks the termination condition at the beginning of each iteration. That breaks vectorizer's implicit assumption that all instructions in the loop body are executed the same number of times. The fix is pretty simple, I sent it to maillist [1], and hopefully it will be committed soon. [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079274.html |
Fixed in trunk in r223170 and r223171. |
Extended Description
Hello, I have a problem with a big open source program called ZDoom.
The problem is that with the release of clang 3.5, the llvm optimizer miscompiles a portion of the code so that the program crashes with suitable input. To reproduce it, follow the instructions:
Download the source with git:
git clone https://github.com/rheit/zdoom.git path/to/zdoom
Before compiling the program, open src/p_acs.cpp and remove this workaround:
// [EP] Clang 3.5.0 optimizer miscompiles this function and causes random
// crashes in the program. I hope that Clang 3.5.x will fix this.
#if defined(clang) && clang_major == 3 && clang_minor >= 5
asm("" : "+g" (NumScripts));
#endif
which prevents the problems.
Then follow the instructions in http://zdoom.org/wiki/Compile_ZDoom_on_Linux
inside the new folder before the section:
http://zdoom.org/wiki/Compile_ZDoom_on_Linux#Compile
After that, do this:
create a folder where to run cmake (release_clang_3.5, whatever you want), then
inside the folder, run:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_C_COMPILER=path/to/clang-3.5
-DCMAKE_CXX_COMPILER=path/to/clang++-3.5
-DNO_ASM=1
-DVALGRIND=1
-DCMAKE_CXX_FLAGS=-std=c++11 ..
if anything goes fine, run:
make -j4
In order to reproduce the problem, you need to download two files:
freedoom.wad , extracted from the archive file, downloadable at:
https://github.com/freedoom/freedoom/releases/download/v0.9/freedoom-0.9.zip
ZDoomEditDemo_v1_2.pk3, extracted from the archive file, downloadable at:
http://www.mediafire.com/download/w95mcmc83b5z18r/ZDoom_Editing_Demo_v1.2.zip
Save the two files in $HOME/.zdoom .
You have everything, just run (with valgrind too):
./zdoom -iwad freedoom.wad -file ZDoomEditDemo_v1_2.pk3
You should be able get a crash which is not catched by the program and closes
suddenly, if you compiled the program with clang 3.5.
Sometimes the crash happens in other places, and sometimes the crash happens
when closing the program (esc to open the menu -> quit game -> confirm).
you run it with valgrind, you'll get an error which is not present with
clang 3.4 (should be the first one in the error list), then the program works
fine.
If you want to check with address sanitizer, you have to disable asan-globals because for some reason asan does not like data sorting into custom sections.
I can't help you with the debugging other than seeing that the problem happens in that portion of the code where there's the workaround.
Sincerely,
Edward-san
The text was updated successfully, but these errors were encountered: