Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm 3.5 optimizer miscompile regression in little testcase (zdoom is affected by it) #21676

Closed
llvmbot opened this issue Oct 16, 2014 · 37 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 16, 2014

Bugzilla Link 21302
Resolution FIXED
Resolved on Dec 02, 2014 19:04
Version unspecified
OS Linux
Reporter LLVM Bugzilla Contributor
CC @chandlerc,@hfinkel,@jsonn,@nadavrot,@TNorthover

Extended Description

Hello, I have a problem with a big open source program called ZDoom.

The problem is that with the release of clang 3.5, the llvm optimizer miscompiles a portion of the code so that the program crashes with suitable input. To reproduce it, follow the instructions:

Download the source with git:
git clone https://github.com/rheit/zdoom.git path/to/zdoom

Before compiling the program, open src/p_acs.cpp and remove this workaround:

// [EP] Clang 3.5.0 optimizer miscompiles this function and causes random
// crashes in the program. I hope that Clang 3.5.x will fix this.
#if defined(clang) && clang_major == 3 && clang_minor >= 5
asm("" : "+g" (NumScripts));
#endif

which prevents the problems.

Then follow the instructions in http://zdoom.org/wiki/Compile_ZDoom_on_Linux
inside the new folder before the section:
http://zdoom.org/wiki/Compile_ZDoom_on_Linux#Compile

After that, do this:

create a folder where to run cmake (release_clang_3.5, whatever you want), then
inside the folder, run:

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_C_COMPILER=path/to/clang-3.5
-DCMAKE_CXX_COMPILER=path/to/clang++-3.5
-DNO_ASM=1
-DVALGRIND=1
-DCMAKE_CXX_FLAGS=-std=c++11 ..

if anything goes fine, run:

make -j4

In order to reproduce the problem, you need to download two files:

freedoom.wad , extracted from the archive file, downloadable at:
https://github.com/freedoom/freedoom/releases/download/v0.9/freedoom-0.9.zip

ZDoomEditDemo_v1_2.pk3, extracted from the archive file, downloadable at:
http://www.mediafire.com/download/w95mcmc83b5z18r/ZDoom_Editing_Demo_v1.2.zip

Save the two files in $HOME/.zdoom .

You have everything, just run (with valgrind too):

./zdoom -iwad freedoom.wad -file ZDoomEditDemo_v1_2.pk3

You should be able get a crash which is not catched by the program and closes
suddenly, if you compiled the program with clang 3.5.

Sometimes the crash happens in other places, and sometimes the crash happens
when closing the program (esc to open the menu -> quit game -> confirm).

you run it with valgrind, you'll get an error which is not present with
clang 3.4 (should be the first one in the error list), then the program works
fine.

If you want to check with address sanitizer, you have to disable asan-globals because for some reason asan does not like data sorting into custom sections.

I can't help you with the debugging other than seeing that the problem happens in that portion of the code where there's the workaround.

Sincerely,
Edward-san

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 17, 2014

From a cursory look at the code, I think that it invokes undefined behavior by accessing different members of the union "scripts", e.g:

// Load script flags
scripts.b = FindChunk (MAKE_ID('S','F','L','G'));
if (scripts.dw != NULL)

Here the code writes member "b", making it the active member, and then reads the non-active member "dw". This is undefined behavior in C++ (although not always in C), c.f. http://stackoverflow.com/questions/2310483/. Undefined behavior essentially means the compiler is free to do whatever it wants, c.f. http://people.csail.mit.edu/akcheung/papers/apsys12.html and http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. Most assume no undefined behavior occurs in order to better optimize the code. Including the asm statement might disable these optimizations.

Have you tried cleaning up the code to see if zdoom still crashes?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 18, 2014

Additional info:

-fno-strict-aliasing makes the program fine;
-fsanitize=undefined to a reduced code makes the program fine;
the crash does not happen if the FBehavior::FindChunk function is not defined.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 18, 2014

Does compiling the reduced code without -fsanitize=undefined result in a crash? If yes, could you please post it here?

the crash does not happen if the FBehavior::FindChunk function is not defined.

I'm sorry, I don't quite understand. Do you mean the crash does not happen if FindChunk is not called, e.g. the following line (from comment 1) is removed:

scripts.b = FindChunk (MAKE_ID('S','F','L','G'));

(As well as all other lines that call FindChunk.)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 18, 2014

Does compiling the reduced code without -fsanitize=undefined result in a crash? If yes, could you please post it here?
Post what? It's random...

I'm sorry, I don't quite understand. Do you mean the crash does not happen if FindChunk is not called, e.g. the following line (from comment 1) is removed:

scripts.b = FindChunk (MAKE_ID('S','F','L','G'));

(As well as all other lines that call FindChunk.)

No, since the problem happens in a class member method, I can move this code:

BYTE *FBehavior::FindChunk (DWORD id) const
{
BYTE *chunk = Chunks;

while (chunk != NULL && chunk < Data + DataSize)
{
	if (((DWORD *)chunk)[0] == id)
	{
		return chunk;
	}
	chunk += ((DWORD *)chunk)[1] + 8;
}
return NULL;

}

to another source file. If I move it, the crash does not happen.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 18, 2014

To make it easy the life to devs, I made a patch to apply to the zdoom source code, which:

  • moves the 'bad' code to a file called p_acs_minimal.cpp,
  • adds the file to CMakeLists.txt,
  • removes the undefined behavior mentioned in this post by:
    -- changing the functions FBehavior::FindChunk and FBehavior::NextChunk to have as input DWORD *,
    -- removing 'b' from the 'scripts' union, hence it uses 'dw'.

I get the same problems with these changes. If you move 'FBehavior::FindChunk' definition back to p_acs.cpp, the crash does not happen.

Link is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced.diff

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 19, 2014

I looked at your patch, and unfortunately it still invokes undefined behavior, for example here:

scripts.dw += 2;
for (i = max; i > 0; --i, scripts.w += 2)
{
ScriptPtr *ptr = const_cast<ScriptPtr *>(FindScript (LittleShort(scripts.sw[0])));

"scripts.dw" aliases "scripts.w" and "scripts.sw", as they are all fields of the same union. This is undefined behavior in C++.

To fix this, the union "scripts" has to be removed and replaced by a pointer to char/unsigned char[1]. It must not be replaced by a pointer to WORD/DWORD, as casting e.g. from WORD* to DWORD* violates C++'s aliasing rules and again is undefined behavior. For example this is invalid code:

WORD* pointer_to_WORD = ...;
DWORD* pointer_to_DWORD = reinterpret_cast<DWORD*>(pointer_to_WORD);

Only casts from char*/unsigned char*[1] to other pointer types are valid[2]. For example this is valid code:

char* pointer_to_char = ...;
DWORD* pointer_to_DWORD = reinterpret_cast<DWORD*>(pointer_to_char);

These are valid as well:

const char* pointer_to_const_char = ...;
const DWORD* pointer_to_const_DWORD = reinterpret_cast<const DWORD*>(pointer_to_const_char)

and:

char* pointer_to_char = ...;
const DWORD* pointer_to_const_DWORD = reinterpret_cast<const DWORD*>(pointer_to_char)

See http://stackoverflow.com/questions/98650/.

[1] Or BYTE* if BYTE is a typedef for char/unsigned char.
[2] This is slightly simplified. See the C++ Standard §3.10p10 for the gory details.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 19, 2014

Ok, removed the 'union' and relied on the intermediary casts to 'BYTE *' (yes, BYTE is a typedef to 'unsigned char'). It still crashes.

I updated the diff, available here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced.diff

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

I managed to reduce a lot both codepaths and dependencies from the big class FBehavior, so you'll get a huge code reduction.

The new patch to apply to zdoom source is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced_2.diff .

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

generated llvm ir from p_acs_minimal.cpp with -O0
I managed to remove the 'X' struct from the reduced code, I changed the problematic function to have parameters.

The new patch is here: https://dl.dropboxusercontent.com/u/25860178/zdoom_acs_reduced_3.diff

Also, I'm providing the generated .ll file from p_acs_reduced.cpp with this command:

clang++ -std=c++11 -fno-rtti -fomit-frame-pointer -O0 -DNDEBUG -S -emit-llvm -o p_acs_minimal.ll -c p_acs_minimal.cpp

I tried to do:

opt-3.6 -O2 p_acs_minimal.ll -S -o p_acs_minimal_opt2.ll

but the result is different from generating the optimized llvm ir from clang++ itself:

clang++ -std=c++11 -fno-rtti -fomit-frame-pointer -O2 -DNDEBUG -S -emit-llvm -o p_acs_minimal_opt.ll -c p_acs_minimal.cpp

I have no idea what's going on.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

Independent and reduced testcase - main file
I managed to create an executable without the big zdoom program. I'm attaching the two preprocessed files needed to reproduce the crash. main.ii calls the function where it crashes, which is in pr21302.ii .

I noticed that in pr21302.ii, if you exchange the two assignments inside this loop:

for (int i = 0; i < n; ++i)
{
x_s[i].f = 0;
x_s[i].e = 20;
}

the crash goes away.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

Thank you very much for your reduced testcase and your work! I compiled it using clang 3.5.0 on Mac OS X. However, the executable ran without crashing.

The command line I used:

/opt/local/llvm-3.5.0/bin/clang++ main.ii pr21302.ii

The output of clang -v:

clang version 3.5.0 (tags/RELEASE_350/final)
Target: x86_64-apple-darwin13.4.0

I also compiled the testcase with -O3, but the executable still ran without crashing.

Could you please post the complete command line that clang uses internally when you compile the testcase? You get it by passing the additional option -v. Then clang will output something like:

/opt/local/llvm-3.5.0/bin/clang" -cc1 -triple x86_64-apple-macosx10.9.0 [...]

(look for the "-cc1")

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

Independent and more reduced testcase - main file
I managed to remove 'struct Y' and the size of 'struct X_s', renamed 'struct X'. If both files are merged, the crash does not happen.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

Could you please post the complete command line that clang uses internally when you compile the testcase?

Oops! Now I saw your message. Here:

-cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name main.cc -mrelocation-model static -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -target-linker-version 2.24 -momit-leaf-frame-pointer -v -g -dwarf-column-info -resource-dir /usr/lib/llvm-3.5/bin/../lib/clang/3.5.0 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/x86_64-linux-gnu/c++/4.9 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/../../../../include/c++/4.9/backward -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-3.5/bin/../lib/clang/3.5.0/include -internal-externc-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O2 -Weverything -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/edward-san/llvm -ferror-limit 19 -fmessage-length 80 -fsanitize=address -mstackrealign -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

I can reproduce the miscompile, although my executable doesn't crash, but results in an error message from the address sanitizer. My clang command line:

/opt/local/llvm-3.5.0/bin/clang++ -O2 -fsanitize=address main.ii pr21302.ii

Executing a.out results in:

==2933==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000e09c at pc 0x00010a9bde6e bp 0x7fff552439d0 sp 0x7fff552439c8
WRITE of size 2 at 0x60200000e09c thread T0
#​0 0x10a9bde6d in f(int, int&, X*&) (/Users/rynnsauer/#21302 /./a.out+0x100001e6d)
#​1 0x10a9bd6fe in main (/Users/rynnsauer/#21302 /./a.out+0x1000016fe)
#​2 0x7fff82e8b5fc in start (/usr/lib/system/libdyld.dylib+0x35fc)
#​3 0x0 ()

0x60200000e09c is located 4 bytes to the right of 8-byte region [0x60200000e090,0x60200000e098)
allocated by thread T0 here:
#​0 0x10a9f05db in wrap__Znam (/opt/local/llvm-3.5.0/lib/clang/3.5.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x2d5db)
#​1 0x10a9bd882 in f(int, int&, X*&) (/Users/rynnsauer/#21302 /./a.out+0x100001882)
#​2 0x10a9bd6fe in main (/Users/rynnsauer/#21302 /./a.out+0x1000016fe)
#​3 0x7fff82e8b5fc in start (/usr/lib/system/libdyld.dylib+0x35fc)
#​4 0x0 ()

This only happens when compiling the code in two separate files. Merging main.ii and pr21302.ii and compiling the result does not result in an error message from the address sanitizer (nor a crash).

This also only happens when compiling the separate files with -O2 or -O3. Compiling with -O1, -Os or -Oz or no -O at all does also not result in an error message from the address sanitizer (nor a crash).

I can also reproduce this with clang 3.6.0 r219601 (trunk from last Monday).

I think the 8-byte region mentioned by the address sanitizer is the single X allocated in f (n is 1 at that point):

x = new X[n];

Changing n's type from "int&" to "int" makes the code compile correctly.

Removing the call to qsort (which should not be called anyway, because "n" should be 1) also makes the code compile correctly.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2014

That's it. The 'crash' would happen randomly at zdoom in other places later, but if you compile with -fsanitize=address, the problem is detected before it's spreading :).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 21, 2014

testcase - main file
Joerg reduced the testcase even more. Reporting for who didn't follow IRC discussion.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 21, 2014

testcase - crashing file

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 21, 2014

Reduced testcase
I reduced the testcase to a single file by preventing the inlining of "f" and also removed some lines that were unneeded. Causes a miscompile with -O2 and -faddress-sanitizer with both clang 3.5.0 and 3.6.0 r220277 (trunk from this morning), but not for example with -Os.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2014

Preprocessed testcase
I just noticed the new file, but why did you remove 'return 0;' from main and the deallocation of 'x'? The leak sanitizer reports an error when the program runs fine.

Anyways, I can't reproduce the problem if the last element of 'struct X' has type 'int', independently from the type of the first element (I tested char, short, int, long so far). The other combinations will crash.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2014

I just noticed the new file, but why did you remove 'return 0;' from main
and the deallocation of 'x'? The leak sanitizer reports an error when the
program runs fine.

I tried to reduce the test case as far as I could (I also tried removing one/both of the loops in "f" or the parameter "m", but that made the program run fine). That's why I did not care for the memory leak.

As far as the "return 0" is concerned: It is implicit in main. See the C++ standard §3.6.1p5:

| If control reaches the end of main without encountering a return statement, the
| effect is that of executing return 0;

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 23, 2014

Joerg and me found that r200198 is the cause of this problem. Adding the author to the CC list.

@jsonn
Copy link
Contributor

jsonn commented Oct 24, 2014

@jsonn
Copy link
Contributor

jsonn commented Oct 24, 2014

@jsonn
Copy link
Contributor

jsonn commented Oct 24, 2014

Looking at the IR processing in detail, the loop vectorizer seems to mishandle the initial IV value of the second loop. As seen in the attached IR:

%i2.0 = phi i32 [ %inc9, %for.body5 ], [ %bc.trunc.resume.val50, %scalar.ph39 ]

and tracing back %bc.trunc.resume.val50 shows that it is not the expected constant 0, but a modified IV value of the first loop.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 30, 2014

Is it still reproducible?

I tried
clang++ -O2 -fsanitize=address ~/Downloads/pr21302.ii -o a.out && ./a.out
and it worked for me.

Are any flags missing in my command line?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 31, 2014

clang++ -O2 -fsanitize=address ~/Downloads/pr21302.ii -o a.out && ./a.out
and it worked for me.

This works for me as well with clang r220769. However the code in attachment 13229 still crashes. Diff:

% diff all.cpp pr21302.ii
3,4c3,4
< unsigned int a;
< unsigned short b;

char a;
int b;
7c7
< void g(const X *) {}


attribute((noinline)) void g(const X *) {}
30a31
delete[] x;
37a39
return 0;

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 31, 2014

Thanks, I can reproduce it now (with all.cpp).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 1, 2014

Thanks, I can reproduce it now (with all.cpp).

Can you reproduce it if you reintroduce the 'delete[]' call?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 1, 2014

Thanks, I can reproduce it now (with all.cpp).

Can you reproduce it if you reintroduce the 'delete[]' call?

Yes. The relevant difference seems to be the definition of X. With this definition (from all.cpp minus "unsigned") pr21302.ii miscompiles:

struct X
{
int a;
short b;
};

pr21302.ii's original definition however does not result in a miscompile:

struct X
{
char a;
int b;
};

pr21302.ii also miscompiles with these definitions:

struct X
{
int a;
char b;
};

struct X
{
short a;
char b;
};

struct X
{
short a;
short b;
};

struct X
{
char a;
char b;
};

struct X
{
char a;
short b;
};

@jsonn
Copy link
Contributor

jsonn commented Nov 2, 2014

Avoid changing the struct as padding etc will change whether the access is invalid or not.

@jsonn
Copy link
Contributor

jsonn commented Nov 4, 2014

Further attempt at bisection points to:

commit 6c76835c7546686bdef45553c1b5144543040f05
Author: Nadav Rotem nrotem@apple.com
Date: Tue Sep 3 21:33:17 2013 +0000

Enable late-vectorization by default.
This patch changes the default setting for the LateVectorization flag that controls where the loop-vectorizer is ran.

...with the cleanup from Chandler applied on top.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 12, 2014

Ping. I'd like to see this fixed for the 3.5.1 release.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 21, 2014

I made a reproducer that works without asan (see below). It also shows where we mistakenly touch memory.

$ bin/clang -O0 fail.cpp -o a.out && ./a.out; echo "Exit code: $?"
0000001111111111
Exit code: 0

$ bin/clang -O2 fail.cpp -o a.out && ./a.out; echo "Exit code: $?"
0000001111110011
Exit code: 254

$ cat fail.cpp
#include <stdio.h>
struct X
{
unsigned int a;
unsigned short b;
};

void g(const X *) {}
char array[16];

attribute((noinline)) void f(int m, int &n, X &x)
{
n /= 8;
x = (X
)&array[0];

if (m == 0)
{
for (int i = 0; i < n; ++i)
{
x[i].a = 0;
}
}

for (int i = 0; i < n; ++i)
{
x[i].b = 0;
}

if (n > 1)
{
g(x);
}
}

int main()
{
int n = 8;
X *x = 0;
for (int i = 0; i < 16; i++)
array[i] = 1;
f( 0, n, x );
for (int i = 0; i < 16; i++)
printf("%d", array[i]);
printf("\n");
return array[12] + array[13]-2;
}

I'll take a closer look at it.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 3, 2014

The problem is that vectorized version performs an additional iteration. That happens because the original loop isn't bottom-tested - it checks the termination condition at the beginning of each iteration. That breaks vectorizer's implicit assumption that all instructions in the loop body are executed the same number of times.

The fix is pretty simple, I sent it to maillist [1], and hopefully it will be committed soon.

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-December/079274.html

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 3, 2014

Fixed in trunk in r223170 and r223171.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants