New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiSource/Applications/siod fails on X86-64 with clang 7.0.0-rc1 #37996
Comments
assigned to @jyknight |
This failure was introduced by: r322957. |
Can you attach pre-processed source or IR to repro the problem? |
Pre-processed source clang -DNDEBUG -O2 -DNDEBUG -w -Werror=date-time -D__USE_MISC -D__USE_GNU -D__USE_SVID -D__USE_XOPEN_EXTENDED -D__USE_XOPEN -Dunix -o slib.c.o -c slib.i |
Thanks! To make it easier to A/B test this, I added a flag that disables the optimization that was added with r322957 here: At first glance, that optimization is doing what we expected in _c_mark_and_sweep(): becomes: But there's another diff later: becomes: That seems...odd. cc'ing Craig for ideas about how to reduce/debug this. |
I'm not able to reproduce this error. I downloaded the 7.0-rc1 source and built it on linux, but this is the output i got from the test Content-type: text/plain 10 |
Let's make sure we're compiling the same way - the "-O2 NDEBUG" appears to be necessary (and I'm not sure if that's how test-suite would build this by default). I set up test-suite on Linux x86-64, and I can reproduce the problem, but I'm not seeing that same suspicious asm that I saw before (that was on macOS using the attached source file). Here's how I built/ran the siod benchmark: ~/mysandbox $ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --build-threads 4 --threads 4 --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=0 -mllvm -stats -save-temps" This passes as shown in comment 5. Now, allow mask shrinking: ~/mysandbox $ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --build-threads 4 --threads 4 --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=1 -mllvm -stats -save-temps" This fails as shown in the description. |
I managed to get it to fail with and shrinking enabled, but I got "ERROR: too few arguements" If you modify test.scm to calculate a different fibonacci number, do you find that it stops failing for smaller values? For me the failure doesn't happen calculating the 20th fibonacci number, but fails for 21. |
For me the lowest number that fails for me is 23. |
Reproducer script |
That's what I'm seeing too. There were some unnecessary params in my earlier test-suite invocation. It can be reduced to: ~/mysandbox$ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=1" |
Do you think we can fix this for the release? It seems bad if it's a regression from 6.0.1. |
I've looked a little closer and the ands that got optimized changed their mask from 0x7fff'ffff'ffff'fff8 in a register created by a movabsq that was hoisted outside of loop to using an and with a -8 immediate. The instruction proceeding all of the ands was a right shift by 1. So this seems correct to me. But this freed up a register and changed the stack spills and reloads created by the register allocator. The program has some questionable stack memory access behavior. Valgrind reports many conditional operations using uninitialized memory. I haven't checked it with address sanitizer yet. I wonder if the change in spills/reloads changed the behavior. |
The gc_mark_and_sweep function creates a variable named "end_of_stack", but doesn't initialize it. It then passes the address of this variable as an end pointer to a function called mark_locations. The "start" address passed comes from a global variable that points to another uninitialized object in an earlier callers stack frame. The mark_locations code will note that start and end are out of order due to stack growing down and will swap them. Making start then point to the beginning of the end_of_stack object. Then it calls mark_locations_array which will read from start to end looking for pointers. So this will read the uninitialized value from end_of_stack and it will read the return addresses and other spills, etc. along the way. |
Nice detective work! Not sure what the policy is for this sort of thing. Do we resolve this bug as invalid? Are we supposed to flag the test as potentially misbehaving or fix the test itself? |
+1, thanks for digging into this!
At least it means I won't worry about it for 7.0.0, so unblocking :-) I think for the test-suite, it's common to fix the test to not misbehave if possible. If that's not possible / too inconvenient, I suppose we should flag or delete it since it's not adding value. |
One our engineers finally figured out the real problem here https://reviews.llvm.org/D60039 If I remember right from my analysis, Sanjay's patch did change some spill behavior here. I just didn't understand at the time how that would effect things. |
Hi James, Do you think this is OK to merge into 8.0.1? |
Yes, should be fine. |
Merged: r362747 and r362751. |
mentioned in issue #40566 |
Extended Description
This is a regression, this test executes correctly with clang 6.0.1. Running the command:
${TEST_SUITE_BUILDDIR}/MultiSource/Applications/siod/siod -v1 ${TEST_SUITE_SRCDIR}/test-suite/MultiSource/Applications/siod/test.scm
Produces an error:
Content-type: text/plain
10
9
8
7
6
5
4
3
2
1
-42 is negative
0 is zero
42 is positive
the value of (get 'answer 'value) is 42
the value of (get 'answer 'value) is xyzzy
the 33rd Fibonacci number is ERROR: damaged frame (see errobj)
I am going to try to bisect this.
The text was updated successfully, but these errors were encountered: