38648 – MultiSource/Applications/siod fails on X86-64 with clang 7.0.0-rc1

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 38648 - MultiSource/Applications/siod fails on X86-64 with clang 7.0.0-rc1

Summary: MultiSource/Applications/siod fails on X86-64 with clang 7.0.0-rc1

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	7.0
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	James Y Knight

URL:
Keywords:

Depends on:
Blocks:	release-8.0.1
	Show dependency tree

Reported:	2018-08-20 13:13 PDT by Tom Stellard
Modified:	2019-06-06 14:22 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):	r357986 r362747 r362751

Attachments
Pre-processed source (202.15 KB, text/plain) 2018-08-22 13:56 PDT, Tom Stellard	Details
Reproducer script (1.69 KB, text/plain) 2018-08-27 18:06 PDT, Tom Stellard	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Stellard 2018-08-20 13:13:57 PDT

This is a regression, this test executes correctly with clang 6.0.1.  Running the command:

${TEST_SUITE_BUILDDIR}/MultiSource/Applications/siod/siod -v1 ${TEST_SUITE_SRCDIR}/test-suite/MultiSource/Applications/siod/test.scm

Produces an error:

Content-type: text/plain

10
9
8
7
6
5
4
3
2
1
-42 is negative
0 is zero
42 is positive
the value of (get 'answer 'value) is 42
the value of (get 'answer 'value) is xyzzy
the 33rd Fibonacci number is ERROR: damaged frame (see errobj)

I am going to try to bisect this.

Comment 1 Tom Stellard 2018-08-21 17:36:18 PDT

This failure was introduced by: r322957.

Comment 2 Sanjay Patel 2018-08-22 11:05:33 PDT

Can you attach pre-processed source or IR to repro the problem?

Comment 3 Tom Stellard 2018-08-22 13:56:14 PDT

Created attachment 20757 [details]
Pre-processed source

Here is the pre-processed source.  Here are the compile arguments that are used:

clang -DNDEBUG  -O2 -DNDEBUG   -w -Werror=date-time -D__USE_MISC -D__USE_GNU -D__USE_SVID -D__USE_XOPEN_EXTENDED -D__USE_XOPEN -Dunix -o slib.c.o   -c slib.i

Comment 4 Sanjay Patel 2018-08-23 16:11:28 PDT

(In reply to Tom Stellard from comment #3)
> Created attachment 20757 [details]
> Pre-processed source
> 
> Here is the pre-processed source.  Here are the compile arguments that are
> used:
> 
> clang -DNDEBUG  -O2 -DNDEBUG   -w -Werror=date-time -D__USE_MISC -D__USE_GNU
> -D__USE_SVID -D__USE_XOPEN_EXTENDED -D__USE_XOPEN -Dunix -o slib.c.o   -c
> slib.i

Thanks! 

To make it easier to A/B test this, I added a flag that disables the optimization that was added with r322957 here:
https://reviews.llvm.org/rL340540

At first glance, that optimization is doing what we expected in _c_mark_and_sweep():
	movabsq	$9223372036854775800, %r14 ## imm = 0x7FFFFFFFFFFFFFF8
	shrq	%rdx
	andq	%r14, %rdx

becomes:
	andq	$-8, %rdx ; avoided a giant constant and saved a register

But there's another diff later:
	movabsq	$-6148914691236517205, %r10 ## imm = 0xAAAAAAAAAAAAAAAB

becomes:
	movq	_heaps@GOTPCREL(%rip), %r10

That seems...odd. cc'ing Craig for ideas about how to reduce/debug this.

Comment 5 Craig Topper 2018-08-25 18:35:20 PDT

I'm not able to reproduce this error. I downloaded the 7.0-rc1 source and built it on linux, but this is the output i got from the test

Content-type: text/plain

10
9
8
7
6
5
4
3
2
1
-42 is negative
0 is zero
42 is positive
the value of (get 'answer 'value) is 42
the value of (get 'answer 'value) is xyzzy
the 33rd Fibonacci number is 3.52458e+06

Comment 6 Sanjay Patel 2018-08-27 12:48:40 PDT

Let's make sure we're compiling the same way - the "-O2 NDEBUG" appears to be necessary (and I'm not sure if that's how test-suite would build this by default). 

I set up test-suite on Linux x86-64, and I can reproduce the problem, but I'm not seeing that same suspicious asm that I saw before (that was on macOS using the attached source file).

Here's how I built/ran the siod benchmark:

~/mysandbox $ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod  --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --build-threads 4 --threads 4 --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=0 -mllvm -stats -save-temps"

This passes as shown in comment 5. 

Now, allow mask shrinking:

~/mysandbox $ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod  --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --build-threads 4 --threads 4 --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=1 -mllvm -stats -save-temps"

This fails as shown in the description.

Comment 7 Craig Topper 2018-08-27 17:32:13 PDT

I managed to get it to fail with and shrinking enabled, but I got "ERROR: too few arguements"

If you modify test.scm to calculate a different fibonacci number, do you find that it stops failing for smaller values? For me the failure doesn't happen calculating the 20th fibonacci number, but fails for 21.

Comment 8 Tom Stellard 2018-08-27 18:05:17 PDT

(In reply to Craig Topper from comment #7)
> I managed to get it to fail with and shrinking enabled, but I got "ERROR:
> too few arguements"
> 
> If you modify test.scm to calculate a different fibonacci number, do you
> find that it stops failing for smaller values? For me the failure doesn't
> happen calculating the 20th fibonacci number, but fails for 21.

For me the lowest number that fails for me is 23.

Comment 9 Tom Stellard 2018-08-27 18:06:19 PDT

Created attachment 20784 [details]
Reproducer script

Here is the script I've been using to reproduce.  You may have to adjust some of the paths if you haven't run cmake.

Comment 10 Sanjay Patel 2018-08-29 07:53:50 PDT

(In reply to Tom Stellard from comment #8)
> (In reply to Craig Topper from comment #7)
> > I managed to get it to fail with and shrinking enabled, but I got "ERROR:
> > too few arguements"
> > 
> > If you modify test.scm to calculate a different fibonacci number, do you
> > find that it stops failing for smaller values? For me the failure doesn't
> > happen calculating the 20th fibonacci number, but fails for 21.
> 
> For me the lowest number that fails for me is 23.

That's what I'm seeing too.

There were some unnecessary params in my earlier test-suite invocation. It can be reduced to:

~/mysandbox$ ./bin/lnt runtest test-suite --sandbox ~/mysandbox/ --cc ~/myllvm/release/bin/clang --test-suite ~/llvm-test-suite --only-test MultiSource/Applications/siod  --use-lit ~/myllvm/release/bin/llvm-lit --use-cmake cmake --cflags="-O2 -DNDEBUG -mllvm -x86-and-imm-shrink=1"

Comment 11 Hans Wennborg 2018-09-04 06:23:56 PDT

Do you think we can fix this for the release? It seems bad if it's a regression from 6.0.1.

Comment 12 Craig Topper 2018-09-06 09:29:11 PDT

I've looked a little closer and the ands that got optimized changed their mask from 0x7fff'ffff'ffff'fff8 in a register created by a movabsq that was hoisted outside of loop to using an and with a -8 immediate. The instruction proceeding all of the ands was a right shift by 1. So this seems correct to me.

But this freed up a register and changed the stack spills and reloads created by the register allocator.

The program has some questionable stack memory access behavior. Valgrind reports many conditional operations using uninitialized memory. I haven't checked it with address sanitizer yet. I wonder if the change in spills/reloads changed the behavior.

Comment 13 Craig Topper 2018-09-06 10:32:19 PDT

The gc_mark_and_sweep function creates a variable named "end_of_stack", but doesn't initialize it. It then passes the address of this variable as an end pointer to a function called mark_locations. The "start" address passed comes from a global variable that points to another uninitialized object in an earlier callers stack frame. The mark_locations code will note that start and end are out of order due to stack growing down and will swap them. Making start then point to the beginning of the end_of_stack object. Then it calls mark_locations_array which will read from start to end looking for pointers. So this will read the uninitialized value from end_of_stack and it will read the return addresses and other spills, etc. along the way.

Comment 14 Sanjay Patel 2018-09-07 07:11:57 PDT

(In reply to Craig Topper from comment #13)
> The gc_mark_and_sweep function creates a variable named "end_of_stack", but
> doesn't initialize it. It then passes the address of this variable as an end
> pointer to a function called mark_locations. The "start" address passed
> comes from a global variable that points to another uninitialized object in
> an earlier callers stack frame. The mark_locations code will note that start
> and end are out of order due to stack growing down and will swap them.
> Making start then point to the beginning of the end_of_stack object. Then it
> calls mark_locations_array which will read from start to end looking for
> pointers. So this will read the uninitialized value from end_of_stack and it
> will read the return addresses and other spills, etc. along the way.

Nice detective work!

Not sure what the policy is for this sort of thing. Do we resolve this bug as invalid? Are we supposed to flag the test as potentially misbehaving or fix the test itself?

Comment 15 Hans Wennborg 2018-09-07 07:50:15 PDT

(In reply to Sanjay Patel from comment #14)
> (In reply to Craig Topper from comment #13)
> > The gc_mark_and_sweep function creates a variable named "end_of_stack", but
> > doesn't initialize it. It then passes the address of this variable as an end
> > pointer to a function called mark_locations. The "start" address passed
> > comes from a global variable that points to another uninitialized object in
> > an earlier callers stack frame. The mark_locations code will note that start
> > and end are out of order due to stack growing down and will swap them.
> > Making start then point to the beginning of the end_of_stack object. Then it
> > calls mark_locations_array which will read from start to end looking for
> > pointers. So this will read the uninitialized value from end_of_stack and it
> > will read the return addresses and other spills, etc. along the way.
> 
> Nice detective work!

+1, thanks for digging into this!

> 
> Not sure what the policy is for this sort of thing. Do we resolve this bug
> as invalid? Are we supposed to flag the test as potentially misbehaving or
> fix the test itself?

At least it means I won't worry about it for 7.0.0, so unblocking :-)

I think for the test-suite, it's common to fix the test to not misbehave if possible. If that's not possible / too inconvenient, I suppose we should flag or delete it since it's not adding value.

Comment 16 Craig Topper 2019-04-01 17:44:24 PDT

One our engineers finally figured out the real problem here https://reviews.llvm.org/D60039

If I remember right from my analysis, Sanjay's patch did change some spill behavior here. I just didn't understand at the time how that would effect things.

Comment 17 Tom Stellard 2019-04-25 10:52:54 PDT

Hi James,

Do you think this is OK to merge into 8.0.1?

Comment 18 James Y Knight 2019-04-25 11:46:20 PDT

Yes, should be fine.

Comment 19 Tom Stellard 2019-06-06 14:22:34 PDT

Merged: r362747 and r362751.