ldecod has undefined behavior, fails in x86 jit on linux #1335

llvmbot · 2006-10-22T20:50:39Z


Bugzilla Link	963
Resolution	FIXED
Resolved on	Feb 22, 2010 12:41
Version	trunk
OS	Linux
Attachments	Safe portion of ldecod that bugpoint found., Buggy portion of ldecod that bugpoint found
Reporter	LLVM Bugzilla Contributor
CC	@asl,@lattner

Extended Description

The ldecod test shows a significant difference in the output of its floating
point numbers (~0.2 to ~1.3) compared to GCC. This is most notable on the JIT.
Furthermore the JIT output does not agree with the CBE, and LLC output.

Running "make bugpoint-jit" produces:

*** The following function is being miscompiled: dpb_split_field_cond_true967
You can reproduce the problem with the command line:
lli -load ./bugpoint.safe.bc.cbe.c.so bugpoint.test.bc -i
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test.264 -o
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_dec.yuv -r
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_rec.yuv
The shared object was created with:
llc -march=c bugpoint.safe.bc -o temporary.c
gcc -xc temporary.c -O2 -o ./bugpoint.safe.bc.cbe.c.so -shared \
-fno-strict-aliasing

Looking at the bugpoint.safe.bc file, the interesting bit is this block:
cond_true967: ; preds = %newFuncRoot
%tmp974 = getelementptr %struct.StorablePicture* %tmp957.reload, int 0,
uint 30 ; <short***> [#uses=1]
%tmp975 = load short*** %tmp974 ; <short**> [#uses=1]
%tmp977 = getelementptr short** %tmp975, int %tmp675.reload
; <short**> [#uses=1]
%tmp978 = load short** %tmp977 ; <short*> [#uses=1]
%tmp980 = getelementptr short* %tmp978, int %tmp678.reload
; <short*> [#uses=1]
%tmp981 = load short* %tmp980 ; [#uses=1]
%tmp981 = cast short %tmp981 to int ; [#uses=1]
%tmp986 = getelementptr %struct.StorablePicture* %tmp957.reload, int 0,
uint 5, int %tmp981, int 1, int %tmp916.reload ; <long*> [#uses=1]
%tmp987 = load long* %tmp986 ; [#uses=1]
br label %cond_next989.exitStub

which apparently has nothing to do with floating point calcuations.

llvmbot · 2006-10-22T20:57:44Z

Oops. There's a typo in the initial comment. It should say:

Looking at the bugpoint.test.bc file ..

instead of:

Looking at the bugpoint.safe.bc file ..

lattner · 2006-10-30T21:38:16Z

Hi Reid, thanks for filing this. In order to debug this, please do the following steps:

run with lli in GDB with -debug-only=jit:
gdb --args lli -debug-only=jit -load ./bugpoint.safe.bc.cbe.c.so bugpoint.test.bc -i
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test.264 -o
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_dec.yuv -r
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_rec.yuv
Set a breakpoint at JITEmitter.cpp:856, run until it.
You should hit it twice, and it should print out information about the start/size of each function jit'd.
For each one, please tell gdb "disassemble start start+size" Since size (IIRC) is printed in hex, give it a
0x prefix to gdb.

Please attach the output of #3 along with the output of llc on the test .bc file to this bug.

Thanks,

-Chris

llvmbot · 2006-11-02T09:42:43Z

Output from gdb session on ldecod in JIT as Chris Suggested

llvmbot · 2006-11-02T09:44:48Z

JIT Code:
Dump of assembler code from 0xb64038 to 0xb6409b:
0x00b64038: sub $0xc,%esp
0x00b6403b: mov %esi,0x8(%esp)
0x00b6403f: mov %edi,0x4(%esp)
0x00b64043: mov %ebx,(%esp)
0x00b64046: mov 0x1c(%esp),%eax
0x00b6404a: mov 0x14(%esp),%ecx
0x00b6404e: mov 0x20(%esp),%edx
0x00b64052: mov 0x18(%esp),%esi
0x00b64056: mov 0x10(%esp),%edi
0x00b6405a: jmp 0xb64073
0x00b6405f: mov %ecx,0x4(%edx)
0x00b64062: mov %eax,(%edx)
0x00b64064: mov (%esp),%ebx
0x00b64067: mov 0x4(%esp),%edi
0x00b6406b: mov 0x8(%esp),%esi
0x00b6406f: add $0xc,%esp
0x00b64072: ret
0x00b64073: mov 0x4d5e8(%eax),%ebx
0x00b64079: mov (%ebx,%ecx,4),%ecx
0x00b6407c: movswl (%ecx,%edi,2),%ecx
0x00b64080: imul $0x630,%ecx,%ecx
0x00b64086: add %ecx,%eax
0x00b64088: mov 0x120(%eax,%esi,8),%ecx
0x00b6408f: mov 0x11c(%eax,%esi,8),%eax
0x00b64096: jmp 0xb6405f
End of assembler dump.

llvmbot · 2006-11-02T09:46:07Z

LLC Generated Code:
.text
.align 16
.globl dpb_split_field_cond_true967
dpb_split_field_cond_true967:
subl $12, %esp
movl %esi, 8(%esp)
movl %edi, 4(%esp)
movl %ebx, (%esp)
movl 28(%esp), %eax
movl 20(%esp), %ecx
movl 32(%esp), %edx
movl 24(%esp), %esi
movl 16(%esp), %edi
.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub
movl %ecx, 4(%edx)
movl %eax, (%edx)
movl (%esp), %ebx
movl 4(%esp), %edi
movl 8(%esp), %esi
addl $12, %esp
ret
.size dpb_split_field_cond_true967, .-dpb_split_field_cond_true967

    .align  16
    .globl  main

main:
subl $12, %esp
fnstcw 10(%esp)
movb $2, 11(%esp)
fldcw 10(%esp)
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
call llvm_bugpoint_old_main
addl $12, %esp
ret
.size main, .-main

llvmbot · 2006-11-02T09:48:33Z

LLVM Assembly For Test Function

llvmbot · 2006-11-02T09:50:36Z

Bytecode for test function
Note: I reran bugpoint and slightly different bytecode was reduced by it than
the original run. The failure is the same (fp precision). That's why I've
attached this new version of the bytecode.

llvmbot · 2006-11-02T10:26:34Z

The difference between the gdb disassembly and llc output makes no sense. There is whole chunk of
code missing from the llc version:

0x00b6405a: jmp 0xb64073
0x00b6405f: mov %ecx,0x4(%edx)
0x00b64062: mov %eax,(%edx)
0x00b64064: mov (%esp),%ebx
0x00b64067: mov 0x4(%esp),%edi
0x00b6406b: mov 0x8(%esp),%esi
0x00b6406f: add $0xc,%esp
0x00b64072: ret

That shouldn't happen.

Not that it should make that kind of difference. But Reid, are you generating the llc output with -
relocation-mode=static?

llvmbot · 2006-11-02T18:51:27Z

Here's the llc -relocation-model=static code:
.text
.align 16
.globl dpb_split_field_cond_true967
dpb_split_field_cond_true967:
subl $12, %esp
movl %esi, 8(%esp)
movl %edi, 4(%esp)
movl %ebx, (%esp)
movl 28(%esp), %eax
movl 20(%esp), %ecx
movl 32(%esp), %edx
movl 24(%esp), %esi
movl 16(%esp), %edi
.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub
movl %ecx, 4(%edx)
movl %eax, (%edx)
movl (%esp), %ebx
movl 4(%esp), %edi
movl 8(%esp), %esi
addl $12, %esp
ret
.size dpb_split_field_cond_true967, .-dpb_split_field_cond_true967

    .align  16
    .globl  main

main:
subl $12, %esp
fnstcw 10(%esp)
movb $2, 11(%esp)
fldcw 10(%esp)
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
call llvm_bugpoint_old_main
addl $12, %esp
ret
.size main, .-main

The only difference I see is the additional block and jump to it in the lli
version of the code.

llvmbot · 2006-11-02T19:14:38Z

I went through this carefully. The only differences are:

The LLC version puts an "l" at the end of the mnemonics. I'm assuming this is
just a difference between gdb and llc. However, if they truly are different
instruction codes, perhaps that's an issue.
Ignoring #1, the JIT code is identical to the LLC code except for the extra
unconditional branch. While the LLC code executes straight through with this:

.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub

the JIT code branches to do the same thing:
0x00b6405a: jmp 0xb64073 # Branch down to 4073
0x00b6405f: mov %ecx,0x4(%edx)
... snip ...
0x00b64073: mov 0x4d5e8(%eax),%ebx
0x00b64079: mov (%ebx,%ecx,4),%ecx
0x00b6407c: movswl (%ecx,%edi,2),%ecx
0x00b64080: imul $0x630,%ecx,%ecx
0x00b64086: add %ecx,%eax
0x00b64088: mov 0x120(%eax,%esi,8),%ecx
0x00b6408f: mov 0x11c(%eax,%esi,8),%eax
0x00b64096: jmp 0xb6405f #Branch back to 405f

Those are the only differences. Furthermore, I don't see how its affecting
floating point output. Did bugpoint find the wrong bug?

lattner · 2006-11-03T04:03:39Z

Evan, please look into this and try to determine if it's bogus or not. If not, I'd like to have this fixed for
the release (we branch on monday) but if it can't be fixed/figured out, we should just close it.

Thanks,

-Chris

llvmbot · 2006-11-07T01:22:43Z

I can't really figure out why there is an extra block of code under JIT case. So this is a "can't fix / figured
out" for 1.9. Should I close it?

llvmbot · 2006-11-07T01:28:48Z

No, you shouldn't. This is the reason why all the ldecod tests have been failing
on x86/Linux. It needs to be figured out. We can hold it over as a known
(unknown?) issue for the release, however.

llvmbot · 2006-11-07T01:32:00Z

An afterthought:

You may be missing the point. The "extra block" doesn't matter. The LLC and JIT
code are identical except JIT branches to that block and branches back. Weird,
but no big deal. The real problem is that the values produced by this code do
not conform to the values produced by llc, cbe or gcc. Here's last nights
nightly test result:

******************** TEST (jit) 'ldecod' FAILED! ********************
Execution Context Diff:
/proj/llvm/nightly/build/llvm/Debug/bin/fpcmp: Compared: 3.535600e+00 and
4.472200e+00: diff = 2.094271e-01
Out of tolerance: rel/abs: 2.000000e-02/0.000000e+00
******************** TEST (jit) 'ldecod' ****************************

I will attach the jit and nat output.

llvmbot · 2006-11-07T01:33:27Z

Native Output
This is what hte JIT should produce.

llvmbot · 2006-11-07T01:33:59Z

JIT Output
This is what the JIT actually produces.

llvmbot · 2006-11-07T01:35:21Z

Another thought:

The difference in output may not be the result of any JIT'd code at all, it
might be in the ExecutionEngine or some other factor.

lattner · 2007-01-07T10:57:12Z

is this still an issue?

llvmbot · 2007-01-07T18:51:50Z

Yes. I tried to fix it but I can't detect where it is producing differeing
binary output.

lattner · 2007-01-08T03:43:39Z

Anton, if you get a chance, can you see if this reproduces for you?

asl · 2007-01-08T16:47:56Z

Ok. Will try within next 3-4 days.

asl · 2007-01-18T20:23:05Z

JIT fails for me too.

lattner · 2007-01-24T20:33:02Z

Reid/Anton, can you see if this still fails with llvm/llvm-bugzilla-archive#1130 fixed?

Thanks,

-Chris

asl · 2007-02-03T22:30:31Z

It is still failing for me

asl · 2007-02-04T01:34:24Z

If we're enable trace feature of ldecod (define.h => TRACE define) the output
for all binaries (native, llc, cbe, lli) is the same.

lattner · 2007-02-04T01:35:46Z

have you tried running the native version of ldecod under valgrind?

asl · 2007-02-04T16:40:46Z

Fixed with

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043880.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043881.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043883.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043882.html

Reopen bug in case of any problems.

lattner · 2007-02-04T20:56:05Z

To be clear, the x86-jit failed because the program was reading uninitialized memory, so it was a bug in
the program, not llvm.

asl · 2007-02-04T21:00:34Z

Well, to be clearer :) Program reads uninitialized memory, because provided
input data was wrong: sequence to decode contained 15 frames, but reference
sequence (to compare with) - just 2. So, there were 2 bugs: absence of size
checking in the program and bad input in the testcase :)

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ldecod has undefined behavior, fails in x86 jit on linux #1335

ldecod has undefined behavior, fails in x86 jit on linux #1335

llvmbot commented Oct 22, 2006

llvmbot commented Oct 22, 2006

lattner commented Oct 30, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

lattner commented Nov 3, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

lattner commented Jan 7, 2007

llvmbot commented Jan 7, 2007

lattner commented Jan 8, 2007

asl commented Jan 8, 2007

asl commented Jan 18, 2007

lattner commented Jan 24, 2007

asl commented Feb 3, 2007

asl commented Feb 4, 2007

lattner commented Feb 4, 2007

asl commented Feb 4, 2007

lattner commented Feb 4, 2007

asl commented Feb 4, 2007

ldecod has undefined behavior, fails in x86 jit on linux #1335

ldecod has undefined behavior, fails in x86 jit on linux #1335

Comments

llvmbot commented Oct 22, 2006

Extended Description

llvmbot commented Oct 22, 2006

lattner commented Oct 30, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

llvmbot commented Nov 2, 2006

lattner commented Nov 3, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

llvmbot commented Nov 7, 2006

lattner commented Jan 7, 2007

llvmbot commented Jan 7, 2007

lattner commented Jan 8, 2007

asl commented Jan 8, 2007

asl commented Jan 18, 2007

lattner commented Jan 24, 2007

asl commented Feb 3, 2007

asl commented Feb 4, 2007

lattner commented Feb 4, 2007

asl commented Feb 4, 2007

lattner commented Feb 4, 2007

asl commented Feb 4, 2007