Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldecod has undefined behavior, fails in x86 jit on linux #1335

Closed
llvmbot opened this issue Oct 22, 2006 · 29 comments
Closed

ldecod has undefined behavior, fails in x86 jit on linux #1335

llvmbot opened this issue Oct 22, 2006 · 29 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla miscompilation

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 22, 2006

Bugzilla Link 963
Resolution FIXED
Resolved on Feb 22, 2010 12:41
Version trunk
OS Linux
Attachments Safe portion of ldecod that bugpoint found., Buggy portion of ldecod that bugpoint found
Reporter LLVM Bugzilla Contributor
CC @asl,@lattner

Extended Description

The ldecod test shows a significant difference in the output of its floating
point numbers (~0.2 to ~1.3) compared to GCC. This is most notable on the JIT.
Furthermore the JIT output does not agree with the CBE, and LLC output.

Running "make bugpoint-jit" produces:

*** The following function is being miscompiled: dpb_split_field_cond_true967
You can reproduce the problem with the command line:
lli -load ./bugpoint.safe.bc.cbe.c.so bugpoint.test.bc -i
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test.264 -o
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_dec.yuv -r
/proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_rec.yuv
The shared object was created with:
llc -march=c bugpoint.safe.bc -o temporary.c
gcc -xc temporary.c -O2 -o ./bugpoint.safe.bc.cbe.c.so -shared \
-fno-strict-aliasing

Looking at the bugpoint.safe.bc file, the interesting bit is this block:
cond_true967: ; preds = %newFuncRoot
%tmp974 = getelementptr %struct.StorablePicture* %tmp957.reload, int 0,
uint 30 ; <short***> [#uses=1]
%tmp975 = load short*** %tmp974 ; <short**> [#uses=1]
%tmp977 = getelementptr short** %tmp975, int %tmp675.reload
; <short**> [#uses=1]
%tmp978 = load short** %tmp977 ; <short*> [#uses=1]
%tmp980 = getelementptr short* %tmp978, int %tmp678.reload
; <short*> [#uses=1]
%tmp981 = load short* %tmp980 ; [#uses=1]
%tmp981 = cast short %tmp981 to int ; [#uses=1]
%tmp986 = getelementptr %struct.StorablePicture* %tmp957.reload, int 0,
uint 5, int %tmp981, int 1, int %tmp916.reload ; <long*> [#uses=1]
%tmp987 = load long* %tmp986 ; [#uses=1]
br label %cond_next989.exitStub

which apparently has nothing to do with floating point calcuations.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 22, 2006

Oops. There's a typo in the initial comment. It should say:

Looking at the bugpoint.test.bc file ..

instead of:

Looking at the bugpoint.safe.bc file ..

@lattner
Copy link
Collaborator

lattner commented Oct 30, 2006

Hi Reid, thanks for filing this. In order to debug this, please do the following steps:

  1. run with lli in GDB with -debug-only=jit:
    gdb --args lli -debug-only=jit -load ./bugpoint.safe.bc.cbe.c.so bugpoint.test.bc -i
    /proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test.264 -o
    /proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_dec.yuv -r
    /proj/llvm/llvm-test-nc/MultiSource/Applications/JM/ldecod/data/test_rec.yuv

  2. Set a breakpoint at JITEmitter.cpp:856, run until it.

  3. You should hit it twice, and it should print out information about the start/size of each function jit'd.
    For each one, please tell gdb "disassemble start start+size" Since size (IIRC) is printed in hex, give it a
    0x prefix to gdb.

Please attach the output of #​3 along with the output of llc on the test .bc file to this bug.

Thanks,

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

JIT Code:
Dump of assembler code from 0xb64038 to 0xb6409b:
0x00b64038: sub $0xc,%esp
0x00b6403b: mov %esi,0x8(%esp)
0x00b6403f: mov %edi,0x4(%esp)
0x00b64043: mov %ebx,(%esp)
0x00b64046: mov 0x1c(%esp),%eax
0x00b6404a: mov 0x14(%esp),%ecx
0x00b6404e: mov 0x20(%esp),%edx
0x00b64052: mov 0x18(%esp),%esi
0x00b64056: mov 0x10(%esp),%edi
0x00b6405a: jmp 0xb64073
0x00b6405f: mov %ecx,0x4(%edx)
0x00b64062: mov %eax,(%edx)
0x00b64064: mov (%esp),%ebx
0x00b64067: mov 0x4(%esp),%edi
0x00b6406b: mov 0x8(%esp),%esi
0x00b6406f: add $0xc,%esp
0x00b64072: ret
0x00b64073: mov 0x4d5e8(%eax),%ebx
0x00b64079: mov (%ebx,%ecx,4),%ecx
0x00b6407c: movswl (%ecx,%edi,2),%ecx
0x00b64080: imul $0x630,%ecx,%ecx
0x00b64086: add %ecx,%eax
0x00b64088: mov 0x120(%eax,%esi,8),%ecx
0x00b6408f: mov 0x11c(%eax,%esi,8),%eax
0x00b64096: jmp 0xb6405f
End of assembler dump.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

LLC Generated Code:
.text
.align 16
.globl dpb_split_field_cond_true967
dpb_split_field_cond_true967:
subl $12, %esp
movl %esi, 8(%esp)
movl %edi, 4(%esp)
movl %ebx, (%esp)
movl 28(%esp), %eax
movl 20(%esp), %ecx
movl 32(%esp), %edx
movl 24(%esp), %esi
movl 16(%esp), %edi
.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub
movl %ecx, 4(%edx)
movl %eax, (%edx)
movl (%esp), %ebx
movl 4(%esp), %edi
movl 8(%esp), %esi
addl $12, %esp
ret
.size dpb_split_field_cond_true967, .-dpb_split_field_cond_true967

    .align  16
    .globl  main

main:
subl $12, %esp
fnstcw 10(%esp)
movb $2, 11(%esp)
fldcw 10(%esp)
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
call llvm_bugpoint_old_main
addl $12, %esp
ret
.size main, .-main

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

Bytecode for test function
Note: I reran bugpoint and slightly different bytecode was reduced by it than
the original run. The failure is the same (fp precision). That's why I've
attached this new version of the bytecode.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

The difference between the gdb disassembly and llc output makes no sense. There is whole chunk of
code missing from the llc version:

0x00b6405a: jmp 0xb64073
0x00b6405f: mov %ecx,0x4(%edx)
0x00b64062: mov %eax,(%edx)
0x00b64064: mov (%esp),%ebx
0x00b64067: mov 0x4(%esp),%edi
0x00b6406b: mov 0x8(%esp),%esi
0x00b6406f: add $0xc,%esp
0x00b64072: ret

That shouldn't happen.

Not that it should make that kind of difference. But Reid, are you generating the llc output with -
relocation-mode=static?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

Here's the llc -relocation-model=static code:
.text
.align 16
.globl dpb_split_field_cond_true967
dpb_split_field_cond_true967:
subl $12, %esp
movl %esi, 8(%esp)
movl %edi, 4(%esp)
movl %ebx, (%esp)
movl 28(%esp), %eax
movl 20(%esp), %ecx
movl 32(%esp), %edx
movl 24(%esp), %esi
movl 16(%esp), %edi
.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub
movl %ecx, 4(%edx)
movl %eax, (%edx)
movl (%esp), %ebx
movl 4(%esp), %edi
movl 8(%esp), %esi
addl $12, %esp
ret
.size dpb_split_field_cond_true967, .-dpb_split_field_cond_true967

    .align  16
    .globl  main

main:
subl $12, %esp
fnstcw 10(%esp)
movb $2, 11(%esp)
fldcw 10(%esp)
movl 20(%esp), %eax
movl %eax, 4(%esp)
movl 16(%esp), %eax
movl %eax, (%esp)
call llvm_bugpoint_old_main
addl $12, %esp
ret
.size main, .-main

The only difference I see is the additional block and jump to it in the lli
version of the code.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2006

I went through this carefully. The only differences are:

  1. The LLC version puts an "l" at the end of the mnemonics. I'm assuming this is
    just a difference between gdb and llc. However, if they truly are different
    instruction codes, perhaps that's an issue.
  2. Ignoring #​1, the JIT code is identical to the LLC code except for the extra
    unconditional branch. While the LLC code executes straight through with this:

.BB1_2: #cond_true967
movl 316904(%eax), %ebx
movl (%ebx,%ecx,4), %ecx
movswl (%ecx,%edi,2), %ecx
imull $1584, %ecx, %ecx
addl %ecx, %eax
movl 288(%eax,%esi,8), %ecx
movl 284(%eax,%esi,8), %eax
.BB1_1: #cond_next989.exitStub

the JIT code branches to do the same thing:
0x00b6405a: jmp 0xb64073 # Branch down to 4073
0x00b6405f: mov %ecx,0x4(%edx)
... snip ...
0x00b64073: mov 0x4d5e8(%eax),%ebx
0x00b64079: mov (%ebx,%ecx,4),%ecx
0x00b6407c: movswl (%ecx,%edi,2),%ecx
0x00b64080: imul $0x630,%ecx,%ecx
0x00b64086: add %ecx,%eax
0x00b64088: mov 0x120(%eax,%esi,8),%ecx
0x00b6408f: mov 0x11c(%eax,%esi,8),%eax
0x00b64096: jmp 0xb6405f #Branch back to 405f

Those are the only differences. Furthermore, I don't see how its affecting
floating point output. Did bugpoint find the wrong bug?

@lattner
Copy link
Collaborator

lattner commented Nov 3, 2006

Evan, please look into this and try to determine if it's bogus or not. If not, I'd like to have this fixed for
the release (we branch on monday) but if it can't be fixed/figured out, we should just close it.

Thanks,

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

I can't really figure out why there is an extra block of code under JIT case. So this is a "can't fix / figured
out" for 1.9. Should I close it?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

No, you shouldn't. This is the reason why all the ldecod tests have been failing
on x86/Linux. It needs to be figured out. We can hold it over as a known
(unknown?) issue for the release, however.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

An afterthought:

You may be missing the point. The "extra block" doesn't matter. The LLC and JIT
code are identical except JIT branches to that block and branches back. Weird,
but no big deal. The real problem is that the values produced by this code do
not conform to the values produced by llc, cbe or gcc. Here's last nights
nightly test result:

******************** TEST (jit) 'ldecod' FAILED! ********************
Execution Context Diff:
/proj/llvm/nightly/build/llvm/Debug/bin/fpcmp: Compared: 3.535600e+00 and
4.472200e+00: diff = 2.094271e-01
Out of tolerance: rel/abs: 2.000000e-02/0.000000e+00
******************** TEST (jit) 'ldecod' ****************************

I will attach the jit and nat output.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

Native Output
This is what hte JIT should produce.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

JIT Output
This is what the JIT actually produces.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 7, 2006

Another thought:

The difference in output may not be the result of any JIT'd code at all, it
might be in the ExecutionEngine or some other factor.

@lattner
Copy link
Collaborator

lattner commented Jan 7, 2007

is this still an issue?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 7, 2007

Yes. I tried to fix it but I can't detect where it is producing differeing
binary output.

@lattner
Copy link
Collaborator

lattner commented Jan 8, 2007

Anton, if you get a chance, can you see if this reproduces for you?

@asl
Copy link
Collaborator

asl commented Jan 8, 2007

Ok. Will try within next 3-4 days.

@asl
Copy link
Collaborator

asl commented Jan 18, 2007

JIT fails for me too.

@lattner
Copy link
Collaborator

lattner commented Jan 24, 2007

Reid/Anton, can you see if this still fails with llvm/llvm-bugzilla-archive#1130 fixed?

Thanks,

-Chris

@asl
Copy link
Collaborator

asl commented Feb 3, 2007

It is still failing for me

@asl
Copy link
Collaborator

asl commented Feb 4, 2007

If we're enable trace feature of ldecod (define.h => TRACE define) the output
for all binaries (native, llc, cbe, lli) is the same.

@lattner
Copy link
Collaborator

lattner commented Feb 4, 2007

have you tried running the native version of ldecod under valgrind?

@lattner
Copy link
Collaborator

lattner commented Feb 4, 2007

To be clear, the x86-jit failed because the program was reading uninitialized memory, so it was a bug in
the program, not llvm.

@asl
Copy link
Collaborator

asl commented Feb 4, 2007

Well, to be clearer :) Program reads uninitialized memory, because provided
input data was wrong: sequence to decode contained 15 frames, but reference
sequence (to compare with) - just 2. So, there were 2 bugs: absence of size
checking in the program and bad input in the testcase :)

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla miscompilation
Projects
None yet
Development

No branches or pull requests

3 participants