5201 – JIT stub offsets silently truncated to 32 bits in call instruction

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 5201 - JIT stub offsets silently truncated to 32 bits in call instruction

Summary: JIT stub offsets silently truncated to 32 bits in call instruction

Status:	RESOLVED FIXED

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Jeffrey Yasskin

URL:
Keywords:

Duplicates (1):	6758 (view as bug list)
Depends on:
Blocks:

Reported:	2009-10-15 13:45 PDT by Collin Winter
Modified:	2015-11-17 11:45 PST (History)
CC List:	13 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add asserts, against trunk r84189 (953 bytes, patch) 2009-10-15 13:45 PDT, Collin Winter	Details
Add a test for the bug, against trunk r84531 (5.52 KB, patch) 2009-10-19 16:55 PDT, Collin Winter	Details
New test, hopefully easier to port and less fragile (5.96 KB, patch) 2009-11-10 18:52 PST, Jeffrey Yasskin	Details
PowerPC JIT workaround (4.18 KB, patch) 2010-04-04 06:19 PDT, Török Edwin	Details
new patch for PowerPC JIT workaround (3.79 KB, application/pgp-keys) 2010-08-04 15:46 PDT, Török Edwin	Details
pr5201-ppcfix-llvm2.8-rc1-openjdk-fail.log (3.61 KB, text/plain) 2010-09-10 08:47 PDT, Xerxes Rånby	Details
Show Obsolete (2) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Collin Winter 2009-10-15 13:45:36 PDT

Created attachment 3661 [details]
Add asserts, against trunk r84189

On x86-64 systems, we're observing segfaults that we believe to have tracked down to a codegen problem in the JIT:

Disassembling the buggy function in gdb gives us this:
... asm ...
0x00002aaab33482b1 <_23_u_23___init__45+673>:	incq   (%r14)
0x00002aaab33482b4 <_23_u_23___init__45+676>:	mov    0x18(%rsp),%rbx
0x00002aaab33482b9 <_23_u_23___init__45+681>:	mov    %r14,(%rbx)
0x00002aaab33482bc <_23_u_23___init__45+684>:	mov    $0x2dd8b48,%rbx
0x00002aaab33482c6 <_23_u_23___init__45+694>:	mov    0x18(%rbx),%rsi
0x00002aaab33482ca <_23_u_23___init__45+698>:	mov    %r14,%rdi
0x00002aaab33482cd <_23_u_23___init__45+701>:	callq  0x2aaaa5c810b0
0x00002aaab33482d2 <_23_u_23___init__45+706>:	mov    (%r14),%rbx
0x00002aaab33482d5 <_23_u_23___init__45+709>:	dec    %rbx
... asm ...

(gdb) x 0x2aaaa5c810b0
0x2aaaa5c810b0:	Cannot access memory at address 0x2aaaa5c810b0

That callq  0x2aaaa5c810b0 instruction is the problem: it's calling to invalid memory. Looking at the other callq instructions in this function, gdb can't 
access any of them. callq addresses in other functions can be accessed by gdb.

What's happening is that now that the JIT memory manager can allocate multiple code slabs, there's no protection against having stub slabs and code slabs more than 32 bits distant from each other. If the memory manager allocates a new code slab far away from the stub, the offset will be truncated to 32 bits, which means that Bad Things Happen.

I'm working on a reduced test case, but in the meantime, I think the attached patch should be applied to trunk so that the offset is no longer silently truncated.

Applying this patch and passing -debug-only=jit will show this output when running part of Unladen Swallow's test suite:

[...tests...]
test_importhooks
test_enumerate
test_getopt
test_codecencodings_cn
JIT: Allocating another slab of memory for function.
python: /usr/local/google/collinwinter/us/trunk/Util/llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:652: 
void*<unnamed>::JITEmitter::getPointerToGlobal(llvm::GlobalValue*, void*, bool): Assertion `(Offset & 0xFFFFFFFF) == Offset && "Offset 
too big for 32 bits"' failed.
Stack dump:
0.	Running pass 'X86 Machine Code Emitter' on function '@"#u#readline241"

Without this, you'd see a segfault instead.

Comment 1 Collin Winter 2009-10-19 16:55:24 PDT

Created attachment 3677 [details]
Add a test for the bug, against trunk r84531

This test fails (correctly) when run on an x86-64 system with 32GB virtual memory (8GB physical memory). This is with a Debug build of LLVM on Linux.

When run directly (ie, invoking the binary by hand), the test trips the assertion added in the other patch in this issue. When run via "make unittests", however, that assertion is not hit, and the test segfaults in a way that indicates the bug is still present in a different code path.

I'm going to try telling mmap to use MAP_32BIT before I attempt any of the more yak-intensive solutions.

Comment 2 Collin Winter 2009-10-20 13:19:54 PDT

Adding MAP_32BIT to llvm::sys::Memory::AllocateRWX()'s mmap flags makes the test case pass, but does not fix the problems we're seeing with running Unladen Swallow's full test suite on x86-64. So much for a quick hack.

Comment 3 Gary Benson 2009-11-02 06:46:45 PST

This failure also appears on PowerPC:
http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=399

Comment 4 Gary Benson 2009-11-02 07:55:50 PST

I suspect this is quite an old issue that has only come to light because of recent changes to the JIT memory manager.  Prior to r76902, all code was generated into a single buffer which, while large, wasn't large enough for jumps to stubs not to fit in the relevant branch instruction.  From r76902, however, code is allocated in small (64k?) slabs, which can be anywhere in memory.  Once you fill a slab it's only a matter of time before you hit this error.

Comment 5 Reid Kleckner 2009-11-02 13:38:07 PST

Yeah, I wrote the change, and if you read the code, you can see that it tries to allocate all the code and stubs contiguously.  If it can't get the address it wants from mmap, it tries again without specifying an address, and then it's possible to end up anywhere.  That approach is horribly buggy.  What we really need is some kind of heap for code, but there's no good way to implement that in a cross-platform manner.  One terrible thing we could do is to linearly probe the address space.  How do other JITs handle this?

Comment 6 Gary Benson 2009-11-03 04:23:01 PST

On PowerPC a short jump can only go 5Mb in each direction, so trying to keep the slabs "close enough" might be a losing battle.

I don't know a lot about the internals of LLVM, so this may be rubbish, but my thoughts at the moment would be to move from one stub per destination to N stubs per destination.  When performing the relocation on the jump, you walk the list of stubs for that destination, picking the first one that's close enough, and generating a new stub in the current slab if none are suitable.

Comment 7 Jeffrey Yasskin 2009-11-03 09:55:36 PST

I'd actually go even farther. I think allocating far-call-stubs inside the calling function's block (i.e. don't bother re-using stubs across functions) will save some complexity, allow us to free stubs relatively easily, and not waste much space.

I think we can change where the stubs are allocated without rearchitecting anything, since JITEmitter::finishFunction() already checks if it needs to retry after relocating, but I could have missed something.

We might run into problems sorting out the current assumption that far-call-stubs, lazy-compilation-stubs, and dlsym-stubs are all the same thing, but fixing those will help the overall design anyway.

Comment 8 Jeffrey Yasskin 2009-11-05 21:42:22 PST

I'm going to look into un-sharing far-call-stubs.

Comment 9 Gary Benson 2009-11-06 04:02:02 PST

Awesome :)

Comment 10 Jeffrey Yasskin 2009-11-09 23:03:01 PST

Using far-call stubs allocated with the calling function has the problem that the stub gets returned from address-of operations, and may outlive the call to the function entirely. The function is destroyed then, the call stub gets invalidated, and calling it may cause a crash. :(  The real fix here is to distinguish between address-of operations and call operations, and only use far-call stubs for calls, but that appears to be a hairy yak.

As a fix for this problem, nlewycky proposed making _all_ calls into far calls, even if they turn out to be nearby, eliminating the need for far calls entirely. This will waste some code space, but make the code correct. Are there any problems we haven't seen with solving the problem this way?

Comment 11 Jeffrey Yasskin 2009-11-10 18:52:01 PST

Created attachment 3807 [details]
New test, hopefully easier to port and less fragile

Comment 12 Jeffrey Yasskin 2009-11-19 18:48:29 PST

r88984 fixes this problem for x86-64 by generating far calls for all calls in the large code model. Unfortunately, PPC and ARM don't have a large code model, so making the same change there affects their static compilation. There's a couple things we could do on PPC and ARM:
 1) Go back to 16MB code allocation by default, which would mask this problem in most cases, but wouldn't fix it.
 2) Invent separate code models for PPC and ARM. I'm skeptical of this because there are no external ABI docs to justify or drive any choices we make.
 3) Figure out more generally how to have multiple far-call-stubs for a single target. This probably requires distinguishing address-of and call uses of Functions in JITEmitter::getPointerToGlobal(), which will take more widespread changes.
 4) Thread the actual target address into the code generator. This would be a pretty general win, but would take even more changes.
 5) ????
 6) Profit!

Since I'm not working on PPC or ARM systems, I favor option 1 to undo the regression from Reid's change. When someone else gets enthusiastic about those platforms, they can fix them more generally.

Comment 13 Török Edwin 2010-04-04 06:18:17 PDT

*** Bug 6758 has been marked as a duplicate of this bug. ***

Comment 14 Török Edwin 2010-04-04 06:18:47 PDT

(In reply to comment #12)
> r88984 fixes this problem for x86-64 by generating far calls for all calls in
> the large code model. Unfortunately, PPC and ARM don't have a large code model,
> so making the same change there affects their static compilation. There's a
> couple things we could do on PPC and ARM:
>  1) Go back to 16MB code allocation by default, which would mask this problem
> in most cases, but wouldn't fix it.

It should also allocated stubs together with code, otherwise they will be too far apart for relocations (see PR6758).

Problem is that stub emission can't be guaranteed to be "close enough" to the code, at least not for PPC where the possible relocation range is very small.

>  4) Thread the actual target address into the code generator. This would be a
> pretty general win, but would take even more changes.

Sounds good, we could emit direct calls when constants are small enough, and indirect when not.
It would need some interface between the code generator and the JIT to query the address.

>  5) ????

5) Always emit indirect calls for ALL calls in the JIT.

The disadvantage is that even for local functions we need to use this, since we don't know if they will be close enough to call or not (and the "far call stubs" are broken, they are often too far to call too on PowerPC. Even when the function would be close-enough to call).

Attached is a patch that implements this.

>  6) Profit!
> 
> Since I'm not working on PPC or ARM systems, I favor option 1 to undo the
> regression from Reid's change. When someone else gets enthusiastic about those
> platforms, they can fix them more generally.

Comment 15 Török Edwin 2010-04-04 06:19:22 PDT

Created attachment 4631 [details]
PowerPC JIT workaround

Workaround: always emit indirect calls in JIT mode for PowerPC.

Comment 16 Eric Christopher 2010-04-06 12:04:04 PDT

Seems somewhat reasonable, I'm probably missing something but I'm not seeing SetJITMode ever called?

Comment 17 Török Edwin 2010-04-06 12:41:25 PDT

(In reply to comment #16)
> Seems somewhat reasonable, I'm probably missing something but I'm not seeing
> SetJITMode ever called?

It is called in PPCTargetMachine.cpp:
bool PPCTargetMachine::addCodeEmitter(PassManagerBase &PM,
                                      CodeGenOpt::Level OptLevel,
                                      JITCodeEmitter &JCE) {
  // The JIT should use the static relocation model in ppc32 mode, PIC in ppc64.
  // FIXME: This should be moved to TargetJITInfo!!
  if (Subtarget.isPPC64()) {
    // We use PIC codegen in ppc64 mode, because otherwise we'd have to use many
    // instructions to materialize arbitrary global variable + function +
    // constant pool addresses.
    setRelocationModel(Reloc::PIC_);
    // Temporary workaround for the inability of PPC64 JIT to handle jump
    // tables.
    DisableJumpTables = true;      
  } else {
    setRelocationModel(Reloc::Static);
  }
  
  // Inform the subtarget that we are in JIT mode.  FIXME: does this break macho
  // writing?
  Subtarget.SetJITMode();

Comment 18 Eric Christopher 2010-04-06 12:54:36 PDT

Aha. For some reason I thought I'd see it set in the diff.

Comment 19 Xerxes Rånby 2010-04-23 07:52:35 PDT

(In reply to comment #15)
> Created an attachment (id=4631) [details]
> PowerPC JIT workaround
> 
> Workaround: always emit indirect calls in JIT mode for PowerPC.

With the patch applied on LLVM 2.7 makes the PowerPC JIT fail to resolve 
llvm.memset.* intrinsics: bug 6902 (contains testcase and additional info)

Comment 20 Xerxes Rånby 2010-05-19 15:43:04 PDT

(In reply to comment #6)
> On PowerPC a short jump can only go 5Mb in each direction, so trying to keep
> the slabs "close enough" might be a losing battle.

I have hit this bug using the ARM JIT.
On ARM a short jump can only go 32Mb in each direction, while it happens less frequently on ARM than on PowerPC it still can happen.
This bug looks plausible to fix for ARM by generating a long branch to any address within the 4Gb addresspace by emitting the following when outside the +-32Mb scope:
mov lr, pc
mov pc, #stubaddress

Comment 21 Török Edwin 2010-08-04 15:46:34 PDT

Created attachment 5324 [details]
new patch for PowerPC JIT workaround

New patch, that passes PR6902 too.

Comment 22 Xerxes Rånby 2010-08-09 04:09:26 PDT

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

ARM got a new -arm-long-calls option to force calls to be indirect
that got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=101303
A possible fix would be to enable this new option by default on ARM.

Comment 23 Török Edwin 2010-08-09 04:11:15 PDT

(In reply to comment #22)
> Török's PPC workaround got commited to:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=110246
> 

Can you test if it works with OpenJDK too?

Comment 24 Xerxes Rånby 2010-09-10 08:47:30 PDT

Created attachment 5476 [details]
pr5201-ppcfix-llvm2.8-rc1-openjdk-fail.log

(In reply to comment #23)
> (In reply to comment #22)
> > Török's PPC workaround got commited to:
> > http://llvm.org/viewvc/llvm-project?view=rev&revision=110246
> > 
> 
> Can you test if it works with OpenJDK too?

Unfortunately:
PPC JIT still asserts with reallocation out of range errors when I test using LLVM 2.8 rc1.

Comment 25 Eric Christopher 2015-10-15 22:54:50 PDT

Is this still affecting anyone?

Comment 26 Reid Kleckner 2015-11-17 11:45:06 PST

The old JIT has been deleted, and MCJIT doesn't suffer from this problem. It sometimes has a different class of issues where it fatal errors during relocation resolution, but we already have issues open for that.