JIT stub offsets silently truncated to 32 bits in call instruction #5573

llvmbot · 2009-10-15T20:45:36Z


Bugzilla Link	5201
Resolution	FIXED
Resolved on	Nov 17, 2015 11:45
Version	trunk
OS	Linux
Attachments	Add asserts, against trunk r84189, Add a test for the bug, against trunk r84531
Reporter	LLVM Bugzilla Contributor
CC	@comex,@echristo,@edwintorok,@rnk

Extended Description

On x86-64 systems, we're observing segfaults that we believe to have tracked down to a codegen problem in the JIT:

Disassembling the buggy function in gdb gives us this:
... asm ...
0x00002aaab33482b1 <_23_u_23___init__45+673>: incq (%r14)
0x00002aaab33482b4 <_23_u_23___init__45+676>: mov 0x18(%rsp),%rbx
0x00002aaab33482b9 <_23_u_23___init__45+681>: mov %r14,(%rbx)
0x00002aaab33482bc <_23_u_23___init__45+684>: mov $0x2dd8b48,%rbx
0x00002aaab33482c6 <_23_u_23___init__45+694>: mov 0x18(%rbx),%rsi
0x00002aaab33482ca <_23_u_23___init__45+698>: mov %r14,%rdi
0x00002aaab33482cd <_23_u_23___init__45+701>: callq 0x2aaaa5c810b0
0x00002aaab33482d2 <_23_u_23___init__45+706>: mov (%r14),%rbx
0x00002aaab33482d5 <_23_u_23___init__45+709>: dec %rbx
... asm ...

(gdb) x 0x2aaaa5c810b0
0x2aaaa5c810b0: Cannot access memory at address 0x2aaaa5c810b0

That callq 0x2aaaa5c810b0 instruction is the problem: it's calling to invalid memory. Looking at the other callq instructions in this function, gdb can't
access any of them. callq addresses in other functions can be accessed by gdb.

What's happening is that now that the JIT memory manager can allocate multiple code slabs, there's no protection against having stub slabs and code slabs more than 32 bits distant from each other. If the memory manager allocates a new code slab far away from the stub, the offset will be truncated to 32 bits, which means that Bad Things Happen.

I'm working on a reduced test case, but in the meantime, I think the attached patch should be applied to trunk so that the offset is no longer silently truncated.

Applying this patch and passing -debug-only=jit will show this output when running part of Unladen Swallow's test suite:

[...tests...]
test_importhooks
test_enumerate
test_getopt
test_codecencodings_cn
JIT: Allocating another slab of memory for function.
python: /usr/local/google/collinwinter/us/trunk/Util/llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:652:
void*::JITEmitter::getPointerToGlobal(llvm::GlobalValue*, void*, bool): Assertion `(Offset & 0xFFFFFFFF) == Offset && "Offset
too big for 32 bits"' failed.
Stack dump:
0. Running pass 'X86 Machine Code Emitter' on function '@"#u#readline241"

Without this, you'd see a segfault instead.

llvmbot · 2009-10-20T20:19:54Z

Adding MAP_32BIT to llvm::sys::Memory::AllocateRWX()'s mmap flags makes the test case pass, but does not fix the problems we're seeing with running Unladen Swallow's full test suite on x86-64. So much for a quick hack.

llvmbot · 2009-11-02T14:46:45Z

This failure also appears on PowerPC:
http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=399

llvmbot · 2009-11-02T15:55:50Z

I suspect this is quite an old issue that has only come to light because of recent changes to the JIT memory manager. Prior to r76902, all code was generated into a single buffer which, while large, wasn't large enough for jumps to stubs not to fit in the relevant branch instruction. From r76902, however, code is allocated in small (64k?) slabs, which can be anywhere in memory. Once you fill a slab it's only a matter of time before you hit this error.

rnk · 2009-11-02T21:38:07Z

Yeah, I wrote the change, and if you read the code, you can see that it tries to allocate all the code and stubs contiguously. If it can't get the address it wants from mmap, it tries again without specifying an address, and then it's possible to end up anywhere. That approach is horribly buggy. What we really need is some kind of heap for code, but there's no good way to implement that in a cross-platform manner. One terrible thing we could do is to linearly probe the address space. How do other JITs handle this?

llvmbot · 2009-11-03T12:23:01Z

On PowerPC a short jump can only go 5Mb in each direction, so trying to keep the slabs "close enough" might be a losing battle.

I don't know a lot about the internals of LLVM, so this may be rubbish, but my thoughts at the moment would be to move from one stub per destination to N stubs per destination. When performing the relocation on the jump, you walk the list of stubs for that destination, picking the first one that's close enough, and generating a new stub in the current slab if none are suitable.

llvmbot · 2009-11-03T17:55:36Z

I'd actually go even farther. I think allocating far-call-stubs inside the calling function's block (i.e. don't bother re-using stubs across functions) will save some complexity, allow us to free stubs relatively easily, and not waste much space.

I think we can change where the stubs are allocated without rearchitecting anything, since JITEmitter::finishFunction() already checks if it needs to retry after relocating, but I could have missed something.

We might run into problems sorting out the current assumption that far-call-stubs, lazy-compilation-stubs, and dlsym-stubs are all the same thing, but fixing those will help the overall design anyway.

llvmbot · 2009-11-06T05:42:22Z

I'm going to look into un-sharing far-call-stubs.

llvmbot · 2009-11-06T12:02:02Z

Awesome :)

llvmbot · 2009-11-10T07:03:01Z

Using far-call stubs allocated with the calling function has the problem that the stub gets returned from address-of operations, and may outlive the call to the function entirely. The function is destroyed then, the call stub gets invalidated, and calling it may cause a crash. :( The real fix here is to distinguish between address-of operations and call operations, and only use far-call stubs for calls, but that appears to be a hairy yak.

As a fix for this problem, nlewycky proposed making all calls into far calls, even if they turn out to be nearby, eliminating the need for far calls entirely. This will waste some code space, but make the code correct. Are there any problems we haven't seen with solving the problem this way?

llvmbot · 2009-11-11T02:52:01Z

New test, hopefully easier to port and less fragile

llvmbot · 2009-11-20T02:48:29Z

r88984 fixes this problem for x86-64 by generating far calls for all calls in the large code model. Unfortunately, PPC and ARM don't have a large code model, so making the same change there affects their static compilation. There's a couple things we could do on PPC and ARM:

Go back to 16MB code allocation by default, which would mask this problem in most cases, but wouldn't fix it.
Invent separate code models for PPC and ARM. I'm skeptical of this because there are no external ABI docs to justify or drive any choices we make.
Figure out more generally how to have multiple far-call-stubs for a single target. This probably requires distinguishing address-of and call uses of Functions in JITEmitter::getPointerToGlobal(), which will take more widespread changes.
Thread the actual target address into the code generator. This would be a pretty general win, but would take even more changes.
????
Profit!

Since I'm not working on PPC or ARM systems, I favor option 1 to undo the regression from Reid's change. When someone else gets enthusiastic about those platforms, they can fix them more generally.

edwintorok · 2010-04-04T13:18:17Z

*** Bug llvm/llvm-bugzilla-archive#6758 has been marked as a duplicate of this bug. ***

edwintorok · 2010-04-04T13:18:47Z

r88984 fixes this problem for x86-64 by generating far calls for all calls in
the large code model. Unfortunately, PPC and ARM don't have a large code model,
so making the same change there affects their static compilation. There's a
couple things we could do on PPC and ARM:

Go back to 16MB code allocation by default, which would mask this problem
in most cases, but wouldn't fix it.

It should also allocated stubs together with code, otherwise they will be too far apart for relocations (see llvm/llvm-bugzilla-archive#6758 ).

Problem is that stub emission can't be guaranteed to be "close enough" to the code, at least not for PPC where the possible relocation range is very small.

Thread the actual target address into the code generator. This would be a
pretty general win, but would take even more changes.

Sounds good, we could emit direct calls when constants are small enough, and indirect when not.
It would need some interface between the code generator and the JIT to query the address.

????

Always emit indirect calls for ALL calls in the JIT.

The disadvantage is that even for local functions we need to use this, since we don't know if they will be close enough to call or not (and the "far call stubs" are broken, they are often too far to call too on PowerPC. Even when the function would be close-enough to call).

Attached is a patch that implements this.

Profit!

Since I'm not working on PPC or ARM systems, I favor option 1 to undo the
regression from Reid's change. When someone else gets enthusiastic about those
platforms, they can fix them more generally.

edwintorok · 2010-04-04T13:19:22Z

PowerPC JIT workaround
Workaround: always emit indirect calls in JIT mode for PowerPC.

echristo · 2010-04-06T19:04:04Z

Seems somewhat reasonable, I'm probably missing something but I'm not seeing SetJITMode ever called?

edwintorok · 2010-04-06T19:41:25Z

Seems somewhat reasonable, I'm probably missing something but I'm not seeing
SetJITMode ever called?

It is called in PPCTargetMachine.cpp:
bool PPCTargetMachine::addCodeEmitter(PassManagerBase &PM,
CodeGenOpt::Level OptLevel,
JITCodeEmitter &JCE) {
// The JIT should use the static relocation model in ppc32 mode, PIC in ppc64.
// FIXME: This should be moved to TargetJITInfo!!
if (Subtarget.isPPC64()) {
// We use PIC codegen in ppc64 mode, because otherwise we'd have to use many
// instructions to materialize arbitrary global variable + function +
// constant pool addresses.
setRelocationModel(Reloc::PIC_);
// Temporary workaround for the inability of PPC64 JIT to handle jump
// tables.
DisableJumpTables = true;
} else {
setRelocationModel(Reloc::Static);
}

// Inform the subtarget that we are in JIT mode. FIXME: does this break macho
// writing?
Subtarget.SetJITMode();

echristo · 2010-04-06T19:54:36Z

Aha. For some reason I thought I'd see it set in the diff.

llvmbot · 2010-04-23T14:52:35Z

Created an attachment (id=4631) [details]
PowerPC JIT workaround

Workaround: always emit indirect calls in JIT mode for PowerPC.

With the patch applied on LLVM 2.7 makes the PowerPC JIT fail to resolve
llvm.memset.* intrinsics: bug 6902 (contains testcase and additional info)

llvmbot · 2010-05-19T22:43:04Z

On PowerPC a short jump can only go 5Mb in each direction, so trying to keep
the slabs "close enough" might be a losing battle.

I have hit this bug using the ARM JIT.
On ARM a short jump can only go 32Mb in each direction, while it happens less frequently on ARM than on PowerPC it still can happen.
This bug looks plausible to fix for ARM by generating a long branch to any address within the 4Gb addresspace by emitting the following when outside the +-32Mb scope:
mov lr, pc
mov pc, #stubaddress

edwintorok · 2010-08-04T22:46:34Z

new patch for PowerPC JIT workaround
New patch, that passes llvm/llvm-bugzilla-archive#6902 too.

llvmbot · 2010-08-09T11:09:26Z

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

ARM got a new -arm-long-calls option to force calls to be indirect
that got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=101303
A possible fix would be to enable this new option by default on ARM.

edwintorok · 2010-08-09T11:11:15Z

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

Can you test if it works with OpenJDK too?

llvmbot · 2010-09-10T15:47:30Z

pr5201-ppcfix-llvm2.8-rc1-openjdk-fail.log

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

Can you test if it works with OpenJDK too?

Unfortunately:
PPC JIT still asserts with reallocation out of range errors when I test using LLVM 2.8 rc1.

echristo · 2015-10-16T05:54:50Z

Is this still affecting anyone?

rnk · 2015-11-17T19:45:06Z

The old JIT has been deleted, and MCJIT doesn't suffer from this problem. It sometimes has a different class of issues where it fatal errors during relocation resolution, but we already have issues open for that.

edwintorok · 2021-11-27T05:29:23Z

mentioned in issue llvm/llvm-bugzilla-archive#6758

Move the initialization of the DebuggerSupport flag to a single place.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT stub offsets silently truncated to 32 bits in call instruction #5573

JIT stub offsets silently truncated to 32 bits in call instruction #5573

llvmbot commented Oct 15, 2009

llvmbot commented Oct 20, 2009

llvmbot commented Nov 2, 2009

llvmbot commented Nov 2, 2009

rnk commented Nov 2, 2009

llvmbot commented Nov 3, 2009

llvmbot commented Nov 3, 2009

llvmbot commented Nov 6, 2009

llvmbot commented Nov 6, 2009

llvmbot commented Nov 10, 2009

llvmbot commented Nov 11, 2009

llvmbot commented Nov 20, 2009

edwintorok commented Apr 4, 2010

edwintorok commented Apr 4, 2010

edwintorok commented Apr 4, 2010

echristo commented Apr 6, 2010

edwintorok commented Apr 6, 2010

echristo commented Apr 6, 2010

llvmbot commented Apr 23, 2010

llvmbot commented May 19, 2010

edwintorok commented Aug 4, 2010

llvmbot commented Aug 9, 2010

edwintorok commented Aug 9, 2010

llvmbot commented Sep 10, 2010

echristo commented Oct 16, 2015

rnk commented Nov 17, 2015

edwintorok commented Nov 27, 2021

JIT stub offsets silently truncated to 32 bits in call instruction #5573

JIT stub offsets silently truncated to 32 bits in call instruction #5573

Comments

llvmbot commented Oct 15, 2009

Extended Description

llvmbot commented Oct 20, 2009

llvmbot commented Nov 2, 2009

llvmbot commented Nov 2, 2009

rnk commented Nov 2, 2009

llvmbot commented Nov 3, 2009

llvmbot commented Nov 3, 2009

llvmbot commented Nov 6, 2009

llvmbot commented Nov 6, 2009

llvmbot commented Nov 10, 2009

llvmbot commented Nov 11, 2009

llvmbot commented Nov 20, 2009

edwintorok commented Apr 4, 2010

edwintorok commented Apr 4, 2010

edwintorok commented Apr 4, 2010

echristo commented Apr 6, 2010

edwintorok commented Apr 6, 2010

echristo commented Apr 6, 2010

llvmbot commented Apr 23, 2010

llvmbot commented May 19, 2010

edwintorok commented Aug 4, 2010

llvmbot commented Aug 9, 2010

edwintorok commented Aug 9, 2010

llvmbot commented Sep 10, 2010

echristo commented Oct 16, 2015

rnk commented Nov 17, 2015

edwintorok commented Nov 27, 2021