-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT stub offsets silently truncated to 32 bits in call instruction #5573
Comments
Adding MAP_32BIT to llvm::sys::Memory::AllocateRWX()'s mmap flags makes the test case pass, but does not fix the problems we're seeing with running Unladen Swallow's full test suite on x86-64. So much for a quick hack. |
This failure also appears on PowerPC: |
I suspect this is quite an old issue that has only come to light because of recent changes to the JIT memory manager. Prior to r76902, all code was generated into a single buffer which, while large, wasn't large enough for jumps to stubs not to fit in the relevant branch instruction. From r76902, however, code is allocated in small (64k?) slabs, which can be anywhere in memory. Once you fill a slab it's only a matter of time before you hit this error. |
Yeah, I wrote the change, and if you read the code, you can see that it tries to allocate all the code and stubs contiguously. If it can't get the address it wants from mmap, it tries again without specifying an address, and then it's possible to end up anywhere. That approach is horribly buggy. What we really need is some kind of heap for code, but there's no good way to implement that in a cross-platform manner. One terrible thing we could do is to linearly probe the address space. How do other JITs handle this? |
On PowerPC a short jump can only go 5Mb in each direction, so trying to keep the slabs "close enough" might be a losing battle. I don't know a lot about the internals of LLVM, so this may be rubbish, but my thoughts at the moment would be to move from one stub per destination to N stubs per destination. When performing the relocation on the jump, you walk the list of stubs for that destination, picking the first one that's close enough, and generating a new stub in the current slab if none are suitable. |
I'd actually go even farther. I think allocating far-call-stubs inside the calling function's block (i.e. don't bother re-using stubs across functions) will save some complexity, allow us to free stubs relatively easily, and not waste much space. I think we can change where the stubs are allocated without rearchitecting anything, since JITEmitter::finishFunction() already checks if it needs to retry after relocating, but I could have missed something. We might run into problems sorting out the current assumption that far-call-stubs, lazy-compilation-stubs, and dlsym-stubs are all the same thing, but fixing those will help the overall design anyway. |
I'm going to look into un-sharing far-call-stubs. |
Awesome :) |
Using far-call stubs allocated with the calling function has the problem that the stub gets returned from address-of operations, and may outlive the call to the function entirely. The function is destroyed then, the call stub gets invalidated, and calling it may cause a crash. :( The real fix here is to distinguish between address-of operations and call operations, and only use far-call stubs for calls, but that appears to be a hairy yak. As a fix for this problem, nlewycky proposed making all calls into far calls, even if they turn out to be nearby, eliminating the need for far calls entirely. This will waste some code space, but make the code correct. Are there any problems we haven't seen with solving the problem this way? |
r88984 fixes this problem for x86-64 by generating far calls for all calls in the large code model. Unfortunately, PPC and ARM don't have a large code model, so making the same change there affects their static compilation. There's a couple things we could do on PPC and ARM:
Since I'm not working on PPC or ARM systems, I favor option 1 to undo the regression from Reid's change. When someone else gets enthusiastic about those platforms, they can fix them more generally. |
*** Bug llvm/llvm-bugzilla-archive#6758 has been marked as a duplicate of this bug. *** |
It should also allocated stubs together with code, otherwise they will be too far apart for relocations (see llvm/llvm-bugzilla-archive#6758 ). Problem is that stub emission can't be guaranteed to be "close enough" to the code, at least not for PPC where the possible relocation range is very small.
Sounds good, we could emit direct calls when constants are small enough, and indirect when not.
The disadvantage is that even for local functions we need to use this, since we don't know if they will be close enough to call or not (and the "far call stubs" are broken, they are often too far to call too on PowerPC. Even when the function would be close-enough to call). Attached is a patch that implements this.
|
PowerPC JIT workaround |
Seems somewhat reasonable, I'm probably missing something but I'm not seeing SetJITMode ever called? |
It is called in PPCTargetMachine.cpp: // Inform the subtarget that we are in JIT mode. FIXME: does this break macho |
Aha. For some reason I thought I'd see it set in the diff. |
With the patch applied on LLVM 2.7 makes the PowerPC JIT fail to resolve |
I have hit this bug using the ARM JIT. |
new patch for PowerPC JIT workaround |
Török's PPC workaround got commited to: ARM got a new -arm-long-calls option to force calls to be indirect |
Can you test if it works with OpenJDK too? |
pr5201-ppcfix-llvm2.8-rc1-openjdk-fail.log
Unfortunately: |
Is this still affecting anyone? |
The old JIT has been deleted, and MCJIT doesn't suffer from this problem. It sometimes has a different class of issues where it fatal errors during relocation resolution, but we already have issues open for that. |
mentioned in issue llvm/llvm-bugzilla-archive#6758 |
Move the initialization of the DebuggerSupport flag to a single place.
Extended Description
On x86-64 systems, we're observing segfaults that we believe to have tracked down to a codegen problem in the JIT:
Disassembling the buggy function in gdb gives us this:
... asm ...
0x00002aaab33482b1 <_23_u_23___init__45+673>: incq (%r14)
0x00002aaab33482b4 <_23_u_23___init__45+676>: mov 0x18(%rsp),%rbx
0x00002aaab33482b9 <_23_u_23___init__45+681>: mov %r14,(%rbx)
0x00002aaab33482bc <_23_u_23___init__45+684>: mov $0x2dd8b48,%rbx
0x00002aaab33482c6 <_23_u_23___init__45+694>: mov 0x18(%rbx),%rsi
0x00002aaab33482ca <_23_u_23___init__45+698>: mov %r14,%rdi
0x00002aaab33482cd <_23_u_23___init__45+701>: callq 0x2aaaa5c810b0
0x00002aaab33482d2 <_23_u_23___init__45+706>: mov (%r14),%rbx
0x00002aaab33482d5 <_23_u_23___init__45+709>: dec %rbx
... asm ...
(gdb) x 0x2aaaa5c810b0
0x2aaaa5c810b0: Cannot access memory at address 0x2aaaa5c810b0
That callq 0x2aaaa5c810b0 instruction is the problem: it's calling to invalid memory. Looking at the other callq instructions in this function, gdb can't
access any of them. callq addresses in other functions can be accessed by gdb.
What's happening is that now that the JIT memory manager can allocate multiple code slabs, there's no protection against having stub slabs and code slabs more than 32 bits distant from each other. If the memory manager allocates a new code slab far away from the stub, the offset will be truncated to 32 bits, which means that Bad Things Happen.
I'm working on a reduced test case, but in the meantime, I think the attached patch should be applied to trunk so that the offset is no longer silently truncated.
Applying this patch and passing -debug-only=jit will show this output when running part of Unladen Swallow's test suite:
[...tests...]
test_importhooks
test_enumerate
test_getopt
test_codecencodings_cn
JIT: Allocating another slab of memory for function.
python: /usr/local/google/collinwinter/us/trunk/Util/llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:652:
void*::JITEmitter::getPointerToGlobal(llvm::GlobalValue*, void*, bool): Assertion `(Offset & 0xFFFFFFFF) == Offset && "Offset
too big for 32 bits"' failed.
Stack dump:
0. Running pass 'X86 Machine Code Emitter' on function '@"#u#readline241"
Without this, you'd see a segfault instead.
The text was updated successfully, but these errors were encountered: