New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409
Comments
I reverted the fix in 4e0fe03 , see comments on https://reviews.llvm.org/D75712 |
I think we need to look into running some WinEH integration tests continuously. I bet we could've found this bug with https://github.com/microsoft/compiler-tests instead of the big angle DEQP tests. |
I think my change really revealed a separate underlying issue, but I will debug it here because I don't understand it well enough to write a good new issue summary. I think we were just getting lucky before my change. I came up with a simple reproducer program: extern "C" int printf(const char *, ...); This is the difference in the assembly before and after my change: $ diff -u after.s before.s
.Ltmp5:
.Ltmp3:
.Ltmp1: The stack adjustments are, as described in this issue, totally wrong. If we executed them, bad things would happen, so we must not be executing them. We know from previous experience with the Win64 unwinder that it does some disassembly when it unwinds, and it can be frustratingly fussy about instructions immediately following calls. Debugging this program with windbg and taking a stack trace at the point of the exception results in a bad stack trace, like this: 0:000> k Child-SP RetAddr Call Site00 000000ee This suggests that there is no issue with the _CxxFrameHandler3 tables. The problem seems to be at a higher level. I strongly suspect that if we jam in some nops after the call site, it will make these problems go away. I've known that MSVC emits nops after x64 call sites for years, but I've always considered it to be kind of silly and I've been avoiding it in LLVM. Perhaps as a separate issue, LLVM shouldn't be emitting these jumps to the end of the function after noreturn functions. I suspect it's the jump instruction itself that makes the unwinder go off the rails, so we could consider that to be the bug. Aha! I have found the codepath in the unwinder: The unwinder goes down this codepath: So, I think the issue goes back to disabling TrapUnreachable: If we still emitted int3 for unreachable, then these jumps would not point outside the current function, and the unwinder wouldn't think this "CALL ; JMP" code sequence was a tail call. Possible steps to completely fix the issue:
|
After landing https://reviews.llvm.org/D76531, we can remove the unnecessary stack adjustments again. |
Relanded as 597718a. I confirmed that the deqp issue was fixed. I never was able to reproduce any sanitizer issues, before or after my change. We'll see if anything arises. |
mentioned in issue llvm/llvm-bugzilla-archive#45283 |
Extended Description
I noticed this while investigating #44366
We've had this bug many times:
#42500
#27491
But we still have it. :)
Consider, with EH enabled:
struct MakeCleanup {
~MakeCleanup();
};
bool cond();
void foo() {
MakeCleanup o;
if (cond())
throw;
if (cond())
throw;
}
We get this:
...
callq _CxxThrowException
subq $32, %rsp
.Ltmp7:
jmp .LBB0_8
.LBB0_5: # %if.then3
.Ltmp4:
xorl %ecx, %ecx
xorl %edx, %edx
callq _CxxThrowException
subq $32, %rsp
.Ltmp5:
.LBB0_8: # %unreachable
This is worrying to me, because it could interfere with the Win64 stack unwinder.
On Win64, most stack frames are reserved, and I think the code we added in the first place is causing more trouble than it's worth:
if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) {
// No need to reset the stack after the call if the call doesn't return. To
// make the MI verify, we'll pretend the callee does it for us.
NumBytesForCalleeToPop = NumBytes;
}
X86ISelLowering.cpp:4333 (file is too big to link on github)
When EH is involved, it becomes difficult for the code I added to PEI to determine that this adjustment is after a noreturn function call:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86FrameLowering.cpp#L3074
I think we might want to invent a cleaner way of indicating that the adjustment is "dead" and can be skipped while still pacifying the verifier. I notice that a third argument to ADJCALLSTACKDOWN* was added to pacify the verifier, so we could do the same again for ADJCALLSTACKUP*.
The text was updated successfully, but these errors were encountered: