Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

Closed
rnk opened this issue Feb 29, 2020 · 7 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@rnk
Copy link
Collaborator

rnk commented Feb 29, 2020

Bugzilla Link 45064
Resolution FIXED
Resolved on Mar 24, 2020 12:11
Version trunk
OS Windows NT
CC @topperc,@zmodem,@RKSimon,@nico,@rotateright

Extended Description

I noticed this while investigating #44366

We've had this bug many times:
#42500
#27491

But we still have it. :)

Consider, with EH enabled:

struct MakeCleanup {
~MakeCleanup();
};
bool cond();
void foo() {
MakeCleanup o;
if (cond())
throw;
if (cond())
throw;
}

We get this:
...
callq _CxxThrowException
subq $32, %rsp
.Ltmp7:
jmp .LBB0_8
.LBB0_5: # %if.then3
.Ltmp4:
xorl %ecx, %ecx
xorl %edx, %edx
callq _CxxThrowException
subq $32, %rsp
.Ltmp5:
.LBB0_8: # %unreachable

This is worrying to me, because it could interfere with the Win64 stack unwinder.

On Win64, most stack frames are reserved, and I think the code we added in the first place is causing more trouble than it's worth:
if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) {
// No need to reset the stack after the call if the call doesn't return. To
// make the MI verify, we'll pretend the callee does it for us.
NumBytesForCalleeToPop = NumBytes;
}
X86ISelLowering.cpp:4333 (file is too big to link on github)

When EH is involved, it becomes difficult for the code I added to PEI to determine that this adjustment is after a noreturn function call:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86FrameLowering.cpp#L3074

I think we might want to invent a cleaner way of indicating that the adjustment is "dead" and can be skipped while still pacifying the verifier. I notice that a third argument to ADJCALLSTACKDOWN* was added to pacify the verifier, so we could do the same again for ADJCALLSTACKUP*.

@rnk
Copy link
Collaborator Author

rnk commented Mar 6, 2020

65b2128

@nico
Copy link
Contributor

nico commented Mar 18, 2020

I reverted the fix in 4e0fe03 , see comments on https://reviews.llvm.org/D75712

@rnk
Copy link
Collaborator Author

rnk commented Mar 18, 2020

I think we need to look into running some WinEH integration tests continuously. I bet we could've found this bug with https://github.com/microsoft/compiler-tests instead of the big angle DEQP tests.

@rnk
Copy link
Collaborator Author

rnk commented Mar 20, 2020

I think my change really revealed a separate underlying issue, but I will debug it here because I don't understand it well enough to write a good new issue summary. I think we were just getting lucky before my change.

I came up with a simple reproducer program:

extern "C" int printf(const char *, ...);
extern "C" int puts(const char *);
int main(int argc, char **argv) {
try {
printf("line: %d\n", LINE);
if (argc == 1) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
if (argc == 2) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
if (argc == 3) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
} catch (...) {
puts("CAUGHT");
}
}

This is the difference in the assembly before and after my change:

$ diff -u after.s before.s
--- after.s 2020-03-20 09:44:56.629227800 -0700
+++ before.s 2020-03-20 09:44:22.719712100 -0700
@@ -66,6 +66,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 12]
call _CxxThrowException

  •   sub     rsp, 32
    

.Ltmp5:
jmp .LBB0_9
.LBB0_3: # %if.then4
@@ -76,6 +77,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 8]
call _CxxThrowException

  •   sub     rsp, 32
    

.Ltmp3:
jmp .LBB0_9
.LBB0_5: # %if.then10
@@ -86,7 +88,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 4]
call _CxxThrowException

  •   int3
    
  •   sub     rsp, 32
    

.Ltmp1:
.LBB0_9: # %unreachable
.seh_handlerdata

The stack adjustments are, as described in this issue, totally wrong. If we executed them, bad things would happen, so we must not be executing them.

We know from previous experience with the Win64 unwinder that it does some disassembly when it unwinds, and it can be frustratingly fussy about instructions immediately following calls.

Debugging this program with windbg and taking a stack trace at the point of the exception results in a bad stack trace, like this:

0:000> k

Child-SP RetAddr Call Site

00 000000ee1db9f910 00007ff62443a724 KERNELBASE!RaiseException+0x69
01 000000ee1db9f9f0 00007ff624436d45 t!_CxxThrowException+0x90 [d:\agent_work\5\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75]
02 000000ee1db9fa50 00007ff62449df10 t!main+0xb5 [C:\src\llvm-project\build\t.cpp @ 13]
03 000000ee1db9fa58 000000000000000c t!string' # This frame is garbage, and beyond 04 000000ee1db9fa60 000001f5ed607220 0xc 05 000000ee1db9fa68 0000000000000002 0x000001f5ed607220
06 000000ee1db9fa70 0000000000000007 0x2
07 000000ee1db9fa78 000000000000002a 0x7
08 000000ee1db9fa80 fffffffffffffffe 0x2a
09 000000ee1db9fa88 0000000000000000 0xffffffff`fffffffe

This suggests that there is no issue with the _CxxFrameHandler3 tables. The problem seems to be at a higher level.

I strongly suspect that if we jam in some nops after the call site, it will make these problems go away. I've known that MSVC emits nops after x64 call sites for years, but I've always considered it to be kind of silly and I've been avoiding it in LLVM.

Perhaps as a separate issue, LLVM shouldn't be emitting these jumps to the end of the function after noreturn functions. I suspect it's the jump instruction itself that makes the unwinder go off the rails, so we could consider that to be the bug.


Aha! I have found the codepath in the unwinder:
https://github.com/dotnet/coreclr/blob/a9f3fc16483eecfc47fb79c362811d870be02249/src/unwinder/amd64/unwinder_amd64.cpp#L1318

The unwinder goes down this codepath:
} else if ((NextByte[0] == JMP_IMM8_OP) ||
(NextByte[0] == JMP_IMM32_OP)) {
...
It calculates BranchTarget by decoding, then does this check:
if (BranchTarget < FunctionEntry->BeginAddress ||
BranchTarget >= FunctionEntry->EndAddress) {
...
It does some double-checking for cases that seem irrelevant here, and ultimately concludes we are in the epilogue (perhaps the jmp is a tail call):
InEpilogue = TRUE;

So, I think the issue goes back to disabling TrapUnreachable:
https://reviews.llvm.org/D67201
http://github.com/llvm/llvm-project/commit/bf02399a852e1ff06e074c353908147d9a22b1dc

If we still emitted int3 for unreachable, then these jumps would not point outside the current function, and the unwinder wouldn't think this "CALL ; JMP" code sequence was a tail call.

Possible steps to completely fix the issue:

  1. Modify X86AvoidTrailingCall to prevent trailing empty blocks, since jumps to them can be mistaken for tail calls.
  2. Reland tail call adjustment.
  3. Change the MachineBasicBlock construction to remove these empty, unreachable BB successors from blocks that invoke noreturn functions. These extra jmp instructions are wasted space.
  4. Insert int3 after noreturn calls for good measure. This is not necessary, and would not be a complete fix for the issue, since a call may be followed by unreachable without being noreturn, as in:
    try { throw_but_not_noreturn(); __builtin_unreachable(); } catch (...) {}

@rnk
Copy link
Collaborator Author

rnk commented Mar 20, 2020

After landing https://reviews.llvm.org/D76531, we can remove the unnecessary stack adjustments again.

@rnk
Copy link
Collaborator Author

rnk commented Mar 24, 2020

Relanded as 597718a. I confirmed that the deqp issue was fixed. I never was able to reproduce any sanitizer issues, before or after my change. We'll see if anything arises.

@rnk
Copy link
Collaborator Author

rnk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#45283

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants