[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

rnk · 2020-02-29T01:05:16Z


Bugzilla Link	45064
Resolution	FIXED
Resolved on	Mar 24, 2020 12:11
Version	trunk
OS	Windows NT
CC	@topperc,@zmodem,@RKSimon,@nico,@rotateright

Extended Description

I noticed this while investigating #44366

We've had this bug many times:
#42500
#27491

But we still have it. :)

Consider, with EH enabled:

struct MakeCleanup {
~MakeCleanup();
};
bool cond();
void foo() {
MakeCleanup o;
if (cond())
throw;
if (cond())
throw;
}

We get this:
...
callq _CxxThrowException
subq $32, %rsp
.Ltmp7:
jmp .LBB0_8
.LBB0_5: # %if.then3
.Ltmp4:
xorl %ecx, %ecx
xorl %edx, %edx
callq _CxxThrowException
subq $32, %rsp
.Ltmp5:
.LBB0_8: # %unreachable

This is worrying to me, because it could interfere with the Win64 stack unwinder.

On Win64, most stack frames are reserved, and I think the code we added in the first place is causing more trouble than it's worth:
if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) {
// No need to reset the stack after the call if the call doesn't return. To
// make the MI verify, we'll pretend the callee does it for us.
NumBytesForCalleeToPop = NumBytes;
}
X86ISelLowering.cpp:4333 (file is too big to link on github)

When EH is involved, it becomes difficult for the code I added to PEI to determine that this adjustment is after a noreturn function call:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86FrameLowering.cpp#L3074

I think we might want to invent a cleaner way of indicating that the adjustment is "dead" and can be skipped while still pacifying the verifier. I notice that a third argument to ADJCALLSTACKDOWN* was added to pacify the verifier, so we could do the same again for ADJCALLSTACKUP*.

rnk · 2020-03-06T21:40:17Z

65b2128

nico · 2020-03-18T00:52:25Z

I reverted the fix in 4e0fe03 , see comments on https://reviews.llvm.org/D75712

rnk · 2020-03-18T19:48:45Z

I think we need to look into running some WinEH integration tests continuously. I bet we could've found this bug with https://github.com/microsoft/compiler-tests instead of the big angle DEQP tests.

rnk · 2020-03-20T18:17:10Z

I think my change really revealed a separate underlying issue, but I will debug it here because I don't understand it well enough to write a good new issue summary. I think we were just getting lucky before my change.

I came up with a simple reproducer program:

extern "C" int printf(const char *, ...);
extern "C" int puts(const char *);
int main(int argc, char **argv) {
try {
printf("line: %d\n", LINE);
if (argc == 1) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
if (argc == 2) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
if (argc == 3) {
printf("line: %d\n", LINE);
throw 42;
}
printf("line: %d\n", LINE);
} catch (...) {
puts("CAUGHT");
}
}

This is the difference in the assembly before and after my change:

$ diff -u after.s before.s
--- after.s 2020-03-20 09:44:56.629227800 -0700
+++ before.s 2020-03-20 09:44:22.719712100 -0700
@@ -66,6 +66,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 12]
call _CxxThrowException

```
  sub     rsp, 32
```

.Ltmp5:
jmp .LBB0_9
.LBB0_3: # %if.then4
@@ -76,6 +77,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 8]
call _CxxThrowException

```
  sub     rsp, 32
```

.Ltmp3:
jmp .LBB0_9
.LBB0_5: # %if.then10
@@ -86,7 +88,7 @@
lea rdx, [rip + _TI1H]
lea rcx, [rbp - 4]
call _CxxThrowException

```
  int3
```

```
  sub     rsp, 32
```

.Ltmp1:
.LBB0_9: # %unreachable
.seh_handlerdata

The stack adjustments are, as described in this issue, totally wrong. If we executed them, bad things would happen, so we must not be executing them.

We know from previous experience with the Win64 unwinder that it does some disassembly when it unwinds, and it can be frustratingly fussy about instructions immediately following calls.

Debugging this program with windbg and taking a stack trace at the point of the exception results in a bad stack trace, like this:

0:000> k

Child-SP RetAddr Call Site

00 000000ee1db9f910 00007ff62443a724 KERNELBASE!RaiseException+0x69
01 000000ee1db9f9f0 00007ff624436d45 t!_CxxThrowException+0x90 [d:\agent_work\5\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75]
02 000000ee1db9fa50 00007ff62449df10 t!main+0xb5 [C:\src\llvm-project\build\t.cpp @ 13]
03 000000ee1db9fa58 000000000000000c t!string' # This frame is garbage, and beyond 04 000000ee1db9fa60 000001f5ed607220 0xc 05 000000ee1db9fa68 0000000000000002 0x000001f5ed607220
06 000000ee1db9fa70 0000000000000007 0x2
07 000000ee1db9fa78 000000000000002a 0x7
08 000000ee1db9fa80 fffffffffffffffe 0x2a
09 000000ee1db9fa88 0000000000000000 0xffffffff`fffffffe

This suggests that there is no issue with the _CxxFrameHandler3 tables. The problem seems to be at a higher level.

I strongly suspect that if we jam in some nops after the call site, it will make these problems go away. I've known that MSVC emits nops after x64 call sites for years, but I've always considered it to be kind of silly and I've been avoiding it in LLVM.

Perhaps as a separate issue, LLVM shouldn't be emitting these jumps to the end of the function after noreturn functions. I suspect it's the jump instruction itself that makes the unwinder go off the rails, so we could consider that to be the bug.

Aha! I have found the codepath in the unwinder:
https://github.com/dotnet/coreclr/blob/a9f3fc16483eecfc47fb79c362811d870be02249/src/unwinder/amd64/unwinder_amd64.cpp#L1318

The unwinder goes down this codepath:
} else if ((NextByte[0] == JMP_IMM8_OP) ||
(NextByte[0] == JMP_IMM32_OP)) {
...
It calculates BranchTarget by decoding, then does this check:
if (BranchTarget < FunctionEntry->BeginAddress ||
BranchTarget >= FunctionEntry->EndAddress) {
...
It does some double-checking for cases that seem irrelevant here, and ultimately concludes we are in the epilogue (perhaps the jmp is a tail call):
InEpilogue = TRUE;

So, I think the issue goes back to disabling TrapUnreachable:
https://reviews.llvm.org/D67201
http://github.com/llvm/llvm-project/commit/bf02399a852e1ff06e074c353908147d9a22b1dc

If we still emitted int3 for unreachable, then these jumps would not point outside the current function, and the unwinder wouldn't think this "CALL ; JMP" code sequence was a tail call.

Possible steps to completely fix the issue:

Modify X86AvoidTrailingCall to prevent trailing empty blocks, since jumps to them can be mistaken for tail calls.
Reland tail call adjustment.
Change the MachineBasicBlock construction to remove these empty, unreachable BB successors from blocks that invoke noreturn functions. These extra jmp instructions are wasted space.
Insert int3 after noreturn calls for good measure. This is not necessary, and would not be a complete fix for the issue, since a call may be followed by unreachable without being noreturn, as in:
try { throw_but_not_noreturn(); __builtin_unreachable(); } catch (...) {}

rnk · 2020-03-20T22:10:02Z

After landing https://reviews.llvm.org/D76531, we can remove the unnecessary stack adjustments again.

rnk · 2020-03-24T19:11:35Z

Relanded as 597718a. I confirmed that the deqp issue was fixed. I never was able to reproduce any sanitizer issues, before or after my change. We'll see if anything arises.

rnk · 2021-11-27T02:59:45Z

mentioned in issue llvm/llvm-bugzilla-archive#45283

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

rnk commented Feb 29, 2020

rnk commented Mar 6, 2020

nico commented Mar 18, 2020

rnk commented Mar 18, 2020

rnk commented Mar 20, 2020

rnk commented Mar 20, 2020

rnk commented Mar 24, 2020

rnk commented Nov 27, 2021

[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

[WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException #44409

Comments

rnk commented Feb 29, 2020

Extended Description

rnk commented Mar 6, 2020

nico commented Mar 18, 2020

rnk commented Mar 18, 2020

rnk commented Mar 20, 2020

Child-SP RetAddr Call Site

rnk commented Mar 20, 2020

rnk commented Mar 24, 2020

rnk commented Nov 27, 2021