45064 – [WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 45064 - [WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn calls, like CxxThrowException

Summary: [WinEH] LLVM emits unreachable (incorrect?) RSP adjustments after noreturn ca...

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-02-28 17:05 PST by Reid Kleckner
Modified:	2020-03-24 12:11 PDT (History)
CC List:	6 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Reid Kleckner 2020-02-28 17:05:16 PST

I noticed this while investigating https://bugs.llvm.org/show_bug.cgi?id=45021

We've had this bug many times:
https://bugs.llvm.org/show_bug.cgi?id=43155
https://bugs.llvm.org/show_bug.cgi?id=27117

But we still have it. :)

Consider, with EH enabled:

struct MakeCleanup {
  ~MakeCleanup();
};
bool cond();
void foo() {
  MakeCleanup o;
  if (cond())
    throw;
  if (cond())
    throw;
}

We get this:
...
        callq   _CxxThrowException
        subq    $32, %rsp
.Ltmp7:
        jmp     .LBB0_8
.LBB0_5:                                # %if.then3
.Ltmp4:
        xorl    %ecx, %ecx
        xorl    %edx, %edx
        callq   _CxxThrowException
        subq    $32, %rsp
.Ltmp5:
.LBB0_8:                                # %unreachable

This is worrying to me, because it could interfere with the Win64 stack unwinder.

On Win64, most stack frames are reserved, and I think the code we added in the first place is causing more trouble than it's worth:
  if (CLI.DoesNotReturn && !getTargetMachine().Options.TrapUnreachable) {
    // No need to reset the stack after the call if the call doesn't return. To
    // make the MI verify, we'll pretend the callee does it for us.
    NumBytesForCalleeToPop = NumBytes;
  }
X86ISelLowering.cpp:4333 (file is too big to link on github)

When EH is involved, it becomes difficult for the code I added to PEI to determine that this adjustment is after a noreturn function call:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/X86/X86FrameLowering.cpp#L3074

I think we might want to invent a cleaner way of indicating that the adjustment is "dead" and can be skipped while still pacifying the verifier. I notice that a third argument to ADJCALLSTACKDOWN* was added to pacify the verifier, so we could do the same again for ADJCALLSTACKUP*.

Comment 1 Reid Kleckner 2020-03-06 13:40:17 PST

65b21282c710afe9c275778820c6e3c1cf46734b

Comment 2 Nico Weber 2020-03-17 17:52:25 PDT

I reverted the fix in 4e0fe038f438ae1679eae9e156e1f248595b2373 , see comments on https://reviews.llvm.org/D75712

Comment 3 Reid Kleckner 2020-03-18 12:48:45 PDT

I think we need to look into running some WinEH integration tests continuously. I bet we could've found this bug with https://github.com/microsoft/compiler-tests instead of the big angle DEQP tests.

Comment 4 Reid Kleckner 2020-03-20 11:17:10 PDT

I think my change really revealed a separate underlying issue, but I will debug it here because I don't understand it well enough to write a good new issue summary. I think we were just getting lucky before my change.

I came up with a simple reproducer program:

extern "C" int printf(const char *, ...);
extern "C" int puts(const char *);
int main(int argc, char **argv) {
  try {
    printf("line: %d\n", __LINE__);
    if (argc == 1) {
      printf("line: %d\n", __LINE__);
      throw 42;
    }
    printf("line: %d\n", __LINE__);
    if (argc == 2) {
      printf("line: %d\n", __LINE__);
      throw 42;
    }
    printf("line: %d\n", __LINE__);
    if (argc == 3) {
      printf("line: %d\n", __LINE__);
      throw 42;
    }
    printf("line: %d\n", __LINE__);
  } catch (...) {
    puts("CAUGHT");
  }
}


This is the difference in the assembly before and after my change:

$ diff -u after.s before.s
--- after.s     2020-03-20 09:44:56.629227800 -0700
+++ before.s    2020-03-20 09:44:22.719712100 -0700
@@ -66,6 +66,7 @@
        lea     rdx, [rip + _TI1H]
        lea     rcx, [rbp - 12]
        call    _CxxThrowException
+       sub     rsp, 32
 .Ltmp5:
        jmp     .LBB0_9
 .LBB0_3:                                # %if.then4
@@ -76,6 +77,7 @@
        lea     rdx, [rip + _TI1H]
        lea     rcx, [rbp - 8]
        call    _CxxThrowException
+       sub     rsp, 32
 .Ltmp3:
        jmp     .LBB0_9
 .LBB0_5:                                # %if.then10
@@ -86,7 +88,7 @@
        lea     rdx, [rip + _TI1H]
        lea     rcx, [rbp - 4]
        call    _CxxThrowException
-       int3
+       sub     rsp, 32
 .Ltmp1:
 .LBB0_9:                                # %unreachable
        .seh_handlerdata


The stack adjustments are, as described in this issue, totally wrong. If we executed them, bad things would happen, so we must not be executing them.

We know from previous experience with the Win64 unwinder that it does some disassembly when it unwinds, and it can be frustratingly fussy about instructions immediately following calls.

Debugging this program with windbg and taking a stack trace at the point of the exception results in a bad stack trace, like this:

0:000> k
 # Child-SP          RetAddr           Call Site
00 000000ee`1db9f910 00007ff6`2443a724 KERNELBASE!RaiseException+0x69
01 000000ee`1db9f9f0 00007ff6`24436d45 t!_CxxThrowException+0x90 [d:\agent\_work\5\s\src\vctools\crt\vcruntime\src\eh\throw.cpp @ 75] 
02 000000ee`1db9fa50 00007ff6`2449df10 t!main+0xb5 [C:\src\llvm-project\build\t.cpp @ 13] 
03 000000ee`1db9fa58 00000000`0000000c t!`string'   # This frame is garbage, and beyond
04 000000ee`1db9fa60 000001f5`ed607220 0xc
05 000000ee`1db9fa68 00000000`00000002 0x000001f5`ed607220
06 000000ee`1db9fa70 00000000`00000007 0x2
07 000000ee`1db9fa78 00000000`0000002a 0x7
08 000000ee`1db9fa80 ffffffff`fffffffe 0x2a
09 000000ee`1db9fa88 00000000`00000000 0xffffffff`fffffffe

This suggests that there is no issue with the _CxxFrameHandler3 tables. The problem seems to be at a higher level.

I strongly suspect that if we jam in some nops after the call site, it will make these problems go away. I've known that MSVC emits nops after x64 call sites for years, but I've always considered it to be kind of silly and I've been avoiding it in LLVM.

Perhaps as a separate issue, LLVM shouldn't be emitting these jumps to the end of the function after noreturn functions. I suspect it's the jump instruction itself that makes the unwinder go off the rails, so we could consider that to be the bug.

---

Aha! I have found the codepath in the unwinder:
https://github.com/dotnet/coreclr/blob/a9f3fc16483eecfc47fb79c362811d870be02249/src/unwinder/amd64/unwinder_amd64.cpp#L1318

The unwinder goes down this codepath:
        } else if ((NextByte[0] == JMP_IMM8_OP) ||
                   (NextByte[0] == JMP_IMM32_OP)) {
...
It calculates BranchTarget by decoding, then does this check:
            if (BranchTarget < FunctionEntry->BeginAddress ||
                BranchTarget >= FunctionEntry->EndAddress) {
...
It does some double-checking for cases that seem irrelevant here, and ultimately concludes we are in the epilogue (perhaps the jmp is a tail call):
                    InEpilogue = TRUE;

So, I think the issue goes back to disabling TrapUnreachable:
https://reviews.llvm.org/D67201
http://github.com/llvm/llvm-project/commit/bf02399a852e1ff06e074c353908147d9a22b1dc

If we still emitted int3 for unreachable, then these jumps would not point outside the current function, and the unwinder wouldn't think this "CALL ; JMP" code sequence was a tail call.

Possible steps to completely fix the issue:
1. Modify X86AvoidTrailingCall to prevent trailing empty blocks, since jumps to them can be mistaken for tail calls.
2. Reland tail call adjustment.
3. Change the MachineBasicBlock construction to remove these empty, unreachable BB successors from blocks that invoke noreturn functions. These extra jmp instructions are wasted space.
4. Insert int3 after noreturn calls for good measure. This is not necessary, and would not be a complete fix for the issue, since a call may be followed by unreachable without being noreturn, as in:
  try { throw_but_not_noreturn(); __builtin_unreachable(); } catch (...) {}

Comment 5 Reid Kleckner 2020-03-20 15:10:02 PDT

After landing https://reviews.llvm.org/D76531, we can remove the unnecessary stack adjustments again.

Comment 6 Reid Kleckner 2020-03-24 12:11:35 PDT

Relanded as 597718aae017a870e99cdb37b3bc10d8dfa58a25. I confirmed that the deqp issue was fixed. I never was able to reproduce any sanitizer issues, before or after my change. We'll see if anything arises.