Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling C++ exceptions on Windows x64 corrupts local variables #42930

Closed
llvmbot opened this issue Oct 7, 2019 · 15 comments
Closed

Handling C++ exceptions on Windows x64 corrupts local variables #42930

llvmbot opened this issue Oct 7, 2019 · 15 comments
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 7, 2019

Bugzilla Link 43585
Resolution FIXED
Resolved on Mar 12, 2020 14:32
Version 9.0
OS Windows NT
Blocks #42705
Attachments x64 Disassembly
Reporter LLVM Bugzilla Contributor
CC @efriedma-quic,@AMDG2,@zmodem,@slacka,@phoebewang,@zygoloid,@rnk,@tstellar
Fixed by commit(s) 564fb58 9a9b649 8723b95

Extended Description

I see segmentation faults in my product when switching from Clang 8 to Clang 9. Sometimes local variables are incorrectly initialised after an exception has been successfully caught. With certain input this can lead to a crash at a later point. For one of our unit tests this crash is 100% repeatable.

After some investigation I'm confident this is a Windows only issue with exception funclets incorrectly spilling XMM registers. I can confirm this issue does not occur when I use the master branch of Clang because of the change https://reviews.llvm.org/D66596 ([WinEH] Allocate space in funclets stack to save XMM CSRs). This issue is a bit of showstopper for me because it leaves me worried that there are similar problems in our product code that we have not found through testing. I'm raising this issue in the hope of getting the D66596 change incorporated into a future clang 9 release.

It's hard to come up with a simple example. The code below demonstrates the corruption problem (but not the crash). It shows how a constant data member is initialised to the wrong value.

When the following code is compiled with:
clang-cl /O2 /Ob1 /EHsc code.cpp
and run it prints "This shouldn't happen to Two, value of a is 1". I would expect the code to run without showing this message.

#include

struct One {
const size_t a = 1;
const size_t b = 0;

~One() {
    if (a != 1)
        std::cout << "This shouldn't happen to One, value of a is " << a << "." << std::endl;
}

};

struct Two {
const size_t a = 2;
const size_t b = 0;

~Two() {
    if (a != 2)
        std::cout << "This shouldn't happen to Two, value of a is " << a << "." << std::endl;
}

};

void throwIt() {
throw std::string();
}

bool mytest() {
for (int i = 0; i < 2; ++i) {
try {
One one;
throwIt();
}
catch (const std::string& s) {
return true;
}
}
return false;
}

int main(int argc, char* argv[]) {
for (int i = 0; i < 2; ++i) {
Two two;
mytest();
}
return 0;
}

For more details see annotated attached disassembly it shows how mytest() overwrites the CSR slot for XMM6 in both the unwinding and catch handling funclets.

@rnk
Copy link
Collaborator

rnk commented Oct 10, 2019

I think your analysis is probably correct, although I haven't checked it. I think it's reason enough to revert r367088 from the 9.0 branch and merge r370005.

@rnk
Copy link
Collaborator

rnk commented Nov 5, 2019

I built the 9.x release candidate and ran this test case, but I could not reproduce the problem. However, I chery picked the WinEH XMM CSR fix to the release/9.x branch:
9a9b649

I confirmed that fixed the other similar bug (issue 43710).

You can build the 9.0.1 candidate locally if you like and confirm if the issue has been fixed, but I don't plan to do any more investigation beyond this. I suspect the bug will be fixed in 9.0.1.

@tstellar
Copy link
Collaborator

tstellar commented Nov 8, 2019

Has this been fixed now in the release/9.x branch? Can we close this bug?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 8, 2019

While this change does appear fix the original problem I reported, I now have an access violation in different location in my product code.

At this stage I don't have a lots of details but I see a "read access violation, pFunc was nullptr" that occurs in
FindHandler<__FrameHandler3>(EHExceptionRecord * pExcept, unsigned __int64 * pRN, _CONTEXT * pContext, _xDISPATCHER_CONTEXT * pDC, const _s_FuncInfo * pFuncInfo, unsigned char recursive, int CatchDepth, unsigned __int64 * pMarkerRN) Line 627
This looks like a Microsoft internal function related to structured exception handling.

I've tried using clang built from the parent of the of this commit (2d75b24) and the "pFunc was nullptr" access violation does not occur. This leads me to think the XMM CSR fix addresses one problem but causes another.

Whether this is a new bug or a related one (or indeed a bug in our code that has just been exposed), I'll let you advise whether I should raise another issue. I'd prefer to keep this bug open for a few days longer to allow me to try and come up with some more details.

Also some further info.

  1. Issue 43710 is indeed a duplicate of this bug. I looked at the disassembly and saw the problematic combination of three functions using the same register and an exception being thrown.

  2. My reproduction example no longer reproduces the problem. It is very difficult to give a simple example that causes this issue to be observable and very small changes can make the problem disappear. It's possible that change to a header files in a MSVC update would be enough to prevent my example from reproducing the problem (sigh).

  3. This is a serious issue for my company. We have stopped shipping production code on Windows until there is a fix. While the XMM register corruption can potentially affect every function, it only turns into a problem if 3 functions are aligned in a very particular way. This 'alignment' is less common and, so far, I've seen this problem twice in our 100K line codebase. My concern is that there are other cases we just haven't found and unit testing has a different code path and so isn't going to help us find them.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 12, 2019

I think I've located the cause of the new problem. I will try and give some easier reproduction steps but at this stage the best I can do is give a snippet of disassembly from our product code. What follows is the unwind funclet of a problematic function:

"?dtor$15@?0??open@?$QueryIteratorImpl@$0A@$0A@$00$0A@@@UEAA_KXZ@4HA":
.seh_proc "?dtor$15@?0??open@?$QueryIteratorImpl@$0A@$0A@$00$0A@@@UEAA_KXZ@4HA"
.LBB288_15:
#DEBUG_VALUE: now <- [DW_OP_plus_uconst 120, DW_OP_deref] $rbp
#DEBUG_VALUE: open:this <- $rsi
mov qword ptr [rsp + 16], rdx
push rbp
.seh_pushreg 5
push rsi
.seh_pushreg 6
sub rsp, 56
.seh_stackalloc 56
lea rbp, [rdx + 128]
.Ltmp9548:
movaps xmmword ptr [rsp + 48], xmm6 # 16-byte Spill
.seh_savexmm 6, 48
.seh_endprologue
.Ltmp9549:
#DEBUG_VALUE: now <- [DW_OP_plus_uconst 120, DW_OP_deref] $rbp
lea rcx, [rbp + 120]
.Ltmp9550:
.cv_loc 3880 1 147 0 # Core\src\querying\QueryIterator.cpp:147:0
call "??1ResourceValue@@qeaa@XZ"
movaps xmm6, xmmword ptr [rsp + 48] # 16-byte Reload
add rsp, 56
pop rsi
.Ltmp9551:
pop rbp
ret # CLEANUPRET

We can see that the funclet now spills the 16 bytes of xmm6 register to location rsp + 48. However the funclet has only reserved 56 bytes of stack space so it overflows by 8 bytes.

My attempts to reproduce this issue with a smaller example has been unsuccessful and the code generated appears to reserve the correct amount of space to save the xmm registers.

Also as a side note, I don't think the above funclet needs to save/reload the xmm6 because it doesn't modify it. I guess clang decides which registers to spill based on the parent function rather than the funclet. This results in non-optimal code rather than anything wrong.

@phoebewang
Copy link
Contributor

We can see that the funclet now spills the 16 bytes of xmm6 register to
location rsp + 48. However the funclet has only reserved 56 bytes of stack
space so it overflows by 8 bytes.

My attempts to reproduce this issue with a smaller example has been
unsuccessful and the code generated appears to reserve the correct amount of
space to save the xmm registers.

Also as a side note, I don't think the above funclet needs to save/reload
the xmm6 because it doesn't modify it. I guess clang decides which registers
to spill based on the parent function rather than the funclet. This results
in non-optimal code rather than anything wrong.

Hi Brain, the behavior in your example is as expected. The X86 stack is growing downward, so xmm6 is stored in [rsp + 32, rsp + 48). There's no overflow here.

The math that reserves memory in stack is:

  1. Reserves 32 bytes (aligned to 16 bytes) for callee functions;
  2. Allocates 16 * N bytes (aligned to 16 bytes) for N XMM CSRs;
  3. Aligns to 16 bytes for 1 and 2

The odd pushes (pushes except 'push rbp') need an extra 8 bytes to acquire alignment. That's why we subtract 56 rather than 48 here.

@phoebewang
Copy link
Contributor

Oop, seems I was confused by the pushes behavior. The memory accessing is growing upward. For any cases the offset for xmm spilling should be 32.

@phoebewang
Copy link
Contributor

Hi Brain, I have committed a patch on https://reviews.llvm.org/D70224, I guess this should be useful for your problems. Thanks for the great tracking.

@phoebewang
Copy link
Contributor

Merged to 9.0 branch by https://reviews.llvm.org/rG3437c7fc4470.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 18, 2019

I have tested my product code using this latest version of clang (commit 3437c7f) and confirm it resolves my issue.

Thank you very much for fixing this.

@rnk
Copy link
Collaborator

rnk commented Dec 6, 2019

*** Bug llvm/llvm-bugzilla-archive#43977 has been marked as a duplicate of this bug. ***

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 12, 2020

*** Bug llvm/llvm-bugzilla-archive#45163 has been marked as a duplicate of this bug. ***

@rnk
Copy link
Collaborator

rnk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#43977

@slacka
Copy link
Mannequin

slacka mannequin commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#44063

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#45163

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category
Projects
None yet
Development

No branches or pull requests

4 participants