-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Win64] XMM registers saved in funclet prologues clobber register saves in parent function #31854
Comments
assigned to @andykaylor |
Is this really just an enhancement? At least for the LDC side of things, this is effectively a regression from 3.9. (Disregard this if you are working on this anyway, of course.) |
enhancement is just the default severity. LLVM hackers generally do not use the importance field for anything. |
Sorry for the noise, then – I guess I've just been burnt by "enhancements" rotting away in trackers before, so I thought I would take this opportunity to ask. Thanks! |
*** Bug llvm/llvm-bugzilla-archive#36103 has been marked as a duplicate of this bug. *** |
Without looking at the code, I think the solution here is probably going to be to make X86FrameLowering emit XMM saves with direct offsets from RSP, rather than using the fixed FrameIndex operand that we do now. That will allow us to use the same logic for the main function as we do for the funclet. A secondary issue that would help EH code size a lot would be to make our CSR usage analysis funclet-local. Catch handlers generally don't use many integer of XMM registers, but right now we spill CSRs used anywhere in the parent function in every funclet regardless of what registers that funclet uses. |
I've stumbled on this issue recently with Clang 7. I wonder, is there any chance for it to be fixed anytime soon? Or is there any workaround (besides compiling with -O0)? |
In my particular case, I was able to work around it by adding __declspec(noinline) to some functions. Those functions had catch handlers but weren't doing any floating point computations themselves, so preventing them from being inlined into functions that were performing floating point computations worked around the issue. That's not a general solution, but it's a possible workaround. |
We also stumbled over this issue in our code base. In our case noinline unfortunately didn't work. Did anyone else hit this and maybe found a different way to circumvent the issue? |
I've just posted a proposed patch to fix this problem. https://reviews.llvm.org/D57970 This is a quick and dirty patch that was basically just the quickest thing I could think of to get things working. I'm definitely open to better approaches, such as the hardcoded offset approach that Reid suggested, but I'd prefer to get some kind of fix committed soon before tackling the larger code quality issues which look like they'll require a good bit more work. |
I've posted a patch that fixes this problem in Phabricator. https://reviews.llvm.org/D57970 A reviewer there pointed out a problem with my solution, but I think it's better than the current behavior and will work in most cases. If it doesn't look like the alternate approaches suggested by the reviewer can be implemented quickly, I'll commit my patch to xmain to unblock this test. We definitely need a better long term solution. |
Sorry for the noise. That last comment was meant to go somewhere else. |
Pengfei Wang fixed this in r370005. |
mentioned in issue llvm/llvm-bugzilla-archive#36103 |
Extended Description
This C++ program illustrates the bug:
$ cat a.cpp
double get_xmm(double);
void throw_internally();
void use_xmm(double d);
int main() {
double xmm = get_xmm(1.0);
use_xmm(xmm);
throw_internally();
use_xmm(xmm);
}
$ cat b.cpp
extern "C" int printf(const char *, ...);
void use_xmm(double d) { printf("%f\n", d); }
double __declspec(noinline) get_xmm(double d) {
asm volatile("" : "=m"(d) : "0"(d));
return d;
}
void throw_internally() {
double r1 = get_xmm(2.0);
use_xmm(r1);
use_xmm(r1);
try {
throw 42;
} catch (...) {
double r2 = get_xmm(3.0);
use_xmm(r2);
}
use_xmm(r1);
}
$ clang-cl -Z7 -EHsc -O0 a.cpp b.cpp -Fet.exe && ./t.exe
1.000000
2.000000
2.000000
3.000000
2.000000
1.000000
$ clang-cl -Z7 -EHsc -O2 a.cpp b.cpp -Fet.exe && ./t.exe
1.000000
2.000000
2.000000
3.000000
2.000000
2.000000 # BUG: should be 1.0
The buggy code is in the catch funclet prologue. It does this:
"?catch$2@?0??throw_internally@@yaxxz@4HA":
.seh_proc "?catch$2@?0??throw_internally@@yaxxz@4HA"
.seh_handler __CxxFrameHandler3, @unwind, @except
movq %rdx, 16(%rsp)
pushq %rbp
.seh_pushreg 5
pushq %rsi
.seh_pushreg 6
subq $40, %rsp
.seh_stackalloc 40
leaq 80(%rdx), %rbp
movdqa %xmm6, -16(%rbp) # 16-byte Spill
.seh_savexmm 6, 64
.seh_endprologue
We set up RBP to point back to the parent stack frame, and then do XMM spills through RBP. This is no good, because when we eventually return from throw_internally, we will now restore XMM6 to the value it contained before running the catch handler (2.0 in the example), and not the value it contained before calling throw_internally (1.0 in the example).
Funclets should allocate their own stack memory to spill XMM registers.
This bug could probably also be observed by re-throwing the exception out of the catch handler, since the .seh_savexmm directives we emit for the catch handler are definitely wrong.
The text was updated successfully, but these errors were encountered: