LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 32507 - [Win64] XMM registers saved in funclet prologues clobber register saves in parent function
Summary: [Win64] XMM registers saved in funclet prologues clobber register saves in pa...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P normal
Assignee: Andy Kaylor
URL:
Keywords:
: 36103 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-04-03 09:55 PDT by Reid Kleckner
Modified: 2019-08-30 14:08 PDT (History)
7 users (show)

See Also:
Fixed By Commit(s): r370005


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Reid Kleckner 2017-04-03 09:55:20 PDT
This C++ program illustrates the bug:

$ cat a.cpp
double get_xmm(double);
void throw_internally();
void use_xmm(double d);
int main() {
  double xmm = get_xmm(1.0);
  use_xmm(xmm);
  throw_internally();
  use_xmm(xmm);
}

$ cat b.cpp
extern "C" int printf(const char *, ...);
void use_xmm(double d) { printf("%f\n", d); }
double __declspec(noinline) get_xmm(double d) {
  asm volatile("" : "=m"(d) : "0"(d));
  return d;
}
void throw_internally() {
  double r1 = get_xmm(2.0);
  use_xmm(r1);
  use_xmm(r1);
  try {
    throw 42;
  } catch (...) {
    double r2 = get_xmm(3.0);
    use_xmm(r2);
  }
  use_xmm(r1);
}

$ clang-cl -Z7 -EHsc -O0 a.cpp b.cpp -Fet.exe && ./t.exe
1.000000
2.000000
2.000000
3.000000
2.000000
1.000000

$ clang-cl -Z7 -EHsc -O2 a.cpp b.cpp -Fet.exe && ./t.exe
1.000000
2.000000
2.000000
3.000000
2.000000
2.000000  # BUG: should be 1.0

The buggy code is in the catch funclet prologue. It does this:

"?catch$2@?0??throw_internally@@YAXXZ@4HA":
.seh_proc "?catch$2@?0??throw_internally@@YAXXZ@4HA"
	.seh_handler __CxxFrameHandler3, @unwind, @except
	movq	%rdx, 16(%rsp)
	pushq	%rbp
	.seh_pushreg 5
	pushq	%rsi
	.seh_pushreg 6
	subq	$40, %rsp
	.seh_stackalloc 40
	leaq	80(%rdx), %rbp
	movdqa	%xmm6, -16(%rbp)        # 16-byte Spill
	.seh_savexmm 6, 64
	.seh_endprologue

We set up RBP to point back to the parent stack frame, and then do XMM spills through RBP. This is no good, because when we eventually return from throw_internally, we will now restore XMM6 to the value it contained before running the catch handler (2.0 in the example), and not the value it contained before calling throw_internally (1.0 in the example).

Funclets should allocate their own stack memory to spill XMM registers.

This bug could probably also be observed by re-throwing the exception out of the catch handler, since the .seh_savexmm directives we emit for the catch handler are definitely wrong.
Comment 1 David Nadlinger 2017-04-03 11:07:29 PDT
Is this really just an enhancement? At least for the LDC side of things, this is effectively a regression from 3.9. (Disregard this if you are working on this anyway, of course.)
Comment 2 David Majnemer 2017-04-03 11:20:01 PDT
(In reply to David Nadlinger from comment #1)
> Is this really just an enhancement? At least for the LDC side of things,
> this is effectively a regression from 3.9. (Disregard this if you are
> working on this anyway, of course.)

enhancement is just the default severity. LLVM hackers generally do not use the importance field for anything.
Comment 3 David Nadlinger 2017-04-03 11:40:02 PDT
(In reply to David Majnemer from comment #2)
> enhancement is just the default severity. LLVM hackers generally do not use
> the importance field for anything.

Sorry for the noise, then – I guess I've just been burnt by "enhancements" rotting away in trackers before, so I thought I would take this opportunity to ask. Thanks!
Comment 4 Reid Kleckner 2018-02-01 12:50:00 PST
*** Bug 36103 has been marked as a duplicate of this bug. ***
Comment 5 Reid Kleckner 2018-02-01 12:55:59 PST
Without looking at the code, I think the solution here is probably going to be to make X86FrameLowering emit XMM saves with direct offsets from RSP, rather than using the fixed FrameIndex operand that we do now. That will allow us to use the same logic for the main function as we do for the funclet.

A secondary issue that would help EH code size a lot would be to make our CSR usage analysis funclet-local. Catch handlers generally don't use many integer of XMM registers, but right now we spill CSRs used anywhere in the parent function in every funclet regardless of what registers that funclet uses.
Comment 6 Mykhailo Kremniov 2019-01-11 08:29:17 PST
I've stumbled on this issue recently with Clang 7. I wonder, is there any chance for it to be fixed anytime soon? Or is there any workaround (besides compiling with -O0)?
Comment 7 Shoaib Meenai 2019-01-11 12:36:10 PST
In my particular case, I was able to work around it by adding __declspec(noinline) to some functions. Those functions had catch handlers but weren't doing any floating point computations themselves, so preventing them from being inlined into functions that were performing floating point computations worked around the issue. That's not a general solution, but it's a possible workaround.
Comment 8 Adrian Vogelsgesang 2019-01-23 10:18:34 PST
We also stumbled over this issue in our code base.
In our case, there are not even doubles involved, but reading from XMM6 was used to reset two subsequent pointers (begin & end pointer within a std::vector) to nullptrs as part of a std::move out of the vector.
But since XMM6 was already clobbered by the time the vector was moved, the vector now pointing to invalid data. The destructor of the vector later on crashed when trying to destruct to contained polymorphic objects.

In our case noinline unfortunately didn't work. Did anyone else hit this and maybe found a different way to circumvent the issue?
Comment 9 Andy Kaylor 2019-02-08 12:33:25 PST
I've just posted a proposed patch to fix this problem.

https://reviews.llvm.org/D57970

This is a quick and dirty patch that was basically just the quickest thing I could think of to get things working. I'm definitely open to better approaches, such as the hardcoded offset approach that Reid suggested, but I'd prefer to get some kind of fix committed soon before tackling the larger code quality issues which look like they'll require a good bit more work.
Comment 10 Andy Kaylor 2019-02-08 13:47:42 PST
I've posted a patch that fixes this problem in Phabricator.

https://reviews.llvm.org/D57970

A reviewer there pointed out a problem with my solution, but I think it's better than the current behavior and will work in most cases. If it doesn't look like the alternate approaches suggested by the reviewer can be implemented quickly, I'll commit my patch to xmain to unblock this test.

We definitely need a better long term solution.
Comment 11 Andy Kaylor 2019-02-08 13:49:40 PST
Sorry for the noise. That last comment was meant to go somewhere else.
Comment 12 Reid Kleckner 2019-08-30 14:08:18 PDT
Pengfei Wang fixed this in r370005.