New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVM no longer honours -stack-alignment=4 #21809
Comments
Based on the input IR and what I've seen of the backend, this looks like some kind of phase ordering problem. The input test case has no allocas, so we don't think we need to realign the stack for any highly aligned allocations. Then register allocation (or vector legalization) runs, and we are forced to spill some large vector values. This introduces a highly aligned stack allocation, but it's already too late, we can't go back and add the realignment prologue. See this awesome comment in X86RegisterInfo about it maybe being too late to realign the stack: bool X86RegisterInfo::canRealignStack(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); // Stack realignment requires a frame pointer. If we already started // If a base pointer is necessary. Check that it isn't too late to reserve Even if that codepath isn't executing, it's possible that there is a similar issue. |
CC'ing Jim and Bob on this one... (as they can see the radar) and should be able to chime in on this one. IIRC, this was some refactoring I did with Jakob, but I don't recall the full details. It is/was a kind of phase ordering issue, but we never came up with a good solution. |
I think we could conservatively assume that any SSA value will need to be spilled to the stack and align the current frame to handle at least that. |
Reed, your proposal is to revert r168627, correct? It's conservatively correct, but you waste a register is many cases. I would prefer a more targeted approach. Any reason we shouldn't go with Jose's suggestion: reserve the frame pointer to enable stack-realignment, if the user specifies an alignment larger than the default stack alignment? |
To be clear -- my suggestion was enable frame pointer when needed. I'm not sure that enabling when the user specifies a stack alignment different from default stack alignment covers all cases. Imagine that the function needs to spill AVX 32-byte registers. Wouldn't that require a 32-byte aligned stack pointer? Or what if the function needed to call another function passing a 32-byte vector by reference. In other words, it seems to me that the alternatives are to revert r168627, or disable frame pointer omission blindly... |
I feel like if the default stack alignment is 4 that should do the same thing as the user specifying a stack alignment of 4, right? |
If the FP isn't available/reserved, then register allocator uses unaligned loads/stores to fill/spill. Thus, the FP isn't strictly needed for correctness in the cases you've described. |
IIRC, the default stack alignment on x86 is 16 bytes (i.e., -stack-alignment=2). If the user specifies a larger alignment than the default then we might consider reserving the FP and realigning the stack to the larger non-default alignment. |
It depends on the platforms. The default stack alignment on x86 on Windows is 4-bytes. In fact it seems this bus is mis-titled: it seems the problem is not that LLVM doesn't honor -stack-alignment=, but rather that x86 backend is broken for stack alignments (default or overriden) less than 16 bytes. |
If I try the attached input, targetting x86_64 linux with the default stack alignment (which is 16bytes), and grep for rsp I get $ llc-3.4 -o - -mtriple=x86_64-unknown-linux -mattr=+avx bug1347703.ll | grep rsp The spills are indeed with VMOVDQU, but doesn't VANDPS require the m256 argument to be 32-byte aligned? |
It has been a while and I don't remember the numbers, but I'm pretty sure that reverting r168627 would cause a significant performance regression. We should figure out how to fix the problems with -stack-alignment=4 some other way. |
This is why I care. :) I believe there are two workarounds for Jose:
If you can do 1, we can probably let this issue stand. |
The workaround we took was to set TargetOptions::NoFramePointerElim and TargetOptions::NoFramePointerElimNonLeaf on all x86 and x86_64 builds of Mesa. We were already doing this for non-release builds (to facilitate stack back traces when debugging/profiling), but after this issue it is clear that frame-pointer-omission is not just a minor optimization in LLVM, but has deep-reaching implications on X86 code generation, so I'm no longer comfortable in having (non-)release builds differing on this matter, as most of our automated tests actually run with non-release builds.
We generate LLVM IR directly (Mesa llvmpipe driver), and I'm used to tweak things in order to get LLVM, and can do so. But, for whatever my opinion is worth, I have to say this seems a serious flaw in LLVM x86 backend; and taking an approach that works 99% of cases for the sake of performance, instead of one that works 100% seems the wrong trade-off for a compiler; and if there are indeed possible workarounds then I fail to understand why pushing the workarounds to the user instead of automatically detecting the problematic situations and deploy the workarounds without intervention; at very least you should add an assert/error message when this happens -- anything less is just sweeping the problem under the rug... |
But, in this case, it would be forbidden for those fill/spill loads/stores (or maybe can only happen for loads) to get eliminated and folded into the arithmetic ops, though only for sse, not when using avx (as vex encoding can handle unaligned loads in ordinary arithmetic ops whereas sse can't). |
This is over a year old and the attached LLVM IR no longer parses with TOT llc. llc's command line interface isn't exactly stable or user facing either, so I don't think there's anything to do here. |
This bug is not specific llc... I provided the llc command line / IR merely to facilitate reproing and fixing the bug. We saw the bug on Mesa + llvmpipe with JIT. The problem here is that That said, it's clear that nothing happened. So probably nothing will ever happen. On Mac and Linux the ABI (both 32 and 64bits) states that stack alignment is always aligned 16bytes, so this is never a issue. (Maybe it matters for Microsoft, since on Windows 32 bits, the stack alignment is 4bytes.) But, if you don't want to fix, then at very least mark it as WONTFIX. Closing as INVALID on the grounds you can't no longer repro is IMO insult to the time I spent on obtaining a small repro, as per the bug guidelines. LLVM has very little stable interfaces, so if LLVM devs sleep it long enough you can close any bug on the basis you can repro... |
bug1347703-r253245.ll |
Here. Updated LLVM IR. Basically took the original IR and massaged with vim subst commands. :%s/getelementptr ({.})*/getelementptr \1, \1 |
Sorry, our bugzilla resolution statuses are really old and sometimes rude. I end up using "invalid" to mean "working as intended". I'm only trying to convey that we aren't taking action because:
It sounds like it's still important to you and you don't like your workaround, so we can reopen it. To resummarize what's wrong, LLVM is making a bad performance decision: it is choosing to use unaligned vector spills in order to omit the frame pointer. In Chad's case, this helped, but in your case, it hurts. Is that accurate? |
|
I agree, in general, doing unaligned vector loads seems pretty lame. I think in Chad's test case, the vector spills were not in the hot path, and it was on 32-bit x86, where going from 6 to 7 GPRs makes a big difference. |
|
Thanks.
When I filled this bug this was a crash, not performance issue. The 1347703 is the number of a VMware internal bug report, and I've just doubled checked: this was causing crashes on Windows, in 32bits processes, both on AVX and non-AVX systems. But I've just retried, and the crash/correctness issue might have indeed be fixed in LLVM since this was filed. For example, if I do llc -o - -mtriple=i686-pc-mingw32 -mattr=-avx,+sse2 -stack-alignment=4 bug1347703.ll then all accesses to the stack are with unaligned MOVs. This was not the case before. I have all llc versions from 3.3 to 3.6, and indeed it seems the issue was fixed going from LLVM 3.3 to 3.4. Performance was never my worry here. Even with the crash fixed, it makes sense for us to continue using frame pointer. My main concern was the crash. So it looks you can resolve this bug after all! |
OK, thanks for looking into it. I think we'll mark this fixed then. If someone runs into the performance issue, there are tons of workarounds: pass -fno-omit-frame-pointer, -mstackrealign, -mstack-alignment=32, or whatever. Those all use a different codepath from TargetOptions::StackAlignmentOverride, and should avoid the unaligned loads. |
Extended Description
LLVM stoped honouring
This can easily be reproduced with the attached bug1347703.ll doing
$ llc -o - -mtriple=i686-pc-mingw32 -mattr=+avx -stack-alignment=4 bug1347703.ll | grep '<and.*[er]sp'
$
which produces nothing showing that the stack pointer is never aligned.
However if one disables frame-pointer elimination then it works again:
$ /home/jfonseca/work/vmware/llvm/llvm/build/linux-x86_64/Debug+Asserts/bin/llc -disable-fp-elim -o - -mtriple=i686-pc-mingw32 -mattr=+avx -stack-alignment=4 bug1347703.ll | grep '<and.*[er]sp'
andl $-32, %esp
andl $-32, %esp
$
Which doesn't seem the right behavior -- using the frame-pointer to align the stack seems an implementation detail -- there are probably other ways, and if frame-pointer is indeed necessary it should be automatically employed as necessary.
This regression happened between LLVM release 3.2 and release 3.3. I strongly suspect r168627 -- llvm-mirror/llvm@1243922
The text was updated successfully, but these errors were encountered: