-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suboptimal codegen for conditional float return? #38882
Comments
Matthias, do you have any suggestions for how to improve this? |
Given these two equivalent IR files where I've flipped the condition and select operands, we produce better results with one than the other. define dso_local float @_Z4testf(float %x) local_unnamed_addr #0 { define dso_local float @_Z4testf2(float %x) local_unnamed_addr #0 { |
For reference, instcombine is missing several potential select canonicalizations, but it does canonicalize this code already. Unfortunately, it's to the 'eq' version, and that's the one that has the poor codegen. |
I assume with "bad codegen" you specifically mean the "vmovdqa xmm1, xmm0", "vmovdqa xmm0, xmm1" combination. The problem at the MI level here is that the code looks like something like this after the CMOV got replaced with a jump:
Unfortunately bb.0->bb.2 is a critical edge, and we are forced to keep the FsFLD0SS instruction in bb.0. If I force CodeGen to split all critical edges we end up with better code:
I'm not up to date with the discussion of when we split or not split critical edges and it definitely isn't unviversally good to split them. This example however would clearly benefit from splitting, so maybe we can come up with a heuristic to catch this situation... |
Or maybe we should change that CMOV expansion code to not produce critical edges. Given that the choice of edge is arbitrary, I'd rather not introduce any critical edge here to have more consistent codegen in that case... |
I was just about to ask if we should do that. |
Patch to avoid creating critical edges: https://reviews.llvm.org/D54119 Unfortunately it doesn't seem to help this case by itself. |
Looks like tailduplication code is making a bad decision because the instructions have been sunk yet... If I move the tailduplicator to run after machine sinking in TargetPassConfig then the example works as expected... Not sure though what the interaction of taildup with the other passes are and if moving it is a good idea in general. Short of that maybe we can do an extra round of machine sinking when we lowered any cmovs? (Or implement some simple version of sinking inside the cmov lowering) |
And BTW: For me the example works with your patch regardless. Tail duplication makes the other edge critical so the |
I was checking the output from the two test cases from comment 2. With my patch they switch behavior. So I guess it fixes the original case, but regresses some opposite of that case. |
Yeah as I explained in my other comment: Short of the instruction actually sunk into the blocks it's probably random which edge the tail duplicator picks. That probably also explains some of the regressions in the tests you saw. I CCed Kyle maybe he can make a statement as to the effects of moving early tail duplication later into the pipeline (at least after the pre-RA machine sinking)... |
Kyle, have you had a chance to look at this? |
I haven't been working on llvm in a while, but I can't think of any reason why moving early tail duplication after sinking would cause problems. If it improves benchmarks and it doesn't cause regressions, go for it. |
Clang still generates the same code. |
Extended Description
bool isNanOrInf(float x)
{
return (reinterpret_cast<unsigned&>(x) & 0x7F800000) == 0x7F800000;
}
float test(float x)
{
return isNanOrInf(x) ? 0 : x;
}
test(float):
vmovd eax, xmm0
not eax
vpxor xmm1, xmm1, xmm1
and eax, 2139095040
je .LBB1_2
vmovdqa xmm1, xmm0
.LBB1_2:
vmovdqa xmm0, xmm1
ret
In contrast gcc does not use xmm1 at all:
test(float):
vmovd eax, xmm0
mov edx, 2139095040
and edx, eax
cmp edx, 2139095040
je .L5
ret
.L5:
vxorps xmm0, xmm0, xmm0
ret
https://gcc.godbolt.org/z/biNBFq
The text was updated successfully, but these errors were encountered: