[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

RKSimon · 2021-02-11T14:30:21Z


Bugzilla Link	49147
Resolution	FIXED
Resolved on	Feb 27, 2021 02:54
Version	trunk
OS	Windows NT
CC	@rotateright

Extended Description

#include

float invpow_i(float x, int y) {
return 1.0f / __builtin_powf(x, y);
}
float invpow_f(float x, float y) {
return 1.0f / __builtin_powf(x, y);
}

https://simd.godbolt.org/z/YxaW96

Clang fails to merge the fdiv into the pow by negating the exponent arg

rotateright · 2021-02-12T19:57:31Z

It should be easy enough to get these in instcombine.

There's an open question about exactly which FMF should be matched. Candidates are any/all of:
reassoc
arcp
afn

https://llvm.org/docs/LangRef.html#fast-math-flags

In practical terms, once the programmer allows "reassoc", they should be prepared for anything, but we try to be safer on most fdiv transforms and also check "arcp".

rotateright · 2021-02-13T16:44:53Z

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to care about that possibility?

RKSimon · 2021-02-13T19:08:53Z

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the
case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
care about that possibility?

Looking at the code where these snippets came from, technically we could use valuetracking to check the min/max bounds of the int - although I think more likely we'll end up having to convert to a float and negate that, which lose us any perf benefit.

rotateright · 2021-02-14T13:37:36Z

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the
case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
care about that possibility?

Looking at the code where these snippets came from, technically we could use
valuetracking to check the min/max bounds of the int - although I think more
likely we'll end up having to convert to a float and negate that, which lose
us any perf benefit.

Make sure I'm seeing it correctly - in the general case, we need to decide which of these has better perf:
callq __powisf2@PLT
divss %xmm0, %xmm1

Or:
pxor %xmm1, %xmm1
cvtsi2ssl %edi, %xmm1
xorps .LC0(%rip), %xmm1
jmp powf

So we have to know if the divss costs more than the presumed savings of the "powi" call vs. the "powf" call.

In IR (instcombine), we could say that getting rid of the fdiv is worth adding 2 extra instructions (fneg + sitofp).

I don't know where in the gcc pipeline this is implemented, but that's what they seem to have done:
https://simd.godbolt.org/z/WjWrv1

RKSimon · 2021-02-20T12:09:17Z

Interestingly, I'm not sure compiler-rt accounts for MIN_INT in powi:

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c

rotateright · 2021-02-20T13:52:55Z

powi() isn't part of the standard math lib, so anything goes?

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Other-Builtins.html#Other-Builtins

— Built-in Function: double __builtin_powi (double, int)

Returns the first argument raised to the power of the second. Unlike the pow function no guarantees about precision and rounding are made.

rotateright · 2021-02-20T13:59:11Z

We probably want to add at least these 2 related folds for exp() and exp2()?

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 9bc566ed3523..702f76da2774 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1326,8 +1326,9 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
if (match(Op1, m_FDiv(m_SpecificFP(1.0), m_Value(Y))))
return BinaryOperator::CreateFMulFMF(Y, Op0, &I);

// Negate the exponent of pow to fold division-by-pow() into multiply:

// Negate the exponent of pow/exp to fold division-by-pow() into multiply:
// Z / pow(X, Y) --> Z * pow(X, -Y)
// Z / exp{2}(Y) --> Z * exp{2}(-Y)
// In the general case, this creates an extra instruction, but fmul allows
// for better canonicalization and optimization than fdiv.
if (match(Op1,
@@ -1336,6 +1337,16 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
Value *Pow = Builder.CreateBinaryIntrinsic(Intrinsic::pow, X, NegY, &I);
return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
}
if (match(Op1, m_OneUse(m_IntrinsicIntrinsic::exp(m_Value(Y))))) {

 Value *NegY = Builder.CreateFNegFMF(Y, &I);

 Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp, NegY, &I);

 return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);

}
if (match(Op1, m_OneUse(m_IntrinsicIntrinsic::exp2(m_Value(Y))))) {

 Value *NegY = Builder.CreateFNegFMF(Y, &I);

 Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp2, NegY, &I);

 return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);

}
}

if (I.hasAllowReassoc() && Op0->hasOneUse() && Op1->hasOneUse()) {

rotateright · 2021-02-20T14:03:37Z

We probably want to add at least these 2 related folds for exp() and exp2()?

gcc has those, so I'll add some tests and push that:
https://simd.godbolt.org/z/8KMGTf

RKSimon · 2021-02-20T17:47:58Z

Adding the exp/exp2 cases would be awesome - thanks.

It would be useful to support powi if at all possible, given the vagueness of the builtin - maybe if we just add something to the langref 'powi(x, INT_MIN) is undefined' and we always do the negation?

rotateright · 2021-02-20T21:14:36Z

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't tried to benchmark it yet, but I thought powi is (much) faster given that it's just a loop of fmul. Is there a larger example where we see powf as the winner?

RKSimon · 2021-02-21T10:43:41Z

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't
tried to benchmark it yet, but I thought powi is (much) faster given that
it's just a loop of fmul. Is there a larger example where we see powf as the
winner?

Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?

rotateright · 2021-02-22T21:55:23Z

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't
tried to benchmark it yet, but I thought powi is (much) faster given that
it's just a loop of fmul. Is there a larger example where we see powf as the
winner?

Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?

I think that case is fine except for the potential INT_MIN corner case. Note that gcc is not getting that transform in the example here. So I was confused in my earlier comments - we're never going to convert powi to powf; gcc is missing an opportunity to convert powf into powi.

I don't know how to prove this, but I wonder if we could say that for any supported LLVM FP types powi(X, 0x80000000) is either 0.0, 1.0, or Inf (and so the reciprocal is Inf, 1.0, or 0.0).

If that holds, it's safe to do the transform with -ffast-math because that implies 'ninf'?

rotateright · 2021-02-25T13:11:37Z

I think the combined disclaimers on powi and FMF allow this:
https://reviews.llvm.org/rGa7cee55762c6

So that's the last item for this bug on my list.

RKSimon · 2021-02-27T10:54:47Z

Awesome - thanks @spatel!

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

RKSimon commented Feb 11, 2021

rotateright commented Feb 12, 2021

rotateright commented Feb 13, 2021

RKSimon commented Feb 13, 2021

rotateright commented Feb 14, 2021

RKSimon commented Feb 20, 2021

rotateright commented Feb 20, 2021

rotateright commented Feb 20, 2021

rotateright commented Feb 20, 2021

RKSimon commented Feb 20, 2021

rotateright commented Feb 20, 2021

RKSimon commented Feb 21, 2021

rotateright commented Feb 22, 2021

rotateright commented Feb 25, 2021

RKSimon commented Feb 27, 2021

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

Comments

RKSimon commented Feb 11, 2021

Extended Description

rotateright commented Feb 12, 2021

rotateright commented Feb 13, 2021

RKSimon commented Feb 13, 2021

rotateright commented Feb 14, 2021

RKSimon commented Feb 20, 2021

rotateright commented Feb 20, 2021

rotateright commented Feb 20, 2021

rotateright commented Feb 20, 2021

RKSimon commented Feb 20, 2021

rotateright commented Feb 20, 2021

RKSimon commented Feb 21, 2021

rotateright commented Feb 22, 2021

rotateright commented Feb 25, 2021

RKSimon commented Feb 27, 2021