Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

Closed
RKSimon opened this issue Feb 11, 2021 · 14 comments
Closed

[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491

RKSimon opened this issue Feb 11, 2021 · 14 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Feb 11, 2021

Bugzilla Link 49147
Resolution FIXED
Resolved on Feb 27, 2021 02:54
Version trunk
OS Windows NT
CC @rotateright

Extended Description

#include

float invpow_i(float x, int y) {
return 1.0f / __builtin_powf(x, y);
}
float invpow_f(float x, float y) {
return 1.0f / __builtin_powf(x, y);
}

https://simd.godbolt.org/z/YxaW96

Clang fails to merge the fdiv into the pow by negating the exponent arg

@rotateright
Copy link
Contributor

It should be easy enough to get these in instcombine.

There's an open question about exactly which FMF should be matched. Candidates are any/all of:
reassoc
arcp
afn

https://llvm.org/docs/LangRef.html#fast-math-flags

In practical terms, once the programmer allows "reassoc", they should be prepared for anything, but we try to be safer on most fdiv transforms and also check "arcp".

@rotateright
Copy link
Contributor

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to care about that possibility?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 13, 2021

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the
case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
care about that possibility?

Looking at the code where these snippets came from, technically we could use valuetracking to check the min/max bounds of the int - although I think more likely we'll end up having to convert to a float and negate that, which lose us any perf benefit.

@rotateright
Copy link
Contributor

https://reviews.llvm.org/D96648

I left "powi" off of that because I'm not sure how we should deal with the
case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to
care about that possibility?

Looking at the code where these snippets came from, technically we could use
valuetracking to check the min/max bounds of the int - although I think more
likely we'll end up having to convert to a float and negate that, which lose
us any perf benefit.

Make sure I'm seeing it correctly - in the general case, we need to decide which of these has better perf:
callq __powisf2@PLT
divss %xmm0, %xmm1

Or:
pxor %xmm1, %xmm1
cvtsi2ssl %edi, %xmm1
xorps .LC0(%rip), %xmm1
jmp powf

So we have to know if the divss costs more than the presumed savings of the "powi" call vs. the "powf" call.

In IR (instcombine), we could say that getting rid of the fdiv is worth adding 2 extra instructions (fneg + sitofp).

I don't know where in the gcc pipeline this is implemented, but that's what they seem to have done:
https://simd.godbolt.org/z/WjWrv1

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 20, 2021

Interestingly, I'm not sure compiler-rt accounts for MIN_INT in powi:

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c

@rotateright
Copy link
Contributor

powi() isn't part of the standard math lib, so anything goes?

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Other-Builtins.html#Other-Builtins

— Built-in Function: double __builtin_powi (double, int)

Returns the first argument raised to the power of the second. Unlike the pow function no guarantees about precision and rounding are made.

@rotateright
Copy link
Contributor

We probably want to add at least these 2 related folds for exp() and exp2()?

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index 9bc566ed3523..702f76da2774 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -1326,8 +1326,9 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
if (match(Op1, m_FDiv(m_SpecificFP(1.0), m_Value(Y))))
return BinaryOperator::CreateFMulFMF(Y, Op0, &I);

  • // Negate the exponent of pow to fold division-by-pow() into multiply:
  • // Negate the exponent of pow/exp to fold division-by-pow() into multiply:
    // Z / pow(X, Y) --> Z * pow(X, -Y)
  • // Z / exp{2}(Y) --> Z * exp{2}(-Y)
    // In the general case, this creates an extra instruction, but fmul allows
    // for better canonicalization and optimization than fdiv.
    if (match(Op1,
    @@ -1336,6 +1337,16 @@ Instruction *InstCombinerImpl::visitFDiv(BinaryOperator &I) {
    Value *Pow = Builder.CreateBinaryIntrinsic(Intrinsic::pow, X, NegY, &I);
    return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
    }
  • if (match(Op1, m_OneUse(m_IntrinsicIntrinsic::exp(m_Value(Y))))) {
  •  Value *NegY = Builder.CreateFNegFMF(Y, &I);
    
  •  Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp, NegY, &I);
    
  •  return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
    
  • }
  • if (match(Op1, m_OneUse(m_IntrinsicIntrinsic::exp2(m_Value(Y))))) {
  •  Value *NegY = Builder.CreateFNegFMF(Y, &I);
    
  •  Value *Pow = Builder.CreateUnaryIntrinsic(Intrinsic::exp2, NegY, &I);
    
  •  return BinaryOperator::CreateFMulFMF(Op0, Pow, &I);
    
  • }
    }

if (I.hasAllowReassoc() && Op0->hasOneUse() && Op1->hasOneUse()) {

@rotateright
Copy link
Contributor

We probably want to add at least these 2 related folds for exp() and exp2()?

gcc has those, so I'll add some tests and push that:
https://simd.godbolt.org/z/8KMGTf

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 20, 2021

Adding the exp/exp2 cases would be awesome - thanks.

It would be useful to support powi if at all possible, given the vagueness of the builtin - maybe if we just add something to the langref 'powi(x, INT_MIN) is undefined' and we always do the negation?

@rotateright
Copy link
Contributor

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't tried to benchmark it yet, but I thought powi is (much) faster given that it's just a loop of fmul. Is there a larger example where we see powf as the winner?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 21, 2021

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't
tried to benchmark it yet, but I thought powi is (much) faster given that
it's just a loop of fmul. Is there a larger example where we see powf as the
winner?

Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?

@rotateright
Copy link
Contributor

Adding the exp/exp2 cases would be awesome - thanks.

Added with:
https://reviews.llvm.org/rGe772618f1ee2

It would be useful to support powi if at all possible, given the vagueness
of the builtin - maybe if we just add something to the langref 'powi(x,
INT_MIN) is undefined' and we always do the negation?

I'm still not clear on when it is optimal to convert powi to powf. I haven't
tried to benchmark it yet, but I thought powi is (much) faster given that
it's just a loop of fmul. Is there a larger example where we see powf as the
winner?

Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ?

I think that case is fine except for the potential INT_MIN corner case. Note that gcc is not getting that transform in the example here. So I was confused in my earlier comments - we're never going to convert powi to powf; gcc is missing an opportunity to convert powf into powi.

I don't know how to prove this, but I wonder if we could say that for any supported LLVM FP types powi(X, 0x80000000) is either 0.0, 1.0, or Inf (and so the reciprocal is Inf, 1.0, or 0.0).

If that holds, it's safe to do the transform with -ffast-math because that implies 'ninf'?

@rotateright
Copy link
Contributor

I think the combined disclaimers on powi and FMF allow this:
https://reviews.llvm.org/rGa7cee55762c6

So that's the last item for this bug on my list.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Feb 27, 2021

Awesome - thanks @​spatel!

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants