New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FastMath] Failure to fold 1 / powf(x,y) -> powf(x, -y) #48491
Comments
It should be easy enough to get these in instcombine. There's an open question about exactly which FMF should be matched. Candidates are any/all of: https://llvm.org/docs/LangRef.html#fast-math-flags In practical terms, once the programmer allows "reassoc", they should be prepared for anything, but we try to be safer on most fdiv transforms and also check "arcp". |
https://reviews.llvm.org/D96648 I left "powi" off of that because I'm not sure how we should deal with the case where the exponent is 0x8000000 (signed min i32). gcc doesn't seem to care about that possibility? |
Looking at the code where these snippets came from, technically we could use valuetracking to check the min/max bounds of the int - although I think more likely we'll end up having to convert to a float and negate that, which lose us any perf benefit. |
Make sure I'm seeing it correctly - in the general case, we need to decide which of these has better perf: Or: So we have to know if the divss costs more than the presumed savings of the "powi" call vs. the "powf" call. In IR (instcombine), we could say that getting rid of the fdiv is worth adding 2 extra instructions (fneg + sitofp). I don't know where in the gcc pipeline this is implemented, but that's what they seem to have done: |
Interestingly, I'm not sure compiler-rt accounts for MIN_INT in powi: https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/powisf2.c |
powi() isn't part of the standard math lib, so anything goes? https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Other-Builtins.html#Other-Builtins — Built-in Function: double __builtin_powi (double, int)
|
We probably want to add at least these 2 related folds for exp() and exp2()? diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
if (I.hasAllowReassoc() && Op0->hasOneUse() && Op1->hasOneUse()) { |
gcc has those, so I'll add some tests and push that: |
Adding the exp/exp2 cases would be awesome - thanks. It would be useful to support powi if at all possible, given the vagueness of the builtin - maybe if we just add something to the langref 'powi(x, INT_MIN) is undefined' and we always do the negation? |
Added with:
I'm still not clear on when it is optimal to convert powi to powf. I haven't tried to benchmark it yet, but I thought powi is (much) faster given that it's just a loop of fmul. Is there a larger example where we see powf as the winner? |
Whats still stopping us just mapping 1/powi(x,i) -> powi(x,-i) ? |
I think that case is fine except for the potential INT_MIN corner case. Note that gcc is not getting that transform in the example here. So I was confused in my earlier comments - we're never going to convert powi to powf; gcc is missing an opportunity to convert powf into powi. I don't know how to prove this, but I wonder if we could say that for any supported LLVM FP types powi(X, 0x80000000) is either 0.0, 1.0, or Inf (and so the reciprocal is Inf, 1.0, or 0.0). If that holds, it's safe to do the transform with -ffast-math because that implies 'ninf'? |
I think the combined disclaimers on powi and FMF allow this: So that's the last item for this bug on my list. |
Awesome - thanks @spatel! |
Extended Description
#include
float invpow_i(float x, int y) {
return 1.0f / __builtin_powf(x, y);
}
float invpow_f(float x, float y) {
return 1.0f / __builtin_powf(x, y);
}
https://simd.godbolt.org/z/YxaW96
Clang fails to merge the fdiv into the pow by negating the exponent arg
The text was updated successfully, but these errors were encountered: