-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP] Failure to create v2f64 comparison reductions #43090
Comments
HorizontalReduction.tryToReduce bails out of <2 x X> reduction cases, which seems to prevent the fcmp vectorization as well. |
Would this handle it? |
Yes I think so, if we don't want to introduce <2 x X> reductions for whatever reason, we would catch this in the backend if only the fcmp had vectorized. |
This contains an ideal version that merges the reductions into a single result:
|
D59710 became a restricted first step to avoid regressions on other cases: |
VectorCombine proposal that would create the missing reduction pattern (not intrinsic though): I'm not sure how we get from there to the ideal code from comment 4. |
Handling in the backend should be relatively trivial with the recent work on CMP(MOVMSK()) folds. |
After https://reviews.llvm.org/rGb6315aee5b42, we have this IR:
SDAG can't do anything with the ops in different basic blocks. This would require SimplifyCFG to speculatively execute the 2nd reduction plus some instcombine or other transforms to convert to a single reduction? Current x86 AVX asm looks like this:
|
In c-ray 1.1, gcc 11 is 94% faster than clang 12. |
Current IR:
|
Current IR:
|
We're failing to vectorize several comparison reduction patterns. Issue #43090 was based off this, but while that simplified test case is now folding, the original still fails due to poor cost model values for vXi1 extractions
Candidate Patch: https://reviews.llvm.org/D134605 |
Extended Description
Current Codegen: https://godbolt.org/z/n0UB_k
Pulled out of the c-ray benchmark:
SLP fails to create AND reductions for either of these, let alone merge them into a single (tweaked) reduction (and branch). Oddly it also manages to vectorize only one of the comparisons but then fails to form a reduction for the result.
clang -g0 -O3 -march=btver2 -emit-llvm
The text was updated successfully, but these errors were encountered: