-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing simplification for llvm.vector.reduce.and on vector of i1 #50603
Comments
assigned to @LebedevRI |
It looks like we might be missing a simplification of @llvm.vector.reduce.and on a vector of i1: replacing %5 = call i1 @llvm.vector.reduce.and.v8i1(<8 x i1> %4) with an icmp looks like it should result in the better IR. (Thanks to Nick Lewycky for pointing this out.) |
Should we always be canonicalizing and/or/xor reductions to bitcasts+icmp/parity do you think? IIRC we already do this in a couple of cases already. |
Already done with: The godbolt link in the description is testing with 12.0.1. Here's trunk: There's still a difference in the IR and asm though. We might want to change both. |
Surely. https://godbolt.org/z/fhrPzvxzv |
Ok, i've added all the missing instcombine pieces, We now probably want to adjust costmodel to reflect that, |
Awesome, thanks! Minor point: the original two examples still don't canonicalize to the same IR: f1 uses %5 = bitcast <8 x i1> %4 to i8 resulting in
f2 uses %5 = sext <8 x i1> %4 to <8 x i8>
I have no idea which of these is better (if either), but presumably there's an instcombine missing or similar? |
Yep, looks like we fail to drop sext before bitcast-icmp: https://godbolt.org/z/Mf8vMfs5r |
Alright, i think the only thing we are missing before we can close this is cost model changes. This block: llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp Lines 1978 to 2147 in 01d59c0
needs to be effectively copied here: with an appropriate set of tests.
@Simon: might you be interested in handling with this? |
mentioned in issue llvm/llvm-bugzilla-archive#51305 |
Extended Description
Live demo: https://godbolt.org/z/a6YaahdPP
Example:
This results in the following:
I think these should produce the same assembly, and the result from f2 looks better to me (though both are the same size). Presumably we'd need to recognize that after vpcmpeqb, each lane in xmm0 is either all-zeros or all-ones, so the vpmovmskb is redundant.
The text was updated successfully, but these errors were encountered: