New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing trunc(ctpop(zext(x))) -> ctpop(x) fold #49485
Comments
assigned to @rotateright |
Don't need the trunc unless we are seeing multi-use-of-the-zext examples. |
Maybe we want something similar for cttz too? define i32 @src(i16 %x) { %z = zext i16 %x to i32 define i32 @tgt(i16 %x) { %p = call i16 @llvm.cttz.i16(i16 %x, i1 true) declare i32 @llvm.cttz.i32(i32, i1) define i32 @src(i16 %x) { src: # @src |
also define i16 @src(i16 %x) { %z = zext i16 %x to i32 define i16 @tgt(i16 %x) { %p = call i16 @llvm.ctlz.i16(i16 %x, i1 false) declare i32 @llvm.ctlz.i32(i32, i1) define i16 @src(i16 %x) { |
Sure - cttz/ctlz can be improved too. Can you open another bug, so we can track that? |
Ok. llvm/llvm-bugzilla-archive#50172 , llvm/llvm-bugzilla-archive#50173 |
Extended Description
https://simd.godbolt.org/z/EcP4en5KG
#include <x86intrin.h>
__v8hu ctpop_int(__v8hu x) {
return (__v8hu) {
(unsigned short)__builtin_popcount( x[0] ),
(unsigned short)__builtin_popcount( x[1] ),
(unsigned short)__builtin_popcount( x[2] ),
(unsigned short)__builtin_popcount( x[3] ),
(unsigned short)__builtin_popcount( x[4] ),
(unsigned short)__builtin_popcount( x[5] ),
(unsigned short)__builtin_popcount( x[6] ),
(unsigned short)__builtin_popcount( x[7] )
};
}
define <8 x i16> @ctpop_int(<8 x i16> %0){
%2 = zext <8 x i16> %0 to <8 x i32>
%3 = call <8 x i32> @llvm.ctpop.v8i32(<8 x i32> %2)
%4 = trunc <8 x i32> %3 to <8 x i16>
ret <8 x i16> %4
}
declare <8 x i32> @llvm.ctpop.v8i32(<8 x i32>)
We should be able to just use a @llvm.ctpop.v8i16 call
Not sure if the trunc is vital, or whether we should also allow the fold:
The text was updated successfully, but these errors were encountered: