Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing trunc(ctpop(zext(x))) -> ctpop(x) fold #49485

Closed
RKSimon opened this issue Apr 27, 2021 · 7 comments
Closed

Missing trunc(ctpop(zext(x))) -> ctpop(x) fold #49485

RKSimon opened this issue Apr 27, 2021 · 7 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 27, 2021

Bugzilla Link 50141
Resolution FIXED
Resolved on Apr 30, 2021 03:33
Version trunk
OS Windows NT
CC @davidbolvansky,@LebedevRI,@nikic,@rotateright
Fixed by commit(s) 0f8b668

Extended Description

https://simd.godbolt.org/z/EcP4en5KG

#include <x86intrin.h>

__v8hu ctpop_int(__v8hu x) {
return (__v8hu) {
(unsigned short)__builtin_popcount( x[0] ),
(unsigned short)__builtin_popcount( x[1] ),
(unsigned short)__builtin_popcount( x[2] ),
(unsigned short)__builtin_popcount( x[3] ),
(unsigned short)__builtin_popcount( x[4] ),
(unsigned short)__builtin_popcount( x[5] ),
(unsigned short)__builtin_popcount( x[6] ),
(unsigned short)__builtin_popcount( x[7] )
};
}

define <8 x i16> @​ctpop_int(<8 x i16> %0){
%2 = zext <8 x i16> %0 to <8 x i32>
%3 = call <8 x i32> @​llvm.ctpop.v8i32(<8 x i32> %2)
%4 = trunc <8 x i32> %3 to <8 x i16>
ret <8 x i16> %4
}
declare <8 x i32> @​llvm.ctpop.v8i32(<8 x i32>)

We should be able to just use a @​llvm.ctpop.v8i16 call

Not sure if the trunc is vital, or whether we should also allow the fold:

ctpop(zext(x)) -> zext(ctpop(x))
@RKSimon
Copy link
Collaborator Author

RKSimon commented Apr 27, 2021

assigned to @rotateright

@rotateright
Copy link
Contributor

Don't need the trunc unless we are seeing multi-use-of-the-zext examples.

@rotateright
Copy link
Contributor

@davidbolvansky
Copy link
Collaborator

Maybe we want something similar for cttz too?

define i32 @​src(i16 %x) {

%z = zext i16 %x to i32
%p = call i32 @​llvm.cttz.i32(i32 %z, i1 true)
ret i32 %p
}

define i32 @​tgt(i16 %x) {

%p = call i16 @​llvm.cttz.i16(i16 %x, i1 true)
%z = zext i16 %p to i32
ret i32 %z
}

declare i32 @​llvm.cttz.i32(i32, i1)
declare i16 @​llvm.cttz.i16(i16, i1)


define i32 @​src(i16 %x) {
%0:
%z = zext i16 %x to i32
%p = cttz i32 %z, 1
ret i32 %p
}
=>
define i32 @​tgt(i16 %x) {
%0:
%p = cttz i16 %x, 0
%z = zext i16 %p to i32
ret i32 %z
}
Transformation seems to be correct!

src: # @​src
movzx eax, di
bsf eax, eax
ret
tgt: # @​tgt
bsf ax, di
movzx eax, ax
ret

@davidbolvansky
Copy link
Collaborator

also

define i16 @​src(i16 %x) {

%z = zext i16 %x to i32
%p = call i32 @​llvm.ctlz.i32(i32 %z, i1 false)
%zz = trunc i32 %p to i16
ret i16 %zz
}

define i16 @​tgt(i16 %x) {

%p = call i16 @​llvm.ctlz.i16(i16 %x, i1 false)
%z = add i16 %p, 16
ret i16 %z
}

declare i32 @​llvm.ctlz.i32(i32, i1)
declare i16 @​llvm.ctlz.i16(i16, i1)


define i16 @​src(i16 %x) {
%0:
%z = zext i16 %x to i32
%p = ctlz i32 %z, 0
%zz = trunc i32 %p to i16
ret i16 %zz
}
=>
define i16 @​tgt(i16 %x) {
%0:
%p = ctlz i16 %x, 0
%z = add i16 %p, 16
ret i16 %z
}
Transformation seems to be correct!

@rotateright
Copy link
Contributor

Sure - cttz/ctlz can be improved too. Can you open another bug, so we can track that?

@davidbolvansky
Copy link
Collaborator

Ok.

llvm/llvm-bugzilla-archive#50172 , llvm/llvm-bugzilla-archive#50173

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants