int f(unsigned int a, unsigned int b) { return __builtin_popcount(a & 8) + __builtin_popcount(b & 2); } This can be optimized to `return __builtin_popcount((a & 8) | (b & 2));`. This transformation is done by GCC, but not by LLVM. Godbolt comparison: https://godbolt.org/z/zjP3Ys (If someone can tell me how to use ctpop in alive2 that'd be nice btw, I couldn't get it to work personally)
We should be able to fold: add(ctpop(A),ctpop(B)) --> ctpop(add(A,B)) --> ctpop(or(A,B)) if A and B have no bits set in common.
Does the motivating example really look like the code in the description? We don't need popcount if we're masking a single bit of a value - just shift that bit to the right: https://alive2.llvm.org/ce/z/ASdf47 define i32 @src(i32 %x, i32 %y) { %ax = and i32 %x, 8 %p = call i32 @llvm.ctpop.i32(i32 %ax) ret i32 %p } define i32 @tgt(i32 %x, i32 %y) { %ax = and i32 %x, 8 %p = lshr i32 %ax, 3 ret i32 %p } We are missing that fold currently.
Yes the single bit popcnt cases would make better sense as a shift+mask. That just leaves the more general popcnt case: int f2(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16); } https://godbolt.org/z/sqvY36cbv which gcc also optimizes
(In reply to Simon Pilgrim from comment #3) > Yes the single bit popcnt cases would make better sense as a shift+mask. Get that one out of the way at least: https://reviews.llvm.org/rGc0645f13243c ...although that means we are potentially missing a codegen optimization to form popcount out of bitwise-logic (if popcount is cheap).
GCC: popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.
Candidate patch: https://reviews.llvm.org/D101210
Thanks to Joerg Sonnenberger, we could generalize this even more. ctpop(A) + ctpop(B) => ctpop(X|Y) + numberofcommonbits(X, Y) My simple testcases: int ok_test(unsigned int a) // thanks Joerg { unsigned b = (a & 4) | 3; unsigned c = (a & 8) | 3; return __builtin_popcount(b) + __builtin_popcount(c); } int ok_test2(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 16); } int neg_test(unsigned int a, unsigned int b) { return __builtin_popcount(a << 16) + __builtin_popcount(b >> 4); } So I started with: if (match(LHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(A)))) && match(RHS, m_OneUse(m_Intrinsic<Intrinsic::ctpop>(m_Value(B))))) { KnownBits AKnown = computeKnownBits(A, 0, &I); KnownBits BKnown = computeKnownBits(B, 0, &I); KnownBits commonBits = KnownBits::commonBits(AKnown, BKnown); llvm::errs() << (commonBits.isUnknown() ? " unknown " : "known some") << " Min/Max: " << commonBits.countMinPopulation() << " / " << commonBits.countMaxPopulation() << "\n"; } ok_test - known some Min/Max: 2 / 4 ok_test2 - unknown Min/Max: 0 / 32 neg_test - unknown Min/Max: 0 / 32 On the other side, haveNoCommonBitsSet(A, B) for ok_test2 says true. So I would expect "known some" for "ok_test2" too. I have never worked with knownbits so any advice is useful. Thanks
I think what you're saying is: ctpop(A) + ctpop(B) => ctpop(X|Y) + ctpop(X&Y) https://alive2.llvm.org/ce/z/Hzw8-v ---------------------------------------- define i4 @src(i4 %0, i4 %1) { %2: %3 = ctpop i4 %0 %4 = ctpop i4 %1 %5 = add i4 %4, %3 ret i4 %5 } => define i4 @tgt(i4 %0, i4 %1) { %2: %3 = or i4 %0, %1 %4 = ctpop i4 %3 %5 = and i4 %0, %1 %6 = ctpop i4 %5 %7 = add i4 %4, %6 ret i4 %7 } Transformation seems to be correct! Not sure how we can make good use of this general use case though tbh - anyone got any suggestions?
(In reply to Simon Pilgrim from comment #8) > I think what you're saying is: > > ctpop(A) + ctpop(B) => ctpop(X|Y) + ctpop(X&Y) > > https://alive2.llvm.org/ce/z/Hzw8-v > > ---------------------------------------- > define i4 @src(i4 %0, i4 %1) { > %2: > %3 = ctpop i4 %0 > %4 = ctpop i4 %1 > %5 = add i4 %4, %3 > ret i4 %5 > } > => > define i4 @tgt(i4 %0, i4 %1) { > %2: > %3 = or i4 %0, %1 > %4 = ctpop i4 %3 > %5 = and i4 %0, %1 > %6 = ctpop i4 %5 > %7 = add i4 %4, %6 > ret i4 %7 > } > Transformation seems to be correct! > > Not sure how we can make good use of this general use case though tbh - > anyone got any suggestions? In general - we can't, we should have a reverse fold of that. The case where this could be good, is when ctpop(X&Y) happens to be a constant. I.e. we'd call KnownBits on X and Y, and their bit pairs sort into 4 categories: * known to be set in both * known to be unset in both * known to be set in one and unset in another * unknown in both/either. Iff we have the last case, we can't do it. Not really sure how worth this pattern is in reality.