New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to optimize combined popcount pattern #48343
Comments
We should be able to fold: add(ctpop(A),ctpop(B)) --> ctpop(add(A,B)) --> ctpop(or(A,B)) if A and B have no bits set in common. |
Does the motivating example really look like the code in the description? We don't need popcount if we're masking a single bit of a value - just shift that bit to the right: define i32 @src(i32 %x, i32 %y) { define i32 @tgt(i32 %x, i32 %y) { We are missing that fold currently. |
Yes the single bit popcnt cases would make better sense as a shift+mask. That just leaves the more general popcnt case:
https://godbolt.org/z/sqvY36cbv which gcc also optimizes |
Get that one out of the way at least: ...although that means we are potentially missing a codegen optimization to form popcount out of bitwise-logic (if popcount is cheap). |
GCC: popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero. |
Candidate patch: |
Thanks to Joerg Sonnenberger, we could generalize this even more.
My simple testcases:
So I started with:
On the other side, haveNoCommonBitsSet(A, B) for ok_test2 says true. So I would expect "known some" for "ok_test2" too. I have never worked with knownbits so any advice is useful. Thanks |
I think what you're saying is: ctpop(A) + ctpop(B) => ctpop(X|Y) + ctpop(X&Y) https://alive2.llvm.org/ce/z/Hzw8-v define i4 @src(i4 %0, i4 %1) { Not sure how we can make good use of this general use case though tbh - anyone got any suggestions? |
In general - we can't, we should have a reverse fold of that.
|
Tentatively closing this @GabrielRavier @davidbolvansky |
I have not personally encountered this but I've finally found the patch that made GCC optimize it, and it appears to cite a real world use:
I've also found slides on a similar subject over here which suggest that this can also be used to speed up chemical similarity calculations when they are calculated as Tanimoto coefficients. (PS: I do not seem to be able to reopen this issue myself) |
Actually nvm I just realised this was closed w.r.t. the more generic optimisation, whereas the examples I found are for the original testcase which was fixed - maybe I shouldn't look at old bug reports at 12 AM |
Extended Description
This can be optimized to
return __builtin_popcount((a & 8) | (b & 2));
. This transformation is done by GCC, but not by LLVM.Godbolt comparison: https://godbolt.org/z/zjP3Ys
(If someone can tell me how to use ctpop in alive2 that'd be nice btw, I couldn't get it to work personally)
The text was updated successfully, but these errors were encountered: