New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X86 SSE4.1 instruction problem #50713
Comments
assigned to @topperc |
There's something very strange going on in the IR out of the middle end %37 = shl <2 x i64> %24, <i64 32, i64 32>, !dbg !327 I assume InstCombine has gotten clever, but I'm not sure why yet. |
Disregard that last comment. That's our IR for pmuldq that we should pattern match. Will keep digging. |
I think because warping is loop invariant, it got pulled out of the loop along with the _mm_set1 and the 2xi64 shl+ashr we insert to sign extend the mul input for pmuldq pattern match. Because we don't have a vsraq instruction we emit this
And we aren't able to create an AssertSExt from that to tell the block inside the loop that the input is already sign extended to allow pmuldq to be match. And the sign extend outside the loop would have been unnecessary if we formed pmuldq. |
Maybe we need an X86CodeGenPrepare to bring them back into the same basic block? I assume the same thing can happen to pmuludq. |
Thanks for your analysis, and I also want to know is there any solution from code level to avoid this problem? |
To fix this in the compiler, I think we just need to add this to X86TargetLowering::shouldSinkOperands I'm not sure how to work around it in code. |
Candidate patch https://reviews.llvm.org/D107689 |
Anything left to do here? Not sure if this qualifies for backport to the release branch. |
I don't think so - its a rather old perf regression. Tentatively resolving this. |
Thank you for resolving this issue, so what's the release version of llvm which can avoid this issue? |
It will be in 14.0 (about 6 months from now). |
Extended Description
Hi,
I was build SSE performance work on mac intel. But I found the performance of my SSE4.1 version code using in xcode 12.4 is not as good as xcode 10.1, so I checked the assembly of my code. The one _mm_mul_epi() instruction was translated into three pmuludq, which is the SSE2 instruction, while this was normal when compiling on xcode 10.1 and _mm_mul_epi() was translated into pmuldq. So I checked the clang version, and I found this error occured when clang version larger than 7.0.0.
The simple case can be found: https://godbolt.org/z/Tf7qeocvz
I probably think this is a clang compiler bug. And I hope to get some advice on how to solve this issue.
Thanks.
Best,
Wade
The text was updated successfully, but these errors were encountered: