Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Failure to recognise generic shift from SSE PSLLQ intrinsic #49467

Closed
RKSimon opened this issue Apr 26, 2021 · 7 comments
Closed

[X86] Failure to recognise generic shift from SSE PSLLQ intrinsic #49467

RKSimon opened this issue Apr 26, 2021 · 7 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 26, 2021

Bugzilla Link 50123
Resolution FIXED
Resolved on Jun 24, 2021 07:46
Version trunk
OS Windows NT
CC @davidbolvansky,@LebedevRI,@nikic,@rotateright
Fixed by commit(s) 656001e

Extended Description

We often fail to recognise that the _mm_sll_epi64 shift amount is in bounds, allowing us to fold it to a generic shift. This happens for all SSE 'shift by uniform variable' intrinsics.

https://simd.godbolt.org/z/vKT9YE5zM

#include <x86intrin.h>

__m128i shl_v2i64_mod31(__m128i val, __m128i amt) {
amt = _mm_and_si128( amt, _mm_set1_epi32( 31 ) );
return _mm_sll_epi64( val, _mm_unpacklo_epi32( amt, _mm_setzero_si128() ) );
}

__m128i shl_v2i64_mod31_alt(__m128i val, __m128i amt) {
amt = _mm_and_si128( amt, _mm_setr_epi32( 31, 0, 0, 0 ) );
return _mm_sll_epi64( val, amt );
}

define <2 x i64> @​shl_v2i64_mod31(<2 x i64> %0, <2 x i64> %1){
%3 = bitcast <2 x i64> %1 to <4 x i32>
%4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
%5 = insertelement <4 x i32> %4, i32 0, i32 1
%6 = bitcast <4 x i32> %5 to <2 x i64>
%7 = tail call <2 x i64> @​llvm.x86.sse2.psll.q(<2 x i64> %0, <2 x i64> %6)
ret <2 x i64> %7
}
declare <2 x i64> @​llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>)

define <2 x i64> @​shl_v2i64_mod31_alt(<2 x i64> %0, <2 x i64> %1) {
%3 = and <2 x i64> %1, <i64 31, i64 poison>
%4 = shufflevector <2 x i64> %3, <2 x i64> poison, <2 x i32> zeroinitializer
%5 = shl <2 x i64> %0, %4
ret <2 x i64> %5
}

I think we're just missing some bitcast/insertelement vector handling in ValueTracking.

@rotateright
Copy link
Contributor

I don't see a way to do this in IR. We can't fold the 'insertelement' into the 'and' and maintain the original poison propagation:
https://alive2.llvm.org/ce/z/RpiWpH

It seems ok to do that in codegen though. We do have poison down there, but there doesn't seem to be motivation to try to make SDAG conformant.

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 15, 2021

Indeed, but we should be able to use ValueTracking to recognise that the shift amount is in range (upper 32 bits is zero from the insertelement, lower 32 bits is masked to 31). The existing instcombine x86 calls code would then do this:

define <2 x i64> @​shl_v2i64_mod31(<2 x i64> %0, <2 x i64> %1){
%3 = bitcast <2 x i64> %1 to <4 x i32>
%4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
%5 = insertelement <4 x i32> %4, i32 0, i32 1
%6 = bitcast <4 x i32> %5 to <2 x i64>
%7 = tail call <2 x i64> @​llvm.x86.sse2.psll.q(<2 x i64> %0, <2 x i64> %6)
ret <2 x i64> %7
}
declare <2 x i64> @​llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>)

-->

define <2 x i64> @​shl_v2i64_mod31_opt(<2 x i64> %0, <2 x i64> %1){
%3 = bitcast <2 x i64> %1 to <4 x i32>
%4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
%5 = insertelement <4 x i32> %4, i32 0, i32 1
%6 = bitcast <4 x i32> %5 to <2 x i64>
%7 = shufflevector <2 x i64> %6, <2 x i64> poison, <2 x i32> zeroinitializer
%8 = shl <2 x i64> %0, %7
ret <2 x i64> %8
}

@rotateright
Copy link
Contributor

Indeed, but we should be able to use ValueTracking to recognise that the
shift amount is in range (upper 32 bits is zero from the insertelement,
lower 32 bits is masked to 31). The existing instcombine x86 calls code
would then do this:

I was working through how to do this and wrote a few drafts of hacks...but I forgot to check the SDAG version.

And now I see you already did it! :)

https://reviews.llvm.org/D27129

So we just want to adapt that to IR, right?

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 16, 2021

Yes I think bitcast handling (possibly with some additional demandedelts mask support) should get us there.

@rotateright
Copy link
Contributor

@rotateright
Copy link
Contributor

@RKSimon
Copy link
Collaborator Author

RKSimon commented Jun 24, 2021

Thanks!

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants