LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 50123 - [X86] Failure to recognise generic shift from SSE PSLLQ intrinsic
Summary: [X86] Failure to recognise generic shift from SSE PSLLQ intrinsic
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Scalar Optimizations (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-26 05:09 PDT by Simon Pilgrim
Modified: 2021-06-24 07:46 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s): 656001e7b2b939d9bc


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Pilgrim 2021-04-26 05:09:57 PDT
We often fail to recognise that the _mm_sll_epi64 shift amount is in bounds, allowing us to fold it to a generic shift. This happens for all SSE 'shift by uniform variable' intrinsics.

https://simd.godbolt.org/z/vKT9YE5zM

#include <x86intrin.h>

__m128i shl_v2i64_mod31(__m128i val, __m128i amt) {
  amt = _mm_and_si128( amt, _mm_set1_epi32( 31 ) );
  return _mm_sll_epi64( val, _mm_unpacklo_epi32( amt, _mm_setzero_si128() ) );
}

__m128i shl_v2i64_mod31_alt(__m128i val, __m128i amt) {
  amt = _mm_and_si128( amt, _mm_setr_epi32( 31, 0, 0, 0 ) );
  return _mm_sll_epi64( val, amt );
}

define <2 x i64> @shl_v2i64_mod31(<2 x i64> %0, <2 x i64> %1){
  %3 = bitcast <2 x i64> %1 to <4 x i32>
  %4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
  %5 = insertelement <4 x i32> %4, i32 0, i32 1
  %6 = bitcast <4 x i32> %5 to <2 x i64>
  %7 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %0, <2 x i64> %6)
  ret <2 x i64> %7
}
declare <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>)

define <2 x i64> @shl_v2i64_mod31_alt(<2 x i64> %0, <2 x i64> %1) {
  %3 = and <2 x i64> %1, <i64 31, i64 poison>
  %4 = shufflevector <2 x i64> %3, <2 x i64> poison, <2 x i32> zeroinitializer
  %5 = shl <2 x i64> %0, %4
  ret <2 x i64> %5
}

I think we're just missing some bitcast/insertelement vector handling in ValueTracking.
Comment 1 Sanjay Patel 2021-06-14 17:02:56 PDT
I don't see a way to do this in IR. We can't fold the 'insertelement' into the 'and' and maintain the original poison propagation:
https://alive2.llvm.org/ce/z/RpiWpH

It seems ok to do that in codegen though. We do have poison down there, but there doesn't seem to be motivation to try to make SDAG conformant.
Comment 2 Simon Pilgrim 2021-06-15 00:22:21 PDT
Indeed, but we should be able to use ValueTracking to recognise that the shift amount is in range (upper 32 bits is zero from the insertelement, lower 32 bits is masked to 31). The existing instcombine x86 calls code would then do this:

define <2 x i64> @shl_v2i64_mod31(<2 x i64> %0, <2 x i64> %1){
  %3 = bitcast <2 x i64> %1 to <4 x i32>
  %4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
  %5 = insertelement <4 x i32> %4, i32 0, i32 1
  %6 = bitcast <4 x i32> %5 to <2 x i64>
  %7 = tail call <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64> %0, <2 x i64> %6)
  ret <2 x i64> %7
}
declare <2 x i64> @llvm.x86.sse2.psll.q(<2 x i64>, <2 x i64>)

-->

define <2 x i64> @shl_v2i64_mod31_opt(<2 x i64> %0, <2 x i64> %1){
  %3 = bitcast <2 x i64> %1 to <4 x i32>
  %4 = and <4 x i32> %3, <i32 31, i32 poison, i32 poison, i32 poison>
  %5 = insertelement <4 x i32> %4, i32 0, i32 1
  %6 = bitcast <4 x i32> %5 to <2 x i64>
  %7 = shufflevector <2 x i64> %6, <2 x i64> poison, <2 x i32> zeroinitializer
  %8 = shl <2 x i64> %0, %7
  ret <2 x i64> %8
}
Comment 3 Sanjay Patel 2021-06-15 14:46:43 PDT
(In reply to Simon Pilgrim from comment #2)
> Indeed, but we should be able to use ValueTracking to recognise that the
> shift amount is in range (upper 32 bits is zero from the insertelement,
> lower 32 bits is masked to 31). The existing instcombine x86 calls code
> would then do this:

I was working through how to do this and wrote a few drafts of hacks...but I forgot to check the SDAG version.

And now I see you already did it! :)

https://reviews.llvm.org/D27129

So we just want to adapt that to IR, right?
Comment 4 Simon Pilgrim 2021-06-15 23:54:20 PDT
Yes I think bitcast handling (possibly with some additional demandedelts mask support) should get us there.
Comment 5 Sanjay Patel 2021-06-17 10:04:04 PDT
https://reviews.llvm.org/D104472
Comment 6 Sanjay Patel 2021-06-23 11:07:26 PDT
https://reviews.llvm.org/rG656001e7b2b939d9bc
Comment 7 Simon Pilgrim 2021-06-24 07:46:23 PDT
Thanks!