You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
define <16 x i8> @cmp_i8_sel_v16i8(<16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>) {
%5 = icmp eq <16 x i8> %0, %1
%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %2
ret <16 x i8> %6
}
define <2 x i64> @cmp_i8_sel_m128i(<2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>) {
%5 = bitcast <2 x i64> %0 to <16 x i8>
%6 = bitcast <2 x i64> %1 to <16 x i8>
%7 = icmp eq <16 x i8> %5, %6
%8 = sext <16 x i1> %7 to <16 x i8>
%9 = bitcast <16 x i8> %8 to <2 x i64>
%10 = xor <2 x i64> %9, <i64 -1, i64 -1>
%11 = and <2 x i64> %10, %2
%12 = and <2 x i64> %9, %3
%13 = or <2 x i64> %11, %12
ret <2 x i64> %13
}
Mind you, on X86 at least it doesn't affect final codegen:
I think there's a missing or inverted canonicalization from ashr to select-of-constants for the 'abs' example, but it seems that we can see through that already with an improvement in peeking through bitcasts: https://reviews.llvm.org/D113035
There's potentially still some remaining IR canonicalization and backend work needed to make sure these kinds of patterns are always handled, but the examples here are fixed after: https://reviews.llvm.org/rGc36b7e21bd8f
That should make the fix less fragile, but the multi-use 'TODO' item is still a concern. Ie, we could fail to match because the source is complicated by extra uses and the existing use checks are too strict.
Extended Description
Two versions of the same 'select based on comparison' pattern, one using generic vector types and the other using SSE intrinsics.
The presence of the bitcasts between SSE types prevents the recognition of the select.
#include <x86intrin.h>
__v16qi cmp_i8_sel_v16i8(__v16qi a, __v16qi b, __v16qi c, __v16qi d) {
__v16qi cc = _mm_cmpeq_epi8(a, b);
return (c & ~cc) | (d & cc);
}
__m128i cmp_i8_sel_m128i(__m128i a, __m128i b, __m128i c, __m128i d) {
__m128i cc = _mm_cmpeq_epi8(a, b);
return _mm_or_si128(_mm_andnot_si128(cc, c), _mm_and_si128(cc, d));
}
define <16 x i8> @cmp_i8_sel_v16i8(<16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>) {
%5 = icmp eq <16 x i8> %0, %1
%6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %2
ret <16 x i8> %6
}
define <2 x i64> @cmp_i8_sel_m128i(<2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>) {
%5 = bitcast <2 x i64> %0 to <16 x i8>
%6 = bitcast <2 x i64> %1 to <16 x i8>
%7 = icmp eq <16 x i8> %5, %6
%8 = sext <16 x i1> %7 to <16 x i8>
%9 = bitcast <16 x i8> %8 to <2 x i64>
%10 = xor <2 x i64> %9, <i64 -1, i64 -1>
%11 = and <2 x i64> %10, %2
%12 = and <2 x i64> %9, %3
%13 = or <2 x i64> %11, %12
ret <2 x i64> %13
}
Mind you, on X86 at least it doesn't affect final codegen:
cmp_i8_sel_v16i8:
vpcmpeqb %xmm1, %xmm0, %xmm0
vpblendvb %xmm0, %xmm3, %xmm2, %xmm0
retq
cmp_i8_sel_m128i:
vpcmpeqb %xmm1, %xmm0, %xmm0
vpblendvb %xmm0, %xmm3, %xmm2, %xmm0
retq
The text was updated successfully, but these errors were encountered: