Two versions of the same 'select based on comparison' pattern, one using generic vector types and the other using SSE intrinsics. The presence of the bitcasts between SSE types prevents the recognition of the select. #include <x86intrin.h> __v16qi cmp_i8_sel_v16i8(__v16qi a, __v16qi b, __v16qi c, __v16qi d) { __v16qi cc = _mm_cmpeq_epi8(a, b); return (c & ~cc) | (d & cc); } __m128i cmp_i8_sel_m128i(__m128i a, __m128i b, __m128i c, __m128i d) { __m128i cc = _mm_cmpeq_epi8(a, b); return _mm_or_si128(_mm_andnot_si128(cc, c), _mm_and_si128(cc, d)); } define <16 x i8> @cmp_i8_sel_v16i8(<16 x i8>, <16 x i8>, <16 x i8>, <16 x i8>) { %5 = icmp eq <16 x i8> %0, %1 %6 = select <16 x i1> %5, <16 x i8> %3, <16 x i8> %2 ret <16 x i8> %6 } define <2 x i64> @cmp_i8_sel_m128i(<2 x i64>, <2 x i64>, <2 x i64>, <2 x i64>) { %5 = bitcast <2 x i64> %0 to <16 x i8> %6 = bitcast <2 x i64> %1 to <16 x i8> %7 = icmp eq <16 x i8> %5, %6 %8 = sext <16 x i1> %7 to <16 x i8> %9 = bitcast <16 x i8> %8 to <2 x i64> %10 = xor <2 x i64> %9, <i64 -1, i64 -1> %11 = and <2 x i64> %10, %2 %12 = and <2 x i64> %9, %3 %13 = or <2 x i64> %11, %12 ret <2 x i64> %13 } Mind you, on X86 at least it doesn't affect final codegen: cmp_i8_sel_v16i8: vpcmpeqb %xmm1, %xmm0, %xmm0 vpblendvb %xmm0, %xmm3, %xmm2, %xmm0 retq cmp_i8_sel_m128i: vpcmpeqb %xmm1, %xmm0, %xmm0 vpblendvb %xmm0, %xmm3, %xmm2, %xmm0 retq
This is preventing abs/min/max style patterns from being detected: https://godbolt.org/z/nbxeP7zhb
I think there's a missing or inverted canonicalization from ashr to select-of-constants for the 'abs' example, but it seems that we can see through that already with an improvement in peeking through bitcasts: https://reviews.llvm.org/D113035
There's potentially still some remaining IR canonicalization and backend work needed to make sure these kinds of patterns are always handled, but the examples here are fixed after: https://reviews.llvm.org/rGc36b7e21bd8f
Thanks Sanjay!
Follow-up patch to match even more bitwise select patterns: https://reviews.llvm.org/rG430ad9697d14 That should make the fix less fragile, but the multi-use 'TODO' item is still a concern. Ie, we could fail to match because the source is complicated by extra uses and the existing use checks are too strict. There's also a missing canonicalization: https://alive2.llvm.org/ce/z/iNkTZ5 Need to check which direction on that leads to less breakage...