Failure to avoid using SIMD excessively on small vectors #51399

GabrielRavier · 2021-10-04T03:09:21Z


Bugzilla Link	52057
Resolution	FIXED
Resolved on	Oct 20, 2021 12:44
Version	trunk
OS	Linux
CC	@RKSimon,@rotateright

Extended Description

typedef char V attribute((vector_size(2)));

V foo(V v)
{
v[(V){}[0]] <<= 1;
return v;
}

Code such as this seems to be very badly optimized for all targets that have SIMD.

With -O3, Clang outputs this:

.LCPI0_0:
.byte 0 # 0x0
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
foo(char __vector(2)): # @foo(char __vector(2))
movd xmm0, edi
movdqa xmmword ptr [rsp - 24], xmm0
mov al, byte ptr [rsp - 24]
add al, al
movzx eax, al
movd xmm1, eax
pxor xmm2, xmm2
punpcklbw xmm1, xmm2 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
pand xmm0, xmmword ptr [rip + .LCPI0_0]
por xmm0, xmm1
movd eax, xmm0
ret

GCC outputs this:

foo(char __vector(2)):
movsx edx, dil
mov eax, edi
add edx, edx
mov al, dl
ret

GabrielRavier · 2021-10-04T03:09:21Z

assigned to @rotateright

RKSimon · 2021-10-04T08:34:53Z

Bitcasting between illegal vectors and scalars...

define i16 @foo(i16 %v.coerce) {
entry:
%0 = bitcast i16 %v.coerce to <2 x i8>
%vecext2 = extractelement <2 x i8> %0, i8 0
%shl = shl i8 %vecext2, 1
%vecins = insertelement <2 x i8> %0, i8 %shl, i8 0
%1 = bitcast <2 x i8> %vecins to i16
ret i16 %1
}

@spatel Could/should vectorcombine catch this?

rotateright · 2021-10-04T12:04:54Z

At first look, this seems like we missed some basic patterns in instcombine:
https://alive2.llvm.org/ce/z/abytmj

We don't do that, but we get more complicated cases like:

define i32 @bitcasted_inselt_wide_source_zero_elt(i64 %x) {
; LE-LABEL: @bitcasted_inselt_wide_source_zero_elt(
; LE-NEXT: [[R:%.]] = trunc i64 [[X:%.]] to i32
; LE-NEXT: ret i32 [[R]]
;
; BE-LABEL: @bitcasted_inselt_wide_source_zero_elt(
; BE-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 32
; BE-NEXT: [[R:%.*]] = trunc i64 [[TMP1]] to i32
; BE-NEXT: ret i32 [[R]]
;
%i = insertelement <2 x i64> zeroinitializer, i64 %x, i32 0
%b = bitcast <2 x i64> %i to <4 x i32>
%r = extractelement <4 x i32> %b, i32 0
ret i32 %r
}

rotateright · 2021-10-06T13:46:28Z

This doesn't help the example in the description (because of multi-use), but we should be able to build up from it:
https://reviews.llvm.org/rGdb231ebdb07f

rotateright · 2021-10-07T19:22:30Z

https://reviews.llvm.org/rGd95ebef4b8ec

...makes it less bad, but we still need something like this:
https://alive2.llvm.org/ce/z/9RmeS6

rotateright · 2021-10-20T18:22:26Z

https://reviews.llvm.org/rGd95ebef4b8ec

...makes it less bad, but we still need something like this:
https://alive2.llvm.org/ce/z/9RmeS6

Here's the endian-aware version of that proof:
https://alive2.llvm.org/ce/z/Ux-662

rotateright · 2021-10-20T19:44:03Z

This example should be fixed after:
https://reviews.llvm.org/rG80ab06c599a0

But as noted in the commit message, the transform isn't completely general. So if you have other code like this that still has problems, please do file another bug.

There's also a difference in the x86 codegen vs. gcc, so if the motivating program for this example is still slower, that might be another bug.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to avoid using SIMD excessively on small vectors #51399

Failure to avoid using SIMD excessively on small vectors #51399

GabrielRavier commented Oct 4, 2021

GabrielRavier commented Oct 4, 2021

RKSimon commented Oct 4, 2021

rotateright commented Oct 4, 2021

rotateright commented Oct 6, 2021

rotateright commented Oct 7, 2021

rotateright commented Oct 20, 2021

rotateright commented Oct 20, 2021

Failure to avoid using SIMD excessively on small vectors #51399

Failure to avoid using SIMD excessively on small vectors #51399

Comments

GabrielRavier commented Oct 4, 2021

Extended Description

GabrielRavier commented Oct 4, 2021

RKSimon commented Oct 4, 2021

rotateright commented Oct 4, 2021

rotateright commented Oct 6, 2021

rotateright commented Oct 7, 2021

rotateright commented Oct 20, 2021

rotateright commented Oct 20, 2021