Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to avoid using SIMD excessively on small vectors #51399

Closed
GabrielRavier opened this issue Oct 4, 2021 · 7 comments
Closed

Failure to avoid using SIMD excessively on small vectors #51399

GabrielRavier opened this issue Oct 4, 2021 · 7 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@GabrielRavier
Copy link
Contributor

Bugzilla Link 52057
Resolution FIXED
Resolved on Oct 20, 2021 12:44
Version trunk
OS Linux
CC @RKSimon,@rotateright

Extended Description

typedef char V attribute((vector_size(2)));

V foo(V v)
{
v[(V){}[0]] <<= 1;
return v;
}

Code such as this seems to be very badly optimized for all targets that have SIMD.

With -O3, Clang outputs this:

.LCPI0_0:
.byte 0 # 0x0
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
foo(char __vector(2)): # @​foo(char __vector(2))
movd xmm0, edi
movdqa xmmword ptr [rsp - 24], xmm0
mov al, byte ptr [rsp - 24]
add al, al
movzx eax, al
movd xmm1, eax
pxor xmm2, xmm2
punpcklbw xmm1, xmm2 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
pand xmm0, xmmword ptr [rip + .LCPI0_0]
por xmm0, xmm1
movd eax, xmm0
ret

GCC outputs this:

foo(char __vector(2)):
movsx edx, dil
mov eax, edi
add edx, edx
mov al, dl
ret

@GabrielRavier
Copy link
Contributor Author

assigned to @rotateright

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 4, 2021

Bitcasting between illegal vectors and scalars...

define i16 @​foo(i16 %v.coerce) {
entry:
%0 = bitcast i16 %v.coerce to <2 x i8>
%vecext2 = extractelement <2 x i8> %0, i8 0
%shl = shl i8 %vecext2, 1
%vecins = insertelement <2 x i8> %0, i8 %shl, i8 0
%1 = bitcast <2 x i8> %vecins to i16
ret i16 %1
}

@​spatel Could/should vectorcombine catch this?

@rotateright
Copy link
Contributor

At first look, this seems like we missed some basic patterns in instcombine:
https://alive2.llvm.org/ce/z/abytmj

We don't do that, but we get more complicated cases like:

define i32 @​bitcasted_inselt_wide_source_zero_elt(i64 %x) {
; LE-LABEL: @​bitcasted_inselt_wide_source_zero_elt(
; LE-NEXT: [[R:%.]] = trunc i64 [[X:%.]] to i32
; LE-NEXT: ret i32 [[R]]
;
; BE-LABEL: @​bitcasted_inselt_wide_source_zero_elt(
; BE-NEXT: [[TMP1:%.]] = lshr i64 [[X:%.]], 32
; BE-NEXT: [[R:%.*]] = trunc i64 [[TMP1]] to i32
; BE-NEXT: ret i32 [[R]]
;
%i = insertelement <2 x i64> zeroinitializer, i64 %x, i32 0
%b = bitcast <2 x i64> %i to <4 x i32>
%r = extractelement <4 x i32> %b, i32 0
ret i32 %r
}

@rotateright
Copy link
Contributor

This doesn't help the example in the description (because of multi-use), but we should be able to build up from it:
https://reviews.llvm.org/rGdb231ebdb07f

@rotateright
Copy link
Contributor

https://reviews.llvm.org/rGd95ebef4b8ec

...makes it less bad, but we still need something like this:
https://alive2.llvm.org/ce/z/9RmeS6

@rotateright
Copy link
Contributor

https://reviews.llvm.org/rGd95ebef4b8ec

...makes it less bad, but we still need something like this:
https://alive2.llvm.org/ce/z/9RmeS6

Here's the endian-aware version of that proof:
https://alive2.llvm.org/ce/z/Ux-662

@rotateright
Copy link
Contributor

This example should be fixed after:
https://reviews.llvm.org/rG80ab06c599a0

But as noted in the commit message, the transform isn't completely general. So if you have other code like this that still has problems, please do file another bug.

There's also a difference in the x86 codegen vs. gcc, so if the motivating program for this example is still slower, that might be another bug.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants