New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to avoid using SIMD excessively on small vectors #51399
Comments
assigned to @rotateright |
Bitcasting between illegal vectors and scalars... define i16 @foo(i16 %v.coerce) { @spatel Could/should vectorcombine catch this? |
At first look, this seems like we missed some basic patterns in instcombine: We don't do that, but we get more complicated cases like: define i32 @bitcasted_inselt_wide_source_zero_elt(i64 %x) { |
This doesn't help the example in the description (because of multi-use), but we should be able to build up from it: |
https://reviews.llvm.org/rGd95ebef4b8ec ...makes it less bad, but we still need something like this: |
Here's the endian-aware version of that proof: |
This example should be fixed after: But as noted in the commit message, the transform isn't completely general. So if you have other code like this that still has problems, please do file another bug. There's also a difference in the x86 codegen vs. gcc, so if the motivating program for this example is still slower, that might be another bug. |
Extended Description
typedef char V attribute((vector_size(2)));
V foo(V v)
{
v[(V){}[0]] <<= 1;
return v;
}
Code such as this seems to be very badly optimized for all targets that have SIMD.
With -O3, Clang outputs this:
.LCPI0_0:
.byte 0 # 0x0
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
.byte 255 # 0xff
foo(char __vector(2)): # @foo(char __vector(2))
movd xmm0, edi
movdqa xmmword ptr [rsp - 24], xmm0
mov al, byte ptr [rsp - 24]
add al, al
movzx eax, al
movd xmm1, eax
pxor xmm2, xmm2
punpcklbw xmm1, xmm2 # xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
pand xmm0, xmmword ptr [rip + .LCPI0_0]
por xmm0, xmm1
movd eax, xmm0
ret
GCC outputs this:
foo(char __vector(2)):
movsx edx, dil
mov eax, edi
add edx, edx
mov al, dl
ret
The text was updated successfully, but these errors were encountered: