You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
define <4 x i32> @extract_add_splat(<4 x i32> %x) {
%e1 = extractelement <4 x i32> %x, i32 1
%e3 = extractelement <4 x i32> %x, i32 3
%a = add nsw i32 %e1, %e3
%i = insertelement <4 x i32> undef, i32 %a, i32 0
%r = shufflevector <4 x i32> %i, <4 x i32> undef, <4 x i32> zeroinitializer
ret <4 x i32> %r
}
Are we committed to not vectorizing in SDAG? Changing that 'add' to <4 x i32> is an obvious win for any vector target that I can imagine.
We're committed to not vectorizing in InstCombine, so that's out.
I tried to figure out how to cram this into SLP, but I don't see how to do it without adding a big chunk of logic outside of everything that already exists. SLP just doesn't seem amenable to this kind of peephole opt.
Extended Description
https://godbolt.org/z/LvQiCP
#include <x86intrin.h>
auto add(__v4si x) {
return _mm_set1_epi32(x[1] + x[3]);
}
_Z3addDv4_i:
vextractps $1, %xmm0, %eax
vextractps $3, %xmm0, %ecx
addl %eax, %ecx
vmovd %ecx, %xmm0
vpshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
retq
A more optimal method would be something like:
add(int __vector(4)):
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpaddd xmm0, xmm1, xmm0
vpshufd xmm0, xmm0, 85 # xmm0 = xmm0[1,1,1,1]
ret
The text was updated successfully, but these errors were encountered: