[X86] Avoid scalar/vector transfers for scalar arithmetic #41978

RKSimon · 2019-07-16T12:22:32Z


Bugzilla Link	42633
Resolution	FIXED
Resolved on	Mar 09, 2020 08:58
Version	trunk
OS	Windows NT
CC	@topperc,@RKSimon,@rotateright
Fixed by commit(s)	`a69158c`

Extended Description

https://godbolt.org/z/LvQiCP

#include <x86intrin.h>

auto add(__v4si x) {
return _mm_set1_epi32(x[1] + x[3]);
}

_Z3addDv4_i:
vextractps $1, %xmm0, %eax
vextractps $3, %xmm0, %ecx
addl %eax, %ecx
vmovd %ecx, %xmm0
vpshufd $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
retq

A more optimal method would be something like:

add(int __vector(4)):
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpaddd xmm0, xmm1, xmm0
vpshufd xmm0, xmm0, 85 # xmm0 = xmm0[1,1,1,1]
ret

The text was updated successfully, but these errors were encountered:

rotateright · 2019-09-18T11:32:02Z

define <4 x i32> @extract_add_splat(<4 x i32> %x) {
%e1 = extractelement <4 x i32> %x, i32 1
%e3 = extractelement <4 x i32> %x, i32 3
%a = add nsw i32 %e1, %e3
%i = insertelement <4 x i32> undef, i32 %a, i32 0
%r = shufflevector <4 x i32> %i, <4 x i32> undef, <4 x i32> zeroinitializer
ret <4 x i32> %r
}

Are we committed to not vectorizing in SDAG? Changing that 'add' to <4 x i32> is an obvious win for any vector target that I can imagine.

We're committed to not vectorizing in InstCombine, so that's out.

I tried to figure out how to cram this into SLP, but I don't see how to do it without adding a big chunk of logic outside of everything that already exists. SLP just doesn't seem amenable to this kind of peephole opt.

Various attempts so far:
https://reviews.llvm.org/D59710
https://reviews.llvm.org/D64142
https://reviews.llvm.org/D66416

Should we add a "VectorCombine" IR pass? I'm imagining InstCombine-like iteration, but only on vector ops and with access to the cost model.

rotateright · 2020-03-09T15:58:59Z

We have a vector combine pass now, so we get a vector add:
https://reviews.llvm.org/D75689
https://reviews.llvm.org/rGa69158c12acd

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86] Avoid scalar/vector transfers for scalar arithmetic #41978

[X86] Avoid scalar/vector transfers for scalar arithmetic #41978

RKSimon commented Jul 16, 2019

rotateright commented Sep 18, 2019

rotateright commented Mar 9, 2020

[X86] Avoid scalar/vector transfers for scalar arithmetic #41978

[X86] Avoid scalar/vector transfers for scalar arithmetic #41978

Comments

RKSimon commented Jul 16, 2019

Extended Description

rotateright commented Sep 18, 2019

rotateright commented Mar 9, 2020