SLPVectorizer should drop nsw flags from add #43881

aqjune · 2020-01-13T16:53:20Z


Bugzilla Link	44536
Resolution	FIXED
Resolved on	Jan 31, 2020 07:17
Version	trunk
OS	All
Attachments	horizontal.ll
CC	@alexey-bataev,@hfinkel,@RKSimon,@nunoplopes,@rotateright
Fixed by commit(s)	`bc1148e`

Extended Description

$ cat horizontal.ll # from excerpted from test/Transforms/SLPVectorizer/X86/horizontal.ll 
@&#8203;arr_i32 = global [32 x i32] zeroinitializer, align 16
declare i32 @&#8203;foobar(i32)

define void @&#8203;i32_red_call(i32 %val) {
entry:
  %0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 0), align 16
  %1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 1), align 4
  %add = add nsw i32 %1, %0
  %2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 2), align 8
  %add.1 = add nsw i32 %2, %add
  %3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 3), align 4
  %add.2 = add nsw i32 %3, %add.1
  %4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 4), align 16
  %add.3 = add nsw i32 %4, %add.2
  %5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 5), align 4
  %add.4 = add nsw i32 %5, %add.3
  %6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 6), align 8
  %add.5 = add nsw i32 %6, %add.4
  %7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @&#8203;arr_i32, i64 0, i64 7), align 4
  %add.6 = add nsw i32 %7, %add.5
  %res = call i32 @&#8203;foobar(i32 %add.6)
  ret void
}
$ opt -slp-vectorizer -S -o - -mtriple=x86_64-apple-macosx -mcpu=corei7-avx ./horizontal.ll
@&#8203;arr_i32 = global [32 x i32] zeroinitializer, align 16
declare i32 @&#8203;foobar(i32) #&#8203;0
define void @&#8203;i32_red_call(i32 %val) #&#8203;0 {
entry:
  %0 = load <8 x i32>, <8 x i32>* bitcast ([32 x i32]* @&#8203;arr_i32 to <8 x i32>*), align 16
  %rdx.shuf = shufflevector <8 x i32> %0, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx = add nsw <8 x i32> %0, %rdx.shuf
  %rdx.shuf1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx2 = add nsw <8 x i32> %bin.rdx, %rdx.shuf1
  %rdx.shuf3 = shufflevector <8 x i32> %bin.rdx2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx4 = add nsw <8 x i32> %bin.rdx2, %rdx.shuf3
  %1 = extractelement <8 x i32> %bin.rdx4, i32 0
  %res = call i32 @&#8203;foobar(i32 %1)
  ret void
}

attributes #&#8203;0 = { "target-cpu"="corei7-avx" }

SLPVectorizer reorders addition, so keeping nsw flag can result in introducing undefined behavior.
As reassociate does, nsw flags should be dropped.

The text was updated successfully, but these errors were encountered:

rotateright · 2020-01-31T15:17:30Z

https://reviews.llvm.org/rGbc1148e7bcb0

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLPVectorizer should drop nsw flags from add #43881

SLPVectorizer should drop nsw flags from add #43881

aqjune commented Jan 13, 2020

rotateright commented Jan 31, 2020

SLPVectorizer should drop nsw flags from add #43881

SLPVectorizer should drop nsw flags from add #43881

Comments

aqjune commented Jan 13, 2020

Extended Description

rotateright commented Jan 31, 2020