Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLPVectorizer should drop nsw flags from add #43881

Closed
aqjune opened this issue Jan 13, 2020 · 1 comment
Closed

SLPVectorizer should drop nsw flags from add #43881

aqjune opened this issue Jan 13, 2020 · 1 comment
Labels
bugzilla Issues migrated from bugzilla

Comments

@aqjune
Copy link
Contributor

aqjune commented Jan 13, 2020

Bugzilla Link 44536
Resolution FIXED
Resolved on Jan 31, 2020 07:17
Version trunk
OS All
Attachments horizontal.ll
CC @alexey-bataev,@hfinkel,@RKSimon,@nunoplopes,@rotateright
Fixed by commit(s) bc1148e

Extended Description

$ cat horizontal.ll # from excerpted from test/Transforms/SLPVectorizer/X86/horizontal.ll 
@​arr_i32 = global [32 x i32] zeroinitializer, align 16
declare i32 @​foobar(i32)

define void @​i32_red_call(i32 %val) {
entry:
  %0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 0), align 16
  %1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 1), align 4
  %add = add nsw i32 %1, %0
  %2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 2), align 8
  %add.1 = add nsw i32 %2, %add
  %3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 3), align 4
  %add.2 = add nsw i32 %3, %add.1
  %4 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 4), align 16
  %add.3 = add nsw i32 %4, %add.2
  %5 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 5), align 4
  %add.4 = add nsw i32 %5, %add.3
  %6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 6), align 8
  %add.5 = add nsw i32 %6, %add.4
  %7 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @​arr_i32, i64 0, i64 7), align 4
  %add.6 = add nsw i32 %7, %add.5
  %res = call i32 @​foobar(i32 %add.6)
  ret void
}
$ opt -slp-vectorizer -S -o - -mtriple=x86_64-apple-macosx -mcpu=corei7-avx ./horizontal.ll
@​arr_i32 = global [32 x i32] zeroinitializer, align 16
declare i32 @​foobar(i32) #​0
define void @​i32_red_call(i32 %val) #​0 {
entry:
  %0 = load <8 x i32>, <8 x i32>* bitcast ([32 x i32]* @&#8203;arr_i32 to <8 x i32>*), align 16
  %rdx.shuf = shufflevector <8 x i32> %0, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx = add nsw <8 x i32> %0, %rdx.shuf
  %rdx.shuf1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx2 = add nsw <8 x i32> %bin.rdx, %rdx.shuf1
  %rdx.shuf3 = shufflevector <8 x i32> %bin.rdx2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx4 = add nsw <8 x i32> %bin.rdx2, %rdx.shuf3
  %1 = extractelement <8 x i32> %bin.rdx4, i32 0
  %res = call i32 @&#8203;foobar(i32 %1)
  ret void
}

attributes #&#8203;0 = { "target-cpu"="corei7-avx" }

SLPVectorizer reorders addition, so keeping nsw flag can result in introducing undefined behavior.
As reassociate does, nsw flags should be dropped.

@rotateright
Copy link
Contributor

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants