-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrong code (FP exception) at -O3 on x86_64-linux-gnu (bad instcombine) #20433
Comments
IR test case produced via: @d = global i32 1, align 4 ; Function Attrs: nounwind ssp uwtable for.cond: ; preds = %for.inc, %entry for.body: ; preds = %for.cond for.cond1: ; preds = %for.body3, %for.body for.body3: ; preds = %for.cond1 for.end: ; preds = %for.cond1 cond.true: ; preds = %for.end cond.false: ; preds = %for.end cond.end: ; preds = %cond.false, %cond.true for.inc: ; preds = %cond.end for.end5: ; preds = %for.cond ; Function Attrs: nounwind ssp uwtable attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{#0} !0 = metadata !{metadata !"clang version 3.5.0 (210980)"} |
Looks like a bug in the loop vectorizer: $ ./opt -tbaa -basicaa -inline -early-cse -instcombine -simplifycfg -loop-rotate -licm -loop-unswitch -gvn -sccp -instcombine -correlated-propagation -dse 20059.ll -S -o pre_vector.ll $ ./opt -loop-vectorize pre_vector.ll -S |
The original code looks like it is a duplicate #20197 (a bug in the runtime unroller similar to what we had in the vectorizer). What happens if this is compiled with -fno-unroll-loops? |
Hi Arnold - thanks for your help. Was comment 3 intended for bug 20063? I don't see any difference in this test case with -fno-unroll-loops: $ ./clang -O3 20059.c && ./a.out |
Yes, this is a separate error that I introduced :( when I fixed the back edge overflow error. I have fixed this in ToT (r211460). But, there is a separate error causing the bug reported here. It is not an error in the loop vectorizer. If it dump the IR right after the vectorizer has vectorized the loop and llc that test case it works fine. However, instcombine creates an srem on a partially undef vector. % rm -f a.out && bin/opt -basicaa -loop-vectorize -S < t3.ll | bin/llc - -o - > tmp.s && bin/clang tmp.s && ./a.out The vectorized loop will be: vector.ph: ; preds = %overflow.checked vector.body: ; preds = %vector.body, %vector.ph What is important is to note that the srem operation on the vector on vector body does not operate on undef values. Now, if i ran instcombine over this IR it will wrongly combine this to: vector.ph: ; preds = %overflow.checked vector.body: ; preds = %vector.body, %vector.ph And we get a division on undefined! |
In InstructionCombining.cpp -> InstCombiner::SimplifyVectorOp(): // If both arguments of binary operation are shuffles, which use the same So this: define i32 @main() #0 { Becomes this: define i32 @main() { The x86-64 backend (any backend?) apparently ignores the vector srem op because it sees we're only using the 0 element, so this code doesn't fail even though the IR is wrong. But a more complex test case will show the div-by-0 error: @a = global i32 1, align 4 define i32 @main() #0 { body: exit: $ ./opt 20059.ll -instcombine | ./llc | ./clang -x assembler - -o a.out && ./a.out |
Maybe this is another case of needing to guard against ops that can trap (bug 17073 and bug 16729)? What about vector FP ops or vector compares that generate selection masks? |
We can defeat post-optimization by splitting into 2 functions: @a = global i32 1, align 4 define <4 x i32> @foo(<4 x i32> %p1, <4 x i32> %p2) { define i32 @main() { $ ./opt srem.ll | ./llc | ./clang -x assembler - -o a.out && ./a.out |
Even better - remove the globals and loads: define <4 x i32> @foo(<4 x i32> %p1, <4 x i32> %p2) { define i32 @main() { |
One line patch posted for review: I've convinced myself that integer comparisons will be ok with this transform, but I don't think we should be doing this transform for any vector FP ops. Performing any kind of op on unknown FP operands could cause an exception. If that is correct, then I can submit another one line patch to disallow FP vectors and this test case in test/Transforms/InstCombine/vec_shuffle.ll should be removed: define <4 x float> @shuffle_17fsub(<4 x float> %v1, <4 x float> %v2) nounwind uwtable { |
Fixed with: The FP question is still open. I'll try to make a test case for that when I get a chance. |
Extended Description
The current clang trunk miscompiles the following code at -O3 on x86_64-linux-gnu (in both 32-bit and 64-bit modes).
This is a regression from 3.4.
$ clang-trunk -v
clang version 3.5.0 (trunk 211006)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.7.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.1
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
$
$ clang-trunk -O2 small.c; a.out
$ clang-3.4 -O3 small.c; a.out
$
$ clang-trunk -O3 small.c
$ a.out
Floating point exception (core dumped)
$
int a, b, c, d = 1;
char e = 1;
void
fn1 ()
{
int h;
for (; e; e++)
{
for (; b;)
d--;
h = (d ? a % d : 0);
c &= h;
}
}
int
main ()
{
fn1 ();
return 0;
}
The text was updated successfully, but these errors were encountered: