New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[instcombine comparison] wrong codegen at -O1 (affecting clang trunk, 3.3, and 3.2) #18201
Comments
Simplified away extra struct fields, printf, global: $ cat bitfield.c int foo (int p) int main () This still fails with -O1. |
Unoptimized IR created by: ; ModuleID = 'bitfield.c' %struct.S = type { [2 x i8], [2 x i8] } @a = common global %struct.S zeroinitializer, align 4 ; Function Attrs: nounwind ssp uwtable ; Function Attrs: nounwind ; Function Attrs: nounwind ssp uwtable attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{#0} !0 = metadata !{metadata !"clang version 3.4 (trunk 194436)"} |
Using opt with -O1 on the unoptimized IR: Results in: define i32 @foo(i32 %p) #0 { Something has changed the comparison of '< 1' into a comparison against '0'. |
$ ./opt -O1 -debug-pass=Arguments -S < /dev/null So it must be one of those passes... |
Even more simplified test case source (doesn't need more than one bitfield, does appear to need the global and a mask constant that doesn't get optimized to something else): $ cat bitfield.c int foo (int p) int main () |
More simplified source as unoptimized IR: $ clang bitfield.c -S -emit-llvm $ cat bitfield.ll %struct.S = type { i8, [3 x i8] } @a = common global %struct.S zeroinitializer, align 4 ; Function Attrs: nounwind ssp uwtable ; Function Attrs: nounwind ; Function Attrs: nounwind ssp uwtable attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{#0} !0 = metadata !{metadata !"clang version 3.4 (trunk 194436)"} |
And bug repros with just 2 optimization passes: $ opt -sroa -instcombine bitfield.ll -S -o bitfieldbug.ll Optimized IR in this case is: $ cat bitfieldbug.ll |
-debug output shows: So the comparison of "<1" is being mistakenly transformed into "==0" here: a signed value is being mistakenly treated as an unsigned value. |
Hmmm...given the IR, I think the instcombine optimizer is actually doing the right thing here. It recognizes that for an 8 bit signed value (field f0), "p & 6" can never be negative, so for the comparison to less than 1, the only remaining value is 0. So now the question is: Defined like this: Unless I'm missing something, the IR has lost a fundamental piece of information from the original source: there's no way to know that the 3rd bit of f0 is the sign bit now. |
I started a thread on llvm-dev to discuss this: The original test case (and my simplified test cases) could be resolved as invalid because they do not specify signed/unsigned for the bitfield(s). In that case (and I think this is crazy), the C standard specifies that signedness is implementation defined and C++ has just recently fixed this: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#739 Thank you to Mark Lacey and Richard Smith for pointing this out. However, there is still a bug in llvm because its implementation behaves like we expected - plain 'int' is treated like 'signed int'. Henrique Santos noted that the unoptimized IR does preserve the sign bit with all of those shift left/right ops. So there is a bug in InstCombine. I'm still not sure what advantage the front-end / clang has in treating bit fields as 8-bit operands ('i8') rather than just using the actual number of bits, eg 'i3'. All that shifting can lead to other codegen problems as seen in bug 17956. |
This is the IR after sroa and heading into instcombine: target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" %struct.S = type { i8, [3 x i8] } @a = common global %struct.S zeroinitializer, align 4 define i32 @foo(i32 %p) #0 { ; Function Attrs: nounwind |
Further simplified IR that still fails in instcombine (don't need any globals/structs): define i1 @foo(i8 %p, i8 %q) #0 { Note: the shifts appear to be impossible to generate with straight C code that just uses 'char' because of C's integer promotion rules. |
Finally, based on the debug output, here's the most simplified code before the incorrect optimization takes place: define i1 @foo(i8 %p) #0 { |
Henrique suggested that this particular optimization: or perhaps more easily readable by humans: should only be done if the shift instruction has an 'nsw' flag or if the comparison was unsigned (ult). Adding those constraints in InstCombiner::visitICmpInstWithInstAndIntCst() via:
Causes 6 existing test failures. This one file dealing with comparison optimizations is over 3K lines of code that only a compiler person could love...there will be bugs. |
Forgot to shift the mask value: |
For reference, here's what broke with that one line change: $ llvm-lit ../llvm/test/Transforms/InstCombine/ |
This is based on bug 913.
This is based on bug 1271. |
Phew...all of the failures were due to the fact that (strangely) you can't ask a logical right shift if it hasNoSignedWrap()...you'd think a BinaryOperator that isn't an OverflowingBinaryOperator would just return false for that, but no, it asserts on the cast. By checking for left shift before asking about the signed wrap, all tests are passing again. It is frightening how little logic some of these tests actually confirm and how ad-hoc the testing is... |
Patch submitted for review. |
Evan Cheng looked at the patch and this bug report (thanks!) and suggested: This is the first I've heard of this work - I found: And I see there was a presentation at an LLVM Dev Mtg: I haven't located any slides from that talk though. Is there any testing machinery already in LLVM that I can do this with? cc'ing Evan (I missed your reply on the commits list until now - too much traffic!) and Nuno. |
Please ignore that paper on CORK for now, since it's theorical work. Currently there's no machinery, so the translation to SMT has to be done by hand. I did a simplified version of the optimization for Shl and LshR, which are the cases you are tyring to change. I couldn't find any bug in the mentioned instcombine transformation. You can find the proof here: Unless I've missed some detail of the code, Z3 says it is correct and no patch is needed! |
Thanks, Nuno! That is very good to find the Z3 site. I hope we can trust it. :) I think you must specify 'less than' as the bit vector comparator rather than 'equals'. This is my first try at using this language, but here's my translation of the shift-left case: Just in case that link disappears: ; ShiftOpcode == Instruction::Shl (ite (= (bvshl (bvlshr Y Shift) Shift) Y) |
Thanks for your review! I did some tests for signed < ("(X << C1) & C2 s< C3") and the results are:
X = #x0 So it seems that the correct fix is 1) and not 3). Feel free to play with the tests: http://rise4fun.com/Z3/plN5x |
Here's my implementation of the 3 left-shift cases mentioned so far:
Note that I've reduced the data types to 2-bit vectors because I'm not sure how to code 'shl nsw' in LISP otherwise. :) |
If I've coded it correctly, Z3 proves that case #2 (nsw) is not a safe optimization. This is the same conclusion that Nuno just reached. |
Cool! In summary, the conclusion is that both the current code and your patch are not correct. Feel free to propose a new patch. I'll be glad to review it. |
Thanks, Nuno. I'm having fun with Z3. :) I decided to code up the full set (30!) of comparisons and shifts (because that's what the code claims to handle) with the only constraint that no bits are lost when shifting the comparison value: From that result, I think we see that if either the shift is signed (arithmetic shift right) or the comparison is signed, then the transformation is SAT (ie, illegal). I didn't add the sign bit constraint for C2 and C3 that you found yet. How did you determine that constraint? I guess the right thing to do is check various permutations of that kind of constraint until all tests pass? |
Cool! :)
It was a guess. From past experience, it seemed like a good candidate. Then I used Z3 to prove that it is actually correct. My suggestion is that we first fix the current code, so that the fix goes into LLVM 3.4. Then we can generalize the patch if needed/possible. BTW, we do have an algorithm to generate these conditions automatically. But we have only implemented on simple IR, and not yet for LLVM's IR. |
Hi Nuno, To that end, here is the simplest patch that I can think of to correct the codegen for 3.4 - it just checks the comparison to make sure it is not signed in the case of the logical shifts. I think the existing constraint on the arithmetic shift right case is safe, but I have not proven it yet: Index: lib/Transforms/InstCombine/InstCombineCompares.cpp--- lib/Transforms/InstCombine/InstCombineCompares.cpp (revision 195371)
@@ -1212,6 +1217,9 @@
|
simplest patch to correct bug for 3.4 release Some test cases included, but they are not thorough. If anyone would like to review and commit this patch, I would be most appreciative. I will not be able to commit it myself for at least the next 3 days. |
The patch LGTM. Please commit it when you are available again. Thanks! |
Thanks, Nuno. Committed as r196129. |
Just checking the sign bits of C2 and C3 does not appear to be sufficient for a right-shift (either arithmetic or logical):
|
You're right. 2nd proposal, perform transformation only if: i.e., we don't shift a 1 into the sign bit, unless C3 was already negative. |
The existing check to allow optimization on an arithmetic shift right is insufficient. It just checks if any bits that are shifted in overlap with the mask. Here's a test case that proves it: ; Function Attrs: nounwind readnone ssp uwtable ; Function Attrs: nounwind readnone ssp uwtable ... Or in Z3 form - http://rise4fun.com/Z3/5FiO : ... However, it appears that there is no way to expose this bug in actual codegen because earlier optimizations in InstCombine such as SimplifyICmpInst() always optimize the IR before we get to the shift+and+cmp chunk of code. |
Ok, so I give up trying to weak it by hand. The only safe thing I can come up with is: |
Heh...yeah, I was experimenting by hand too, and I've been wondering if this is worth so much effort...we're just getting rid of 1 shift in some obscure cases now! I'll try to code this up in C and propose a patch. |
relax previous patch to allow more optimization and fix AShr I have not found any reasonable constraint to make the arithmetic right shift case work, so I've just changed that to not fold. Based on my experiments, it is very likely that a previous optimization has simplified AShr into LShr or removed it entirely, so this should not be a significant loss. Index: lib/Transforms/InstCombine/InstCombineCompares.cpp--- lib/Transforms/InstCombine/InstCombineCompares.cpp (revision 196164)
|
Sorry for the delay. I assume you encoded the current Ashr implementation and you found a bug, right? |
Thanks, Nuno. Yes, I think the Ashr case is incorrect. Please see comment 35 for example in IR and Z3. |
Ok, please commit the patch and close this bug report. Thanks! |
Patch committed in r197705. Author: kkhoo URL: http://llvm.org/viewvc/llvm-project?rev=197705&view=rev This change fixes the case of arithmetic shift right - do not attempt to fold that case. No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations. |
Hal correctly points out that some LLVM customers may be on questionable legal ground by including links to Z3 in the LLVM codebase due to its terms of use and also that those links may not last, so I have removed the links in r197713. Here are my code examples for the left shift and logical right shift cases that were referenced in those comments: ; We want to optimize an expression of the form: ; We need to check 10 forms of compares: (declare-fun X () (_ BitVec 8)) (push) ; LEFT-SHIFT ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (pop) ; left-shift ; We want to optimize an expression of the form: ; We also need to check 10 forms of compares: (declare-fun X () (_ BitVec 8)) (push) ; LOGICAL RIGHT SHIFT (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (check-sat) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (check-sat) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (push) ; (X & (C2 >> C1)) s< (C3 >> C1) (pop) ; right-shift |
Extended Description
The following code is miscompiled by the current clang trunk (as well as clang 3.2 and 3.3) on x86_64-linux-gnu at -O1 and above in both 32-bit and 64-bit modes.
It is a regression from clang 3.1.
$ clang-trunk -v
clang version 3.4 (trunk 194075)
Target: x86_64-pc-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
$
$ clang-trunk -O0 small.c; a.out
1
$ clang-3.1 -O1 small.c; a.out
1
$
$ clang-trunk -O1 small.c; a.out
0
$ clang-3.3 -O1 small.c; a.out
0
$ clang-3.2 -O1 small.c; a.out
0
$
$ gcc-trunk -O1 small.c; a.out
1
$ icc -O1 small.c; a.out
1
$
int printf (const char *, ...);
struct S
{
int f0;
int:1;
int f1:8;
int f2;
} a;
int b;
void
foo (int p)
{
struct S c = a;
c.f1 = p & 190;
b = c.f1 < 1;
}
int
main ()
{
foo (174);
printf ("%d\n", b);
return 0;
}
The text was updated successfully, but these errors were encountered: