The following code is miscompiled by the current clang trunk and clang 3.3 at -Os and above on x86_64-linux-gnu in both 32-bit and 64-bit modes. It also affects clang 3.2 at -Os (but neither -O2 nor -O3). This is a regression from clang 3.1. $ clang-trunk -v clang version 3.4 (trunk 195148) Target: x86_64-unknown-linux-gnu Thread model: posix Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 $ $ $ clang-trunk -O1 small.c; a.out 0 $ clang-3.1 -Os small.c; a.out 0 $ $ clang-trunk -Os small.c; a.out 512 $ clang-3.3 -Os small.c; a.out 512 $ clang-3.2 -Os small.c; a.out 512 $ ------------------------------------ int printf (const char *, ...); int a, b, c, d; char e; void foo () { a = 0; } int main () { unsigned char f; for (; b < 1; b++) for (e = 1; e >= 0; e--) { d = 0; if (a) break; f = 179 * e; c = f << 1; foo (); } printf ("%d\n", c); return 0; }
-O2 is the same thing as -Os according to the output of: $ opt -Os -debug-pass=Arguments -S < /dev/null 2> os $ opt -O2 -debug-pass=Arguments -S < /dev/null 2> o2 $ diff os o2 Here are the optimization passes added by -O2 vs -O1: $ diff o1 o2 29c29 < always-inline --- > inline 58a59,60 > gvn > memdep 73a76,86 > barrier > domtree > loops > loop-simplify > lcssa > scalar-evolution > loop-simplify > lcssa > loop-vectorize > instcombine > simplifycfg 74a88,89 > globaldce > constmerge The output is correct with -O2 -fno-vectorize, so we need to see how the IR changes with loop-vectorize.
Oops...no-vectorize doesn't make any difference. The bug is somewhere else.
Created attachment 11722 [details] bugpoint reduced test case To reproduce the problem: opt bugpoint-passinput.bc -instcombine > bugpoint-passinput.out.bc llc bugpoint-passinput.out.bc clang bugpoint-passinput.out.s ./a.out 512 clang bugpoint-passinput.bc ./a.out 0 clang bugpoint-passinput.out.bc ./a.out 0 (thus why I think it is a backend bug: compiling the optimized IR at -O0 is ok)
Looks like maybe an LSR bug: $ clang t.c -O3 -mllvm -disable-lsr && ./a.out 0
Created attachment 11786 [details] Fix without test-cases Issue is in ScalarEvolution. For next expressions (all the numbers are i32): A = 258 + (-258) B = A and 510 It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it applies zext(trunc) to its operands. As you can see this is wrong for -258, since it is 10 bit value (9 bits + sign bit). Possible fixes is to add "-" operation for SCEV and keep all constants positive. But IMHO it a bit ridiculous. Also we can We need to take into account negative numbers somehow. I propose just use i10 for zext(trunc()) operation instead of i9. See my fix as example. Test case and full featured patch is coming. P.S.: Currently it brakes some tests the checks expressions zext(trunc()) expressions. Though there is nothing special, since inner integer type becomes one bit wider.
[quote] Also we can We need to take into account negative numbers somehow. [/quote] Sorry for typo. I mean, that solution I could see is to take into account sign bit. While calculating size of effective mask we risk nothing adding 1 more bit.
(In reply to comment #5) I've also been looking at this issue but I have not come to the same conclusion as you. Unfortunately I haven't been able to isolate this so I have no suggestion for a solution yet but I thought it was best to share what I have so far. There is a fair chance that I've misunderstood something on the way, any pointers are appreciated. See inlined comments below. > Created attachment 11786 [details] > Fix without test-cases > > Issue is in ScalarEvolution. > For next expressions (all the numbers are i32): > > A = 258 + (-258) Is this really 258? I'm seeing 358, i.e. 179 << 1. I started from the source and not the bugpoint reduced test case. Perhaps that's a difference. > B = A and 510 > > It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it > applies zext(trunc) to its operands. > As you can see this is wrong for -258, since it is 10 bit value (9 bits + > sign bit). I don't your reasoning regarding -258 is entirely correct. I'm using 358 as example but the same should apply for 258. The reason is that the result is also truncated to i9. In my example I get a truncate expression for {358, +, -358}. If you truncate -358 to i9 you get 154. 358 + 154 = 512 which is the erroneous result of the test case. If this addition were to have been correctly truncated to i9 and zero-extended to i32 it would have given the correct result of 0. Inside getZeroExtendExpr() there is a code path with comment: "If the input value is a chrec scev, and we can prove that the value did not overflow the old, smaller, value, we can zero extend all of the operands (often constants)." This is the path which is taken and moving on the next conclusion is that since the backedge is "guarded by a comparison with the pre-inc value the addrec is safe". Since the addrec is considered safe an addrec with operands zero-extended to i32 is returned. From my understanding this is what causes the problem. I've tried forcing generation of a zero extend expression instead of the addrec and this gave the correct result. > > Possible fixes is to add "-" operation for SCEV and keep all constants > positive. But IMHO it a bit ridiculous. > Also we can > We need to take into account negative numbers somehow. > > I propose just use i10 for zext(trunc()) operation instead of i9. See my fix > as example. > > Test case and full featured patch is coming. > > P.S.: Currently it brakes some tests the checks expressions zext(trunc()) > expressions. Though there is nothing special, since inner integer type > becomes one bit wider.
Created attachment 11788 [details] Even more reduced test-case. Hi David, This is my test-case. I just cut off all possible parts and reduced to minimum all the values. So minimum value is not a 179, but 129. Try this with -loop-reduce. Both -258 and -358 values are 10bit width, not 9. So it is incorrect to trunc it to 9 bits. That means, the trouble happened before zext, in trunc operation. Though, perhaps it is also incorrect to perform zext here. Since we have only '+' operation in SCEV, it is obvious, operands could be negative (or how we could implement decrements otherwise?). So it is obvious we should treat operands as signed integers. I think that we have to use sext here.
(In reply to comment #8) > Created attachment 11788 [details] > Even more reduced test-case. > > Hi David, > > This is my test-case. I just cut off all possible parts and reduced to > minimum all the values. So minimum value is not a 179, but 129. > > Try this with -loop-reduce. > > Both -258 and -358 values are 10bit width, not 9. So it is incorrect to > trunc it to 9 bits. That means, the trouble happened before zext, in trunc > operation. I will try to describe why I think the values can be reasoned about as 9-bit values. I will be following the notes I took while looking at my test case. This aspect of the test case seems identical with what you are seeing. Hopefully the paste below wont be too messed up with regards to formatting. *** IR Dump Before Loop Strength Reduction *** for.body4: ; preds = %if.end, %for.cond1.preheader %0 = phi i32 [ %.pre, %for.cond1.preheader ], [ 0, %if.end ] %indvars.iv = phi i32 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %if.end ] %1 = phi i8 [ 1, %for.cond1.preheader ], [ %dec, %if.end ] %tobool = icmp eq i32 %0, 0 br i1 %tobool, label %if.end, label %for.inc8 if.end: ; preds = %for.body4 %conv7 = mul i32 %indvars.iv, 358 %shl = and i32 %conv7, 510 store i32 %shl, i32* @c, align 4 When the SCEV for %conv7 is created the %shl register is identified as a user which leads to SCEV creation for the and-expression. Since we have a constant value of 510 as operand to the and-expression we know that at most 9 bits are required for the result. This leads us to calling getZeroExtendExpr(getTruncateExpr(getSCEV(U->getOperand(0), ..., i9) where operand(0) is "%conv7 = mul i32 %indvars.iv". The SCEV for this multiplication has been calculated previously and is {358,+,-358}<%for.body4>. When truncated to i9 this is equivalent to {358,+,154} (Rationale (-358 & 0x1FF) = 154). This is why I think this particular bug is due to zext not being performed and not related to using 9-bit data type. > > Though, perhaps it is also incorrect to perform zext here. Since we have > only '+' operation in SCEV, it is obvious, operands could be negative (or > how we could implement decrements otherwise?). So it is obvious we should > treat operands as signed integers. I think that we have to use sext here.
Am I right, that you want to transform zext(358+154=512) into 0 somehow?
(In reply to comment #10) > Am I right, that you want to transform zext(358+154=512) into 0 somehow? I don't want to transform the expression. I want the addition to be performed in i9 and then be zero extended to i32 as this would give 0 as result. This is my first time looking at this code so I would like someone who is more experienced to give their opinion whether this is the correct approach.
But you right. there were nothing wrong with trunc. In getZeroExtendExpr look at this: This branch: // If the input value is a chrec scev, and we can prove that the value // did not overflow the old, smaller, value, we can zero extend all of the // operands (often constants). This allows analysis of something like // this: for (unsigned char X = 0; X < 100; ++X) { int Y = X; } if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Op)) if (AR->isAffine()) { I'll explain what happens in my case here: start = i9 -254 step = i9 254 maxBeCount = 1 So I have overflowed counting here. Yet method finished in branch: if (isKnownPositive(Step)) {... and returns return getAddRecExpr(getZeroExtendExpr(Start, Ty), // i9 -254 => i32 258 getZeroExtendExpr(Step, Ty), // i9 254 => i32 254 L, AR->getNoWrapFlags()); While we need to analyze Start value as well. I think we have to return: return getAddRecExpr(getSignExtendExpr(Start, Ty), // i9 -254 => i32 -254 getZeroExtendExpr(Step, Ty), // i9 254 => i32 254 L, AR->getNoWrapFlags()); That should be correct IMHO.
After this check... } else if (isKnownPositiveStep)) { What constant value do you get for "N"? const SCEV *N = getConstant(APInt::getMinValue(BitWidth) - getUnsignedRange(Step).getUnsignedMax());
(In reply to comment #13) > After this check... > } else if (isKnownPositiveStep)) { > > What constant value do you get for "N"? N = -254 > > const SCEV *N = getConstant(APInt::getMinValue(BitWidth) - > > getUnsignedRange(Step).getUnsignedMax());
Created attachment 11797 [details] Another possible fix That could be a fix. I just can't see any reason why there was "if (CmpInst::isSigned(Pred)) {" while you want to extend FoundRHS and FoundLHS.
So, below is summary of my explanation, and how I came, that the reason was typo figured in fix I attached yesterday. Just run "Even more reduced test-case" with -loop-reduce option: "opt test-case.ll -loop-reduce" And stop debugger in getZeroExtendExpr, on "if (isKnownPositive(Step)) {" line. If you will dump "L", you will see, that loop counter starts from 1; iteration counter is decremented straight before to be checked, and that the condition is "counter > -1.". So it has only 2 iterations to be performed. Though, this story about another SCEV expr, that starts from i9 258 (aka i9 -254), and decreased by 258 every iteration, so on first iteration it is 258, on the second it is 0, and that's it. Now look inside getZeroExtendExpr; for test case: Step: i9 254 Start: i9 -254 Op (is equal to AR): i9 {-254,+,254}<%if.end> N: i9 -254 (aka i9 258) [code] if (isKnownPositive(Step)) { const SCEV *N = getConstant(APInt::getMinValue(BitWidth) - getUnsignedRange(Step).getUnsignedMax()); if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N) || (isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) && isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR->getPostIncExpr(*this), N))) { // Cache knowledge of AR NUW, which is propagated to this AddRec. const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW); // Return the expression with the addrec on the outside. return getAddRecExpr(getZeroExtendExpr(Start, Ty), getZeroExtendExpr(Step, Ty), L, AR->getNoWrapFlags()); } } [code] Method "isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N)" returns "true", that was cause of Start and Step zero exstention. The first: "Start" must be extended as *signed* integer, just to keep logic same. So, this branch shouldn't affect anything. The second: in fact, loop backage is not guarded by AR ULT N. If I understood properly that means whether loop backage guarded by "AR < 258". The answer is - not. This couldn't be applied to this loop. Otherwise we will not even enter this loop: Initially AR is 258. And 258 < 258 is false. Why isLoopBackedgeGuardedByCond returns true anyway? isLoopBackedgeGuardedByCond invokes isImpliedCond inside. The last one tries to simplify RHS, LHS, FoundLHS and FoundRHS: What I think is wrong is that how it extends FoundLHS and FoundRHS. Proper logic to me, is to check whether FoundPred is signed, if so we have to extend it as signed integer. Otherwise we brake the logic. In this method though it checks whether *Pred* is signed and then it does FoundLHS and FoundRHS extension. I just suppose its wrong. Pred is unsigned, and we do zext on FoundLHS/RHS. Originally FoundLHS and FoundRHS are i8 0, and i8 -1, and FoundPred is SGT. After zero extension FoundRHS would be i9 255. It is max value, and with SGT condition it would never be satisfied. So that's why it replaced with trivially_false, and after treated as equal to "AR < 258" So that we got wrong behaviour finally.
Zhendong, I have committed fix as r198863. Could you check whether it was fixed on your machine, since I'm going to mark it as FIXED.
(In reply to comment #17) > Zhendong, > I have committed fix as r198863. > Could you check whether it was fixed on your machine, since I'm going to > mark it as FIXED. Stepan, just checked. Yes, it's been fixed. Thanks.