18000 – wrong code at -Os and above on x86_64-linux-gnu

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 18000 - wrong code at -Os and above on x86_64-linux-gnu

Summary: wrong code at -Os and above on x86_64-linux-gnu

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-11-19 23:12 PST by Zhendong Su
Modified:	2014-01-12 04:22 PST (History)
CC List:	8 users (show)

See Also:	17847
Fixed By Commit(s):

Attachments
bugpoint reduced test case (3.37 KB, application/zip) 2013-12-12 18:34 PST, Hal Finkel	Details
Fix without test-cases (1.05 KB, patch) 2013-12-28 09:14 PST, Stepan Dyatkovskiy	Details
Even more reduced test-case. (3.90 KB, application/octet-stream) 2013-12-28 13:32 PST, Stepan Dyatkovskiy	Details
Another possible fix (621 bytes, patch) 2013-12-30 04:05 PST, Stepan Dyatkovskiy	Details
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zhendong Su 2013-11-19 23:12:27 PST

The following code is miscompiled by the current clang trunk and clang 3.3 at -Os and above on x86_64-linux-gnu in both 32-bit and 64-bit modes. 

It also affects clang 3.2 at -Os (but neither -O2 nor -O3). 

This is a regression from clang 3.1. 

$ clang-trunk -v
clang version 3.4 (trunk 195148)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
$
$ 
$ clang-trunk -O1 small.c; a.out
0
$ clang-3.1 -Os small.c; a.out
0
$
$ clang-trunk -Os small.c; a.out
512
$ clang-3.3 -Os small.c; a.out
512
$ clang-3.2 -Os small.c; a.out
512
$ 


------------------------------------


int printf (const char *, ...);

int a, b, c, d;
char e;

void
foo ()
{
  a = 0;
}

int
main ()
{
  unsigned char f;
  for (; b < 1; b++)
    for (e = 1; e >= 0; e--)
      {
	d = 0;
	if (a)
	  break;
	f = 179 * e;
	c = f << 1;
	foo ();
      }
  printf ("%d\n", c);
  return 0;
}

Comment 1 Kay Tiong Khoo 2013-11-21 16:18:09 PST

-O2 is the same thing as -Os according to the output of:
$ opt -Os -debug-pass=Arguments -S < /dev/null 2> os
$ opt -O2 -debug-pass=Arguments -S < /dev/null 2> o2
$ diff os o2

Here are the optimization passes added by -O2 vs -O1:
$ diff o1 o2
29c29
< always-inline
---
> inline
58a59,60
> gvn
> memdep
73a76,86
> barrier
> domtree
> loops
> loop-simplify
> lcssa
> scalar-evolution
> loop-simplify
> lcssa
> loop-vectorize
> instcombine
> simplifycfg
74a88,89
> globaldce
> constmerge

The output is correct with -O2 -fno-vectorize, so we need to see how the IR changes with loop-vectorize.

Comment 2 Kay Tiong Khoo 2013-11-21 17:42:24 PST

Oops...no-vectorize doesn't make any difference. The bug is somewhere else.

Comment 3 Hal Finkel 2013-12-12 18:34:45 PST

Created attachment 11722 [details]
bugpoint reduced test case

To reproduce the problem:

opt bugpoint-passinput.bc -instcombine > bugpoint-passinput.out.bc
llc bugpoint-passinput.out.bc
clang bugpoint-passinput.out.s
./a.out 
512

clang bugpoint-passinput.bc
./a.out 
0

clang bugpoint-passinput.out.bc
./a.out 
0
(thus why I think it is a backend bug: compiling the optimized IR at -O0 is ok)

Comment 4 Bill Wendling 2013-12-19 03:39:51 PST

Looks like maybe an LSR bug:

$ clang t.c -O3 -mllvm -disable-lsr && ./a.out
0

Comment 5 Stepan Dyatkovskiy 2013-12-28 09:14:52 PST

Created attachment 11786 [details]
Fix without test-cases

Issue is in ScalarEvolution.
For next expressions (all the numbers are i32):

A = 258 + (-258)
B = A and 510

It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it applies zext(trunc) to its operands.
As you can see this is wrong for -258, since it is 10 bit value (9 bits + sign bit).

Possible fixes is to add "-" operation for SCEV and keep all constants positive. But IMHO it a bit ridiculous.
Also we can 
We need to take into account negative numbers somehow.

I propose just use i10 for zext(trunc()) operation instead of i9. See my fix as example.

Test case and full featured patch is coming.

P.S.: Currently it brakes some tests the checks expressions zext(trunc()) expressions. Though there is nothing special, since inner integer type becomes one bit wider.

Comment 6 Stepan Dyatkovskiy 2013-12-28 09:18:30 PST

[quote]
Also we can 
We need to take into account negative numbers somehow.
[/quote]

Sorry for typo. I mean, that solution I could see is to take into account sign bit. While calculating size of effective mask we risk nothing adding 1 more bit.

Comment 7 David Wiberg 2013-12-28 13:14:09 PST

(In reply to comment #5)

I've also been looking at this issue but I have not come to the same conclusion as you. Unfortunately I haven't been able to isolate this so I have no suggestion for a solution yet but I thought it was best to share what I have so far. There is a fair chance that I've misunderstood something on the way, any pointers are appreciated.

See inlined comments below.

> Created attachment 11786 [details]
> Fix without test-cases
> 
> Issue is in ScalarEvolution.
> For next expressions (all the numbers are i32):
> 
> A = 258 + (-258)
Is this really 258? I'm seeing 358, i.e. 179 << 1. I started from the source and not the bugpoint reduced test case. Perhaps that's a difference.

> B = A and 510
> 
> It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it
> applies zext(trunc) to its operands.
> As you can see this is wrong for -258, since it is 10 bit value (9 bits +
> sign bit).
I don't your reasoning regarding -258 is entirely correct. I'm using 358 as example but the same should apply for 258. The reason is that the result is also truncated to i9. In my example I get a truncate expression for {358, +, -358}. If you truncate -358 to i9 you get 154. 358 + 154 = 512 which is the erroneous result of the test case. If this addition were to have been correctly truncated to i9 and zero-extended to i32 it would have given the correct result of 0.

Inside getZeroExtendExpr() there is a code path with comment:
"If the input value is a chrec scev, and we can prove that the value did not overflow the old, smaller, value, we can zero extend all of the operands (often constants)."
This is the path which is taken and moving on the next conclusion is that since the backedge is "guarded by a comparison with the pre-inc value the addrec is safe". Since the addrec is considered safe an addrec with operands zero-extended to i32 is returned.

From my understanding this is what causes the problem. I've tried forcing generation of a zero extend expression instead of the addrec and this gave the correct result.

> 
> Possible fixes is to add "-" operation for SCEV and keep all constants
> positive. But IMHO it a bit ridiculous.
> Also we can 
> We need to take into account negative numbers somehow.
> 
> I propose just use i10 for zext(trunc()) operation instead of i9. See my fix
> as example.
> 
> Test case and full featured patch is coming.
> 
> P.S.: Currently it brakes some tests the checks expressions zext(trunc())
> expressions. Though there is nothing special, since inner integer type
> becomes one bit wider.

Comment 8 Stepan Dyatkovskiy 2013-12-28 13:32:39 PST

Created attachment 11788 [details]
Even more reduced test-case.

Hi David,

This is my test-case. I just cut off all possible parts and reduced to minimum all the values. So minimum value is not a 179, but 129.

Try this with -loop-reduce.

Both -258 and -358 values are 10bit width, not 9. So it is incorrect to trunc it to 9 bits. That means, the trouble happened before zext, in trunc operation.

Though, perhaps it is also incorrect to perform zext here. Since we have only '+' operation in SCEV, it is obvious, operands could be negative (or how we could implement decrements otherwise?). So it is obvious we should treat operands as signed integers. I think that we have to use sext here.

Comment 9 David Wiberg 2013-12-28 14:28:35 PST

(In reply to comment #8)
> Created attachment 11788 [details]
> Even more reduced test-case.
> 
> Hi David,
> 
> This is my test-case. I just cut off all possible parts and reduced to
> minimum all the values. So minimum value is not a 179, but 129.
> 
> Try this with -loop-reduce.
> 
> Both -258 and -358 values are 10bit width, not 9. So it is incorrect to
> trunc it to 9 bits. That means, the trouble happened before zext, in trunc
> operation.
I will try to describe why I think the values can be reasoned about as 9-bit values. I will be following the notes I took while looking at my test case. This aspect of the test case seems identical with what you are seeing.

Hopefully the paste below wont be too messed up with regards to formatting.
*** IR Dump Before Loop Strength Reduction ***
for.body4:                ; preds = %if.end, %for.cond1.preheader
  %0 = phi i32 [ %.pre, %for.cond1.preheader ], [ 0, %if.end ]
  %indvars.iv = phi i32 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %if.end ]
  %1 = phi i8 [ 1, %for.cond1.preheader ], [ %dec, %if.end ]
  %tobool = icmp eq i32 %0, 0
  br i1 %tobool, label %if.end, label %for.inc8

if.end:                   ; preds = %for.body4
  %conv7 = mul i32 %indvars.iv, 358
  %shl = and i32 %conv7, 510
  store i32 %shl, i32* @c, align 4

When the SCEV for %conv7 is created the %shl register is identified as a user which leads to SCEV creation for the and-expression. Since we have a constant value of 510 as operand to the and-expression we know that at most 9 bits are required for the result.

This leads us to calling getZeroExtendExpr(getTruncateExpr(getSCEV(U->getOperand(0), ..., i9) where operand(0) is "%conv7 = mul i32 %indvars.iv".
The SCEV for this multiplication has been calculated previously and is {358,+,-358}<%for.body4>. When truncated to i9 this is equivalent to {358,+,154} (Rationale (-358 & 0x1FF) = 154). This is why I think this particular bug is due to zext not being performed and not related to using 9-bit data type.

> 
> Though, perhaps it is also incorrect to perform zext here. Since we have
> only '+' operation in SCEV, it is obvious, operands could be negative (or
> how we could implement decrements otherwise?). So it is obvious we should
> treat operands as signed integers. I think that we have to use sext here.

Comment 10 Stepan Dyatkovskiy 2013-12-28 14:57:50 PST

Am I right, that you want to transform zext(358+154=512) into 0 somehow?

Comment 11 David Wiberg 2013-12-28 15:13:57 PST

(In reply to comment #10)
> Am I right, that you want to transform zext(358+154=512) into 0 somehow?
I don't want to transform the expression. I want the addition to be performed in i9 and then be zero extended to i32 as this would give 0 as result.

This is my first time looking at this code so I would like someone who is more experienced to give their opinion whether this is the correct approach.

Comment 12 Stepan Dyatkovskiy 2013-12-28 15:59:26 PST

But you right. there were nothing wrong with trunc.

In getZeroExtendExpr look at this:
This branch:


  // If the input value is a chrec scev, and we can prove that the value
  // did not overflow the old, smaller, value, we can zero extend all of the
  // operands (often constants).  This allows analysis of something like
  // this:  for (unsigned char X = 0; X < 100; ++X) { int Y = X; }
  if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Op))
    if (AR->isAffine()) {


I'll explain what happens in my case here:
start = i9 -254
step = i9 254
maxBeCount = 1

So I have overflowed counting here.

Yet method finished in branch:
if (isKnownPositive(Step)) {...
and returns
  return getAddRecExpr(getZeroExtendExpr(Start, Ty), // i9 -254 => i32 258
                       getZeroExtendExpr(Step, Ty),  // i9 254 => i32 254
                       L, AR->getNoWrapFlags());

While we need to analyze Start value as well. I think we have to return:

  return getAddRecExpr(getSignExtendExpr(Start, Ty), // i9 -254 => i32 -254
                       getZeroExtendExpr(Step, Ty),  // i9 254 => i32 254
                       L, AR->getNoWrapFlags());

That should be correct IMHO.

Comment 13 Andrew Trick 2013-12-28 16:49:11 PST

After this check...
        } else if (isKnownPositiveStep)) {

What constant value do you get for "N"?

          const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
                                      getUnsignedRange(Step).getUnsignedMax());

Comment 14 David Wiberg 2013-12-29 09:09:54 PST

(In reply to comment #13)
> After this check...
>         } else if (isKnownPositiveStep)) {
> 
> What constant value do you get for "N"?
N = -254

> 
>           const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
>                                      
> getUnsignedRange(Step).getUnsignedMax());

Comment 15 Stepan Dyatkovskiy 2013-12-30 04:05:11 PST

Created attachment 11797 [details]
Another possible fix

That could be a fix. I just can't see any reason why there was "if (CmpInst::isSigned(Pred)) {" while you want to extend FoundRHS and FoundLHS.

Comment 16 Stepan Dyatkovskiy 2013-12-31 12:02:37 PST

So, below is summary of my explanation, and how I came, that the reason was typo figured in fix I attached yesterday.

Just run "Even more reduced test-case" with -loop-reduce option:
"opt test-case.ll -loop-reduce"

And stop debugger in getZeroExtendExpr, on "if (isKnownPositive(Step)) {" line. If you will dump "L", you will see, that loop counter starts from 1; iteration counter is decremented straight before to be checked, and that the condition is "counter > -1.".
So it has only 2 iterations to be performed.

Though, this story about another SCEV expr, that starts from i9 258 (aka i9 -254), and decreased by 258 every iteration, so on first iteration it is 258, on the second it is 0, and that's it.

Now look inside getZeroExtendExpr; for test case:
Step: i9 254
Start: i9 -254
Op (is equal to AR): i9 {-254,+,254}<%if.end>
N: i9 -254 (aka i9 258)

[code]
if (isKnownPositive(Step)) {
  const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
  getUnsignedRange(Step).getUnsignedMax());
  if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N) ||
      (isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) &&
       isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT,
                                   AR->getPostIncExpr(*this), N))) {
    // Cache knowledge of AR NUW, which is propagated to this AddRec.
    const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
    // Return the expression with the addrec on the outside.
    return getAddRecExpr(getZeroExtendExpr(Start, Ty),
                         getZeroExtendExpr(Step, Ty),
                         L, AR->getNoWrapFlags());
  }
}
[code]

Method "isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N)" returns "true", that was cause of Start and Step zero exstention.
The first: "Start" must be extended as *signed* integer, just to keep logic same. So, this branch shouldn't affect anything.
The second: in fact, loop backage is not guarded by AR ULT N. If I understood properly that means whether loop backage guarded by "AR < 258". The answer is - not. This couldn't be applied to this loop. Otherwise we will not even enter this loop:
Initially AR is 258. And 258 < 258 is false.

Why isLoopBackedgeGuardedByCond returns true anyway?

isLoopBackedgeGuardedByCond invokes isImpliedCond inside.
The last one tries to simplify RHS, LHS, FoundLHS and FoundRHS:
What I think is wrong is that how it extends FoundLHS and FoundRHS.
Proper logic to me, is to check whether FoundPred is signed, if so we have to extend it as signed integer. Otherwise we brake the logic.

In this method though it checks whether *Pred* is signed and then it does FoundLHS and FoundRHS extension. I just suppose its wrong. Pred is unsigned, and we do zext on FoundLHS/RHS. Originally FoundLHS and FoundRHS are i8 0, and i8 -1, and FoundPred is SGT.
After zero extension FoundRHS would be i9 255. It is max value, and with SGT condition it would never be satisfied. So that's why it replaced with trivially_false, and after treated as equal to "AR < 258"
So that we got wrong behaviour finally.

Comment 17 Stepan Dyatkovskiy 2014-01-11 05:59:48 PST

Zhendong,
I have committed fix as r198863.
Could you check whether it was fixed on your machine, since I'm going to mark it as FIXED.

Comment 18 Zhendong Su 2014-01-11 15:51:29 PST

(In reply to comment #17)
> Zhendong,
> I have committed fix as r198863.
> Could you check whether it was fixed on your machine, since I'm going to
> mark it as FIXED.

Stepan, just checked. Yes, it's been fixed. Thanks.