wrong code at -Os and above on x86_64-linux-gnu #18374

zhendongsu · 2013-11-20T07:12:27Z


Bugzilla Link	18000
Resolution	FIXED
Resolved on	Jan 12, 2014 04:22
Version	trunk
OS	All
CC	@atrick,@hfinkel,@isanbard,@kaomoneus

Extended Description

The following code is miscompiled by the current clang trunk and clang 3.3 at -Os and above on x86_64-linux-gnu in both 32-bit and 64-bit modes.

It also affects clang 3.2 at -Os (but neither -O2 nor -O3).

This is a regression from clang 3.1.

$ clang-trunk -v
clang version 3.4 (trunk 195148)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
$
$
$ clang-trunk -O1 small.c; a.out
0
$ clang-3.1 -Os small.c; a.out
0
$
$ clang-trunk -Os small.c; a.out
512
$ clang-3.3 -Os small.c; a.out
512
$ clang-3.2 -Os small.c; a.out
512
$

int printf (const char *, ...);

int a, b, c, d;
char e;

void
foo ()
{
a = 0;
}

int
main ()
{
unsigned char f;
for (; b < 1; b++)
for (e = 1; e >= 0; e--)
{
d = 0;
if (a)
break;
f = 179 * e;
c = f << 1;
foo ();
}
printf ("%d\n", c);
return 0;
}

llvmbot · 2013-11-22T00:18:09Z

-O2 is the same thing as -Os according to the output of:
$ opt -Os -debug-pass=Arguments -S < /dev/null 2> os
$ opt -O2 -debug-pass=Arguments -S < /dev/null 2> o2
$ diff os o2

Here are the optimization passes added by -O2 vs -O1:
$ diff o1 o2
29c29
< always-inline

inline
58a59,60
gvn
memdep
73a76,86
barrier
domtree
loops
loop-simplify
lcssa
scalar-evolution
loop-simplify
lcssa
loop-vectorize
instcombine
simplifycfg
74a88,89
globaldce
constmerge

The output is correct with -O2 -fno-vectorize, so we need to see how the IR changes with loop-vectorize.

llvmbot · 2013-11-22T01:42:24Z

Oops...no-vectorize doesn't make any difference. The bug is somewhere else.

hfinkel · 2013-12-13T02:34:45Z

bugpoint reduced test case
To reproduce the problem:

opt bugpoint-passinput.bc -instcombine > bugpoint-passinput.out.bc
llc bugpoint-passinput.out.bc
clang bugpoint-passinput.out.s
./a.out
512

clang bugpoint-passinput.bc
./a.out
0

clang bugpoint-passinput.out.bc
./a.out
0
(thus why I think it is a backend bug: compiling the optimized IR at -O0 is ok)

isanbard · 2013-12-19T11:39:51Z

Looks like maybe an LSR bug:

$ clang t.c -O3 -mllvm -disable-lsr && ./a.out
0

kaomoneus · 2013-12-28T17:14:52Z

Fix without test-cases
Issue is in ScalarEvolution.
For next expressions (all the numbers are i32):

A = 258 + (-258)
B = A and 510

It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it applies zext(trunc) to its operands.
As you can see this is wrong for -258, since it is 10 bit value (9 bits + sign bit).

Possible fixes is to add "-" operation for SCEV and keep all constants positive. But IMHO it a bit ridiculous.
Also we can
We need to take into account negative numbers somehow.

I propose just use i10 for zext(trunc()) operation instead of i9. See my fix as example.

Test case and full featured patch is coming.

P.S.: Currently it brakes some tests the checks expressions zext(trunc()) expressions. Though there is nothing special, since inner integer type becomes one bit wider.

kaomoneus · 2013-12-28T17:18:30Z

[quote]
Also we can
We need to take into account negative numbers somehow.
[/quote]

Sorry for typo. I mean, that solution I could see is to take into account sign bit. While calculating size of effective mask we risk nothing adding 1 more bit.

llvmbot · 2013-12-28T21:14:09Z

I've also been looking at this issue but I have not come to the same conclusion as you. Unfortunately I haven't been able to isolate this so I have no suggestion for a solution yet but I thought it was best to share what I have so far. There is a fair chance that I've misunderstood something on the way, any pointers are appreciated.

See inlined comments below.

Created attachment 11786 [details]
Fix without test-cases

Issue is in ScalarEvolution.
For next expressions (all the numbers are i32):

A = 258 + (-258)
Is this really 258? I'm seeing 358, i.e. 179 << 1. I started from the source and not the bugpoint reduced test case. Perhaps that's a difference.

B = A and 510

It replaces A & 510 with "zext (trunc A to i9) to i32". Since A is a sum, it
applies zext(trunc) to its operands.
As you can see this is wrong for -258, since it is 10 bit value (9 bits +
sign bit).
I don't your reasoning regarding -258 is entirely correct. I'm using 358 as example but the same should apply for 258. The reason is that the result is also truncated to i9. In my example I get a truncate expression for {358, +, -358}. If you truncate -358 to i9 you get 154. 358 + 154 = 512 which is the erroneous result of the test case. If this addition were to have been correctly truncated to i9 and zero-extended to i32 it would have given the correct result of 0.

Inside getZeroExtendExpr() there is a code path with comment:
"If the input value is a chrec scev, and we can prove that the value did not overflow the old, smaller, value, we can zero extend all of the operands (often constants)."
This is the path which is taken and moving on the next conclusion is that since the backedge is "guarded by a comparison with the pre-inc value the addrec is safe". Since the addrec is considered safe an addrec with operands zero-extended to i32 is returned.

From my understanding this is what causes the problem. I've tried forcing generation of a zero extend expression instead of the addrec and this gave the correct result.

Possible fixes is to add "-" operation for SCEV and keep all constants
positive. But IMHO it a bit ridiculous.
Also we can
We need to take into account negative numbers somehow.

I propose just use i10 for zext(trunc()) operation instead of i9. See my fix
as example.

Test case and full featured patch is coming.

P.S.: Currently it brakes some tests the checks expressions zext(trunc())
expressions. Though there is nothing special, since inner integer type
becomes one bit wider.

kaomoneus · 2013-12-28T21:32:39Z

Even more reduced test-case.
Hi David,

This is my test-case. I just cut off all possible parts and reduced to minimum all the values. So minimum value is not a 179, but 129.

Try this with -loop-reduce.

Both -258 and -358 values are 10bit width, not 9. So it is incorrect to trunc it to 9 bits. That means, the trouble happened before zext, in trunc operation.

Though, perhaps it is also incorrect to perform zext here. Since we have only '+' operation in SCEV, it is obvious, operands could be negative (or how we could implement decrements otherwise?). So it is obvious we should treat operands as signed integers. I think that we have to use sext here.

llvmbot · 2013-12-28T22:28:35Z

Created attachment 11788 [details]
Even more reduced test-case.

Hi David,

This is my test-case. I just cut off all possible parts and reduced to
minimum all the values. So minimum value is not a 179, but 129.

Try this with -loop-reduce.

Both -258 and -358 values are 10bit width, not 9. So it is incorrect to
trunc it to 9 bits. That means, the trouble happened before zext, in trunc
operation.
I will try to describe why I think the values can be reasoned about as 9-bit values. I will be following the notes I took while looking at my test case. This aspect of the test case seems identical with what you are seeing.

Hopefully the paste below wont be too messed up with regards to formatting.
*** IR Dump Before Loop Strength Reduction ***
for.body4: ; preds = %if.end, %for.cond1.preheader
%0 = phi i32 [ %.pre, %for.cond1.preheader ], [ 0, %if.end ]
%indvars.iv = phi i32 [ 1, %for.cond1.preheader ], [ %indvars.iv.next, %if.end ]
%1 = phi i8 [ 1, %for.cond1.preheader ], [ %dec, %if.end ]
%tobool = icmp eq i32 %0, 0
br i1 %tobool, label %if.end, label %for.inc8

if.end: ; preds = %for.body4
%conv7 = mul i32 %indvars.iv, 358
%shl = and i32 %conv7, 510
store i32 %shl, i32* @c, align 4

When the SCEV for %conv7 is created the %shl register is identified as a user which leads to SCEV creation for the and-expression. Since we have a constant value of 510 as operand to the and-expression we know that at most 9 bits are required for the result.

This leads us to calling getZeroExtendExpr(getTruncateExpr(getSCEV(U->getOperand(0), ..., i9) where operand(0) is "%conv7 = mul i32 %indvars.iv".
The SCEV for this multiplication has been calculated previously and is {358,+,-358}<%for.body4>. When truncated to i9 this is equivalent to {358,+,154} (Rationale (-358 & 0x1FF) = 154). This is why I think this particular bug is due to zext not being performed and not related to using 9-bit data type.

Though, perhaps it is also incorrect to perform zext here. Since we have
only '+' operation in SCEV, it is obvious, operands could be negative (or
how we could implement decrements otherwise?). So it is obvious we should
treat operands as signed integers. I think that we have to use sext here.

kaomoneus · 2013-12-28T22:57:50Z

Am I right, that you want to transform zext(358+154=512) into 0 somehow?

llvmbot · 2013-12-28T23:13:57Z

Am I right, that you want to transform zext(358+154=512) into 0 somehow?
I don't want to transform the expression. I want the addition to be performed in i9 and then be zero extended to i32 as this would give 0 as result.

This is my first time looking at this code so I would like someone who is more experienced to give their opinion whether this is the correct approach.

kaomoneus · 2013-12-28T23:59:26Z

But you right. there were nothing wrong with trunc.

In getZeroExtendExpr look at this:
This branch:

// If the input value is a chrec scev, and we can prove that the value
// did not overflow the old, smaller, value, we can zero extend all of the
// operands (often constants). This allows analysis of something like
// this: for (unsigned char X = 0; X < 100; ++X) { int Y = X; }
if (const SCEVAddRecExpr *AR = dyn_cast(Op))
if (AR->isAffine()) {

I'll explain what happens in my case here:
start = i9 -254
step = i9 254
maxBeCount = 1

So I have overflowed counting here.

Yet method finished in branch:
if (isKnownPositive(Step)) {...
and returns
return getAddRecExpr(getZeroExtendExpr(Start, Ty), // i9 -254 => i32 258
getZeroExtendExpr(Step, Ty), // i9 254 => i32 254
L, AR->getNoWrapFlags());

While we need to analyze Start value as well. I think we have to return:

return getAddRecExpr(getSignExtendExpr(Start, Ty), // i9 -254 => i32 -254
getZeroExtendExpr(Step, Ty), // i9 254 => i32 254
L, AR->getNoWrapFlags());

That should be correct IMHO.

atrick · 2013-12-29T00:49:11Z

After this check...
} else if (isKnownPositiveStep)) {

What constant value do you get for "N"?

      const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
                                  getUnsignedRange(Step).getUnsignedMax());

llvmbot · 2013-12-29T17:09:54Z

After this check...
} else if (isKnownPositiveStep)) {

What constant value do you get for "N"?
N = -254

      const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
getUnsignedRange(Step).getUnsignedMax());

kaomoneus · 2013-12-30T12:05:11Z

Another possible fix
That could be a fix. I just can't see any reason why there was "if (CmpInst::isSigned(Pred)) {" while you want to extend FoundRHS and FoundLHS.

kaomoneus · 2013-12-31T20:02:37Z

So, below is summary of my explanation, and how I came, that the reason was typo figured in fix I attached yesterday.

Just run "Even more reduced test-case" with -loop-reduce option:
"opt test-case.ll -loop-reduce"

And stop debugger in getZeroExtendExpr, on "if (isKnownPositive(Step)) {" line. If you will dump "L", you will see, that loop counter starts from 1; iteration counter is decremented straight before to be checked, and that the condition is "counter > -1.".
So it has only 2 iterations to be performed.

Though, this story about another SCEV expr, that starts from i9 258 (aka i9 -254), and decreased by 258 every iteration, so on first iteration it is 258, on the second it is 0, and that's it.

Now look inside getZeroExtendExpr; for test case:
Step: i9 254
Start: i9 -254
Op (is equal to AR): i9 {-254,+,254}<%if.end>
N: i9 -254 (aka i9 258)

[code]
if (isKnownPositive(Step)) {
const SCEV *N = getConstant(APInt::getMinValue(BitWidth) -
getUnsignedRange(Step).getUnsignedMax());
if (isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N) ||
(isLoopEntryGuardedByCond(L, ICmpInst::ICMP_ULT, Start, N) &&
isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT,
AR->getPostIncExpr(*this), N))) {
// Cache knowledge of AR NUW, which is propagated to this AddRec.
const_cast<SCEVAddRecExpr *>(AR)->setNoWrapFlags(SCEV::FlagNUW);
// Return the expression with the addrec on the outside.
return getAddRecExpr(getZeroExtendExpr(Start, Ty),
getZeroExtendExpr(Step, Ty),
L, AR->getNoWrapFlags());
}
}
[code]

Method "isLoopBackedgeGuardedByCond(L, ICmpInst::ICMP_ULT, AR, N)" returns "true", that was cause of Start and Step zero exstention.
The first: "Start" must be extended as signed integer, just to keep logic same. So, this branch shouldn't affect anything.
The second: in fact, loop backage is not guarded by AR ULT N. If I understood properly that means whether loop backage guarded by "AR < 258". The answer is - not. This couldn't be applied to this loop. Otherwise we will not even enter this loop:
Initially AR is 258. And 258 < 258 is false.

Why isLoopBackedgeGuardedByCond returns true anyway?

isLoopBackedgeGuardedByCond invokes isImpliedCond inside.
The last one tries to simplify RHS, LHS, FoundLHS and FoundRHS:
What I think is wrong is that how it extends FoundLHS and FoundRHS.
Proper logic to me, is to check whether FoundPred is signed, if so we have to extend it as signed integer. Otherwise we brake the logic.

In this method though it checks whether Pred is signed and then it does FoundLHS and FoundRHS extension. I just suppose its wrong. Pred is unsigned, and we do zext on FoundLHS/RHS. Originally FoundLHS and FoundRHS are i8 0, and i8 -1, and FoundPred is SGT.
After zero extension FoundRHS would be i9 255. It is max value, and with SGT condition it would never be satisfied. So that's why it replaced with trivially_false, and after treated as equal to "AR < 258"
So that we got wrong behaviour finally.

kaomoneus · 2014-01-11T13:59:48Z

Zhendong,
I have committed fix as r198863.
Could you check whether it was fixed on your machine, since I'm going to mark it as FIXED.

zhendongsu · 2014-01-11T23:51:29Z

Zhendong,
I have committed fix as r198863.
Could you check whether it was fixed on your machine, since I'm going to
mark it as FIXED.

Stepan, just checked. Yes, it's been fixed. Thanks.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong code at -Os and above on x86_64-linux-gnu #18374

wrong code at -Os and above on x86_64-linux-gnu #18374

zhendongsu commented Nov 20, 2013

llvmbot commented Nov 22, 2013

llvmbot commented Nov 22, 2013

hfinkel commented Dec 13, 2013

isanbard commented Dec 19, 2013

kaomoneus commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

atrick commented Dec 29, 2013

llvmbot commented Dec 29, 2013

kaomoneus commented Dec 30, 2013

kaomoneus commented Dec 31, 2013

kaomoneus commented Jan 11, 2014

zhendongsu commented Jan 11, 2014

wrong code at -Os and above on x86_64-linux-gnu #18374

wrong code at -Os and above on x86_64-linux-gnu #18374

Comments

zhendongsu commented Nov 20, 2013

Extended Description

llvmbot commented Nov 22, 2013

Here are the optimization passes added by -O2 vs -O1: $ diff o1 o2 29c29 < always-inline

llvmbot commented Nov 22, 2013

hfinkel commented Dec 13, 2013

isanbard commented Dec 19, 2013

kaomoneus commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

llvmbot commented Dec 28, 2013

kaomoneus commented Dec 28, 2013

atrick commented Dec 29, 2013

llvmbot commented Dec 29, 2013

kaomoneus commented Dec 30, 2013

kaomoneus commented Dec 31, 2013

kaomoneus commented Jan 11, 2014

zhendongsu commented Jan 11, 2014

Here are the optimization passes added by -O2 vs -O1:
$ diff o1 o2
29c29
< always-inline