17473 – possible wrong code bug

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 17473 - possible wrong code bug

Summary: possible wrong code bug

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P release blocker
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-10-03 17:58 PDT by John Regehr
Modified:	2014-03-12 16:35 PDT (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
input to gvn (4.06 KB, application/octet-stream) 2013-11-12 20:25 PST, Rafael Ávila de Espíndola	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Regehr 2013-10-03 17:58:20 PDT

Well this one is weird since fn2 isn't even called, but it can't be removed or the bad behavior goes away.

[regehr@imp r104]$ clang -w -O0 small.c ; ./a.out 
ffffffff
[regehr@imp r104]$ clang -w -O small.c ; ./a.out 
ff
[regehr@imp r104]$ gcc -w -O small.c ; ./a.out 
ffffffff
[regehr@imp r104]$ cat small.c
int printf(const char *, ...);
int a, c, d, e, f, g, i, j;
short b;
char h;

int fn1(void) 
{ 
  return b && a > b || b < 0 && a ?: b; 
}

void fn2(void) {
  for (;; f = fn1())
    ;
}

void fn3(int p1) {
  j &&(c = 0 ?: 0);
  g = p1;
}

int main(void) {
  for (h = 0; h >= 0; ++h) {
    fn3(h);
    i = d >= 0 ?: 0;
  }
  e = g;
  e += h;
  printf("%x\n", e);
  return 0;
}
[regehr@imp r104]$ clang -v
clang version 3.4 (trunk 191880)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.5.3
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.3
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6

Comment 1 Zhendong Su 2013-10-03 19:53:28 PDT

(In reply to comment #0)
> Well this one is weird since fn2 isn't even called, but it can't be removed
> or the bad behavior goes away.

John, thanks for cc'ing me on this report. I suspect that the code is invalid because as far as I know the standard doesn't like code like "a ?: b", where the middle term is missing.

Comment 2 John Regehr 2013-10-03 20:02:35 PDT

(In reply to comment #1)
> (In reply to comment #0)
> > Well this one is weird since fn2 isn't even called, but it can't be removed
> > or the bad behavior goes away.
> 
> John, thanks for cc'ing me on this report. I suspect that the code is
> invalid because as far as I know the standard doesn't like code like "a ?:
> b", where the middle term is missing.

The missing argument to the ternary operator is a GCC idiom that I believe Clang attempts to implement faithfully.

x?:z simply means x?x:z

Comment 3 Zhendong Su 2013-10-03 20:27:57 PDT

(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > Well this one is weird since fn2 isn't even called, but it can't be removed
> > > or the bad behavior goes away.
> > 
> > John, thanks for cc'ing me on this report. I suspect that the code is
> > invalid because as far as I know the standard doesn't like code like "a ?:
> > b", where the middle term is missing.
> 
> The missing argument to the ternary operator is a GCC idiom that I believe
> Clang attempts to implement faithfully.
> 
> x?:z simply means x?x:z

I see; thanks John. 

I was able to reduce it a bit more and got rid of fn2 (see below). 

--------------------------

int printf (const char *, ...);
int a, c, d, e, g, i, j;
short b;
char h;

int
fn1 (void)
{
  return a || (b && (a ? a : b));
}

void
fn3 (int p1)
{
  j && (c = 0 ? 0 : 0);
  g = p1;
}

int
main (void)
{
  for (h = 0; h >= 0; ++h)
    {
      fn3 (h);
      i = d >= 0 ? d >= 0 : 0;
    }
  e = g;
  e += h;
  printf ("%x\n", e);
  return 0;
}

Comment 4 Zhendong Su 2013-10-03 20:29:50 PDT

(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > (In reply to comment #0)
> > > > Well this one is weird since fn2 isn't even called, but it can't be removed
> > > > or the bad behavior goes away.
> > > 
> > > John, thanks for cc'ing me on this report. I suspect that the code is
> > > invalid because as far as I know the standard doesn't like code like "a ?:
> > > b", where the middle term is missing.
> > 
> > The missing argument to the ternary operator is a GCC idiom that I believe
> > Clang attempts to implement faithfully.
> > 
> > x?:z simply means x?x:z
> 
> I see; thanks John. 
> 
> I was able to reduce it a bit more and got rid of fn2 (see below). 
> 
> --------------------------
> 
> int printf (const char *, ...);
> int a, c, d, e, g, i, j;
> short b;
> char h;
> 
> int
> fn1 (void)
> {
>   return a || (b && (a ? a : b));
> }
> 
> void
> fn3 (int p1)
> {
>   j && (c = 0 ? 0 : 0);
>   g = p1;
> }
> 
> int
> main (void)
> {
>   for (h = 0; h >= 0; ++h)
>     {
>       fn3 (h);
>       i = d >= 0 ? d >= 0 : 0;
>     }
>   e = g;
>   e += h;
>   printf ("%x\n", e);
>   return 0;
> }

Oh, fn1 should be removed too: 

int printf (const char *, ...);
int a, c, d, e, g, i, j;
short b;
char h;

void
fn3 (int p1)
{
  j && (c = 0 ? 0 : 0);
  g = p1;
}

int
main (void)
{
  for (h = 0; h >= 0; ++h)
    {
      fn3 (h);
      i = d >= 0 ? d >= 0 : 0;
    }
  e = g;
  e += h;
  printf ("%x\n", e);
  return 0;
}

Comment 5 John Regehr 2013-10-03 20:33:26 PDT

> Oh, fn1 should be removed too: 

Thanks!  I was lazy and figured that if C-Reduce couldn't get rid of those functions, I couldn't either.

Comment 6 John Regehr 2013-10-04 16:44:10 PDT

Here's a different, smaller test case that looks similar enough that I'll put it here instead of making a new PR.

[regehr@imp r105]$ cat small.c
int printf(const char *, ...);
int a, b, c;
int fn1(int p1) {
  --c;
  return 0;
}

int main() {
  b = 0;
  for (; b > -24; b--) {
    fn1(a && fn1(0));
  }
  --c;
  printf("%d\n", c);
  return 0;
}
[regehr@imp r105]$ gcc -O small.c ; ./a.out 
-25
[regehr@imp r105]$ clang -O0 small.c ; ./a.out 
-25
[regehr@imp r105]$ clang -O1 small.c ; ./a.out 
-25
[regehr@imp r105]$ clang -O2 small.c ; ./a.out 
-22

Comment 7 Zhendong Su 2013-10-04 17:49:06 PDT

(In reply to comment #6)
> Here's a different, smaller test case that looks similar enough that I'll
> put it here instead of making a new PR.
> 
> [regehr@imp r105]$ cat small.c
> int printf(const char *, ...);
> int a, b, c;
> int fn1(int p1) {
>   --c;
>   return 0;
> }
> 
> int main() {
>   b = 0;
>   for (; b > -24; b--) {
>     fn1(a && fn1(0));
>   }
>   --c;
>   printf("%d\n", c);
>   return 0;
> }
> [regehr@imp r105]$ gcc -O small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O0 small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O1 small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O2 small.c ; ./a.out 
> -22

It should be different as only the trunk miscompiles it at -O2, while all versions of clang miscompile the initial testcase.

Comment 8 Zhendong Su 2013-10-06 02:08:29 PDT

(In reply to comment #6)
> Here's a different, smaller test case that looks similar enough that I'll
> put it here instead of making a new PR.
> 
> [regehr@imp r105]$ cat small.c
> int printf(const char *, ...);
> int a, b, c;
> int fn1(int p1) {
>   --c;
>   return 0;
> }
> 
> int main() {
>   b = 0;
>   for (; b > -24; b--) {
>     fn1(a && fn1(0));
>   }
>   --c;
>   printf("%d\n", c);
>   return 0;
> }
> [regehr@imp r105]$ gcc -O small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O0 small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O1 small.c ; ./a.out 
> -25
> [regehr@imp r105]$ clang -O2 small.c ; ./a.out 
> -22

Here is another testcase that should point to the same issue as the above testcase. Since both miscompilations go away with -fno-vectorize, this should be a vectorizer bug. 

------------------------

int printf (const char *, ...);

int a, b, c, d;

int
main ()
{
  for (a = 0; a < 3; a++)
    {
      c++;
      for (d = 0; d < 22; d++)
        {
	  if (b)
	    c++;
	  c++;
        }
    }
  printf ("%d\n", c);
  return 0;
}

Comment 9 Arnold Schwaighofer 2013-10-07 11:57:28 PDT

The first three snippets are not related to the vectorizers:

Compiling them with:
> Debug+Asserts/bin/clang -O2 bug17473_4.c -emit-llvm -S

shows no vectorized code. And compiling them with the vectorizers off shows the same failure behavior:

> Debug+Asserts/bin/clang -O2 bug17473_3.c -fno-vectorize -fno-slp-vectorize
> ./a.out
ff

The last two examples are indeed a bug in the loop vectorizer. I will commit a fix for them soon.

Basically, we were taking a reduction like this:

for.body3.1:                                      ; preds = %for.body3.1, %for.inc7                                                                     
  %inc613.1 = phi i32 [ 0, %for.inc7 ], [ %inc6.1, %for.body3.1 ]                                                                                       
  %inc511.1 = phi i32 [ %inc.1, %for.inc7 ], [ %inc5.1, %for.body3.1 ]                                                                                  
  %2 = zext i1 %tobool to i32
  %inc4.1 = xor i32 %2, 1                                                                                                                               
  %inc511.1.inc4.1 = add nsw i32 %inc511.1, %inc4.1
  %inc5.1 = add nsw i32 %inc511.1.inc4.1, 1                                                                                                             
  %inc6.1 = add nsw i32 %inc613.1, 1
  %exitcond.1 = icmp eq i32 %inc6.1, 22
  br i1 %exitcond.1, label %for.inc7.1, label %for.body3.1
  
for.inc7.1:                                       ; preds = %for.body3.1                                                                                
  %inc.2 = add nsw i32 %inc511.1.inc4.1, 2        
  br label %for.body3.2

And where generating this. The problem is that the external user was not using the “last” value in the reduction. So we would loose 3 "(add 1)”s.

vector.body23:                                    ; preds = %vector.body23, %for.inc7
  %index26 = phi i32 [ 0, %for.inc7 ], [ %index.next30, %vector.body23 ]                                                                                
  %vec.phi34 = phi <4 x i32> [ %7, %for.inc7 ], [ %11, %vector.body23 ]
  %8 = zext <4 x i1> %broadcast.splat36 to <4 x i32>
  %9 = xor <4 x i32> %8, <i32 1, i32 1, i32 1, i32 1>
  %10 = add nsw <4 x i32> %vec.phi34, %9          
  %11 = add nsw <4 x i32> %10, <i32 1, i32 1, i32 1, i32 1>
  %index.next30 = add i32 %index26, 4
  %12 = icmp eq i32 %index.next30, 20
  br i1 %12, label %middle.block24, label %vector.body23
  
middle.block24:                                   ; preds = %vector.body23
  %rdx.shuf38 = shufflevector <4 x i32> %10, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>

I have filed PR 17498 to track the vectorizer bug.

Comment 10 Bill Wendling 2013-11-12 15:46:10 PST

Hi. What's the status of this bug?

Comment 11 Rafael Ávila de Espíndola 2013-11-12 20:02:26 PST

(In reply to comment #10)
> Hi. What's the status of this bug?

The test in comment 4 still reproduces:

int printf(const char *, ...);
int a, c, d, e, g, i, j;
short b;
char h;
void fn3(int p1) {
  j &&(c = 0 ? 0 : 0);
  g = p1;
}
int main(void) {
  for (h = 0; h >= 0; ++h) {
    fn3(h);
    i = d >= 0 ? d >= 0 : 0;
  }
  e = g;
  e += h;
  printf("%x\n", e);
  return 0;
}

It prints different values at -O2 and -O0.

Comment 12 Rafael Ávila de Espíndola 2013-11-12 20:24:23 PST

Looks like a bug in gvn

/build/bin/clang test.c -o test.ll -m64 -emit-llvm -S -O0
./build/bin/opt -S test.ll -o test2.ll -instcombine   -inline   -loop-rotate
./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn
./build/bin/llc test2.ll -o test2.s
./build/bin/llc test3.ll -o test3.s
./build/bin/clang test2.s -o test2 -m64
./build/bin/clang test3.s -o test3 -m64
./test2
./test3

prints

ffffffff
ff

Comment 13 Rafael Ávila de Espíndola 2013-11-12 20:25:43 PST

Created attachment 11530 [details]
input to gvn

Looks like
./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn

misoptimizes this.

Comment 14 Bill Wendling 2013-11-12 22:55:00 PST

(In reply to comment #13)
> Created attachment 11530 [details]
> input to gvn
> 
> Looks like
> ./build/bin/opt -S test2.ll -o test3.ll -basicaa -gvn
> 
> misoptimizes this.

Actually, I think it's an LSR bug. Here's what I get:

[morbo:llvm] opt -S -basicaa -gvn test2.ll -o - | llc -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2
ff
[morbo:llvm] opt -S -basicaa -gvn test2.ll -o - | llc -disable-lsr -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2
ffffffff
[morbo:llvm] opt -S -O2 test2.ll -o - | llc -disable-lsr -o test2.s -relocation-model pic -disable-fp-elim && clang test2.s -o test2 && ./test2
ffffffff

Comment 15 Andrew Trick 2013-12-12 22:03:55 PST

<rdar://problem/15464466> PR17473: LSR miscompile

Comment 16 Michael Zolotukhin 2014-02-14 07:40:19 PST

As far as I understand, the test exploits undefined behaviour (signed overflow), which can't be considered as a compiler bug.

Comment 17 John Regehr 2014-02-14 08:53:22 PST

Michael, UBSAN doesn't think there's a signed overflow, can you explain why you think there is one?

Comment 18 Michael Zolotukhin 2014-02-14 14:02:42 PST

Hi John,

You could add a printf into the loop and it would become apparent:

int printf(const char *, ...);
int a, c, d, e, g, i, j;
short b;
char h;
void fn3(int p1) {
  j &&(c = 0 ? 0 : 0);
  g = p1;
}
int main(void) {
  for (h = 0; h >= 0; ++h) {
    fn3(h);
    i = d >= 0 ? d >= 0 : 0;
    printf("h=%x\n", h);        // <<< This one
  }
  e = g;
  e += h;
  printf("%x\n", e);
  return 0;
}

The output would be:
h=0
h=1
..
h=7e
h=7f
ffffffff

Also one could figure that out from the loop condition: (h = 0; h >= 0; h++).
Var h starts from 0 and increases until it becomes negative - that inevitably leads to overflow. As far as I understand, there should be an undefined behaviour.

Comment 19 John Regehr 2014-02-14 14:08:52 PST

Hi Michael, most people believe that ++h cannot overflow when h is a char because it is promoted to int, incremented, and then converted back to char. 

I believe this issue is not 100% clear in the standard, but my the view above is shared by the Clang developers, which explains why UBSAN does not complain here.

Comment 20 Zhendong Su 2014-02-14 14:10:35 PST

(In reply to comment #19)
> Hi Michael, most people believe that ++h cannot overflow when h is a char
> because it is promoted to int, incremented, and then converted back to char. 
> 
> I believe this issue is not 100% clear in the standard, but my the view
> above is shared by the Clang developers, which explains why UBSAN does not
> complain here.

Yes, I agree with John.  This also applies to signed short.

Comment 21 Michael Zolotukhin 2014-02-14 14:13:38 PST

Good to know, thanks!
But doesn't that mean that the loop should be infinite in such case? I.e. at O0 we also generate incorrect code.

Comment 22 Michael Zolotukhin 2014-02-17 07:29:17 PST

> But doesn't that mean that the loop should be infinite in such case? I.e. at
> O0 we also generate incorrect code.
Sorry, that's incorrect. I mixed up signed and unsigned behaviour.

I looked into LSR to figure out, why is this happening and here is what I found.

When LSR replaces IV with a new one, it uses i32 instead of original i8. As far as I understand, it is the widest type used in computations involving the IV, and the reason why we can do this is that we later can simply ignore higher bits if we need a narrower type. That usually works fine, but in our case the ultimate use is also i32, so we don't add truncate before it, and that causes the fail. I.e. the correct conversion sequence here should be i32->i8->i32, but LSR currently loses the middle part and thus doesn't emit needed trunc/sext, which seems to be the rootcause.

Comment 23 Michael Zolotukhin 2014-03-12 16:34:21 PDT

Fixed in r203719.