51043 – wrong code at -O3 on x86_64-linux-gnu

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 51043 - wrong code at -O3 on x86_64-linux-gnu

Summary: wrong code at -O3 on x86_64-linux-gnu

Status:	CONFIRMED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Duplicates (1):	51671 (view as bug list)
Depends on:
Blocks:

Reported:	2021-07-09 14:50 PDT by Zhendong Su
Modified:	2021-08-30 11:21 PDT (History)
CC List:	9 users (show)

See Also:	31652 27506 51086
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zhendong Su 2021-07-09 14:50:57 PDT

[706] % clangtk -v
clang version 13.0.0 (https://github.com/llvm/llvm-project.git 4a3b0556536d5a2555f7a19f953f0eec0f79f1a9)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /local/suz-local/opfuzz/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
[707] % 
[707] % clangtk -O2 small.c; ./a.out
[708] % 
[708] % clangtk -O3 small.c
[709] % ./a.out
Aborted
[710] % 
[710] % cat small.c
int a = 1, b = 1;
int main() {
  short d, g, i;
  int e = 34000, h;
  d = b;
  g = 100 | b;
L1:
  i = g;
L2:
  g = ~(d / e);
  e = ~((2 / g) & d);
  h = a;
  while (!e) {
    a = b;
    e = ~(1L << i);
  }
  if (g > 0)
    goto L2;
  if (!g)
    goto L1;
  if (h < e)
    __builtin_abort ();
  return 0;
}

Comment 1 Zhendong Su 2021-07-09 14:54:26 PDT

Compiler Explorer: https://godbolt.org/z/oao731Ksc

Comment 2 Sanjay Patel 2021-07-12 16:21:08 PDT

I think the bug is in or uncovered by the SimpleLoopUnswitchPass. At -O3, the pass differs because it takes a parameter to allow "non-trivial" unswitching.

This can be controlled with a debug flag, so we can avoid the bug by changing the clang invocation with:
-mllvm -enable-npm-O3-nontrivial-unswitch=0

Comment 3 Sanjay Patel 2021-07-12 16:25:52 PDT

I don't know anything about the SimpleLoopUnswitch pass, so cc'ing some people based on where this probably became visible:
https://reviews.llvm.org/D94559

Comment 4 Sanjay Patel 2021-07-12 16:27:09 PDT

I managed to show the bug in a reduction for Alive2 by pasting the before/after IR with SimpleLoopUnswitch:

https://alive2.llvm.org/ce/z/kUAyji

Comment 5 Arthur Eubanks 2021-07-13 11:07:50 PDT

Managed to compile alive locally and use llvm-reduce:

$ cat /tmp/reduce.sh 
#!/bin/bash
$HOME/repos/llvm-project2/build/rel/bin/opt -passes='simple-loop-unswitch<nontrivial>' $@ -S -o /tmp/b.ll
$HOME/repos/alive2/build/alive-tv $@ /tmp/b.ll | rg -F "Transformation doesn't verify!"

$ ./build/rel/bin/llvm-reduce --test=/tmp/reduce.sh /tmp/a.ll
$ cat reduce.ll
define i32 @src() {
entry:
  %conv1 = shl i32 undef, 16
  %sext31 = ashr exact i32 %conv1, 16
  br label %L1

L1:                                               ; preds = %if.end, %entry
  %e.0 = phi i32 [ 34000, %entry ], [ undef, %if.end ]
  %tobool.not = icmp eq i32 undef, -1
  br label %L2

L2:                                               ; preds = %while.end, %L1
  %e.1 = phi i32 [ %e.0, %L1 ], [ undef, %while.end ]
  %div = sdiv i32 %sext31, %e.1
  %0 = trunc i32 %div to i16
  %1 = trunc i32 %div to i16
  %div5.rhs.trunc = xor i16 %1, -1
  %div534 = sdiv i16 2, %div5.rhs.trunc
  %div5.sext = sext i16 %div534 to i32
  %and = and i32 %div5.sext, %sext31
  %tobool.not36 = icmp eq i32 %and, -1
  br i1 %tobool.not36, label %while.body.lr.ph, label %while.end

while.body.lr.ph:                                 ; preds = %L2
  br i1 %tobool.not, label %while.body.lr.ph.split, label %while.cond.while.end_crit_edge

while.body.lr.ph.split:                           ; preds = %while.body.lr.ph
  ret i32 undef

while.cond.while.end_crit_edge:                   ; preds = %while.body.lr.ph
  br label %while.end

while.end:                                        ; preds = %while.cond.while.end_crit_edge, %L2
  %cmp = icmp slt i16 %0, -1
  br i1 %cmp, label %L2, label %if.end

if.end:                                           ; preds = %while.end
  %div.lcssa = phi i32 [ %div, %while.end ]
  %2 = trunc i32 %div.lcssa to i16
  %tobool13.not = icmp eq i16 %2, -1
  br i1 %tobool13.not, label %L1, label %if.end15

if.end15:                                         ; preds = %if.end
  ret i32 undef
}

Comment 6 Arthur Eubanks 2021-07-13 15:10:08 PDT

Manually reduced further to

define i32 @foo(i1 %arg) {
bb:
  br label %bb1

bb1:                                              ; preds = %bb
  br label %bb2

bb2:                                              ; preds = %bb5, %bb4, %bb1
  br i1 false, label %bb3, label %bb4

bb3:                                              ; preds = %bb2
  br i1 %arg, label %bb6, label %bb4

bb4:                                              ; preds = %bb3, %bb2
  br i1 false, label %bb2, label %bb5

bb5:                                              ; preds = %bb4
  br i1 false, label %bb2, label %bb6

bb6:                                              ; preds = %bb5, %bb3
  ret i32 1
}

========================>

define i32 @foo(i1 %arg) {
bb:
  br label %bb1

bb1:                                              ; preds = %bb
  br i1 %arg, label %bb1.split.us, label %bb1.split

bb1.split.us:                                     ; preds = %bb1
  br label %bb2.us

bb2.us:                                           ; preds = %bb2.backedge.us, %bb1.split.us
  br i1 false, label %bb3.us, label %bb4.us

bb3.us:                                           ; preds = %bb2.us
  br label %bb6.split.us

bb4.us:                                           ; preds = %bb2.us
  br i1 false, label %bb2.backedge.us, label %bb5.us

bb5.us:                                           ; preds = %bb4.us
  br i1 false, label %bb2.backedge.us, label %bb6.split.us.loopexit

bb2.backedge.us:                                  ; preds = %bb5.us, %bb4.us
  br label %bb2.us

bb6.split.us.loopexit:                            ; preds = %bb5.us
  br label %bb6.split.us

bb6.split.us:                                     ; preds = %bb6.split.us.loopexit, %bb3.us
  br label %bb6

bb1.split:                                        ; preds = %bb1
  br label %bb2

bb2:                                              ; preds = %bb2.backedge, %bb1.split
  br i1 false, label %bb3, label %bb4

bb3:                                              ; preds = %bb2
  br label %bb4

bb4:                                              ; preds = %bb3, %bb2
  br i1 false, label %bb2.backedge, label %bb5

bb2.backedge:                                     ; preds = %bb4, %bb5
  br label %bb2

bb5:                                              ; preds = %bb4
  br i1 false, label %bb2.backedge, label %bb6.split

bb6.split:                                        ; preds = %bb5
  br label %bb6

bb6:                                              ; preds = %bb6.split.us, %bb6.split
  ret i32 1
}

Comment 7 Arthur Eubanks 2021-07-13 15:14:28 PDT

Alive complains when %arg is undef, adding noundef makes alive happy.

Comment 8 Arthur Eubanks 2021-07-13 15:21:14 PDT

I'm having trouble understanding the reduced test case and Alive's issue with it. 

simplifycfg simplifies both the pre and post unswitching cases to just `ret i32 1`.

Comment 9 Nuno Lopes 2021-07-14 04:30:23 PDT

The issue is a known bug in loop unswitching.

The original code (the last reduced case) doesn't reach bb3, hence it never branches on %arg.
However, the optimized code hoists the branch on %arg such that bb1 is reachable.

Since branching on undef/poison is UB, the optimization introduces UB if %arg happens to be undef/poison.

The fix is easy: add freeze on any hoisted branches. We've been working on reducing potential perf regressions around freeze as last time Juneyoung tried to fix loop unswitch someone complained about a perf regression. We have a GSoC student focused just on freeze perf, so I guess we can move forward with the fix now.

Comment 10 Hyeongyu kim 2021-07-14 04:37:55 PDT

I think I can solve it without difficulty.
Is it okay if I upload a patch that solves this problem?

Comment 11 Sanjay Patel 2021-07-14 05:48:57 PDT

(In reply to Hyeongyu kim from comment #10)
> I think I can solve it without difficulty.
> Is it okay if I upload a patch that solves this problem?

That sounds great to me. I don't know enough details, so I'm not the best reviewer.

Some links I dug up for reference:
https://reviews.llvm.org/D29015 [SimpleLoopUnswitch] Fix introduction of UB when hoisted condition may be undef or poison
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200224/748416.html - commit and revert discussion
bug 31652
bug 27506

Comment 12 John Regehr 2021-07-14 09:57:54 PDT

hi I just want to throw in that I have a fork of llvm-reduce that is specifically intended to do a better job on inputs to Alive2. just build this branch:

https://github.com/regehr/llvm-project/tree/new-llvm-reduce-pass

I didn't yet have time to try it out on this example.

Comment 13 Hyeongyu kim 2021-07-16 04:24:56 PDT

While investigating regressions after a loop-unswitch patch, I found that scalar evolution can be improved in this case.

define void @f(i32 %n) {
entry:
  %n.fr = freeze i32 %n
  %nonzero = icmp slt i32 0, %n.fr
  br i1 %nonzero, label %loop, label %exit
exit:
  ret void
loop:
  %i = phi i32 [%i.next, %loop], [0, %entry]
  call void @g() willreturn
  %i.next = add i32 %i, 1
  %cond_next = icmp slt i32 %i, %n
  br i1 %cond_next, label %loop, label %exit2
exit2:
  ret void
}

declare void @g() willreturn


Currently, scalar evolution says backedge-taken count is (0 smax %n), but this can be %n.

https://alive2.llvm.org/ce/z/EamykF

If %n is poison, it will anyway raise UB because there is `br i1 %cond_next`.
If %n is undef, `icmp slt i32 %i, %n` is also undef, so it is UB.
I'm not sure whether this will still work if %i starts from a non-constant value though.

Comment 14 Hyeongyu kim 2021-07-16 04:28:30 PDT

To check that the backedge count is correct, I mimiced the alive2 test that is linked at https://reviews.llvm.org/D93882.

Comment 15 Juneyoung Lee 2021-07-16 22:24:59 PDT

(I found that the Alive2 test needed small correction: -src-unroll and -tgt-unroll had to be given, and minor fixes should have been done in the test. https://alive2.llvm.org/ce/z/7bHWAZ is the right one.
The previous test was actually written by myself, so this is my mistake.)


Not sure whether it can be generalized further (modeling freeze in SCEV is a hard problem), but at least for the simple case, the loop iteration count seems correct to me.
This can be explained without relying on complex reasoning about undef/poison values.

I'm going to show that the backedge-taken count result of the loop must be equivalent to the result of a loop without freeze.


To reach the exit block, it should at least go through two conditional branches:
(1) The branch one at entry: 'br (icmp slt i32 0, freeze(%n))'
(2) The branch in the first iteration of loop: 'br (icmp slt i32 0, %n)' (since %i = 0 in the first iter.)

For (1), it is known that 'icmp slt i32 0, freeze(%n)' = 'freeze (icmp slt i32 0, %n)' (https://alive2.llvm.org/ce/z/ecPDzU).
I'll slightly simplify (1)'s branch as 'br (freeze (icmp slt i32 0, %n))', which is the exactly frozen form of (2).

IIUC, the backedge-taken count makes sense only if there was no UB before reaching the exit block.
Therefore, both (1) and (2) must not have raised UB at the exit block point.
And, (2) being well-defined implies that (1) did not need freeze..!


Again, it isn't clear whether this can be generalized; for this specific case, it seems fine.

Comment 16 Arthur Eubanks 2021-08-02 10:32:19 PDT

Any update on this?

Comment 17 Juneyoung Lee 2021-08-02 21:24:06 PDT

Hi Aurthur,

Hyeongyu wrote https://reviews.llvm.org/D106041 and is running experiments to see performance regressions.

But would it be better if the patch is landed first? Perhaps leaving LLVM in an unsound state is not a good idea.

Comment 18 Arthur Eubanks 2021-08-02 23:18:29 PDT

Non-trivial unswitching was only enabled for -O3 for the new PM in https://reviews.llvm.org/D94559, many years after it had been in use in production. I think that only microbenchmarks should be largely affected, but those may be important. For example, making sure that something like https://bugs.llvm.org/show_bug.cgi?id=48715 doesn't regress is important.

Comment 19 Nuno Lopes 2021-08-30 11:21:25 PDT

*** Bug 51671 has been marked as a duplicate of this bug. ***