-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleLoopUnswitch hoists branches on poison/undef (Wrong code at -O3 on x86_64-linux-gnu) #50387
Comments
Compiler Explorer: https://godbolt.org/z/oao731Ksc |
I think the bug is in or uncovered by the SimpleLoopUnswitchPass. At -O3, the pass differs because it takes a parameter to allow "non-trivial" unswitching. This can be controlled with a debug flag, so we can avoid the bug by changing the clang invocation with: |
I don't know anything about the SimpleLoopUnswitch pass, so cc'ing some people based on where this probably became visible: |
I managed to show the bug in a reduction for Alive2 by pasting the before/after IR with SimpleLoopUnswitch: |
Managed to compile alive locally and use llvm-reduce: $ cat /tmp/reduce.sh $ ./build/rel/bin/llvm-reduce --test=/tmp/reduce.sh /tmp/a.ll L1: ; preds = %if.end, %entry L2: ; preds = %while.end, %L1 while.body.lr.ph: ; preds = %L2 while.body.lr.ph.split: ; preds = %while.body.lr.ph while.cond.while.end_crit_edge: ; preds = %while.body.lr.ph while.end: ; preds = %while.cond.while.end_crit_edge, %L2 if.end: ; preds = %while.end if.end15: ; preds = %if.end |
Manually reduced further to define i32 @foo(i1 %arg) { bb1: ; preds = %bb bb2: ; preds = %bb5, %bb4, %bb1 bb3: ; preds = %bb2 bb4: ; preds = %bb3, %bb2 bb5: ; preds = %bb4 bb6: ; preds = %bb5, %bb3 ========================> define i32 @foo(i1 %arg) { bb1: ; preds = %bb bb1.split.us: ; preds = %bb1 bb2.us: ; preds = %bb2.backedge.us, %bb1.split.us bb3.us: ; preds = %bb2.us bb4.us: ; preds = %bb2.us bb5.us: ; preds = %bb4.us bb2.backedge.us: ; preds = %bb5.us, %bb4.us bb6.split.us.loopexit: ; preds = %bb5.us bb6.split.us: ; preds = %bb6.split.us.loopexit, %bb3.us bb1.split: ; preds = %bb1 bb2: ; preds = %bb2.backedge, %bb1.split bb3: ; preds = %bb2 bb4: ; preds = %bb3, %bb2 bb2.backedge: ; preds = %bb4, %bb5 bb5: ; preds = %bb4 bb6.split: ; preds = %bb5 bb6: ; preds = %bb6.split.us, %bb6.split |
Alive complains when %arg is undef, adding noundef makes alive happy. |
I'm having trouble understanding the reduced test case and Alive's issue with it. simplifycfg simplifies both the pre and post unswitching cases to just |
The issue is a known bug in loop unswitching. The original code (the last reduced case) doesn't reach bb3, hence it never branches on %arg. Since branching on undef/poison is UB, the optimization introduces UB if %arg happens to be undef/poison. The fix is easy: add freeze on any hoisted branches. We've been working on reducing potential perf regressions around freeze as last time Juneyoung tried to fix loop unswitch someone complained about a perf regression. We have a GSoC student focused just on freeze perf, so I guess we can move forward with the fix now. |
I think I can solve it without difficulty. |
That sounds great to me. I don't know enough details, so I'm not the best reviewer. Some links I dug up for reference: |
hi I just want to throw in that I have a fork of llvm-reduce that is specifically intended to do a better job on inputs to Alive2. just build this branch: https://github.com/regehr/llvm-project/tree/new-llvm-reduce-pass I didn't yet have time to try it out on this example. |
While investigating regressions after a loop-unswitch patch, I found that scalar evolution can be improved in this case. define void @f(i32 %n) { declare void @g() willreturn Currently, scalar evolution says backedge-taken count is (0 smax %n), but this can be %n. https://alive2.llvm.org/ce/z/EamykF If %n is poison, it will anyway raise UB because there is |
To check that the backedge count is correct, I mimiced the alive2 test that is linked at https://reviews.llvm.org/D93882. |
(I found that the Alive2 test needed small correction: -src-unroll and -tgt-unroll had to be given, and minor fixes should have been done in the test. https://alive2.llvm.org/ce/z/7bHWAZ is the right one. Not sure whether it can be generalized further (modeling freeze in SCEV is a hard problem), but at least for the simple case, the loop iteration count seems correct to me. I'm going to show that the backedge-taken count result of the loop must be equivalent to the result of a loop without freeze. To reach the exit block, it should at least go through two conditional branches: For (1), it is known that 'icmp slt i32 0, freeze(%n)' = 'freeze (icmp slt i32 0, %n)' (https://alive2.llvm.org/ce/z/ecPDzU). IIUC, the backedge-taken count makes sense only if there was no UB before reaching the exit block. Again, it isn't clear whether this can be generalized; for this specific case, it seems fine. |
Any update on this? |
Hi Aurthur, Hyeongyu wrote https://reviews.llvm.org/D106041 and is running experiments to see performance regressions. But would it be better if the patch is landed first? Perhaps leaving LLVM in an unsound state is not a good idea. |
Non-trivial unswitching was only enabled for -O3 for the new PM in https://reviews.llvm.org/D94559, many years after it had been in use in production. I think that only microbenchmarks should be largely affected, but those may be important. For example, making sure that something like #48059 doesn't regress is important. |
*** Bug llvm/llvm-bugzilla-archive#51671 has been marked as a duplicate of this bug. *** |
mentioned in issue llvm/llvm-bugzilla-archive#51671 |
1 similar comment
mentioned in issue llvm/llvm-bugzilla-archive#51671 |
the original clang reproducer doesn't cause any issues now, but the underlying issue is still there. @aqjune do you know if there are any remaining blockers for enabling |
No, I am not aware of any blockers..! |
let me collect some numbers to sanity check |
Finally got around to that. There were no performance regressions on the AArch64 system I tried with O3 in SPEC2006,SPEC2017 and an internal benchmark suite. Patch to flip the default: https://reviews.llvm.org/D124252 |
Thanks, this fixed 3 tests:
3 more to go, though! |
@nunoplopes what are the remaining 3? |
See here: https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=1fb52562ffabc05d |
Sounds good, I'll look in to trivial unswitching case tomorrow. Is there an issue for those? |
Thank you! |
I put up https://reviews.llvm.org/D124554, https://reviews.llvm.org/D124549, https://reviews.llvm.org/D124526 to fix some of the remaining issues. After that, there's one failure left I think. Lots of tests seem to time out though. |
All patches have now landed, I'm not seeing any verification failures for SimpleLoopUnswitch locally any longer. |
Confirmed, all SLU tests pass now. Thank you @fhahn! |
Extended Description
[706] % clangtk -v
clang version 13.0.0 (https://github.com/llvm/llvm-project.git 4a3b055)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /local/suz-local/opfuzz/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@MX32
Selected multilib: .;@m64
[707] %
[707] % clangtk -O2 small.c; ./a.out
[708] %
[708] % clangtk -O3 small.c
[709] % ./a.out
Aborted
[710] %
[710] % cat small.c
int a = 1, b = 1;
int main() {
short d, g, i;
int e = 34000, h;
d = b;
g = 100 | b;
L1:
i = g;
L2:
g = ~(d / e);
e = ~((2 / g) & d);
h = a;
while (!e) {
a = b;
e = ~(1L << i);
}
if (g > 0)
goto L2;
if (!g)
goto L1;
if (h < e)
__builtin_abort ();
return 0;
}
The text was updated successfully, but these errors were encountered: