Created attachment 21080 [details] Reproduction code I'm running into a case where code seems to miscompile after SVN r320749, "[SimplifyCFG] don't sink common insts too soon (PR34603)" (part of LLVM 6.0, but the same issues can be reproduced with the latest trunk version as well). I can't (yet) point out exactly where the new generated code is wrong, but this commit changed the outcome of the attached code. To reproduce (somewhat), compile the attached sample with "clang -std=c99 -O3 -fvisibility=hidden -fomit-frame-pointer -ffast-math --target=aarch64-linux-gnu -c ref_mvs-preproc.c". If compiled with clang built from before SVN r320749, the compiled code does what it is supposed to, while if compiled with a later version, it produces incorrect results. I have tried looking at the output from compiling with `-mllvm -print-after-all` to look at differences between before and after this commit, and there obviously are differences, but nothing that I could spot that stands out as obviously incorrect. Surprisingly, the same code built for other architectures (both 32 and 64 bit x86, and armv7) with newer clang/llvm versions run just fine without any of the misbehaviour as I run into on AArch64. Can someone spot what this SimplifyCFG change does wrt to this code sample, if there's some overlooked case? Or are the transformations correct and it just happens to trigger buggy codepaths in the AArch64 target after the transformation?
To narrow down the problem, we can compile the source to optimized IR: $ clang -O3 ... -emit-llvm -S Then pass that IR to the backend: $ llc ref_mvs-preproc.ll -o - | clang -x assembler - And run the code: $ a.out Does that produce the wrong result? If yes, turn off optimization in the backend: $ llc -O0 ref_mvs-preproc.ll -o - | clang -x assembler - ; ./a.out Does that produce the wrong result? If yes, the bug is probably in the IR. If not, the bug is probably in the backend...and so r320749 is probably not the problem. (It's possible that there are multiple bugs interacting, but let's hope not...)
(In reply to Sanjay Patel from comment #1) > To narrow down the problem, we can compile the source to optimized IR: > $ clang -O3 ... -emit-llvm -S > > Then pass that IR to the backend: > $ llc ref_mvs-preproc.ll -o - | clang -x assembler - > > And run the code: > $ a.out > > Does that produce the wrong result? If yes, turn off optimization in the > backend: > $ llc -O0 ref_mvs-preproc.ll -o - | clang -x assembler - ; ./a.out > > Does that produce the wrong result? If yes, the bug is probably in the IR. > If not, the bug is probably in the backend...and so r320749 is probably not > the problem. (It's possible that there are multiple bugs interacting, but > let's hope not...) Thanks for these points! It does indeed look like this commit isn't to blame, as the issue disappears if building the IR with llc -O0. That would also explain why it only appears on AArch64. Unfortunately, that also gets me more or less back to square one about figuring out the reason for it, but I guess I'll have to strip down the optimization behaviour change further to zoom in on what might be causing it.
Created attachment 21086 [details] Reduced test case With this test case, the issue is much clearer.
After reducing the issue further, it is clear that the issue lies somewhere around AArch64 CCMP instructions. CC:ing a few people who probably know a bit or two around that. The issue seems to be that there is a condition that looks like this: (mode == 15 || mode == 20) && type > 0 && block_size_allowed After lowering this to CCMPs, the effective form of the condition turns out into this: (((block_size_allowed && mode == 20) || mode == 15) && type > 0) To reproduce, compile the attached reproduction case e.g. like this: $ clang -O2 -target aarch64-linux-gnu ref_mvs-preproc.c -o repro $ ./repro ret aaaa $ clang -O1 -target aarch64-linux-gnu ref_mvs-preproc.c -o repro $ ./repro ret bbbb The generated assembly shows the issue rather clearly: $ clang -O2 -target aarch64-linux-gnu ref_mvs-preproc.c -S -o - ... ldrb w9, [x9, x8] ldrb w8, [x10, x8] cmp w9, w8 csel w8, w9, w8, lo cmp w8, #7 // block_size_allowed ccmp w2, #20, #0, hi // block_size_allowed && mode == 20 ccmp w2, #15, #4, ne // (block_size_allowed && mode == 20) || mode == 15 ccmp w4, #0, #4, eq // ... && type > 0 csel x8, x0, x1, gt While the issue is clearly visible here, I do see in the comment for emitConditionalComparison in AArch64ISelLowering.cpp that it is prepared to handle mixed AND/OR conditions, but apparently something goes wrong here.
You cannot directly express `||` with CCMP instructions. But in principle AArch64ISelLowering.cpp should transform `(mode == 15 || mode == 20) && type > 0 && block_size_allowed` into `!(mode != 15 && mode != 20) && type > 0 && block_size_allowed` (see also the long explanation I wrote in AArch64ISelLowering (search for AArch64CCMP). It seems though something goes wrong with that computation...
I think for this to work properly we would need to emit the expression as: type > 0 && block_size_allowed && !(mode != 15 && mode != 20) (or with the type/block_size_allowed checks swapped, the important part if that the !() expression comes last in the conjunction. It seems the code fails to reorder the expressions as necessary.
I've got a fix for the issue now. Just need to cleanup the code some more and create a testcase before I can upload a review.
(In reply to Matthias Braun from comment #5) > You cannot directly express `||` with CCMP instructions. But in principle > AArch64ISelLowering.cpp should transform `(mode == 15 || mode == 20) && type > > 0 && block_size_allowed` into `!(mode != 15 && mode != 20) && type > 0 && > block_size_allowed` (see also the long explanation I wrote in > AArch64ISelLowering (search for AArch64CCMP). > > It seems though something goes wrong with that computation... Thanks for looking at this! Yes, I found these functions; but the code seemed like it already meant to take care of this and it wasn't really obvious where the assumptions went wrong.
https://reviews.llvm.org/D54137 should fix this.
(In reply to Matthias Braun from comment #9) > https://reviews.llvm.org/D54137 should fix this. I can confirm that the current version of the patch fixes this issue - thanks!
Tim, is this OK to merge to the release_70 branch? https://reviews.llvm.org/rL348444
FWIW, merging that fix also requires merging https://reviews.llvm.org/rL346203 as a preceding cleanup.
Yep, I think they're both good.
Merged: r348636 r348642