-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscompile with the PPC backend after r329047 #36577
Comments
Comparing output of llc with -debug -print-after-all, I see differences starting from some pass called "PowerPC CTR Loops". This pass is in file called ./lib/Target/PowerPC/PPCCTRLoops.cpp. I see it making manipulations with CFG (such as blocks insertion) and working with SCEV, but it never manages to forget any loop after that. So this pass is apparently misusing SCEV. |
Not sure why? The pass doesn't claim to preserve SCEV, so everything should get blown away afterward. Either the pass is corrupting the SCEV it will itself query internally, or some pass before this is corrupting SCEV, or your change simply exposes a fundamental bug in how this pass queries SCEV for exit limits.... I'd look at -debug-pass=Executions and other such tools to understand whether some other pass tried to preserve the SCEV that this pass is using or if it was fresh built. |
FWIW, I don't believe that the pass makes any CFG modifications other than, potentially, calling: if (!Preheader || mightUseCTR(Preheader)) and SCEV should be robust to having loop preheaders added to loops (i.e., that should not necessitate clearing SCEV's cache of data for the loop). The pass does do other things that can affect SCEV's ability to understand the loop (i.e., changes the instruction providing the conditional branch condition and removing (now-dead) PHIs. SCEV's value-handles, however, should take care of updating SCEV automatically. |
FWIW, we're seeing a growing number of users reporting miscompiles of zlib on PPC breaking things. We shouldn't leave the tree broken this way for any target, so Max, if this is going to take days to track down I think we need an interim solution. IMO, the best solution would be to go back to the old SCEV function, maybe with a flag to enable the more aggressive behavior while you and others debug this. I mentioning this because I feel bad not reverting to green here. If this were x86, I would have long ago push hard to revert to green given our reliance on x86. I suspect a similar stance would have been taken by Android, iOS, and other ARM users. I think the fact that we have fewer PPC users actively pushing to keep trunk green is creating a false sense of "this is OK". I don't think this is OK, and I think we need to get trunk back to green right away and then continue debugging this without people having miscompiled code in the interim. |
I am in favour of pulling the patch out until we can figure out the problem. The fact that PPC isn't as widely used as X86 should not really compute in the decision. There are clearly clients that are affected by this bug and we should pull the patch to unblock them. Of course, we could implement the solution suggested by Chandler if there are subsequent patches that depend on this one. If not, I don't see a compelling reason do do more work disabling this. |
I agree. We should revert to green. |
Allright. I will revert it temporally until we understand what's going on within this pass. |
Thanks. |
There is a modification of exit conditions, though: Value *OldCond = CountedExitBranch->getCondition(); When we do so, we still have a cached SCEV for the old condition. This can be a source of miscompiles. I don't have a comprehensive understanding of what's going on, though, so I will revert the patch and then investigate. |
Reverted the patch as https://reviews.llvm.org/rL330893 |
Well this is bizarre. I outlined what the problem was yesterday and clicked on "Save Changes" yet the comment is not there now. |
Anyway, I'll type out the issue again... First off, a quick explanation of what this pass does. What the issue is here is that we have roughly the following loop nest: Namely, the exit block for the outer loop is contained in the inner loop (outlined as a loop header here, but it can really be any exiting block in the inner loop). SCEV used to be unable to compute the trip count for this situation for a somewhat unrelated reason. With the mentioned patch, SCEV is now capable of correctly computing the trip count. As far as I can see, there are two possible solutions here:
At first glance, option 1. is obviously superior since we will now get more opportunities. However, the fact that we've only hit this issue in one super specific situation suggests that adding that transformation isn't worth the effort. Besides, it would delay re-landing of r329047 which is perfectly valid as far as I can tell. So fix according to option 2. to follow in the next comment. |
Patch to fix the underlying problem (tested with the test case Chandler provided): Index: lib/Target/PowerPC/PPCCTRLoops.cpp--- lib/Target/PowerPC/PPCCTRLoops.cpp (revision 329047)
|
Excellent analysis. I definitely agree about just disabling the transform. In particular, messing with the CFG seems likely to cause more problems than the CTR loop is worth.
There is a much better code pattern for doing this: get the loop for the exiting block from LoopInfo. This will be the inner-most loop. If it is not == to the loop being transformed, the exiting block is in a nested loop. No need for checking each one, etc. Send this, combined with a minimal test case for review in phab? Happy to look at it. |
+1 -- Thanks, Nemanja! |
Ah, cool. I guess I should have looked into the available API's :) |
Fix in https://reviews.llvm.org/D46162. |
Great analysis Nemanja! Do I understand correctly that once this fix is merged, we can safely re-enable the reverted patch about taken count calculation? |
I think it is safe to assume that once this patch lands, you can safely re-commit your SCEV patch since the case Chandler provided is fixed with this patch and your original commit. |
Thanks Nemanja! I have returned the reverted patch as https://reviews.llvm.org/rL331427. Closing this one as fixed. Please let me know if anyone sees this problem again. |
Extended Description
We've found that after r329047 the PPC backend appears to miscompile zlib v1.2.11 code. Specifically, the gen_bitlen function here:
https://github.com/madler/zlib/blob/master/trees.c#L486
The only functional reproducer we have sadly involves building zlib with a clang including this revision, linking that zlib against clang itself built targeting PPC, and then running one of the OpenMP Clang tests that uses PCH files that are zlib compressed. Literally no other test we have duplicates this failure. =/
However, turning -O1 off for this file causes the bug to go away, and reverting this commit produces a single interesting diff in the assembly for this file and it is within the identified function, so I'm fairly certain the issue is there.
I don't really have a normal development environment for PPC so its hard for me to really dig furhter into this, but I will attach the single-function IR file (which I've cleaned up manually, but left as much intact as possible, notably debug info so it should be possible to trace what code comes from what part of the C code in that file), and the good and bad version of the assembly.
The bad version of the assembly was generated by 'llc' running over this IR file with no extra flags. The good version had this commit reverted in the 'llc' and then run over this IR file with no extra flags.
The diff in the assembly exactly matches the diff we observe in the real build.
The text was updated successfully, but these errors were encountered: