Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompile with the PPC backend after r329047 #36577

Closed
chandlerc opened this issue Apr 25, 2018 · 20 comments
Closed

Miscompile with the PPC backend after r329047 #36577

chandlerc opened this issue Apr 25, 2018 · 20 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@chandlerc
Copy link
Member

Bugzilla Link 37229
Resolution FIXED
Resolved on May 02, 2018 19:39
Version unspecified
OS Linux
Attachments IR input, Bad assembly, Good assembly
CC @dwblaikie,@hfinkel,@mikaelholmen,@nemanjai

Extended Description

We've found that after r329047 the PPC backend appears to miscompile zlib v1.2.11 code. Specifically, the gen_bitlen function here:
https://github.com/madler/zlib/blob/master/trees.c#L486

The only functional reproducer we have sadly involves building zlib with a clang including this revision, linking that zlib against clang itself built targeting PPC, and then running one of the OpenMP Clang tests that uses PCH files that are zlib compressed. Literally no other test we have duplicates this failure. =/

However, turning -O1 off for this file causes the bug to go away, and reverting this commit produces a single interesting diff in the assembly for this file and it is within the identified function, so I'm fairly certain the issue is there.

I don't really have a normal development environment for PPC so its hard for me to really dig furhter into this, but I will attach the single-function IR file (which I've cleaned up manually, but left as much intact as possible, notably debug info so it should be possible to trace what code comes from what part of the C code in that file), and the good and bad version of the assembly.

The bad version of the assembly was generated by 'llc' running over this IR file with no extra flags. The good version had this commit reverted in the 'llc' and then run over this IR file with no extra flags.

The diff in the assembly exactly matches the diff we observe in the real build.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 25, 2018

Comparing output of llc with -debug -print-after-all, I see differences starting from some pass called "PowerPC CTR Loops". This pass is in file called ./lib/Target/PowerPC/PPCCTRLoops.cpp. I see it making manipulations with CFG (such as blocks insertion) and working with SCEV, but it never manages to forget any loop after that. So this pass is apparently misusing SCEV.

@chandlerc
Copy link
Member Author

Comparing output of llc with -debug -print-after-all, I see differences
starting from some pass called "PowerPC CTR Loops". This pass is in file
called ./lib/Target/PowerPC/PPCCTRLoops.cpp. I see it making manipulations
with CFG (such as blocks insertion) and working with SCEV, but it never
manages to forget any loop after that. So this pass is apparently misusing
SCEV.

Not sure why?

The pass doesn't claim to preserve SCEV, so everything should get blown away afterward.

Either the pass is corrupting the SCEV it will itself query internally, or some pass before this is corrupting SCEV, or your change simply exposes a fundamental bug in how this pass queries SCEV for exit limits....

I'd look at -debug-pass=Executions and other such tools to understand whether some other pass tried to preserve the SCEV that this pass is using or if it was fresh built.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 25, 2018

Comparing output of llc with -debug -print-after-all, I see differences
starting from some pass called "PowerPC CTR Loops". This pass is in file
called ./lib/Target/PowerPC/PPCCTRLoops.cpp. I see it making manipulations
with CFG (such as blocks insertion) and working with SCEV, but it never
manages to forget any loop after that. So this pass is apparently misusing
SCEV.

FWIW, I don't believe that the pass makes any CFG modifications other than, potentially, calling:

if (!Preheader || mightUseCTR(Preheader))
Preheader = InsertPreheaderForLoop(L, DT, LI, PreserveLCSSA);
if (!Preheader)
return MadeChange;

and SCEV should be robust to having loop preheaders added to loops (i.e., that should not necessitate clearing SCEV's cache of data for the loop).

The pass does do other things that can affect SCEV's ability to understand the loop (i.e., changes the instruction providing the conditional branch condition and removing (now-dead) PHIs. SCEV's value-handles, however, should take care of updating SCEV automatically.

@chandlerc
Copy link
Member Author

FWIW, we're seeing a growing number of users reporting miscompiles of zlib on PPC breaking things.

We shouldn't leave the tree broken this way for any target, so Max, if this is going to take days to track down I think we need an interim solution. IMO, the best solution would be to go back to the old SCEV function, maybe with a flag to enable the more aggressive behavior while you and others debug this.

I mentioning this because I feel bad not reverting to green here. If this were x86, I would have long ago push hard to revert to green given our reliance on x86. I suspect a similar stance would have been taken by Android, iOS, and other ARM users. I think the fact that we have fewer PPC users actively pushing to keep trunk green is creating a false sense of "this is OK". I don't think this is OK, and I think we need to get trunk back to green right away and then continue debugging this without people having miscompiled code in the interim.

@nemanjai
Copy link
Member

I am in favour of pulling the patch out until we can figure out the problem. The fact that PPC isn't as widely used as X86 should not really compute in the decision. There are clearly clients that are affected by this bug and we should pull the patch to unblock them.

Of course, we could implement the solution suggested by Chandler if there are subsequent patches that depend on this one. If not, I don't see a compelling reason do do more work disabling this.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 25, 2018

I am in favour of pulling the patch out until we can figure out the problem.
The fact that PPC isn't as widely used as X86 should not really compute in
the decision. There are clearly clients that are affected by this bug and we
should pull the patch to unblock them.

Of course, we could implement the solution suggested by Chandler if there
are subsequent patches that depend on this one. If not, I don't see a
compelling reason do do more work disabling this.

I agree. We should revert to green.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2018

Allright. I will revert it temporally until we understand what's going on within this pass.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 26, 2018

Allright. I will revert it temporally until we understand what's going on
within this pass.

Thanks.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2018

The pass does do other things that can affect SCEV's ability to understand
the loop (i.e., changes the instruction providing the conditional branch
condition and removing (now-dead) PHIs. SCEV's value-handles, however,
should take care of updating SCEV automatically.

There is a modification of exit conditions, though:

Value *OldCond = CountedExitBranch->getCondition();
CountedExitBranch->setCondition(NewCond);

When we do so, we still have a cached SCEV for the old condition. This can be a source of miscompiles.

I don't have a comprehensive understanding of what's going on, though, so I will revert the patch and then investigate.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 26, 2018

Reverted the patch as https://reviews.llvm.org/rL330893
I will investigate the reasons of the bug.

@nemanjai
Copy link
Member

Well this is bizarre. I outlined what the problem was yesterday and clicked on "Save Changes" yet the comment is not there now.

@nemanjai
Copy link
Member

Anyway, I'll type out the issue again...

First off, a quick explanation of what this pass does.
PPC has count register (CTR) based loops. They're simply loops where you populate the CTR in the preheader and the CTR-decrementing branch is used in the exit block. This pass finds loops where SCEV can compute an expression for the trip count, the result of that expression is incremented by 1 and moved into the CTR through the corresponding intrinsic. Then a suitable exit block is found for the decrementing branch and that branch is in turn added through its intrinsic.

What the issue is here is that we have roughly the following loop nest:
----Out_H
| |
| --In_H
| | |
| --In_L Outside
| |
----Out_L

Namely, the exit block for the outer loop is contained in the inner loop (outlined as a loop header here, but it can really be any exiting block in the inner loop). SCEV used to be unable to compute the trip count for this situation for a somewhat unrelated reason. With the mentioned patch, SCEV is now capable of correctly computing the trip count.
Furthermore, we will always favour nested loops in this transformation and in this case, the inner loop doesn't have a computable trip count (the exit condition depends on a loop-variant load). So we try to transform the outer loop and since SCEV gives us a trip count expression now, we happily transform it.
Of course, the only exiting block for the outer loop is where the decrementing branch must be added and it happens to be contained in the inner loop.
This is clearly not valid since now the inner loop is decrementing the CTR which is supposed to contain the trip count of the outer loop.

As far as I can see, there are two possible solutions here:

  1. Give the inner loop a preheader that will be the outer loop's exiting block
  2. Just give up on the transformation in this situation

At first glance, option 1. is obviously superior since we will now get more opportunities. However, the fact that we've only hit this issue in one super specific situation suggests that adding that transformation isn't worth the effort. Besides, it would delay re-landing of r329047 which is perfectly valid as far as I can tell. So fix according to option 2. to follow in the next comment.

@nemanjai
Copy link
Member

Patch to fix the underlying problem (tested with the test case Chandler provided):

Index: lib/Target/PowerPC/PPCCTRLoops.cpp

--- lib/Target/PowerPC/PPCCTRLoops.cpp (revision 329047)
+++ lib/Target/PowerPC/PPCCTRLoops.cpp (working copy)
@@ -572,6 +572,18 @@ bool PPCCTRLoops::convertToCTRLoop(Loop *L) {
if (SE->getTypeSizeInBits(EC->getType()) > (TM->isPPC64() ? 64 : 32))
continue;

  • // If this exiting block is contained in a nested loop, it is not eligible
  • // for insertion of the branch-and-decrement since the inner loop would
  • // end up messing up the value in the CTR.
  • bool ExitingBlockInInnerLoop = false;
  • for (auto Inner : L->getSubLoops())
  •  if (Inner->contains(*I)) {
    
  •    ExitingBlockInInnerLoop = true;
    
  •    break;
    
  •  }
    
  • if (ExitingBlockInInnerLoop)
  •  continue;
    
  • // We now have a loop-invariant count of loop iterations (which is not the
    // constant zero) for which we know that this loop will not exit via this
    // exisiting block.

@chandlerc
Copy link
Member Author

Excellent analysis. I definitely agree about just disabling the transform. In particular, messing with the CFG seems likely to cause more problems than the CTR loop is worth.

Patch to fix the underlying problem (tested with the test case Chandler
provided):

Index: lib/Target/PowerPC/PPCCTRLoops.cpp

--- lib/Target/PowerPC/PPCCTRLoops.cpp (revision 329047)
+++ lib/Target/PowerPC/PPCCTRLoops.cpp (working copy)
@@ -572,6 +572,18 @@ bool PPCCTRLoops::convertToCTRLoop(Loop *L) {
if (SE->getTypeSizeInBits(EC->getType()) > (TM->isPPC64() ? 64 : 32))
continue;

  • // If this exiting block is contained in a nested loop, it is not
    eligible
  • // for insertion of the branch-and-decrement since the inner loop would
  • // end up messing up the value in the CTR.
  • bool ExitingBlockInInnerLoop = false;
  • for (auto Inner : L->getSubLoops())
  •  if (Inner->contains(*I)) {
    
  •    ExitingBlockInInnerLoop = true;
    
  •    break;
    
  •  }
    
  • if (ExitingBlockInInnerLoop)
  •  continue;
    
  • // We now have a loop-invariant count of loop iterations (which is not
    the
    // constant zero) for which we know that this loop will not exit via
    this
    // exisiting block.

There is a much better code pattern for doing this: get the loop for the exiting block from LoopInfo. This will be the inner-most loop. If it is not == to the loop being transformed, the exiting block is in a nested loop. No need for checking each one, etc.

Send this, combined with a minimal test case for review in phab? Happy to look at it.

@hfinkel
Copy link
Collaborator

hfinkel commented Apr 26, 2018

Excellent analysis. I definitely agree about just disabling the transform.
In particular, messing with the CFG seems likely to cause more problems than
the CTR loop is worth.

Patch to fix the underlying problem (tested with the test case Chandler
provided):

Index: lib/Target/PowerPC/PPCCTRLoops.cpp

--- lib/Target/PowerPC/PPCCTRLoops.cpp (revision 329047)
+++ lib/Target/PowerPC/PPCCTRLoops.cpp (working copy)
@@ -572,6 +572,18 @@ bool PPCCTRLoops::convertToCTRLoop(Loop *L) {
if (SE->getTypeSizeInBits(EC->getType()) > (TM->isPPC64() ? 64 : 32))
continue;

  • // If this exiting block is contained in a nested loop, it is not
    eligible
  • // for insertion of the branch-and-decrement since the inner loop would
  • // end up messing up the value in the CTR.
  • bool ExitingBlockInInnerLoop = false;
  • for (auto Inner : L->getSubLoops())
  •  if (Inner->contains(*I)) {
    
  •    ExitingBlockInInnerLoop = true;
    
  •    break;
    
  •  }
    
  • if (ExitingBlockInInnerLoop)
  •  continue;
    
  • // We now have a loop-invariant count of loop iterations (which is not
    the
    // constant zero) for which we know that this loop will not exit via
    this
    // exisiting block.

There is a much better code pattern for doing this: get the loop for the
exiting block from LoopInfo. This will be the inner-most loop. If it is not
== to the loop being transformed, the exiting block is in a nested loop. No
need for checking each one, etc.

Send this, combined with a minimal test case for review in phab? Happy to
look at it.

+1 -- Thanks, Nemanja!

@nemanjai
Copy link
Member

There is a much better code pattern for doing this: get the loop for the
exiting block from LoopInfo. This will be the inner-most loop. If it is not
== to the loop being transformed, the exiting block is in a nested loop. No
need for checking each one, etc.

Send this, combined with a minimal test case for review in phab? Happy to
look at it.

Ah, cool. I guess I should have looked into the available API's :)
Thanks Chandler.
I'm working on reducing the test case and will post it when it's ready.

@nemanjai
Copy link
Member

Fix in https://reviews.llvm.org/D46162.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 27, 2018

Great analysis Nemanja! Do I understand correctly that once this fix is merged, we can safely re-enable the reverted patch about taken count calculation?

@nemanjai
Copy link
Member

nemanjai commented May 2, 2018

Great analysis Nemanja! Do I understand correctly that once this fix is
merged, we can safely re-enable the reverted patch about taken count
calculation?

I think it is safe to assume that once this patch lands, you can safely re-commit your SCEV patch since the case Chandler provided is fixed with this patch and your original commit.

@llvmbot
Copy link
Collaborator

llvmbot commented May 3, 2018

Thanks Nemanja! I have returned the reverted patch as https://reviews.llvm.org/rL331427. Closing this one as fixed. Please let me know if anyone sees this problem again.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants