Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JumpThreading miscompilation in presence of @​llvm.assume due to DomTree not being updated *between* iterations #35481

Closed
aelovikov-intel opened this issue Jan 29, 2018 · 19 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla

Comments

@aelovikov-intel
Copy link
Contributor

Bugzilla Link 36133
Resolution FIXED
Resolved on Feb 19, 2018 11:49
Version trunk
OS Windows NT
CC @bmrzycki,@efriedma-quic,@mikaelholmen
@aelovikov-intel
Copy link
Contributor Author

assigned to @bmrzycki

@aelovikov-intel
Copy link
Contributor Author

With the following LLVM IR:

@​global = external local_unnamed_addr global i8*, align 8

define i32 @​foo(i32 %arg) local_unnamed_addr #​0 {
bb:
%tmp = load i8*, i8** @​global, align 8
%tmp1 = icmp eq i8* %tmp, null
br i1 %tmp1, label %bb3, label %bb2

bb2:
br label %bb3

bb3:
%tmp4 = phi i8 [ 1, %bb2 ], [ 0, %bb ]
%tmp5 = icmp eq i8 %tmp4, 0
br i1 %tmp5, label %bb7, label %bb6

bb6:
br label %bb7

bb7:
%tmp8 = icmp eq i32 %arg, -1
br i1 %tmp8, label %bb9, label %bb10 ; Does not depend on the content of @​global, only %arg

bb9: ; Single pred - %bb7
ret i32 0

bb10: ; Single pred - %bb7
%tmp11 = icmp sgt i32 %arg, -1
call void @​llvm.assume(i1 %tmp11)
ret i32 1
}

; Function Attrs: nounwind
declare void @​llvm.assume(i1) #​1

attributes #​0 = { uwtable }
attributes #​1 = { nounwind }

Running opt -jump-threading -S < a.ll I got
@​global = external local_unnamed_addr global i8*, align 8

; Function Attrs: uwtable
define i32 @​foo(i32 %arg) local_unnamed_addr #​0 {
bb:
%tmp = load i8*, i8** @​global, align 8
%tmp1 = icmp eq i8* %tmp, null
br i1 %tmp1, label %bb10, label %bb7 ; This now bypasses %arg check directly going to one of the possible rets

bb7:
%tmp8 = icmp eq i32 %arg, -1
br i1 %tmp8, label %bb9, label %bb10

bb9:
ret i32 0

bb10:
%tmp11 = icmp sgt i32 %arg, -1
call void @​llvm.assume(i1 %tmp11)
ret i32 1
}

; Function Attrs: nounwind
declare void @​llvm.assume(i1) #​1

attributes #​0 = { uwtable }
attributes #​1 = { nounwind }

I've tracked it down to JumpThreading and LVI interaction when LVI checks if the call to @​llvm.assume dominates the condition that JT asks for. However, due to stale DomTree info the answer turns out to be wrong.

Changing JumpThreading like this:
--- a/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/lib/Transforms/Scalar/JumpThreading.cpp
@@ -375,8 +375,10 @@ bool JumpThreadingPass::runImpl(Function &F, TargetLibraryInfo *TLI_,
for (Function::iterator I = F.begin(), E = F.end(); I != E;) {
BasicBlock *BB = &*I;
// Thread all of the branches we can over this block.

  •  while (ProcessBlock(BB))
    
  •  while (ProcessBlock(BB)) {
    
  •    DDT->flush();
       Changed = true;
    
  •  }
    
     ++I;
    

Fixes the miscompilation but is going to affect compile time a lot (as noted in https://reviews.llvm.org/D34135#979097)

@bmrzycki
Copy link
Contributor

Hello Andrei, thank you for reducing this problem down.

I agree with your assessment that the proposed patch here will negatively impact performance. It would be best to trace down ProcessBlock() to determine exactly which LVI check is being incorrectly calculated. I will start GDB tracing this to determine if I can find it

@aelovikov-intel
Copy link
Contributor Author

It would be best to trace down ProcessBlock() to determine exactly which LVI check is being incorrectly calculated. I will start GDB tracing this to determine if I can find it

I'm not sure I fully understand what you mean, but maybe this will help:

// In ValueTracking.cpp:
bool llvm::isValidAssumeForContext(const Instruction *Inv,
const Instruction *CxtI,
const DominatorTree *DT) {
// There are two restrictions on the use of an assume:
// 1. The assume must dominate the context (or the control flow must
// reach the assume whenever it reaches the context).
// 2. The context must not be in the assume's set of ephemeral values
// (otherwise we will use the assume to prove that the condition
// feeding the assume is trivially true, thus causing the removal of
// the assume).

if (DT) {
if (DT->dominates(Inv, CxtI)) // IIRC it failed here.
return true;
} else if (Inv->getParent() == CxtI->getParent()->getSinglePredecessor()) {
// We don't have a DT, but this trivially dominates.
return true;
}

which was called from
// If we can determine a constraint on the value given conditions assumed by
// the program, intersect those constraints with BBLV
void LazyValueInfoImpl::intersectAssumeOrGuardBlockValueConstantRange(
Value *Val, ValueLatticeElement &BBLV, Instruction *BBI) {
BBI = BBI ? BBI : dyn_cast(Val);
if (!BBI)
return;

for (auto &AssumeVH : AC->assumptionsFor(Val)) {
if (!AssumeVH)
continue;
auto *I = cast(AssumeVH);
if (!isValidAssumeForContext(I, BBI, DT))
continue;

BBLV = intersect(BBLV, getValueFromCondition(Val, I->getArgOperand(0)));

}

@bmrzycki
Copy link
Contributor

Hi Andrei, I appreciate the function listings to examine, they will help.

I don't think this problem is caused by the DDT patch. On my Ubuntu 16.04 host In my testing this problem still occurs in a release version of LLVM 5.0 installed as a debian package. Here are the release compiler versions used for testing:
$ dpkg -l | grep llvm-
ii llvm-3.8 1:3.8-2ubuntu4 amd64 Modular compiler and toolchain technologies
ii llvm-3.8-dev 1:3.8-2ubuntu4 amd64 Modular compiler and toolchain technologies, libraries and headers
ii llvm-3.8-runtime 1:3.8-2ubuntu4 amd64 Modular compiler and toolchain technologies, IR interpreter
ii llvm-5.0 1:5.0-316.04.1 amd64 Modular compiler and toolchain technologies
ii llvm-5.0-dev 1:5.0-3
16.04.1 amd64 Modular compiler and toolchain technologies, libraries and headers
ii llvm-5.0-runtime 1:5.0-3~16.04.1 amd64 Modular compiler and toolchain technologies, IR interpreter

Here is the llvm-project tip SHA used for tip:
commit 3c7922e (HEAD -> master, origin/master, origin/HEAD)
Author: Alexey Bataev a.bataev@hotmail.com
Date: Mon Jan 29 15:56:52 2018 +0000

[SLP] Add a test with extract for llvm/llvm-project#31434 , NFC.

I am attaching all the files I used to test for you to verify their contents. I had to create two different input.ll files due to changes to the IR format between 3.8 and 5.0. When I tried to compile the 5.0 IR with 3.8 I received errors about source_filename and the local_unnamed_addr attribute.

Here's how I generated the JumpThreaded output IR:
$ opt-3.8 -S -jump-threading <input38.ll >jt38.ll
$ opt-5.0 -S -jump-threading <input.ll >jt50.ll
$ /work/brzycki/upstream/build/bin/opt -S -jump-threading <input.ll >jttip.ll

The diff of tip and 5.0 is identical:
$ diff -u jttip.ll jt50.ll
$

The diff of 5.0 and 3.8 is not:
$ diff -u jt50.ll jt38.ll
--- jt50.ll 2018-01-29 11:09:44.296202144 -0600
+++ jt38.ll 2018-01-29 11:09:34.512215182 -0600
@@ -1,23 +1,22 @@
; ModuleID = ''
-source_filename = ""

-@global = external local_unnamed_addr global i8*, align 8
+@global = external global i8*, align 8

; Function Attrs: uwtable
-define i32 @​foo(i32 %arg) local_unnamed_addr #​0 {
+define i32 @​foo(i32 %arg) #​0 {
bb:
%tmp = load i8*, i8** @​global, align 8
%tmp1 = icmp eq i8* %tmp, null

  • br i1 %tmp1, label %bb10, label %bb7
  • br i1 %tmp1, label %bb7, label %bb7

-bb7: ; preds = %bb
+bb7: ; preds = %bb, %bb
%tmp8 = icmp eq i32 %arg, -1
br i1 %tmp8, label %bb9, label %bb10

bb9: ; preds = %bb7
ret i32 0

-bb10: ; preds = %bb, %bb7
+bb10: ; preds = %bb7
%tmp11 = icmp sgt i32 %arg, -1
call void @​llvm.assume(i1 %tmp11)
ret i32 1

@bmrzycki
Copy link
Contributor

@bmrzycki
Copy link
Contributor

@bmrzycki
Copy link
Contributor

LLVM tip (SHA 3c7922ee800de037147f3a2f801c410815043f07) JumpThreading
Compiled with $ opt -S -jump-threading <input.ll >jttip.ll

@bmrzycki
Copy link
Contributor

LLVM 5.0 JumpThreading
Compiled with $ opt -S -jump-threading <input.ll >jt50.ll

@bmrzycki
Copy link
Contributor

LLVM 3.8 JumpThreading
Compiled with $ opt -S -jump-threading <input38.ll >jt38.ll

@aelovikov-intel
Copy link
Contributor Author

I don't think this problem is caused by the DDT patch.

I never wanted to say that, sorry if that was not clear.

The reason I included you (and mentioned D34135) is because DDT supposes that DomTree information is only updated after the pass which is not the case neither for unreachable BBs created by JT nor for this defect. My impression was the direction of the discussion related to dead BBs is to workaround it somehow without proper update of DomTree between iterations. I though that another test case that has the same root cause but does not involve unreachable BBs might be useful to consider.

@bmrzycki
Copy link
Contributor

I never wanted to say that, sorry if that was not clear.
No apologies necessary. :) I just wanted to document my finding that this is looks to be a testing escape that is fairly old.

I though that another test case that has the same root cause but does not involve unreachable BBs might be useful to consider.
I absolutely agree. Any miscompile needs to be reported to fix correctness issues.

I am still looking into this bug to see if I can add any more insights.

@bmrzycki
Copy link
Contributor

Through bisection I believe I have found the commit causing the problem:

commit 5f9df53
Author: Jun Bum Lim junbuml@codeaurora.org
Date: Wed Mar 8 15:22:30 2017 +0000

[JumpThread] Use AA in SimplifyPartiallyRedundantLoad()

Summary: Use AA when scanning to find an available load value.

Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin

Reviewed By: rengolin, dberlin

Subscribers: aemerson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30352

And here is the immediately preceding commit:

commit a2f4a537f655fd9f543dd462049f47e6c1dfba99
Author: Daniel Marjamaki daniel.marjamaki@evidente.se
Date: Wed Mar 8 15:22:24 2017 +0000

[analyzer] Clarify 'uninitialized function argument' messages

Differential Revision: https://reviews.llvm.org/D30341

( The SHAs come from the unified git repo https://github.com/llvm-project/llvm-project-20170507.git )

Here is the output of the commit just before commit a2f4a537f655fd9f543dd462049f47e6c1dfba99
Author: Daniel Marjamaki daniel.marjamaki@evidente.se
Date: Wed Mar 8 15:22:24 2017 +0000

[analyzer] Clarify 'uninitialized function argument' messages

Differential Revision: https://reviews.llvm.org/D30341

The analysis:

$ /work/brzycki/bisect_817f15e61e475cd04122/build/bin/opt -S -jump-threading input38.ll >good.ll
$ diff -u jt38.ll good.ll
--- jt38.ll 2018-01-29 11:09:34.512215182 -0600
+++ good.ll 2018-01-29 18:16:29.621450557 -0600
@@ -1,4 +1,5 @@
-; ModuleID = ''
+; ModuleID = 'input38.ll'
+source_filename = "input38.ll"

@​global = external global i8*, align 8

$ /work/brzycki/bisect_5f9df53e57e8e81d42cf/build/bin/opt -S -jump-threading input38.ll > bad.ll
$ diff -u jt38.ll bad.ll
--- jt38.ll 2018-01-29 11:09:34.512215182 -0600
+++ bad.ll 2018-01-29 18:16:05.249481707 -0600
@@ -1,4 +1,5 @@
-; ModuleID = ''
+; ModuleID = 'input38.ll'
+source_filename = "input38.ll"

@​global = external global i8*, align 8

@@ -7,16 +8,16 @@
bb:
%tmp = load i8*, i8** @​global, align 8
%tmp1 = icmp eq i8* %tmp, null

  • br i1 %tmp1, label %bb7, label %bb7
  • br i1 %tmp1, label %bb10, label %bb7

-bb7: ; preds = %bb, %bb
+bb7: ; preds = %bb
%tmp8 = icmp eq i32 %arg, -1
br i1 %tmp8, label %bb9, label %bb10

bb9: ; preds = %bb7
ret i32 0

-bb10: ; preds = %bb7
+bb10: ; preds = %bb, %bb7
%tmp11 = icmp sgt i32 %arg, -1
call void @​llvm.assume(i1 %tmp11)
ret i32 1

@bmrzycki
Copy link
Contributor

Correction, here are the correct SHAs from (https://github.com/llvm-project/llvm-project-20170507.git):

commit 5f9df53
Author: Jun Bum Lim junbuml@codeaurora.org
Date: Wed Mar 8 15:22:30 2017 +0000

[JumpThread] Use AA in SimplifyPartiallyRedundantLoad()

Summary: Use AA when scanning to find an available load value.

Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin

Reviewed By: rengolin, dberlin

Subscribers: aemerson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30352

Notes:
git-svn-rev: 297284

commit 817f15e
Author: Daniel Marjamaki daniel.marjamaki@evidente.se
Date: Wed Mar 8 15:22:24 2017 +0000

[analyzer] Clarify 'uninitialized function argument' messages

Differential Revision: https://reviews.llvm.org/D30341

Notes:
git-svn-rev: 297283

@bmrzycki
Copy link
Contributor

Here is the problem written in c code. It's easier to see the problem:

#define true 1
#define false 0
typedef char bool;

extern char *global;
extern void llvm_assume(bool);

/* ORIGINAL */
int input38(int arg) {
char *tmp = global;
bool tmp11 = false;
bb:
if (!tmp)
goto bb7;
else
goto bb7;
bb7:
if (arg == -1)
return 0;
goto bb10;
bb10:
if (arg > -1)
tmp11 = true;
llvm_assume(tmp11);
return 1;
}

/* CORRECT: The state of global does not alter the control flow. */
int jt38(int arg) {
char tmp = global;
bool tmp11 = false;
/
tmp check unnecessary but isn't removed */
if ((tmp || !tmp) && (arg == -1))
return 0;
if (arg > -1)
tmp11 = true;
llvm_assume(tmp11);
return 1;
}

/* INCORRECT: The state of global must not alter the control flow. */
int jt50(int arg) {
char *tmp = global;
bool tmp11 = false;
if (tmp && (arg == -1))
return 0;
if (arg > -1)
tmp11 = true;
llvm_assume(tmp11);
return 1;
}

The JumpThreaded 5.0+ version incorrectly threads a jump from bb to bb10 in the case global tmp pointer is null.

@bmrzycki
Copy link
Contributor

There are two things going on here. The first is that the commit I mentioned causes the problem. The second is that the problem manifests differently with LLVM tip.

I was incorrect. In the case of tip there does exist a problem between DDT, JumpThreading, and LVI.

In JumpThreading we have the following code:

// Get DT analysis before LVI. When LVI is initialized it conditionally adds
// DT if it's available.
auto DT = &getAnalysis().getDomTree();
auto LVI = &getAnalysis().getLVI();

What happens here is LVI is initialized with its own local handle to the global DominatorTree structure. This means any time we use LVI for analysis, and that analysis relies on its DT, and JT has pending updates to the DT inside of DDT we have the potential for incorrect analysis.

In this test case JT begins analysis of bb7 and enters ComputeValueKnownInPredecessors(). The dependant value V is "icmp sgt i32 %arg, -1". In that function we eventually get to the following code:

      // If the value is known by LazyValueInfo to be a constant in a
      // predecessor, use that information to try to thread this block.
      LazyValueInfo::Tristate Res =
        LVI->getPredicateOnEdge(Pred, CmpLHS,
                                CmpConst, P, BB, CxtI ? CxtI : Cmp);
      if (Res == LazyValueInfo::Unknown)
        continue;

The underlying implementation of getPredicateOnEdge() uses DT to determine if "%arg" is an integer and incorrectly says it is. This causes JumpThreading to thread across bb7 into bb10 from bb3, bypassing the mandatory check of (arg == -1).

The fix requires flushing the DDT whenever LVI analysis is required. I will post a patch to Phabricator shortly.

@bmrzycki
Copy link
Contributor

@bmrzycki
Copy link
Contributor

Fix pushed to

https://reviews.llvm.org/rL325356

@dotdash
Copy link
Contributor

dotdash commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#36386

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants