49171 – [LoopUnroll] compile-time explosion with assumes at -O3

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 49171 - [LoopUnroll] compile-time explosion with assumes at -O3

Summary: [LoopUnroll] compile-time explosion with assumes at -O3

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-13 06:52 PST by Haoxin Tu
Modified:	2021-02-16 03:07 PST (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Haoxin Tu 2021-02-13 06:52:57 PST

Hi, all.

This code, s.c, is a valid program but makes Clang hung on with -O3 at compile time.

$cat s.c
int a, b, c, d;
void e() {
  a = 7;
  for (; a <= 78; a++) {
    d = 3;
    for (; d <= 73; d++) {
      int f = 0;
      b += c;
      if (b) {
        int g = 0;
        for (f = 5; f; g)
          ;
      }
    }
  }
}

$clang -c -w -O3 s.c
//endless compiling

This problem only occurs in clang-trunk with -O3, and other released versions or optimization flags seem well while dealing with this case.

$clang -v
clang version 13.0.0 (https://github.com/llvm/llvm-project 48bdd676a1d1338c10541460bf5beb69ac17e451)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/haoxin/haoxin-data/dut-research/compilers/llvm-project/build-20210127/bin
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Candidate multilib: .;@m64
Selected multilib: .;@m64

Thanks.
Haoxin

Comment 1 Sanjay Patel 2021-02-15 09:06:35 PST

The bug seems to be in either LoopUnroll or ValueTracking's isKnownNonEqual() and has something to do with @llvm.assume() replication/analysis.

Also, this is not actually an infinite loop. If you adjust the for-loop constants down from 78/73, you can see the compile-time explosion in progress before it becomes intolerable:

time with "21, 16": user	0m3.467s
time with "22, 17": user	0m6.241s
time with "23, 18": user	0m16.959s

That's why the bug only becomes noticeable at -O3 rather than -O2. We use more aggressive unrolling parameters at O3.

cc'ing @nikic and @jdoerfert in case this might be related to https://reviews.llvm.org/D96208 ?

Comment 2 Johannes Doerfert 2021-02-15 09:33:53 PST

We generate a total of #144 assumptions which should not be enough to cause a problem by itself. 

From a quick look at this I see:
  simplifyLoopAfterUnroll
will call 
  SimplifyICmpInst
which will call
  computeKnownBits and computeKnownBitsFromAssume
a lot of times. 

I imagine we look at a decent chunk of the 144 assumes for every if(b) branch in the unrolled loop, probably in a non-linear way?

Comment 3 Nikita Popov 2021-02-15 09:40:03 PST

IR test case:

@a = dso_local global i32 0, align 4
@c = dso_local global i32 0, align 4
@b = dso_local global i32 0, align 4

define dso_local void @e() local_unnamed_addr #0 {
entry:
  store i32 7, i32* @a, align 4
  %0 = load i32, i32* @c, align 4
  %1 = mul i32 %0, 71
  %b.promoted.pre = load i32, i32* @b, align 4
  %add = sub i32 0, %0
  %tobool.not.1 = icmp eq i32 %0, 0
  br label %for.cond1.preheader

for.cond1.preheader:                              ; preds = %entry, %for.cond1.preheader
  %b.promoted = phi i32 [ %b.promoted.pre, %entry ], [ %1, %for.cond1.preheader ]
  %storemerge5 = phi i32 [ 7, %entry ], [ %inc10, %for.cond1.preheader ]
  %tobool.not = icmp eq i32 %b.promoted, %add
  call void @llvm.assume(i1 %tobool.not)
  call void @llvm.assume(i1 %tobool.not.1)
  store i32 %1, i32* @b, align 4
  %inc10 = add nuw nsw i32 %storemerge5, 1
  store i32 %inc10, i32* @a, align 4
  %exitcond.not = icmp eq i32 %inc10, 79
  br i1 %exitcond.not, label %for.end11, label %for.cond1.preheader

for.end11:                                        ; preds = %for.cond1.preheader
  ret void
}

declare void @llvm.assume(i1)

Run with -loop-unroll -unroll-threshold=300.

Comment 4 Nikita Popov 2021-02-15 10:02:26 PST

A bit smaller:

define void @test(i32 %arg) {
entry: 
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv2 = phi i32 [ 0, %entry ], [ %arg, %loop ]
  %assume = icmp eq i32 %iv2, %arg 
  call void @llvm.assume(i1 %assume)
  %iv.inc = add nuw nsw i32 %iv, 1
  %exitcond = icmp eq i32 %iv.inc, 20 ; Increase me!
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
} 
  
declare void @llvm.assume(i1)

I think that at least a large part of the problem is that we allow an unbounded scan to find non-dominating but guaranteed-to-execute assumes. We should limit that.

Comment 5 Sanjay Patel 2021-02-15 10:21:10 PST

(In reply to Nikita Popov from comment #4) 
> I think that at least a large part of the problem is that we allow an
> unbounded scan to find non-dominating but guaranteed-to-execute assumes. We
> should limit that.

Yes, that seems to work. This makes the example compile fast:

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 9e81f84b1c5f..912b57c5f2e0 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -551,8 +551,9 @@ bool llvm::isValidAssumeForContext(const Instruction *Inv,
     // The context comes first, but they're both in the same block.
     // Make sure there is nothing in between that might interrupt
     // the control flow, not even CxtI itself.
+    unsigned ScanLimit = 10;
     for (BasicBlock::const_iterator I(CxtI), IE(Inv); I != IE; ++I)
-      if (!isGuaranteedToTransferExecutionToSuccessor(&*I))
+      if (!isGuaranteedToTransferExecutionToSuccessor(&*I) || --ScanLimit == 0)
         return false;
 
     return !isEphemeralValueOf(Inv, CxtI);

Comment 6 Sanjay Patel 2021-02-15 10:43:04 PST

Note that there's an instcombine test for that chunk of code added with:
https://reviews.llvm.org/D37215

I don't know if that test had any intent of checking the actual distance between assume and its context inst, but if we want to preserve it, we'd make the ScanLimit = 15 in my draft patch.

Let me know if I should post that on Phab and/or we want to do any compile-time-tracking experiments to find a better limit.

Comment 7 Johannes Doerfert 2021-02-15 11:14:11 PST

+1 for distance checks

If we are looking at this, we should add early exit various places or to look at `nounwind` and `willreturn` attributes of the *surrounding function*. If present, no need to look at the instructions.

Comment 8 Nikita Popov 2021-02-15 11:20:38 PST

@spatel Your patch with a limit of 15 looks fine to me. I don't think we need to do further experiments, as this is a question of worst-case complexity rather than average-case performance.

Comment 9 Sanjay Patel 2021-02-15 12:50:51 PST

(In reply to Nikita Popov from comment #8)
> @spatel Your patch with a limit of 15 looks fine to me. I don't think we
> need to do further experiments, as this is a question of worst-case
> complexity rather than average-case performance.

Thanks! Committed the quick fix here - https://reviews.llvm.org/rG378941f611ab

I think that's enough to say this bug is fixed, but reopen if I got that wrong.

I haven't been following the details of guaranteed progress discussions, so not sure how to test this...does this correctly implement the suggested check for function attributes?

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 493eb08bb9b0..ef78847cb43e 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -548,6 +548,12 @@ bool llvm::isValidAssumeForContext(const Instruction *Inv,
     if (Inv == CxtI)
       return false;
 
+    // Use function attributes to check if execution must reach the assume.
+    const Function *F = CxtI->getFunction();
+    if (F->hasFnAttribute(Attribute::WillReturn) ||
+        F->hasFnAttribute(Attribute::NoUnwind))
+      return !isEphemeralValueOf(Inv, CxtI);
+
     // The context comes first, but they're both in the same block.
     // Make sure there is nothing in between that might interrupt
     // the control flow, not even CxtI itself.

Comment 10 Haoxin Tu 2021-02-16 03:07:06 PST

Thanks for the quick fixing!

Cheers,
Haoxin