Hi, all. This code, s.c, is a valid program but makes Clang hung on with -O3 at compile time. $cat s.c int a, b, c, d; void e() { a = 7; for (; a <= 78; a++) { d = 3; for (; d <= 73; d++) { int f = 0; b += c; if (b) { int g = 0; for (f = 5; f; g) ; } } } } $clang -c -w -O3 s.c //endless compiling This problem only occurs in clang-trunk with -O3, and other released versions or optimization flags seem well while dealing with this case. $clang -v clang version 13.0.0 (https://github.com/llvm/llvm-project 48bdd676a1d1338c10541460bf5beb69ac17e451) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /home/haoxin/haoxin-data/dut-research/compilers/llvm-project/build-20210127/bin Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8 Candidate multilib: .;@m64 Selected multilib: .;@m64 Thanks. Haoxin
The bug seems to be in either LoopUnroll or ValueTracking's isKnownNonEqual() and has something to do with @llvm.assume() replication/analysis. Also, this is not actually an infinite loop. If you adjust the for-loop constants down from 78/73, you can see the compile-time explosion in progress before it becomes intolerable: time with "21, 16": user 0m3.467s time with "22, 17": user 0m6.241s time with "23, 18": user 0m16.959s That's why the bug only becomes noticeable at -O3 rather than -O2. We use more aggressive unrolling parameters at O3. cc'ing @nikic and @jdoerfert in case this might be related to https://reviews.llvm.org/D96208 ?
We generate a total of #144 assumptions which should not be enough to cause a problem by itself. From a quick look at this I see: simplifyLoopAfterUnroll will call SimplifyICmpInst which will call computeKnownBits and computeKnownBitsFromAssume a lot of times. I imagine we look at a decent chunk of the 144 assumes for every if(b) branch in the unrolled loop, probably in a non-linear way?
IR test case: @a = dso_local global i32 0, align 4 @c = dso_local global i32 0, align 4 @b = dso_local global i32 0, align 4 define dso_local void @e() local_unnamed_addr #0 { entry: store i32 7, i32* @a, align 4 %0 = load i32, i32* @c, align 4 %1 = mul i32 %0, 71 %b.promoted.pre = load i32, i32* @b, align 4 %add = sub i32 0, %0 %tobool.not.1 = icmp eq i32 %0, 0 br label %for.cond1.preheader for.cond1.preheader: ; preds = %entry, %for.cond1.preheader %b.promoted = phi i32 [ %b.promoted.pre, %entry ], [ %1, %for.cond1.preheader ] %storemerge5 = phi i32 [ 7, %entry ], [ %inc10, %for.cond1.preheader ] %tobool.not = icmp eq i32 %b.promoted, %add call void @llvm.assume(i1 %tobool.not) call void @llvm.assume(i1 %tobool.not.1) store i32 %1, i32* @b, align 4 %inc10 = add nuw nsw i32 %storemerge5, 1 store i32 %inc10, i32* @a, align 4 %exitcond.not = icmp eq i32 %inc10, 79 br i1 %exitcond.not, label %for.end11, label %for.cond1.preheader for.end11: ; preds = %for.cond1.preheader ret void } declare void @llvm.assume(i1) Run with -loop-unroll -unroll-threshold=300.
A bit smaller: define void @test(i32 %arg) { entry: br label %loop loop: %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ] %iv2 = phi i32 [ 0, %entry ], [ %arg, %loop ] %assume = icmp eq i32 %iv2, %arg call void @llvm.assume(i1 %assume) %iv.inc = add nuw nsw i32 %iv, 1 %exitcond = icmp eq i32 %iv.inc, 20 ; Increase me! br i1 %exitcond, label %exit, label %loop exit: ret void } declare void @llvm.assume(i1) I think that at least a large part of the problem is that we allow an unbounded scan to find non-dominating but guaranteed-to-execute assumes. We should limit that.
(In reply to Nikita Popov from comment #4) > I think that at least a large part of the problem is that we allow an > unbounded scan to find non-dominating but guaranteed-to-execute assumes. We > should limit that. Yes, that seems to work. This makes the example compile fast: diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 9e81f84b1c5f..912b57c5f2e0 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -551,8 +551,9 @@ bool llvm::isValidAssumeForContext(const Instruction *Inv, // The context comes first, but they're both in the same block. // Make sure there is nothing in between that might interrupt // the control flow, not even CxtI itself. + unsigned ScanLimit = 10; for (BasicBlock::const_iterator I(CxtI), IE(Inv); I != IE; ++I) - if (!isGuaranteedToTransferExecutionToSuccessor(&*I)) + if (!isGuaranteedToTransferExecutionToSuccessor(&*I) || --ScanLimit == 0) return false; return !isEphemeralValueOf(Inv, CxtI);
Note that there's an instcombine test for that chunk of code added with: https://reviews.llvm.org/D37215 I don't know if that test had any intent of checking the actual distance between assume and its context inst, but if we want to preserve it, we'd make the ScanLimit = 15 in my draft patch. Let me know if I should post that on Phab and/or we want to do any compile-time-tracking experiments to find a better limit.
+1 for distance checks If we are looking at this, we should add early exit various places or to look at `nounwind` and `willreturn` attributes of the *surrounding function*. If present, no need to look at the instructions.
@spatel Your patch with a limit of 15 looks fine to me. I don't think we need to do further experiments, as this is a question of worst-case complexity rather than average-case performance.
(In reply to Nikita Popov from comment #8) > @spatel Your patch with a limit of 15 looks fine to me. I don't think we > need to do further experiments, as this is a question of worst-case > complexity rather than average-case performance. Thanks! Committed the quick fix here - https://reviews.llvm.org/rG378941f611ab I think that's enough to say this bug is fixed, but reopen if I got that wrong. I haven't been following the details of guaranteed progress discussions, so not sure how to test this...does this correctly implement the suggested check for function attributes? diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp index 493eb08bb9b0..ef78847cb43e 100644 --- a/llvm/lib/Analysis/ValueTracking.cpp +++ b/llvm/lib/Analysis/ValueTracking.cpp @@ -548,6 +548,12 @@ bool llvm::isValidAssumeForContext(const Instruction *Inv, if (Inv == CxtI) return false; + // Use function attributes to check if execution must reach the assume. + const Function *F = CxtI->getFunction(); + if (F->hasFnAttribute(Attribute::WillReturn) || + F->hasFnAttribute(Attribute::NoUnwind)) + return !isEphemeralValueOf(Inv, CxtI); + // The context comes first, but they're both in the same block. // Make sure there is nothing in between that might interrupt // the control flow, not even CxtI itself.
Thanks for the quick fixing! Cheers, Haoxin