LLVM  9.0.0svn
LoopFuse.cpp
Go to the documentation of this file.
1 //===- LoopFuse.cpp - Loop Fusion Pass ------------------------------------===//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 ///
9 /// \file
10 /// This file implements the loop fusion pass.
11 /// The implementation is largely based on the following document:
12 ///
13 /// Code Transformations to Augment the Scope of Loop Fusion in a
14 /// Production Compiler
15 /// Christopher Mark Barton
16 /// MSc Thesis
17 /// https://webdocs.cs.ualberta.ca/~amaral/thesis/ChristopherBartonMSc.pdf
18 ///
19 /// The general approach taken is to collect sets of control flow equivalent
20 /// loops and test whether they can be fused. The necessary conditions for
21 /// fusion are:
22 /// 1. The loops must be adjacent (there cannot be any statements between
23 /// the two loops).
24 /// 2. The loops must be conforming (they must execute the same number of
25 /// iterations).
26 /// 3. The loops must be control flow equivalent (if one loop executes, the
27 /// other is guaranteed to execute).
28 /// 4. There cannot be any negative distance dependencies between the loops.
29 /// If all of these conditions are satisfied, it is safe to fuse the loops.
30 ///
31 /// This implementation creates FusionCandidates that represent the loop and the
32 /// necessary information needed by fusion. It then operates on the fusion
33 /// candidates, first confirming that the candidate is eligible for fusion. The
34 /// candidates are then collected into control flow equivalent sets, sorted in
35 /// dominance order. Each set of control flow equivalent candidates is then
36 /// traversed, attempting to fuse pairs of candidates in the set. If all
37 /// requirements for fusion are met, the two candidates are fused, creating a
38 /// new (fused) candidate which is then added back into the set to consider for
39 /// additional fusion.
40 ///
41 /// This implementation currently does not make any modifications to remove
42 /// conditions for fusion. Code transformations to make loops conform to each of
43 /// the conditions for fusion are discussed in more detail in the document
44 /// above. These can be added to the current implementation in the future.
45 //===----------------------------------------------------------------------===//
46 
48 #include "llvm/ADT/Statistic.h"
51 #include "llvm/Analysis/LoopInfo.h"
56 #include "llvm/IR/Function.h"
57 #include "llvm/IR/Verifier.h"
58 #include "llvm/Pass.h"
59 #include "llvm/Support/Debug.h"
61 #include "llvm/Transforms/Scalar.h"
62 #include "llvm/Transforms/Utils.h"
64 
65 using namespace llvm;
66 
67 #define DEBUG_TYPE "loop-fusion"
68 
69 STATISTIC(FuseCounter, "Count number of loop fusions performed");
70 STATISTIC(NumFusionCandidates, "Number of candidates for loop fusion");
71 STATISTIC(InvalidPreheader, "Loop has invalid preheader");
72 STATISTIC(InvalidHeader, "Loop has invalid header");
73 STATISTIC(InvalidExitingBlock, "Loop has invalid exiting blocks");
74 STATISTIC(InvalidExitBlock, "Loop has invalid exit block");
75 STATISTIC(InvalidLatch, "Loop has invalid latch");
76 STATISTIC(InvalidLoop, "Loop is invalid");
77 STATISTIC(AddressTakenBB, "Basic block has address taken");
78 STATISTIC(MayThrowException, "Loop may throw an exception");
79 STATISTIC(ContainsVolatileAccess, "Loop contains a volatile access");
80 STATISTIC(NotSimplifiedForm, "Loop is not in simplified form");
81 STATISTIC(InvalidDependencies, "Dependencies prevent fusion");
82 STATISTIC(InvalidTripCount,
83  "Loop does not have invariant backedge taken count");
84 STATISTIC(UncomputableTripCount, "SCEV cannot compute trip count of loop");
85 STATISTIC(NonEqualTripCount, "Candidate trip counts are not the same");
86 STATISTIC(NonAdjacent, "Candidates are not adjacent");
87 STATISTIC(NonEmptyPreheader, "Candidate has a non-empty preheader");
88 
93 };
94 
96  "loop-fusion-dependence-analysis",
97  cl::desc("Which dependence analysis should loop fusion use?"),
99  "Use the scalar evolution interface"),
101  "Use the dependence analysis interface"),
103  "Use all available analyses")),
105 
106 #ifndef NDEBUG
107 static cl::opt<bool>
108  VerboseFusionDebugging("loop-fusion-verbose-debug",
109  cl::desc("Enable verbose debugging for Loop Fusion"),
111 #endif
112 
113 /// This class is used to represent a candidate for loop fusion. When it is
114 /// constructed, it checks the conditions for loop fusion to ensure that it
115 /// represents a valid candidate. It caches several parts of a loop that are
116 /// used throughout loop fusion (e.g., loop preheader, loop header, etc) instead
117 /// of continually querying the underlying Loop to retrieve these values. It is
118 /// assumed these will not change throughout loop fusion.
119 ///
120 /// The invalidate method should be used to indicate that the FusionCandidate is
121 /// no longer a valid candidate for fusion. Similarly, the isValid() method can
122 /// be used to ensure that the FusionCandidate is still valid for fusion.
124  /// Cache of parts of the loop used throughout loop fusion. These should not
125  /// need to change throughout the analysis and transformation.
126  /// These parts are cached to avoid repeatedly looking up in the Loop class.
127 
128  /// Preheader of the loop this candidate represents
130  /// Header of the loop this candidate represents
132  /// Blocks in the loop that exit the loop
134  /// The successor block of this loop (where the exiting blocks go to)
136  /// Latch of the loop
138  /// The loop that this fusion candidate represents
139  Loop *L;
140  /// Vector of instructions in this loop that read from memory
142  /// Vector of instructions in this loop that write to memory
144  /// Are all of the members of this fusion candidate still valid
145  bool Valid;
146 
147  /// Dominator and PostDominator trees are needed for the
148  /// FusionCandidateCompare function, required by FusionCandidateSet to
149  /// determine where the FusionCandidate should be inserted into the set. These
150  /// are used to establish ordering of the FusionCandidates based on dominance.
153 
155  const PostDominatorTree *PDT)
156  : Preheader(L->getLoopPreheader()), Header(L->getHeader()),
157  ExitingBlock(L->getExitingBlock()), ExitBlock(L->getExitBlock()),
158  Latch(L->getLoopLatch()), L(L), Valid(true), DT(DT), PDT(PDT) {
159 
160  // Walk over all blocks in the loop and check for conditions that may
161  // prevent fusion. For each block, walk over all instructions and collect
162  // the memory reads and writes If any instructions that prevent fusion are
163  // found, invalidate this object and return.
164  for (BasicBlock *BB : L->blocks()) {
165  if (BB->hasAddressTaken()) {
166  AddressTakenBB++;
167  invalidate();
168  return;
169  }
170 
171  for (Instruction &I : *BB) {
172  if (I.mayThrow()) {
173  MayThrowException++;
174  invalidate();
175  return;
176  }
177  if (StoreInst *SI = dyn_cast<StoreInst>(&I)) {
178  if (SI->isVolatile()) {
179  ContainsVolatileAccess++;
180  invalidate();
181  return;
182  }
183  }
184  if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
185  if (LI->isVolatile()) {
186  ContainsVolatileAccess++;
187  invalidate();
188  return;
189  }
190  }
191  if (I.mayWriteToMemory())
192  MemWrites.push_back(&I);
193  if (I.mayReadFromMemory())
194  MemReads.push_back(&I);
195  }
196  }
197  }
198 
199  /// Check if all members of the class are valid.
200  bool isValid() const {
201  return Preheader && Header && ExitingBlock && ExitBlock && Latch && L &&
202  !L->isInvalid() && Valid;
203  }
204 
205  /// Verify that all members are in sync with the Loop object.
206  void verify() const {
207  assert(isValid() && "Candidate is not valid!!");
208  assert(!L->isInvalid() && "Loop is invalid!");
209  assert(Preheader == L->getLoopPreheader() && "Preheader is out of sync");
210  assert(Header == L->getHeader() && "Header is out of sync");
211  assert(ExitingBlock == L->getExitingBlock() &&
212  "Exiting Blocks is out of sync");
213  assert(ExitBlock == L->getExitBlock() && "Exit block is out of sync");
214  assert(Latch == L->getLoopLatch() && "Latch is out of sync");
215  }
216 
217 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
218  LLVM_DUMP_METHOD void dump() const {
219  dbgs() << "\tPreheader: " << (Preheader ? Preheader->getName() : "nullptr")
220  << "\n"
221  << "\tHeader: " << (Header ? Header->getName() : "nullptr") << "\n"
222  << "\tExitingBB: "
223  << (ExitingBlock ? ExitingBlock->getName() : "nullptr") << "\n"
224  << "\tExitBB: " << (ExitBlock ? ExitBlock->getName() : "nullptr")
225  << "\n"
226  << "\tLatch: " << (Latch ? Latch->getName() : "nullptr") << "\n";
227  }
228 #endif
229 
230 private:
231  // This is only used internally for now, to clear the MemWrites and MemReads
232  // list and setting Valid to false. I can't envision other uses of this right
233  // now, since once FusionCandidates are put into the FusionCandidateSet they
234  // are immutable. Thus, any time we need to change/update a FusionCandidate,
235  // we must create a new one and insert it into the FusionCandidateSet to
236  // ensure the FusionCandidateSet remains ordered correctly.
237  void invalidate() {
238  MemWrites.clear();
239  MemReads.clear();
240  Valid = false;
241  }
242 };
243 
245  const FusionCandidate &FC) {
246  if (FC.isValid())
247  OS << FC.Preheader->getName();
248  else
249  OS << "<Invalid>";
250 
251  return OS;
252 }
253 
255  /// Comparison functor to sort two Control Flow Equivalent fusion candidates
256  /// into dominance order.
257  /// If LHS dominates RHS and RHS post-dominates LHS, return true;
258  /// IF RHS dominates LHS and LHS post-dominates RHS, return false;
259  bool operator()(const FusionCandidate &LHS,
260  const FusionCandidate &RHS) const {
261  const DominatorTree *DT = LHS.DT;
262 
263  // Do not save PDT to local variable as it is only used in asserts and thus
264  // will trigger an unused variable warning if building without asserts.
265  assert(DT && LHS.PDT && "Expecting valid dominator tree");
266 
267  // Do this compare first so if LHS == RHS, function returns false.
268  if (DT->dominates(RHS.Preheader, LHS.Preheader)) {
269  // RHS dominates LHS
270  // Verify LHS post-dominates RHS
271  assert(LHS.PDT->dominates(LHS.Preheader, RHS.Preheader));
272  return false;
273  }
274 
275  if (DT->dominates(LHS.Preheader, RHS.Preheader)) {
276  // Verify RHS Postdominates LHS
277  assert(LHS.PDT->dominates(RHS.Preheader, LHS.Preheader));
278  return true;
279  }
280 
281  // If LHS does not dominate RHS and RHS does not dominate LHS then there is
282  // no dominance relationship between the two FusionCandidates. Thus, they
283  // should not be in the same set together.
285  "No dominance relationship between these fusion candidates!");
286  }
287 };
288 
289 namespace {
290 using LoopVector = SmallVector<Loop *, 4>;
291 
292 // Set of Control Flow Equivalent (CFE) Fusion Candidates, sorted in dominance
293 // order. Thus, if FC0 comes *before* FC1 in a FusionCandidateSet, then FC0
294 // dominates FC1 and FC1 post-dominates FC0.
295 // std::set was chosen because we want a sorted data structure with stable
296 // iterators. A subsequent patch to loop fusion will enable fusing non-ajdacent
297 // loops by moving intervening code around. When this intervening code contains
298 // loops, those loops will be moved also. The corresponding FusionCandidates
299 // will also need to be moved accordingly. As this is done, having stable
300 // iterators will simplify the logic. Similarly, having an efficient insert that
301 // keeps the FusionCandidateSet sorted will also simplify the implementation.
302 using FusionCandidateSet = std::set<FusionCandidate, FusionCandidateCompare>;
303 using FusionCandidateCollection = SmallVector<FusionCandidateSet, 4>;
304 } // namespace
305 
307  const FusionCandidateSet &CandSet) {
308  for (auto IT : CandSet)
309  OS << IT << "\n";
310 
311  return OS;
312 }
313 
314 #if !defined(NDEBUG)
315 static void
316 printFusionCandidates(const FusionCandidateCollection &FusionCandidates) {
317  dbgs() << "Fusion Candidates: \n";
318  for (const auto &CandidateSet : FusionCandidates) {
319  dbgs() << "*** Fusion Candidate Set ***\n";
320  dbgs() << CandidateSet;
321  dbgs() << "****************************\n";
322  }
323 }
324 #endif
325 
326 /// Collect all loops in function at the same nest level, starting at the
327 /// outermost level.
328 ///
329 /// This data structure collects all loops at the same nest level for a
330 /// given function (specified by the LoopInfo object). It starts at the
331 /// outermost level.
336 
338  if (!LI.empty())
339  LoopsOnLevel.emplace_back(LoopVector(LI.rbegin(), LI.rend()));
340  }
341 
342  /// Test whether a given loop has been removed from the function, and thus is
343  /// no longer valid.
344  bool isRemovedLoop(const Loop *L) const { return RemovedLoops.count(L); }
345 
346  /// Record that a given loop has been removed from the function and is no
347  /// longer valid.
348  void removeLoop(const Loop *L) { RemovedLoops.insert(L); }
349 
350  /// Descend the tree to the next (inner) nesting level
351  void descend() {
352  LoopsOnLevelTy LoopsOnNextLevel;
353 
354  for (const LoopVector &LV : *this)
355  for (Loop *L : LV)
356  if (!isRemovedLoop(L) && L->begin() != L->end())
357  LoopsOnNextLevel.emplace_back(LoopVector(L->begin(), L->end()));
358 
359  LoopsOnLevel = LoopsOnNextLevel;
360  RemovedLoops.clear();
361  Depth++;
362  }
363 
364  bool empty() const { return size() == 0; }
365  size_t size() const { return LoopsOnLevel.size() - RemovedLoops.size(); }
366  unsigned getDepth() const { return Depth; }
367 
368  iterator begin() { return LoopsOnLevel.begin(); }
369  iterator end() { return LoopsOnLevel.end(); }
370  const_iterator begin() const { return LoopsOnLevel.begin(); }
371  const_iterator end() const { return LoopsOnLevel.end(); }
372 
373 private:
374  /// Set of loops that have been removed from the function and are no longer
375  /// valid.
376  SmallPtrSet<const Loop *, 8> RemovedLoops;
377 
378  /// Depth of the current level, starting at 1 (outermost loops).
379  unsigned Depth;
380 
381  /// Vector of loops at the current depth level that have the same parent loop
382  LoopsOnLevelTy LoopsOnLevel;
383 };
384 
385 #ifndef NDEBUG
386 static void printLoopVector(const LoopVector &LV) {
387  dbgs() << "****************************\n";
388  for (auto L : LV)
389  printLoop(*L, dbgs());
390  dbgs() << "****************************\n";
391 }
392 #endif
393 
394 static void reportLoopFusion(const FusionCandidate &FC0,
395  const FusionCandidate &FC1,
397  using namespace ore;
398  ORE.emit(
399  OptimizationRemark(DEBUG_TYPE, "LoopFusion", FC0.Preheader->getParent())
400  << "Fused " << NV("Cand1", StringRef(FC0.Preheader->getName()))
401  << " with " << NV("Cand2", StringRef(FC1.Preheader->getName())));
402 }
403 
404 struct LoopFuser {
405 private:
406  // Sets of control flow equivalent fusion candidates for a given nest level.
407  FusionCandidateCollection FusionCandidates;
408 
409  LoopDepthTree LDT;
410  DomTreeUpdater DTU;
411 
412  LoopInfo &LI;
413  DominatorTree &DT;
414  DependenceInfo &DI;
415  ScalarEvolution &SE;
416  PostDominatorTree &PDT;
418 
419 public:
422  OptimizationRemarkEmitter &ORE, const DataLayout &DL)
423  : LDT(LI), DTU(DT, PDT, DomTreeUpdater::UpdateStrategy::Lazy), LI(LI),
424  DT(DT), DI(DI), SE(SE), PDT(PDT), ORE(ORE) {}
425 
426  /// This is the main entry point for loop fusion. It will traverse the
427  /// specified function and collect candidate loops to fuse, starting at the
428  /// outermost nesting level and working inwards.
430 #ifndef NDEBUG
432  LI.print(dbgs());
433  }
434 #endif
435 
436  LLVM_DEBUG(dbgs() << "Performing Loop Fusion on function " << F.getName()
437  << "\n");
438  bool Changed = false;
439 
440  while (!LDT.empty()) {
441  LLVM_DEBUG(dbgs() << "Got " << LDT.size() << " loop sets for depth "
442  << LDT.getDepth() << "\n";);
443 
444  for (const LoopVector &LV : LDT) {
445  assert(LV.size() > 0 && "Empty loop set was build!");
446 
447  // Skip singleton loop sets as they do not offer fusion opportunities on
448  // this level.
449  if (LV.size() == 1)
450  continue;
451 #ifndef NDEBUG
453  LLVM_DEBUG({
454  dbgs() << " Visit loop set (#" << LV.size() << "):\n";
455  printLoopVector(LV);
456  });
457  }
458 #endif
459 
460  collectFusionCandidates(LV);
461  Changed |= fuseCandidates();
462  }
463 
464  // Finished analyzing candidates at this level.
465  // Descend to the next level and clear all of the candidates currently
466  // collected. Note that it will not be possible to fuse any of the
467  // existing candidates with new candidates because the new candidates will
468  // be at a different nest level and thus not be control flow equivalent
469  // with all of the candidates collected so far.
470  LLVM_DEBUG(dbgs() << "Descend one level!\n");
471  LDT.descend();
472  FusionCandidates.clear();
473  }
474 
475  if (Changed)
476  LLVM_DEBUG(dbgs() << "Function after Loop Fusion: \n"; F.dump(););
477 
478 #ifndef NDEBUG
479  assert(DT.verify());
480  assert(PDT.verify());
481  LI.verify(DT);
482  SE.verify();
483 #endif
484 
485  LLVM_DEBUG(dbgs() << "Loop Fusion complete\n");
486  return Changed;
487  }
488 
489 private:
490  /// Determine if two fusion candidates are control flow equivalent.
491  ///
492  /// Two fusion candidates are control flow equivalent if when one executes,
493  /// the other is guaranteed to execute. This is determined using dominators
494  /// and post-dominators: if A dominates B and B post-dominates A then A and B
495  /// are control-flow equivalent.
496  bool isControlFlowEquivalent(const FusionCandidate &FC0,
497  const FusionCandidate &FC1) const {
498  assert(FC0.Preheader && FC1.Preheader && "Expecting valid preheaders");
499 
500  if (DT.dominates(FC0.Preheader, FC1.Preheader))
501  return PDT.dominates(FC1.Preheader, FC0.Preheader);
502 
503  if (DT.dominates(FC1.Preheader, FC0.Preheader))
504  return PDT.dominates(FC0.Preheader, FC1.Preheader);
505 
506  return false;
507  }
508 
509  /// Determine if a fusion candidate (representing a loop) is eligible for
510  /// fusion. Note that this only checks whether a single loop can be fused - it
511  /// does not check whether it is *legal* to fuse two loops together.
512  bool eligibleForFusion(const FusionCandidate &FC) const {
513  if (!FC.isValid()) {
514  LLVM_DEBUG(dbgs() << "FC " << FC << " has invalid CFG requirements!\n");
515  if (!FC.Preheader)
516  InvalidPreheader++;
517  if (!FC.Header)
518  InvalidHeader++;
519  if (!FC.ExitingBlock)
520  InvalidExitingBlock++;
521  if (!FC.ExitBlock)
522  InvalidExitBlock++;
523  if (!FC.Latch)
524  InvalidLatch++;
525  if (FC.L->isInvalid())
526  InvalidLoop++;
527 
528  return false;
529  }
530 
531  // Require ScalarEvolution to be able to determine a trip count.
533  LLVM_DEBUG(dbgs() << "Loop " << FC.L->getName()
534  << " trip count not computable!\n");
535  InvalidTripCount++;
536  return false;
537  }
538 
539  if (!FC.L->isLoopSimplifyForm()) {
540  LLVM_DEBUG(dbgs() << "Loop " << FC.L->getName()
541  << " is not in simplified form!\n");
542  NotSimplifiedForm++;
543  return false;
544  }
545 
546  return true;
547  }
548 
549  /// Iterate over all loops in the given loop set and identify the loops that
550  /// are eligible for fusion. Place all eligible fusion candidates into Control
551  /// Flow Equivalent sets, sorted by dominance.
552  void collectFusionCandidates(const LoopVector &LV) {
553  for (Loop *L : LV) {
554  FusionCandidate CurrCand(L, &DT, &PDT);
555  if (!eligibleForFusion(CurrCand))
556  continue;
557 
558  // Go through each list in FusionCandidates and determine if L is control
559  // flow equivalent with the first loop in that list. If it is, append LV.
560  // If not, go to the next list.
561  // If no suitable list is found, start another list and add it to
562  // FusionCandidates.
563  bool FoundSet = false;
564 
565  for (auto &CurrCandSet : FusionCandidates) {
566  if (isControlFlowEquivalent(*CurrCandSet.begin(), CurrCand)) {
567  CurrCandSet.insert(CurrCand);
568  FoundSet = true;
569 #ifndef NDEBUG
571  LLVM_DEBUG(dbgs() << "Adding " << CurrCand
572  << " to existing candidate set\n");
573 #endif
574  break;
575  }
576  }
577  if (!FoundSet) {
578  // No set was found. Create a new set and add to FusionCandidates
579 #ifndef NDEBUG
581  LLVM_DEBUG(dbgs() << "Adding " << CurrCand << " to new set\n");
582 #endif
583  FusionCandidateSet NewCandSet;
584  NewCandSet.insert(CurrCand);
585  FusionCandidates.push_back(NewCandSet);
586  }
587  NumFusionCandidates++;
588  }
589  }
590 
591  /// Determine if it is beneficial to fuse two loops.
592  ///
593  /// For now, this method simply returns true because we want to fuse as much
594  /// as possible (primarily to test the pass). This method will evolve, over
595  /// time, to add heuristics for profitability of fusion.
596  bool isBeneficialFusion(const FusionCandidate &FC0,
597  const FusionCandidate &FC1) {
598  return true;
599  }
600 
601  /// Determine if two fusion candidates have the same trip count (i.e., they
602  /// execute the same number of iterations).
603  ///
604  /// Note that for now this method simply returns a boolean value because there
605  /// are no mechanisms in loop fusion to handle different trip counts. In the
606  /// future, this behaviour can be extended to adjust one of the loops to make
607  /// the trip counts equal (e.g., loop peeling). When this is added, this
608  /// interface may need to change to return more information than just a
609  /// boolean value.
610  bool identicalTripCounts(const FusionCandidate &FC0,
611  const FusionCandidate &FC1) const {
612  const SCEV *TripCount0 = SE.getBackedgeTakenCount(FC0.L);
613  if (isa<SCEVCouldNotCompute>(TripCount0)) {
614  UncomputableTripCount++;
615  LLVM_DEBUG(dbgs() << "Trip count of first loop could not be computed!");
616  return false;
617  }
618 
619  const SCEV *TripCount1 = SE.getBackedgeTakenCount(FC1.L);
620  if (isa<SCEVCouldNotCompute>(TripCount1)) {
621  UncomputableTripCount++;
622  LLVM_DEBUG(dbgs() << "Trip count of second loop could not be computed!");
623  return false;
624  }
625  LLVM_DEBUG(dbgs() << "\tTrip counts: " << *TripCount0 << " & "
626  << *TripCount1 << " are "
627  << (TripCount0 == TripCount1 ? "identical" : "different")
628  << "\n");
629 
630  return (TripCount0 == TripCount1);
631  }
632 
633  /// Walk each set of control flow equivalent fusion candidates and attempt to
634  /// fuse them. This does a single linear traversal of all candidates in the
635  /// set. The conditions for legal fusion are checked at this point. If a pair
636  /// of fusion candidates passes all legality checks, they are fused together
637  /// and a new fusion candidate is created and added to the FusionCandidateSet.
638  /// The original fusion candidates are then removed, as they are no longer
639  /// valid.
640  bool fuseCandidates() {
641  bool Fused = false;
642  LLVM_DEBUG(printFusionCandidates(FusionCandidates));
643  for (auto &CandidateSet : FusionCandidates) {
644  if (CandidateSet.size() < 2)
645  continue;
646 
647  LLVM_DEBUG(dbgs() << "Attempting fusion on Candidate Set:\n"
648  << CandidateSet << "\n");
649 
650  for (auto FC0 = CandidateSet.begin(); FC0 != CandidateSet.end(); ++FC0) {
651  assert(!LDT.isRemovedLoop(FC0->L) &&
652  "Should not have removed loops in CandidateSet!");
653  auto FC1 = FC0;
654  for (++FC1; FC1 != CandidateSet.end(); ++FC1) {
655  assert(!LDT.isRemovedLoop(FC1->L) &&
656  "Should not have removed loops in CandidateSet!");
657 
658  LLVM_DEBUG(dbgs() << "Attempting to fuse candidate \n"; FC0->dump();
659  dbgs() << " with\n"; FC1->dump(); dbgs() << "\n");
660 
661  FC0->verify();
662  FC1->verify();
663 
664  if (!identicalTripCounts(*FC0, *FC1)) {
665  LLVM_DEBUG(dbgs() << "Fusion candidates do not have identical trip "
666  "counts. Not fusing.\n");
667  NonEqualTripCount++;
668  continue;
669  }
670 
671  if (!isAdjacent(*FC0, *FC1)) {
672  LLVM_DEBUG(dbgs()
673  << "Fusion candidates are not adjacent. Not fusing.\n");
674  NonAdjacent++;
675  continue;
676  }
677 
678  // For now we skip fusing if the second candidate has any instructions
679  // in the preheader. This is done because we currently do not have the
680  // safety checks to determine if it is save to move the preheader of
681  // the second candidate past the body of the first candidate. Once
682  // these checks are added, this condition can be removed.
683  if (!isEmptyPreheader(*FC1)) {
684  LLVM_DEBUG(dbgs() << "Fusion candidate does not have empty "
685  "preheader. Not fusing.\n");
686  NonEmptyPreheader++;
687  continue;
688  }
689 
690  if (!dependencesAllowFusion(*FC0, *FC1)) {
691  LLVM_DEBUG(dbgs() << "Memory dependencies do not allow fusion!\n");
692  continue;
693  }
694 
695  bool BeneficialToFuse = isBeneficialFusion(*FC0, *FC1);
696  LLVM_DEBUG(dbgs()
697  << "\tFusion appears to be "
698  << (BeneficialToFuse ? "" : "un") << "profitable!\n");
699  if (!BeneficialToFuse)
700  continue;
701 
702  // All analysis has completed and has determined that fusion is legal
703  // and profitable. At this point, start transforming the code and
704  // perform fusion.
705 
706  LLVM_DEBUG(dbgs() << "\tFusion is performed: " << *FC0 << " and "
707  << *FC1 << "\n");
708 
709  // Report fusion to the Optimization Remarks.
710  // Note this needs to be done *before* performFusion because
711  // performFusion will change the original loops, making it not
712  // possible to identify them after fusion is complete.
713  reportLoopFusion(*FC0, *FC1, ORE);
714 
715  FusionCandidate FusedCand(performFusion(*FC0, *FC1), &DT, &PDT);
716  FusedCand.verify();
717  assert(eligibleForFusion(FusedCand) &&
718  "Fused candidate should be eligible for fusion!");
719 
720  // Notify the loop-depth-tree that these loops are not valid objects
721  // anymore.
722  LDT.removeLoop(FC1->L);
723 
724  CandidateSet.erase(FC0);
725  CandidateSet.erase(FC1);
726 
727  auto InsertPos = CandidateSet.insert(FusedCand);
728 
729  assert(InsertPos.second &&
730  "Unable to insert TargetCandidate in CandidateSet!");
731 
732  // Reset FC0 and FC1 the new (fused) candidate. Subsequent iterations
733  // of the FC1 loop will attempt to fuse the new (fused) loop with the
734  // remaining candidates in the current candidate set.
735  FC0 = FC1 = InsertPos.first;
736 
737  LLVM_DEBUG(dbgs() << "Candidate Set (after fusion): " << CandidateSet
738  << "\n");
739 
740  Fused = true;
741  }
742  }
743  }
744  return Fused;
745  }
746 
747  /// Rewrite all additive recurrences in a SCEV to use a new loop.
748  class AddRecLoopReplacer : public SCEVRewriteVisitor<AddRecLoopReplacer> {
749  public:
750  AddRecLoopReplacer(ScalarEvolution &SE, const Loop &OldL, const Loop &NewL,
751  bool UseMax = true)
752  : SCEVRewriteVisitor(SE), Valid(true), UseMax(UseMax), OldL(OldL),
753  NewL(NewL) {}
754 
755  const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) {
756  const Loop *ExprL = Expr->getLoop();
758  if (ExprL == &OldL) {
759  Operands.append(Expr->op_begin(), Expr->op_end());
760  return SE.getAddRecExpr(Operands, &NewL, Expr->getNoWrapFlags());
761  }
762 
763  if (OldL.contains(ExprL)) {
764  bool Pos = SE.isKnownPositive(Expr->getStepRecurrence(SE));
765  if (!UseMax || !Pos || !Expr->isAffine()) {
766  Valid = false;
767  return Expr;
768  }
769  return visit(Expr->getStart());
770  }
771 
772  for (const SCEV *Op : Expr->operands())
773  Operands.push_back(visit(Op));
774  return SE.getAddRecExpr(Operands, ExprL, Expr->getNoWrapFlags());
775  }
776 
777  bool wasValidSCEV() const { return Valid; }
778 
779  private:
780  bool Valid, UseMax;
781  const Loop &OldL, &NewL;
782  };
783 
784  /// Return false if the access functions of \p I0 and \p I1 could cause
785  /// a negative dependence.
786  bool accessDiffIsPositive(const Loop &L0, const Loop &L1, Instruction &I0,
787  Instruction &I1, bool EqualIsInvalid) {
788  Value *Ptr0 = getLoadStorePointerOperand(&I0);
789  Value *Ptr1 = getLoadStorePointerOperand(&I1);
790  if (!Ptr0 || !Ptr1)
791  return false;
792 
793  const SCEV *SCEVPtr0 = SE.getSCEVAtScope(Ptr0, &L0);
794  const SCEV *SCEVPtr1 = SE.getSCEVAtScope(Ptr1, &L1);
795 #ifndef NDEBUG
797  LLVM_DEBUG(dbgs() << " Access function check: " << *SCEVPtr0 << " vs "
798  << *SCEVPtr1 << "\n");
799 #endif
800  AddRecLoopReplacer Rewriter(SE, L0, L1);
801  SCEVPtr0 = Rewriter.visit(SCEVPtr0);
802 #ifndef NDEBUG
804  LLVM_DEBUG(dbgs() << " Access function after rewrite: " << *SCEVPtr0
805  << " [Valid: " << Rewriter.wasValidSCEV() << "]\n");
806 #endif
807  if (!Rewriter.wasValidSCEV())
808  return false;
809 
810  // TODO: isKnownPredicate doesnt work well when one SCEV is loop carried (by
811  // L0) and the other is not. We could check if it is monotone and test
812  // the beginning and end value instead.
813 
814  BasicBlock *L0Header = L0.getHeader();
815  auto HasNonLinearDominanceRelation = [&](const SCEV *S) {
816  const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(S);
817  if (!AddRec)
818  return false;
819  return !DT.dominates(L0Header, AddRec->getLoop()->getHeader()) &&
820  !DT.dominates(AddRec->getLoop()->getHeader(), L0Header);
821  };
822  if (SCEVExprContains(SCEVPtr1, HasNonLinearDominanceRelation))
823  return false;
824 
825  ICmpInst::Predicate Pred =
826  EqualIsInvalid ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_SGE;
827  bool IsAlwaysGE = SE.isKnownPredicate(Pred, SCEVPtr0, SCEVPtr1);
828 #ifndef NDEBUG
830  LLVM_DEBUG(dbgs() << " Relation: " << *SCEVPtr0
831  << (IsAlwaysGE ? " >= " : " may < ") << *SCEVPtr1
832  << "\n");
833 #endif
834  return IsAlwaysGE;
835  }
836 
837  /// Return true if the dependences between @p I0 (in @p L0) and @p I1 (in
838  /// @p L1) allow loop fusion of @p L0 and @p L1. The dependence analyses
839  /// specified by @p DepChoice are used to determine this.
840  bool dependencesAllowFusion(const FusionCandidate &FC0,
841  const FusionCandidate &FC1, Instruction &I0,
842  Instruction &I1, bool AnyDep,
843  FusionDependenceAnalysisChoice DepChoice) {
844 #ifndef NDEBUG
846  LLVM_DEBUG(dbgs() << "Check dep: " << I0 << " vs " << I1 << " : "
847  << DepChoice << "\n");
848  }
849 #endif
850  switch (DepChoice) {
852  return accessDiffIsPositive(*FC0.L, *FC1.L, I0, I1, AnyDep);
854  auto DepResult = DI.depends(&I0, &I1, true);
855  if (!DepResult)
856  return true;
857 #ifndef NDEBUG
859  LLVM_DEBUG(dbgs() << "DA res: "; DepResult->dump(dbgs());
860  dbgs() << " [#l: " << DepResult->getLevels() << "][Ordered: "
861  << (DepResult->isOrdered() ? "true" : "false")
862  << "]\n");
863  LLVM_DEBUG(dbgs() << "DepResult Levels: " << DepResult->getLevels()
864  << "\n");
865  }
866 #endif
867 
868  if (DepResult->getNextPredecessor() || DepResult->getNextSuccessor())
869  LLVM_DEBUG(
870  dbgs() << "TODO: Implement pred/succ dependence handling!\n");
871 
872  // TODO: Can we actually use the dependence info analysis here?
873  return false;
874  }
875 
877  return dependencesAllowFusion(FC0, FC1, I0, I1, AnyDep,
879  dependencesAllowFusion(FC0, FC1, I0, I1, AnyDep,
881  }
882 
883  llvm_unreachable("Unknown fusion dependence analysis choice!");
884  }
885 
886  /// Perform a dependence check and return if @p FC0 and @p FC1 can be fused.
887  bool dependencesAllowFusion(const FusionCandidate &FC0,
888  const FusionCandidate &FC1) {
889  LLVM_DEBUG(dbgs() << "Check if " << FC0 << " can be fused with " << FC1
890  << "\n");
891  assert(FC0.L->getLoopDepth() == FC1.L->getLoopDepth());
892  assert(DT.dominates(FC0.Preheader, FC1.Preheader));
893 
894  for (Instruction *WriteL0 : FC0.MemWrites) {
895  for (Instruction *WriteL1 : FC1.MemWrites)
896  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *WriteL1,
897  /* AnyDep */ false,
899  InvalidDependencies++;
900  return false;
901  }
902  for (Instruction *ReadL1 : FC1.MemReads)
903  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *ReadL1,
904  /* AnyDep */ false,
906  InvalidDependencies++;
907  return false;
908  }
909  }
910 
911  for (Instruction *WriteL1 : FC1.MemWrites) {
912  for (Instruction *WriteL0 : FC0.MemWrites)
913  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *WriteL1,
914  /* AnyDep */ false,
916  InvalidDependencies++;
917  return false;
918  }
919  for (Instruction *ReadL0 : FC0.MemReads)
920  if (!dependencesAllowFusion(FC0, FC1, *ReadL0, *WriteL1,
921  /* AnyDep */ false,
923  InvalidDependencies++;
924  return false;
925  }
926  }
927 
928  // Walk through all uses in FC1. For each use, find the reaching def. If the
929  // def is located in FC0 then it is is not safe to fuse.
930  for (BasicBlock *BB : FC1.L->blocks())
931  for (Instruction &I : *BB)
932  for (auto &Op : I.operands())
933  if (Instruction *Def = dyn_cast<Instruction>(Op))
934  if (FC0.L->contains(Def->getParent())) {
935  InvalidDependencies++;
936  return false;
937  }
938 
939  return true;
940  }
941 
942  /// Determine if the exit block of \p FC0 is the preheader of \p FC1. In this
943  /// case, there is no code in between the two fusion candidates, thus making
944  /// them adjacent.
945  bool isAdjacent(const FusionCandidate &FC0,
946  const FusionCandidate &FC1) const {
947  return FC0.ExitBlock == FC1.Preheader;
948  }
949 
950  bool isEmptyPreheader(const FusionCandidate &FC) const {
951  return FC.Preheader->size() == 1;
952  }
953 
954  /// Fuse two fusion candidates, creating a new fused loop.
955  ///
956  /// This method contains the mechanics of fusing two loops, represented by \p
957  /// FC0 and \p FC1. It is assumed that \p FC0 dominates \p FC1 and \p FC1
958  /// postdominates \p FC0 (making them control flow equivalent). It also
959  /// assumes that the other conditions for fusion have been met: adjacent,
960  /// identical trip counts, and no negative distance dependencies exist that
961  /// would prevent fusion. Thus, there is no checking for these conditions in
962  /// this method.
963  ///
964  /// Fusion is performed by rewiring the CFG to update successor blocks of the
965  /// components of tho loop. Specifically, the following changes are done:
966  ///
967  /// 1. The preheader of \p FC1 is removed as it is no longer necessary
968  /// (because it is currently only a single statement block).
969  /// 2. The latch of \p FC0 is modified to jump to the header of \p FC1.
970  /// 3. The latch of \p FC1 i modified to jump to the header of \p FC0.
971  /// 4. All blocks from \p FC1 are removed from FC1 and added to FC0.
972  ///
973  /// All of these modifications are done with dominator tree updates, thus
974  /// keeping the dominator (and post dominator) information up-to-date.
975  ///
976  /// This can be improved in the future by actually merging blocks during
977  /// fusion. For example, the preheader of \p FC1 can be merged with the
978  /// preheader of \p FC0. This would allow loops with more than a single
979  /// statement in the preheader to be fused. Similarly, the latch blocks of the
980  /// two loops could also be fused into a single block. This will require
981  /// analysis to prove it is safe to move the contents of the block past
982  /// existing code, which currently has not been implemented.
983  Loop *performFusion(const FusionCandidate &FC0, const FusionCandidate &FC1) {
984  assert(FC0.isValid() && FC1.isValid() &&
985  "Expecting valid fusion candidates");
986 
987  LLVM_DEBUG(dbgs() << "Fusion Candidate 0: \n"; FC0.dump();
988  dbgs() << "Fusion Candidate 1: \n"; FC1.dump(););
989 
990  assert(FC1.Preheader == FC0.ExitBlock);
991  assert(FC1.Preheader->size() == 1 &&
992  FC1.Preheader->getSingleSuccessor() == FC1.Header);
993 
994  // Remember the phi nodes originally in the header of FC0 in order to rewire
995  // them later. However, this is only necessary if the new loop carried
996  // values might not dominate the exiting branch. While we do not generally
997  // test if this is the case but simply insert intermediate phi nodes, we
998  // need to make sure these intermediate phi nodes have different
999  // predecessors. To this end, we filter the special case where the exiting
1000  // block is the latch block of the first loop. Nothing needs to be done
1001  // anyway as all loop carried values dominate the latch and thereby also the
1002  // exiting branch.
1003  SmallVector<PHINode *, 8> OriginalFC0PHIs;
1004  if (FC0.ExitingBlock != FC0.Latch)
1005  for (PHINode &PHI : FC0.Header->phis())
1006  OriginalFC0PHIs.push_back(&PHI);
1007 
1008  // Replace incoming blocks for header PHIs first.
1011 
1012  // Then modify the control flow and update DT and PDT.
1014 
1015  // The old exiting block of the first loop (FC0) has to jump to the header
1016  // of the second as we need to execute the code in the second header block
1017  // regardless of the trip count. That is, if the trip count is 0, so the
1018  // back edge is never taken, we still have to execute both loop headers,
1019  // especially (but not only!) if the second is a do-while style loop.
1020  // However, doing so might invalidate the phi nodes of the first loop as
1021  // the new values do only need to dominate their latch and not the exiting
1022  // predicate. To remedy this potential problem we always introduce phi
1023  // nodes in the header of the second loop later that select the loop carried
1024  // value, if the second header was reached through an old latch of the
1025  // first, or undef otherwise. This is sound as exiting the first implies the
1026  // second will exit too, __without__ taking the back-edge. [Their
1027  // trip-counts are equal after all.
1028  // KB: Would this sequence be simpler to just just make FC0.ExitingBlock go
1029  // to FC1.Header? I think this is basically what the three sequences are
1030  // trying to accomplish; however, doing this directly in the CFG may mean
1031  // the DT/PDT becomes invalid
1033  FC1.Header);
1038 
1039  // The pre-header of L1 is not necessary anymore.
1042  new UnreachableInst(FC1.Preheader->getContext(), FC1.Preheader);
1045 
1046  // Moves the phi nodes from the second to the first loops header block.
1047  while (PHINode *PHI = dyn_cast<PHINode>(&FC1.Header->front())) {
1048  if (SE.isSCEVable(PHI->getType()))
1049  SE.forgetValue(PHI);
1050  if (PHI->hasNUsesOrMore(1))
1051  PHI->moveBefore(&*FC0.Header->getFirstInsertionPt());
1052  else
1053  PHI->eraseFromParent();
1054  }
1055 
1056  // Introduce new phi nodes in the second loop header to ensure
1057  // exiting the first and jumping to the header of the second does not break
1058  // the SSA property of the phis originally in the first loop. See also the
1059  // comment above.
1060  Instruction *L1HeaderIP = &FC1.Header->front();
1061  for (PHINode *LCPHI : OriginalFC0PHIs) {
1062  int L1LatchBBIdx = LCPHI->getBasicBlockIndex(FC1.Latch);
1063  assert(L1LatchBBIdx >= 0 &&
1064  "Expected loop carried value to be rewired at this point!");
1065 
1066  Value *LCV = LCPHI->getIncomingValue(L1LatchBBIdx);
1067 
1068  PHINode *L1HeaderPHI = PHINode::Create(
1069  LCV->getType(), 2, LCPHI->getName() + ".afterFC0", L1HeaderIP);
1070  L1HeaderPHI->addIncoming(LCV, FC0.Latch);
1071  L1HeaderPHI->addIncoming(UndefValue::get(LCV->getType()),
1072  FC0.ExitingBlock);
1073 
1074  LCPHI->setIncomingValue(L1LatchBBIdx, L1HeaderPHI);
1075  }
1076 
1077  // Replace latch terminator destinations.
1080 
1081  // If FC0.Latch and FC0.ExitingBlock are the same then we have already
1082  // performed the updates above.
1083  if (FC0.Latch != FC0.ExitingBlock)
1085  DominatorTree::Insert, FC0.Latch, FC1.Header));
1086 
1088  FC0.Latch, FC0.Header));
1090  FC1.Latch, FC0.Header));
1092  FC1.Latch, FC1.Header));
1093 
1094  // Update DT/PDT
1095  DTU.applyUpdates(TreeUpdates);
1096 
1097  LI.removeBlock(FC1.Preheader);
1098  DTU.deleteBB(FC1.Preheader);
1099  DTU.flush();
1100 
1101  // Is there a way to keep SE up-to-date so we don't need to forget the loops
1102  // and rebuild the information in subsequent passes of fusion?
1103  SE.forgetLoop(FC1.L);
1104  SE.forgetLoop(FC0.L);
1105 
1106  // Merge the loops.
1108  FC1.L->block_end());
1109  for (BasicBlock *BB : Blocks) {
1110  FC0.L->addBlockEntry(BB);
1111  FC1.L->removeBlockFromLoop(BB);
1112  if (LI.getLoopFor(BB) != FC1.L)
1113  continue;
1114  LI.changeLoopFor(BB, FC0.L);
1115  }
1116  while (!FC1.L->empty()) {
1117  const auto &ChildLoopIt = FC1.L->begin();
1118  Loop *ChildLoop = *ChildLoopIt;
1119  FC1.L->removeChildLoop(ChildLoopIt);
1120  FC0.L->addChildLoop(ChildLoop);
1121  }
1122 
1123  // Delete the now empty loop L1.
1124  LI.erase(FC1.L);
1125 
1126 #ifndef NDEBUG
1127  assert(!verifyFunction(*FC0.Header->getParent(), &errs()));
1129  assert(PDT.verify());
1130  LI.verify(DT);
1131  SE.verify();
1132 #endif
1133 
1134  FuseCounter++;
1135 
1136  LLVM_DEBUG(dbgs() << "Fusion done:\n");
1137 
1138  return FC0.L;
1139  }
1140 };
1141 
1142 struct LoopFuseLegacy : public FunctionPass {
1143 
1144  static char ID;
1145 
1148  }
1149 
1150  void getAnalysisUsage(AnalysisUsage &AU) const override {
1158 
1163  }
1164 
1165  bool runOnFunction(Function &F) override {
1166  if (skipFunction(F))
1167  return false;
1168  auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
1169  auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
1170  auto &DI = getAnalysis<DependenceAnalysisWrapperPass>().getDI();
1171  auto &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
1172  auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
1173  auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
1174 
1175  const DataLayout &DL = F.getParent()->getDataLayout();
1176  LoopFuser LF(LI, DT, DI, SE, PDT, ORE, DL);
1177  return LF.fuseLoops(F);
1178  }
1179 };
1180 
1182  auto &LI = AM.getResult<LoopAnalysis>(F);
1183  auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
1184  auto &DI = AM.getResult<DependenceAnalysis>(F);
1185  auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);
1186  auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
1188 
1189  const DataLayout &DL = F.getParent()->getDataLayout();
1190  LoopFuser LF(LI, DT, DI, SE, PDT, ORE, DL);
1191  bool Changed = LF.fuseLoops(F);
1192  if (!Changed)
1193  return PreservedAnalyses::all();
1194 
1195  PreservedAnalyses PA;
1199  PA.preserve<LoopAnalysis>();
1200  return PA;
1201 }
1202 
1203 char LoopFuseLegacy::ID = 0;
1204 
1205 INITIALIZE_PASS_BEGIN(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false,
1206  false)
1213 INITIALIZE_PASS_END(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false, false)
1214 
1215 FunctionPass *llvm::createLoopFusePass() { return new LoopFuseLegacy(); }
void initializeLoopFuseLegacyPass(PassRegistry &)
SymbolTableList< Instruction >::iterator eraseFromParent()
This method unlinks &#39;this&#39; from the containing basic block and deletes it.
Definition: Instruction.cpp:67
static void printLoopVector(const LoopVector &LV)
Definition: LoopFuse.cpp:386
static void printFusionCandidates(const FusionCandidateCollection &FusionCandidates)
Definition: LoopFuse.cpp:316
A parsed version of the target data layout string in and methods for querying it. ...
Definition: DataLayout.h:110
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:641
LoopFuser(LoopInfo &LI, DominatorTree &DT, DependenceInfo &DI, ScalarEvolution &SE, PostDominatorTree &PDT, OptimizationRemarkEmitter &ORE, const DataLayout &DL)
Definition: LoopFuse.cpp:420
AnalysisUsage & addPreserved()
Add the specified Pass class to the set of analyses preserved by this pass.
raw_ostream & errs()
This returns a reference to a raw_ostream for standard error.
bool isRemovedLoop(const Loop *L) const
Test whether a given loop has been removed from the function, and thus is no longer valid...
Definition: LoopFuse.cpp:344
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
BlockT * getLoopLatch() const
If there is a single latch block for this loop, return it.
Definition: LoopInfoImpl.h:224
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
LoopsOnLevelTy::iterator iterator
Definition: LoopFuse.cpp:334
bool empty() const
Definition: LoopInfo.h:837
DiagnosticInfoOptimizationBase::Argument NV
typename SuperClass::const_iterator const_iterator
Definition: SmallVector.h:320
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
Definition: PassManager.h:776
AnalysisPass to compute dependence information in a function.
This class represents lattice values for constants.
Definition: AllocatorList.h:23
#define LLVM_DUMP_METHOD
Mark debug helper function definitions like dump() that should not be stripped from debug builds...
Definition: Compiler.h:473
const PostDominatorTree * PDT
Definition: LoopFuse.cpp:152
Legacy pass manager pass to access dependence information.
This class is used to represent a candidate for loop fusion.
Definition: LoopFuse.cpp:123
unsigned getLoopDepth() const
Return the nesting level of this loop.
Definition: LoopInfo.h:94
void push_back(const T &Elt)
Definition: SmallVector.h:211
LoopT * removeChildLoop(iterator I)
This removes the specified child from being a subloop of this loop.
Definition: LoopInfo.h:342
The main scalar evolution driver.
This file implements the Loop Fusion pass.
BlockT * getLoopPreheader() const
If there is a preheader for this loop, return it.
Definition: LoopInfoImpl.h:173
STATISTIC(NumFunctions, "Total number of functions")
void flush()
Apply all pending updates to available trees and flush all BasicBlocks awaiting deletion.
Analysis pass which computes a DominatorTree.
Definition: Dominators.h:230
F(f)
block Block Frequency true
An instruction for reading from memory.
Definition: Instructions.h:167
const Instruction * getTerminator() const LLVM_READONLY
Returns the terminator instruction if the block is well formed or null if the block is not well forme...
Definition: BasicBlock.cpp:137
INITIALIZE_PASS_BEGIN(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false, false) FunctionPass *llvm
Definition: LoopFuse.cpp:1205
DependenceInfo - This class is the main dependence-analysis driver.
bool isValid() const
Check if all members of the class are valid.
Definition: LoopFuse.cpp:200
FusionCandidate(Loop *L, const DominatorTree *DT, const PostDominatorTree *PDT)
Definition: LoopFuse.cpp:154
LLVMContext & getContext() const
Get the context in which this basic block lives.
Definition: BasicBlock.cpp:32
bool operator()(const FusionCandidate &LHS, const FusionCandidate &RHS) const
Comparison functor to sort two Control Flow Equivalent fusion candidates into dominance order...
Definition: LoopFuse.cpp:259
bool isInvalid() const
Return true if this loop is no longer valid.
Definition: LoopInfo.h:195
static void reportLoopFusion(const FusionCandidate &FC0, const FusionCandidate &FC1, OptimizationRemarkEmitter &ORE)
Definition: LoopFuse.cpp:394
SmallVector< Instruction *, 16 > MemReads
Vector of instructions in this loop that read from memory.
Definition: LoopFuse.cpp:141
void verify(const DominatorTreeBase< BlockT, false > &DomTree) const
Definition: LoopInfoImpl.h:703
static cl::opt< FusionDependenceAnalysisChoice > FusionDependenceAnalysis("loop-fusion-dependence-analysis", cl::desc("Which dependence analysis should loop fusion use?"), cl::values(clEnumValN(FUSION_DEPENDENCE_ANALYSIS_SCEV, "scev", "Use the scalar evolution interface"), clEnumValN(FUSION_DEPENDENCE_ANALYSIS_DA, "da", "Use the dependence analysis interface"), clEnumValN(FUSION_DEPENDENCE_ANALYSIS_ALL, "all", "Use all available analyses")), cl::Hidden, cl::init(FUSION_DEPENDENCE_ANALYSIS_ALL), cl::ZeroOrMore)
BasicBlock * Latch
Latch of the loop.
Definition: LoopFuse.cpp:137
void dump() const
Support for debugging, callable in GDB: V->dump()
Definition: AsmWriter.cpp:4382
LoopDepthTree(LoopInfo &LI)
Definition: LoopFuse.cpp:337
AnalysisUsage & addRequired()
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:50
bool fuseLoops(Function &F)
This is the main entry point for loop fusion.
Definition: LoopFuse.cpp:429
iterator end()
Definition: LoopFuse.cpp:369
const DataLayout & getDataLayout() const
Get the data layout for the module&#39;s target platform.
Definition: Module.cpp:369
bool verify(VerificationLevel VL=VerificationLevel::Full) const
verify - checks if the tree is correct.
LoopT * getLoopFor(const BlockT *BB) const
Return the inner most loop that BB lives in.
Definition: LoopInfo.h:858
void addBlockEntry(BlockT *BB)
This adds a basic block directly to the basic block list.
Definition: LoopInfo.h:361
bool empty() const
Definition: LoopFuse.cpp:364
static cl::opt< bool > VerboseFusionDebugging("loop-fusion-verbose-debug", cl::desc("Enable verbose debugging for Loop Fusion"), cl::Hidden, cl::init(false), cl::ZeroOrMore)
void erase(Loop *L)
Update LoopInfo after removing the last backedge from a loop.
Definition: LoopInfo.cpp:815
Analysis pass that exposes the LoopInfo for a function.
Definition: LoopInfo.h:1113
void replaceSuccessorsPhiUsesWith(BasicBlock *Old, BasicBlock *New)
Update all phi nodes in this basic block&#39;s successors to refer to basic block New instead of basic bl...
Definition: BasicBlock.cpp:445
BlockT * getHeader() const
Definition: LoopInfo.h:102
bool isKnownPredicate(ICmpInst::Predicate Pred, const SCEV *LHS, const SCEV *RHS)
Test if the given expression is known to satisfy the condition described by Pred, LHS...
Type * getType() const
All values are typed, get the type of this value.
Definition: Value.h:244
This node represents a polynomial recurrence on the trip count of the specified loop.
const BasicBlock * getSingleSuccessor() const
Return the successor of this block if it has a single successor.
Definition: BasicBlock.cpp:268
Value * getLoadStorePointerOperand(Value *V)
A helper function that returns the pointer operand of a load or store instruction.
An instruction for storing to memory.
Definition: Instructions.h:320
op_iterator op_begin() const
void verify() const
Verify that all members are in sync with the Loop object.
Definition: LoopFuse.cpp:206
const SCEV * getStepRecurrence(ScalarEvolution &SE) const
Constructs and returns the recurrence indicating how much this expression steps by.
Concrete subclass of DominatorTreeBase that is used to compute a normal dominator tree...
Definition: Dominators.h:144
const SCEV * getAddRecExpr(const SCEV *Start, const SCEV *Step, const Loop *L, SCEV::NoWrapFlags Flags)
Get an add recurrence expression for the specified loop.
void replaceUsesOfWith(Value *From, Value *To)
Replace uses of one Value with another.
Definition: User.cpp:20
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:432
std::unique_ptr< Dependence > depends(Instruction *Src, Instruction *Dst, bool PossiblyLoopIndependent)
depends - Tests for a dependence between the Src and Dst instructions.
A set of analyses that are preserved following a run of a transformation pass.
Definition: PassManager.h:153
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:216
LLVM Basic Block Representation.
Definition: BasicBlock.h:57
void deleteBB(BasicBlock *DelBB)
Delete DelBB.
Collect all loops in function at the same nest level, starting at the outermost level.
Definition: LoopFuse.cpp:332
This function has undefined behavior.
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:652
bool Valid
Are all of the members of this fusion candidate still valid.
Definition: LoopFuse.cpp:145
const Instruction & front() const
Definition: BasicBlock.h:280
size_t size() const
Definition: BasicBlock.h:278
Diagnostic information for applied optimization remarks.
Interval::pred_iterator pred_begin(Interval *I)
pred_begin/pred_end - define methods so that Intervals may be used just like BasicBlocks can with the...
Definition: Interval.h:112
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM)
Definition: LoopFuse.cpp:1181
unsigned getDepth() const
Definition: LoopFuse.cpp:366
const SCEV * getSCEVAtScope(const SCEV *S, const Loop *L)
Return a SCEV expression for the specified value at the specified scope in the program.
Represent the analysis usage information of a pass.
const_iterator begin() const
Definition: LoopFuse.cpp:370
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:732
bool SCEVExprContains(const SCEV *Root, PredTy Pred)
Return true if any node in Root satisfies the predicate Pred.
FunctionPass class - This class is used to implement most global optimizations.
Definition: Pass.h:284
Interval::pred_iterator pred_end(Interval *I)
Definition: Interval.h:115
bool isAffine() const
Return true if this represents an expression A + B*x where A and B are loop invariant values...
void forgetValue(Value *V)
This method should be called by the client when it has changed a value in a way that may effect its v...
BlockT * getExitBlock() const
If getExitBlocks would return exactly one block, return that block.
Definition: LoopInfoImpl.h:75
static UndefValue * get(Type *T)
Static factory methods - Return an &#39;undef&#39; object of the specified type.
Definition: Constants.cpp:1424
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: PassManager.h:159
INITIALIZE_PASS_END(RegBankSelect, DEBUG_TYPE, "Assign register bank of generic virtual registers", false, false) RegBankSelect
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
signed greater than
Definition: InstrTypes.h:759
char & LoopSimplifyID
static char ID
Definition: LoopFuse.cpp:1144
void descend()
Descend the tree to the next (inner) nesting level.
Definition: LoopFuse.cpp:351
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: LoopFuse.cpp:1150
bool contains(const LoopT *L) const
Return true if the specified loop is contained within in this loop.
Definition: LoopInfo.h:112
void print(raw_ostream &OS) const
Definition: LoopInfoImpl.h:632
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Output the remark via the diagnostic handler and to the optimization record file. ...
auto size(R &&Range, typename std::enable_if< std::is_same< typename std::iterator_traits< decltype(Range.begin())>::iterator_category, std::random_access_iterator_tag >::value, void >::type *=nullptr) -> decltype(std::distance(Range.begin(), Range.end()))
Get the size of a range.
Definition: STLExtras.h:1173
bool isKnownPositive(const SCEV *S)
Test if the given expression is known to be positive.
BasicBlock * ExitingBlock
Blocks in the loop that exit the loop.
Definition: LoopFuse.cpp:133
void applyUpdates(ArrayRef< DominatorTree::UpdateType > Updates)
Submit updates to all available trees.
Fast - This calling convention attempts to make calls as fast as possible (e.g.
Definition: CallingConv.h:42
AnalysisUsage & addRequiredID(const void *ID)
Definition: Pass.cpp:314
reverse_iterator rend() const
Definition: LoopInfo.h:836
bool dominates(const Instruction *Def, const Use &U) const
Return true if Def dominates a use in User.
Definition: Dominators.cpp:248
Analysis pass which computes a PostDominatorTree.
iterator begin() const
Definition: LoopInfo.h:144
BasicBlock * ExitBlock
The successor block of this loop (where the exiting blocks go to)
Definition: LoopFuse.cpp:135
static PHINode * Create(Type *Ty, unsigned NumReservedValues, const Twine &NameStr="", Instruction *InsertBefore=nullptr)
Constructors - NumReservedValues is a hint for the number of incoming edges that this phi node will h...
#define DEBUG_TYPE
Definition: LoopFuse.cpp:67
size_t size() const
Definition: LoopFuse.cpp:365
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:132
SmallVector< Instruction *, 16 > MemWrites
Vector of instructions in this loop that write to memory.
Definition: LoopFuse.cpp:143
typename SuperClass::iterator iterator
Definition: SmallVector.h:319
PostDominatorTree Class - Concrete subclass of DominatorTree that is used to compute the post-dominat...
void append(in_iter in_start, in_iter in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:387
reverse_iterator rbegin() const
Definition: LoopInfo.h:835
bool dominates(const DomTreeNodeBase< NodeT > *A, const DomTreeNodeBase< NodeT > *B) const
dominates - Returns true iff A dominates B.
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:627
FunctionPass * createLoopFusePass()
Analysis pass that exposes the ScalarEvolution for a function.
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::ZeroOrMore, cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate IT block based on arch"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow deprecated IT based on ARMv8"), clEnumValN(NoRestrictedIT, "arm-no-restrict-it", "Allow IT blocks based on ARMv7")))
Virtual Register Rewriter
Definition: VirtRegMap.cpp:221
bool isLoopSimplifyForm() const
Return true if the Loop is in the form that the LoopSimplify form transforms loops to...
Definition: LoopInfo.cpp:425
Loop * L
The loop that this fusion candidate represents.
Definition: LoopFuse.cpp:139
void forgetLoop(const Loop *L)
This method should be called by the client when it has changed a loop in a way that may effect Scalar...
This class represents an analyzed expression in the program.
void addChildLoop(LoopT *NewChild)
Add the specified loop to be a child of this loop.
Definition: LoopInfo.h:333
StringRef getName() const
Definition: LoopInfo.h:757
Represents a single loop in the control flow graph.
Definition: LoopInfo.h:467
StringRef getName() const
Return a constant reference to the value&#39;s name.
Definition: Value.cpp:214
iterator begin()
Definition: LoopFuse.cpp:368
const Function * getParent() const
Return the enclosing method, or null if none.
Definition: BasicBlock.h:106
#define I(x, y, z)
Definition: MD5.cpp:58
LLVM_DUMP_METHOD void dump() const
Definition: LoopFuse.cpp:218
bool verifyFunction(const Function &F, raw_ostream *OS=nullptr)
Check a function for errors, useful for use when debugging a pass.
Definition: Verifier.cpp:4999
LLVM_NODISCARD std::enable_if<!is_simple_type< Y >::value, typename cast_retty< X, const Y >::ret_type >::type dyn_cast(const Y &Val)
Definition: Casting.h:332
block_iterator block_end() const
Definition: LoopInfo.h:157
void changeLoopFor(BlockT *BB, LoopT *L)
Change the top-level loop that contains BB to the specified loop.
Definition: LoopInfo.h:890
void preserve()
Mark an analysis as preserved.
Definition: PassManager.h:174
iterator_range< const_phi_iterator > phis() const
Returns a range that iterates over the phis in the basic block.
Definition: BasicBlock.h:324
bool runOnFunction(Function &F) override
runOnFunction - Virtual method overriden by subclasses to do the per-function processing of the pass...
Definition: LoopFuse.cpp:1165
const_iterator end() const
Definition: LoopFuse.cpp:371
FusionDependenceAnalysisChoice
Definition: LoopFuse.cpp:89
raw_ostream & operator<<(raw_ostream &OS, const APInt &I)
Definition: APInt.h:2038
void removeBlockFromLoop(BlockT *BB)
This removes the specified basic block from the current loop, updating the Blocks as appropriate...
Definition: LoopInfo.h:398
BasicBlock * Preheader
Cache of parts of the loop used throughout loop fusion.
Definition: LoopFuse.cpp:129
bool empty() const
Definition: LoopInfo.h:148
const SCEV * getBackedgeTakenCount(const Loop *L)
If the specified loop has a predictable backedge-taken count, return it, otherwise return a SCEVCould...
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
Module * getParent()
Get the module that this global value is contained inside of...
Definition: GlobalValue.h:575
LLVM Value Representation.
Definition: Value.h:72
BasicBlock * Header
Header of the loop this candidate represents.
Definition: LoopFuse.cpp:131
bool isSCEVable(Type *Ty) const
Test if values of the given type are analyzable within the SCEV framework.
OptimizationRemarkEmitter legacy analysis pass.
This class implements an extremely fast bulk output stream that can only output to a stream...
Definition: raw_ostream.h:45
The legacy pass manager&#39;s analysis pass to compute loop information.
Definition: LoopInfo.h:1138
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:48
A container for analyses that lazily runs them and caches their results.
Legacy analysis pass which computes a DominatorTree.
Definition: Dominators.h:259
const DominatorTree * DT
Dominator and PostDominator trees are needed for the FusionCandidateCompare function, required by FusionCandidateSet to determine where the FusionCandidate should be inserted into the set.
Definition: LoopFuse.cpp:151
#define LLVM_DEBUG(X)
Definition: Debug.h:122
LoopsOnLevelTy::const_iterator const_iterator
Definition: LoopFuse.cpp:335
iterator_range< block_iterator > blocks() const
Definition: LoopInfo.h:158
void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner="")
Function to print a loop&#39;s contents as LLVM&#39;s text IR assembly.
Definition: LoopInfo.cpp:894
block_iterator block_begin() const
Definition: LoopInfo.h:156
BlockT * getExitingBlock() const
If getExitingBlocks would return exactly one block, return that block.
Definition: LoopInfoImpl.h:49
NoWrapFlags getNoWrapFlags(NoWrapFlags Mask=NoWrapMask) const
bool hasLoopInvariantBackedgeTakenCount(const Loop *L)
Return true if the specified loop has an analyzable loop-invariant backedge-taken count...
The optimization diagnostic interface.
void removeLoop(const Loop *L)
Record that a given loop has been removed from the function and is no longer valid.
Definition: LoopFuse.cpp:348
void removeBlock(BlockT *BB)
This method completely removes BB from all data structures, including all of the Loop objects it is n...
Definition: LoopInfo.h:917
This visitor recursively visits a SCEV expression and re-writes it.
signed greater or equal
Definition: InstrTypes.h:760