LLVM  10.0.0svn
LoopFuse.cpp
Go to the documentation of this file.
1 //===- LoopFuse.cpp - Loop Fusion Pass ------------------------------------===//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 ///
9 /// \file
10 /// This file implements the loop fusion pass.
11 /// The implementation is largely based on the following document:
12 ///
13 /// Code Transformations to Augment the Scope of Loop Fusion in a
14 /// Production Compiler
15 /// Christopher Mark Barton
16 /// MSc Thesis
17 /// https://webdocs.cs.ualberta.ca/~amaral/thesis/ChristopherBartonMSc.pdf
18 ///
19 /// The general approach taken is to collect sets of control flow equivalent
20 /// loops and test whether they can be fused. The necessary conditions for
21 /// fusion are:
22 /// 1. The loops must be adjacent (there cannot be any statements between
23 /// the two loops).
24 /// 2. The loops must be conforming (they must execute the same number of
25 /// iterations).
26 /// 3. The loops must be control flow equivalent (if one loop executes, the
27 /// other is guaranteed to execute).
28 /// 4. There cannot be any negative distance dependencies between the loops.
29 /// If all of these conditions are satisfied, it is safe to fuse the loops.
30 ///
31 /// This implementation creates FusionCandidates that represent the loop and the
32 /// necessary information needed by fusion. It then operates on the fusion
33 /// candidates, first confirming that the candidate is eligible for fusion. The
34 /// candidates are then collected into control flow equivalent sets, sorted in
35 /// dominance order. Each set of control flow equivalent candidates is then
36 /// traversed, attempting to fuse pairs of candidates in the set. If all
37 /// requirements for fusion are met, the two candidates are fused, creating a
38 /// new (fused) candidate which is then added back into the set to consider for
39 /// additional fusion.
40 ///
41 /// This implementation currently does not make any modifications to remove
42 /// conditions for fusion. Code transformations to make loops conform to each of
43 /// the conditions for fusion are discussed in more detail in the document
44 /// above. These can be added to the current implementation in the future.
45 //===----------------------------------------------------------------------===//
46 
48 #include "llvm/ADT/Statistic.h"
51 #include "llvm/Analysis/LoopInfo.h"
56 #include "llvm/IR/Function.h"
57 #include "llvm/IR/Verifier.h"
58 #include "llvm/Pass.h"
59 #include "llvm/Support/Debug.h"
61 #include "llvm/Transforms/Scalar.h"
62 #include "llvm/Transforms/Utils.h"
64 
65 using namespace llvm;
66 
67 #define DEBUG_TYPE "loop-fusion"
68 
69 STATISTIC(FuseCounter, "Loops fused");
70 STATISTIC(NumFusionCandidates, "Number of candidates for loop fusion");
71 STATISTIC(InvalidPreheader, "Loop has invalid preheader");
72 STATISTIC(InvalidHeader, "Loop has invalid header");
73 STATISTIC(InvalidExitingBlock, "Loop has invalid exiting blocks");
74 STATISTIC(InvalidExitBlock, "Loop has invalid exit block");
75 STATISTIC(InvalidLatch, "Loop has invalid latch");
76 STATISTIC(InvalidLoop, "Loop is invalid");
77 STATISTIC(AddressTakenBB, "Basic block has address taken");
78 STATISTIC(MayThrowException, "Loop may throw an exception");
79 STATISTIC(ContainsVolatileAccess, "Loop contains a volatile access");
80 STATISTIC(NotSimplifiedForm, "Loop is not in simplified form");
81 STATISTIC(InvalidDependencies, "Dependencies prevent fusion");
82 STATISTIC(UnknownTripCount, "Loop has unknown trip count");
83 STATISTIC(UncomputableTripCount, "SCEV cannot compute trip count of loop");
84 STATISTIC(NonEqualTripCount, "Loop trip counts are not the same");
85 STATISTIC(NonAdjacent, "Loops are not adjacent");
86 STATISTIC(NonEmptyPreheader, "Loop has a non-empty preheader");
87 STATISTIC(FusionNotBeneficial, "Fusion is not beneficial");
88 
93 };
94 
96  "loop-fusion-dependence-analysis",
97  cl::desc("Which dependence analysis should loop fusion use?"),
99  "Use the scalar evolution interface"),
101  "Use the dependence analysis interface"),
103  "Use all available analyses")),
105 
106 #ifndef NDEBUG
107 static cl::opt<bool>
108  VerboseFusionDebugging("loop-fusion-verbose-debug",
109  cl::desc("Enable verbose debugging for Loop Fusion"),
111 #endif
112 
113 /// This class is used to represent a candidate for loop fusion. When it is
114 /// constructed, it checks the conditions for loop fusion to ensure that it
115 /// represents a valid candidate. It caches several parts of a loop that are
116 /// used throughout loop fusion (e.g., loop preheader, loop header, etc) instead
117 /// of continually querying the underlying Loop to retrieve these values. It is
118 /// assumed these will not change throughout loop fusion.
119 ///
120 /// The invalidate method should be used to indicate that the FusionCandidate is
121 /// no longer a valid candidate for fusion. Similarly, the isValid() method can
122 /// be used to ensure that the FusionCandidate is still valid for fusion.
124  /// Cache of parts of the loop used throughout loop fusion. These should not
125  /// need to change throughout the analysis and transformation.
126  /// These parts are cached to avoid repeatedly looking up in the Loop class.
127 
128  /// Preheader of the loop this candidate represents
130  /// Header of the loop this candidate represents
132  /// Blocks in the loop that exit the loop
134  /// The successor block of this loop (where the exiting blocks go to)
136  /// Latch of the loop
138  /// The loop that this fusion candidate represents
139  Loop *L;
140  /// Vector of instructions in this loop that read from memory
142  /// Vector of instructions in this loop that write to memory
144  /// Are all of the members of this fusion candidate still valid
145  bool Valid;
146 
147  /// Dominator and PostDominator trees are needed for the
148  /// FusionCandidateCompare function, required by FusionCandidateSet to
149  /// determine where the FusionCandidate should be inserted into the set. These
150  /// are used to establish ordering of the FusionCandidates based on dominance.
153 
155 
158  : Preheader(L->getLoopPreheader()), Header(L->getHeader()),
159  ExitingBlock(L->getExitingBlock()), ExitBlock(L->getExitBlock()),
160  Latch(L->getLoopLatch()), L(L), Valid(true), DT(DT), PDT(PDT),
161  ORE(ORE) {
162 
163  // Walk over all blocks in the loop and check for conditions that may
164  // prevent fusion. For each block, walk over all instructions and collect
165  // the memory reads and writes If any instructions that prevent fusion are
166  // found, invalidate this object and return.
167  for (BasicBlock *BB : L->blocks()) {
168  if (BB->hasAddressTaken()) {
169  invalidate();
170  reportInvalidCandidate(AddressTakenBB);
171  return;
172  }
173 
174  for (Instruction &I : *BB) {
175  if (I.mayThrow()) {
176  invalidate();
177  reportInvalidCandidate(MayThrowException);
178  return;
179  }
180  if (StoreInst *SI = dyn_cast<StoreInst>(&I)) {
181  if (SI->isVolatile()) {
182  invalidate();
183  reportInvalidCandidate(ContainsVolatileAccess);
184  return;
185  }
186  }
187  if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
188  if (LI->isVolatile()) {
189  invalidate();
190  reportInvalidCandidate(ContainsVolatileAccess);
191  return;
192  }
193  }
194  if (I.mayWriteToMemory())
195  MemWrites.push_back(&I);
196  if (I.mayReadFromMemory())
197  MemReads.push_back(&I);
198  }
199  }
200  }
201 
202  /// Check if all members of the class are valid.
203  bool isValid() const {
204  return Preheader && Header && ExitingBlock && ExitBlock && Latch && L &&
205  !L->isInvalid() && Valid;
206  }
207 
208  /// Verify that all members are in sync with the Loop object.
209  void verify() const {
210  assert(isValid() && "Candidate is not valid!!");
211  assert(!L->isInvalid() && "Loop is invalid!");
212  assert(Preheader == L->getLoopPreheader() && "Preheader is out of sync");
213  assert(Header == L->getHeader() && "Header is out of sync");
214  assert(ExitingBlock == L->getExitingBlock() &&
215  "Exiting Blocks is out of sync");
216  assert(ExitBlock == L->getExitBlock() && "Exit block is out of sync");
217  assert(Latch == L->getLoopLatch() && "Latch is out of sync");
218  }
219 
220 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
221  LLVM_DUMP_METHOD void dump() const {
222  dbgs() << "\tPreheader: " << (Preheader ? Preheader->getName() : "nullptr")
223  << "\n"
224  << "\tHeader: " << (Header ? Header->getName() : "nullptr") << "\n"
225  << "\tExitingBB: "
226  << (ExitingBlock ? ExitingBlock->getName() : "nullptr") << "\n"
227  << "\tExitBB: " << (ExitBlock ? ExitBlock->getName() : "nullptr")
228  << "\n"
229  << "\tLatch: " << (Latch ? Latch->getName() : "nullptr") << "\n";
230  }
231 #endif
232 
233  /// Determine if a fusion candidate (representing a loop) is eligible for
234  /// fusion. Note that this only checks whether a single loop can be fused - it
235  /// does not check whether it is *legal* to fuse two loops together.
237  if (!isValid()) {
238  LLVM_DEBUG(dbgs() << "FC has invalid CFG requirements!\n");
239  if (!Preheader)
240  ++InvalidPreheader;
241  if (!Header)
242  ++InvalidHeader;
243  if (!ExitingBlock)
244  ++InvalidExitingBlock;
245  if (!ExitBlock)
246  ++InvalidExitBlock;
247  if (!Latch)
248  ++InvalidLatch;
249  if (L->isInvalid())
250  ++InvalidLoop;
251 
252  return false;
253  }
254 
255  // Require ScalarEvolution to be able to determine a trip count.
257  LLVM_DEBUG(dbgs() << "Loop " << L->getName()
258  << " trip count not computable!\n");
259  return reportInvalidCandidate(UnknownTripCount);
260  }
261 
262  if (!L->isLoopSimplifyForm()) {
263  LLVM_DEBUG(dbgs() << "Loop " << L->getName()
264  << " is not in simplified form!\n");
265  return reportInvalidCandidate(NotSimplifiedForm);
266  }
267 
268  return true;
269  }
270 
271 private:
272  // This is only used internally for now, to clear the MemWrites and MemReads
273  // list and setting Valid to false. I can't envision other uses of this right
274  // now, since once FusionCandidates are put into the FusionCandidateSet they
275  // are immutable. Thus, any time we need to change/update a FusionCandidate,
276  // we must create a new one and insert it into the FusionCandidateSet to
277  // ensure the FusionCandidateSet remains ordered correctly.
278  void invalidate() {
279  MemWrites.clear();
280  MemReads.clear();
281  Valid = false;
282  }
283 
284  bool reportInvalidCandidate(llvm::Statistic &Stat) const {
285  using namespace ore;
286  assert(L && Preheader && "Fusion candidate not initialized properly!");
287  ++Stat;
289  L->getStartLoc(), Preheader)
290  << "[" << Preheader->getParent()->getName() << "]: "
291  << "Loop is not a candidate for fusion: " << Stat.getDesc());
292  return false;
293  }
294 };
295 
297  const FusionCandidate &FC) {
298  if (FC.isValid())
299  OS << FC.Preheader->getName();
300  else
301  OS << "<Invalid>";
302 
303  return OS;
304 }
305 
307  /// Comparison functor to sort two Control Flow Equivalent fusion candidates
308  /// into dominance order.
309  /// If LHS dominates RHS and RHS post-dominates LHS, return true;
310  /// IF RHS dominates LHS and LHS post-dominates RHS, return false;
311  bool operator()(const FusionCandidate &LHS,
312  const FusionCandidate &RHS) const {
313  const DominatorTree *DT = LHS.DT;
314 
315  // Do not save PDT to local variable as it is only used in asserts and thus
316  // will trigger an unused variable warning if building without asserts.
317  assert(DT && LHS.PDT && "Expecting valid dominator tree");
318 
319  // Do this compare first so if LHS == RHS, function returns false.
320  if (DT->dominates(RHS.Preheader, LHS.Preheader)) {
321  // RHS dominates LHS
322  // Verify LHS post-dominates RHS
323  assert(LHS.PDT->dominates(LHS.Preheader, RHS.Preheader));
324  return false;
325  }
326 
327  if (DT->dominates(LHS.Preheader, RHS.Preheader)) {
328  // Verify RHS Postdominates LHS
329  assert(LHS.PDT->dominates(RHS.Preheader, LHS.Preheader));
330  return true;
331  }
332 
333  // If LHS does not dominate RHS and RHS does not dominate LHS then there is
334  // no dominance relationship between the two FusionCandidates. Thus, they
335  // should not be in the same set together.
337  "No dominance relationship between these fusion candidates!");
338  }
339 };
340 
341 namespace {
342 using LoopVector = SmallVector<Loop *, 4>;
343 
344 // Set of Control Flow Equivalent (CFE) Fusion Candidates, sorted in dominance
345 // order. Thus, if FC0 comes *before* FC1 in a FusionCandidateSet, then FC0
346 // dominates FC1 and FC1 post-dominates FC0.
347 // std::set was chosen because we want a sorted data structure with stable
348 // iterators. A subsequent patch to loop fusion will enable fusing non-ajdacent
349 // loops by moving intervening code around. When this intervening code contains
350 // loops, those loops will be moved also. The corresponding FusionCandidates
351 // will also need to be moved accordingly. As this is done, having stable
352 // iterators will simplify the logic. Similarly, having an efficient insert that
353 // keeps the FusionCandidateSet sorted will also simplify the implementation.
354 using FusionCandidateSet = std::set<FusionCandidate, FusionCandidateCompare>;
355 using FusionCandidateCollection = SmallVector<FusionCandidateSet, 4>;
356 } // namespace
357 
359  const FusionCandidateSet &CandSet) {
360  for (auto IT : CandSet)
361  OS << IT << "\n";
362 
363  return OS;
364 }
365 
366 #if !defined(NDEBUG)
367 static void
368 printFusionCandidates(const FusionCandidateCollection &FusionCandidates) {
369  dbgs() << "Fusion Candidates: \n";
370  for (const auto &CandidateSet : FusionCandidates) {
371  dbgs() << "*** Fusion Candidate Set ***\n";
372  dbgs() << CandidateSet;
373  dbgs() << "****************************\n";
374  }
375 }
376 #endif
377 
378 /// Collect all loops in function at the same nest level, starting at the
379 /// outermost level.
380 ///
381 /// This data structure collects all loops at the same nest level for a
382 /// given function (specified by the LoopInfo object). It starts at the
383 /// outermost level.
388 
390  if (!LI.empty())
391  LoopsOnLevel.emplace_back(LoopVector(LI.rbegin(), LI.rend()));
392  }
393 
394  /// Test whether a given loop has been removed from the function, and thus is
395  /// no longer valid.
396  bool isRemovedLoop(const Loop *L) const { return RemovedLoops.count(L); }
397 
398  /// Record that a given loop has been removed from the function and is no
399  /// longer valid.
400  void removeLoop(const Loop *L) { RemovedLoops.insert(L); }
401 
402  /// Descend the tree to the next (inner) nesting level
403  void descend() {
404  LoopsOnLevelTy LoopsOnNextLevel;
405 
406  for (const LoopVector &LV : *this)
407  for (Loop *L : LV)
408  if (!isRemovedLoop(L) && L->begin() != L->end())
409  LoopsOnNextLevel.emplace_back(LoopVector(L->begin(), L->end()));
410 
411  LoopsOnLevel = LoopsOnNextLevel;
412  RemovedLoops.clear();
413  Depth++;
414  }
415 
416  bool empty() const { return size() == 0; }
417  size_t size() const { return LoopsOnLevel.size() - RemovedLoops.size(); }
418  unsigned getDepth() const { return Depth; }
419 
420  iterator begin() { return LoopsOnLevel.begin(); }
421  iterator end() { return LoopsOnLevel.end(); }
422  const_iterator begin() const { return LoopsOnLevel.begin(); }
423  const_iterator end() const { return LoopsOnLevel.end(); }
424 
425 private:
426  /// Set of loops that have been removed from the function and are no longer
427  /// valid.
428  SmallPtrSet<const Loop *, 8> RemovedLoops;
429 
430  /// Depth of the current level, starting at 1 (outermost loops).
431  unsigned Depth;
432 
433  /// Vector of loops at the current depth level that have the same parent loop
434  LoopsOnLevelTy LoopsOnLevel;
435 };
436 
437 #ifndef NDEBUG
438 static void printLoopVector(const LoopVector &LV) {
439  dbgs() << "****************************\n";
440  for (auto L : LV)
441  printLoop(*L, dbgs());
442  dbgs() << "****************************\n";
443 }
444 #endif
445 
446 struct LoopFuser {
447 private:
448  // Sets of control flow equivalent fusion candidates for a given nest level.
449  FusionCandidateCollection FusionCandidates;
450 
451  LoopDepthTree LDT;
452  DomTreeUpdater DTU;
453 
454  LoopInfo &LI;
455  DominatorTree &DT;
456  DependenceInfo &DI;
457  ScalarEvolution &SE;
458  PostDominatorTree &PDT;
460 
461 public:
464  OptimizationRemarkEmitter &ORE, const DataLayout &DL)
465  : LDT(LI), DTU(DT, PDT, DomTreeUpdater::UpdateStrategy::Lazy), LI(LI),
466  DT(DT), DI(DI), SE(SE), PDT(PDT), ORE(ORE) {}
467 
468  /// This is the main entry point for loop fusion. It will traverse the
469  /// specified function and collect candidate loops to fuse, starting at the
470  /// outermost nesting level and working inwards.
472 #ifndef NDEBUG
474  LI.print(dbgs());
475  }
476 #endif
477 
478  LLVM_DEBUG(dbgs() << "Performing Loop Fusion on function " << F.getName()
479  << "\n");
480  bool Changed = false;
481 
482  while (!LDT.empty()) {
483  LLVM_DEBUG(dbgs() << "Got " << LDT.size() << " loop sets for depth "
484  << LDT.getDepth() << "\n";);
485 
486  for (const LoopVector &LV : LDT) {
487  assert(LV.size() > 0 && "Empty loop set was build!");
488 
489  // Skip singleton loop sets as they do not offer fusion opportunities on
490  // this level.
491  if (LV.size() == 1)
492  continue;
493 #ifndef NDEBUG
495  LLVM_DEBUG({
496  dbgs() << " Visit loop set (#" << LV.size() << "):\n";
497  printLoopVector(LV);
498  });
499  }
500 #endif
501 
502  collectFusionCandidates(LV);
503  Changed |= fuseCandidates();
504  }
505 
506  // Finished analyzing candidates at this level.
507  // Descend to the next level and clear all of the candidates currently
508  // collected. Note that it will not be possible to fuse any of the
509  // existing candidates with new candidates because the new candidates will
510  // be at a different nest level and thus not be control flow equivalent
511  // with all of the candidates collected so far.
512  LLVM_DEBUG(dbgs() << "Descend one level!\n");
513  LDT.descend();
514  FusionCandidates.clear();
515  }
516 
517  if (Changed)
518  LLVM_DEBUG(dbgs() << "Function after Loop Fusion: \n"; F.dump(););
519 
520 #ifndef NDEBUG
521  assert(DT.verify());
522  assert(PDT.verify());
523  LI.verify(DT);
524  SE.verify();
525 #endif
526 
527  LLVM_DEBUG(dbgs() << "Loop Fusion complete\n");
528  return Changed;
529  }
530 
531 private:
532  /// Determine if two fusion candidates are control flow equivalent.
533  ///
534  /// Two fusion candidates are control flow equivalent if when one executes,
535  /// the other is guaranteed to execute. This is determined using dominators
536  /// and post-dominators: if A dominates B and B post-dominates A then A and B
537  /// are control-flow equivalent.
538  bool isControlFlowEquivalent(const FusionCandidate &FC0,
539  const FusionCandidate &FC1) const {
540  assert(FC0.Preheader && FC1.Preheader && "Expecting valid preheaders");
541 
542  if (DT.dominates(FC0.Preheader, FC1.Preheader))
543  return PDT.dominates(FC1.Preheader, FC0.Preheader);
544 
545  if (DT.dominates(FC1.Preheader, FC0.Preheader))
546  return PDT.dominates(FC0.Preheader, FC1.Preheader);
547 
548  return false;
549  }
550 
551  /// Iterate over all loops in the given loop set and identify the loops that
552  /// are eligible for fusion. Place all eligible fusion candidates into Control
553  /// Flow Equivalent sets, sorted by dominance.
554  void collectFusionCandidates(const LoopVector &LV) {
555  for (Loop *L : LV) {
556  FusionCandidate CurrCand(L, &DT, &PDT, ORE);
557  if (!CurrCand.isEligibleForFusion(SE))
558  continue;
559 
560  // Go through each list in FusionCandidates and determine if L is control
561  // flow equivalent with the first loop in that list. If it is, append LV.
562  // If not, go to the next list.
563  // If no suitable list is found, start another list and add it to
564  // FusionCandidates.
565  bool FoundSet = false;
566 
567  for (auto &CurrCandSet : FusionCandidates) {
568  if (isControlFlowEquivalent(*CurrCandSet.begin(), CurrCand)) {
569  CurrCandSet.insert(CurrCand);
570  FoundSet = true;
571 #ifndef NDEBUG
573  LLVM_DEBUG(dbgs() << "Adding " << CurrCand
574  << " to existing candidate set\n");
575 #endif
576  break;
577  }
578  }
579  if (!FoundSet) {
580  // No set was found. Create a new set and add to FusionCandidates
581 #ifndef NDEBUG
583  LLVM_DEBUG(dbgs() << "Adding " << CurrCand << " to new set\n");
584 #endif
585  FusionCandidateSet NewCandSet;
586  NewCandSet.insert(CurrCand);
587  FusionCandidates.push_back(NewCandSet);
588  }
589  NumFusionCandidates++;
590  }
591  }
592 
593  /// Determine if it is beneficial to fuse two loops.
594  ///
595  /// For now, this method simply returns true because we want to fuse as much
596  /// as possible (primarily to test the pass). This method will evolve, over
597  /// time, to add heuristics for profitability of fusion.
598  bool isBeneficialFusion(const FusionCandidate &FC0,
599  const FusionCandidate &FC1) {
600  return true;
601  }
602 
603  /// Determine if two fusion candidates have the same trip count (i.e., they
604  /// execute the same number of iterations).
605  ///
606  /// Note that for now this method simply returns a boolean value because there
607  /// are no mechanisms in loop fusion to handle different trip counts. In the
608  /// future, this behaviour can be extended to adjust one of the loops to make
609  /// the trip counts equal (e.g., loop peeling). When this is added, this
610  /// interface may need to change to return more information than just a
611  /// boolean value.
612  bool identicalTripCounts(const FusionCandidate &FC0,
613  const FusionCandidate &FC1) const {
614  const SCEV *TripCount0 = SE.getBackedgeTakenCount(FC0.L);
615  if (isa<SCEVCouldNotCompute>(TripCount0)) {
616  UncomputableTripCount++;
617  LLVM_DEBUG(dbgs() << "Trip count of first loop could not be computed!");
618  return false;
619  }
620 
621  const SCEV *TripCount1 = SE.getBackedgeTakenCount(FC1.L);
622  if (isa<SCEVCouldNotCompute>(TripCount1)) {
623  UncomputableTripCount++;
624  LLVM_DEBUG(dbgs() << "Trip count of second loop could not be computed!");
625  return false;
626  }
627  LLVM_DEBUG(dbgs() << "\tTrip counts: " << *TripCount0 << " & "
628  << *TripCount1 << " are "
629  << (TripCount0 == TripCount1 ? "identical" : "different")
630  << "\n");
631 
632  return (TripCount0 == TripCount1);
633  }
634 
635  /// Walk each set of control flow equivalent fusion candidates and attempt to
636  /// fuse them. This does a single linear traversal of all candidates in the
637  /// set. The conditions for legal fusion are checked at this point. If a pair
638  /// of fusion candidates passes all legality checks, they are fused together
639  /// and a new fusion candidate is created and added to the FusionCandidateSet.
640  /// The original fusion candidates are then removed, as they are no longer
641  /// valid.
642  bool fuseCandidates() {
643  bool Fused = false;
644  LLVM_DEBUG(printFusionCandidates(FusionCandidates));
645  for (auto &CandidateSet : FusionCandidates) {
646  if (CandidateSet.size() < 2)
647  continue;
648 
649  LLVM_DEBUG(dbgs() << "Attempting fusion on Candidate Set:\n"
650  << CandidateSet << "\n");
651 
652  for (auto FC0 = CandidateSet.begin(); FC0 != CandidateSet.end(); ++FC0) {
653  assert(!LDT.isRemovedLoop(FC0->L) &&
654  "Should not have removed loops in CandidateSet!");
655  auto FC1 = FC0;
656  for (++FC1; FC1 != CandidateSet.end(); ++FC1) {
657  assert(!LDT.isRemovedLoop(FC1->L) &&
658  "Should not have removed loops in CandidateSet!");
659 
660  LLVM_DEBUG(dbgs() << "Attempting to fuse candidate \n"; FC0->dump();
661  dbgs() << " with\n"; FC1->dump(); dbgs() << "\n");
662 
663  FC0->verify();
664  FC1->verify();
665 
666  if (!identicalTripCounts(*FC0, *FC1)) {
667  LLVM_DEBUG(dbgs() << "Fusion candidates do not have identical trip "
668  "counts. Not fusing.\n");
669  reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
670  NonEqualTripCount);
671  continue;
672  }
673 
674  if (!isAdjacent(*FC0, *FC1)) {
675  LLVM_DEBUG(dbgs()
676  << "Fusion candidates are not adjacent. Not fusing.\n");
677  reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1, NonAdjacent);
678  continue;
679  }
680 
681  // For now we skip fusing if the second candidate has any instructions
682  // in the preheader. This is done because we currently do not have the
683  // safety checks to determine if it is save to move the preheader of
684  // the second candidate past the body of the first candidate. Once
685  // these checks are added, this condition can be removed.
686  if (!isEmptyPreheader(*FC1)) {
687  LLVM_DEBUG(dbgs() << "Fusion candidate does not have empty "
688  "preheader. Not fusing.\n");
689  reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
690  NonEmptyPreheader);
691  continue;
692  }
693 
694  if (!dependencesAllowFusion(*FC0, *FC1)) {
695  LLVM_DEBUG(dbgs() << "Memory dependencies do not allow fusion!\n");
696  reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
697  InvalidDependencies);
698  continue;
699  }
700 
701  bool BeneficialToFuse = isBeneficialFusion(*FC0, *FC1);
702  LLVM_DEBUG(dbgs()
703  << "\tFusion appears to be "
704  << (BeneficialToFuse ? "" : "un") << "profitable!\n");
705  if (!BeneficialToFuse) {
706  reportLoopFusion<OptimizationRemarkMissed>(*FC0, *FC1,
707  FusionNotBeneficial);
708  continue;
709  }
710  // All analysis has completed and has determined that fusion is legal
711  // and profitable. At this point, start transforming the code and
712  // perform fusion.
713 
714  LLVM_DEBUG(dbgs() << "\tFusion is performed: " << *FC0 << " and "
715  << *FC1 << "\n");
716 
717  // Report fusion to the Optimization Remarks.
718  // Note this needs to be done *before* performFusion because
719  // performFusion will change the original loops, making it not
720  // possible to identify them after fusion is complete.
721  reportLoopFusion<OptimizationRemark>(*FC0, *FC1, FuseCounter);
722 
723  FusionCandidate FusedCand(performFusion(*FC0, *FC1), &DT, &PDT, ORE);
724  FusedCand.verify();
725  assert(FusedCand.isEligibleForFusion(SE) &&
726  "Fused candidate should be eligible for fusion!");
727 
728  // Notify the loop-depth-tree that these loops are not valid objects
729  LDT.removeLoop(FC1->L);
730 
731  CandidateSet.erase(FC0);
732  CandidateSet.erase(FC1);
733 
734  auto InsertPos = CandidateSet.insert(FusedCand);
735 
736  assert(InsertPos.second &&
737  "Unable to insert TargetCandidate in CandidateSet!");
738 
739  // Reset FC0 and FC1 the new (fused) candidate. Subsequent iterations
740  // of the FC1 loop will attempt to fuse the new (fused) loop with the
741  // remaining candidates in the current candidate set.
742  FC0 = FC1 = InsertPos.first;
743 
744  LLVM_DEBUG(dbgs() << "Candidate Set (after fusion): " << CandidateSet
745  << "\n");
746 
747  Fused = true;
748  }
749  }
750  }
751  return Fused;
752  }
753 
754  /// Rewrite all additive recurrences in a SCEV to use a new loop.
755  class AddRecLoopReplacer : public SCEVRewriteVisitor<AddRecLoopReplacer> {
756  public:
757  AddRecLoopReplacer(ScalarEvolution &SE, const Loop &OldL, const Loop &NewL,
758  bool UseMax = true)
759  : SCEVRewriteVisitor(SE), Valid(true), UseMax(UseMax), OldL(OldL),
760  NewL(NewL) {}
761 
762  const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) {
763  const Loop *ExprL = Expr->getLoop();
765  if (ExprL == &OldL) {
766  Operands.append(Expr->op_begin(), Expr->op_end());
767  return SE.getAddRecExpr(Operands, &NewL, Expr->getNoWrapFlags());
768  }
769 
770  if (OldL.contains(ExprL)) {
771  bool Pos = SE.isKnownPositive(Expr->getStepRecurrence(SE));
772  if (!UseMax || !Pos || !Expr->isAffine()) {
773  Valid = false;
774  return Expr;
775  }
776  return visit(Expr->getStart());
777  }
778 
779  for (const SCEV *Op : Expr->operands())
780  Operands.push_back(visit(Op));
781  return SE.getAddRecExpr(Operands, ExprL, Expr->getNoWrapFlags());
782  }
783 
784  bool wasValidSCEV() const { return Valid; }
785 
786  private:
787  bool Valid, UseMax;
788  const Loop &OldL, &NewL;
789  };
790 
791  /// Return false if the access functions of \p I0 and \p I1 could cause
792  /// a negative dependence.
793  bool accessDiffIsPositive(const Loop &L0, const Loop &L1, Instruction &I0,
794  Instruction &I1, bool EqualIsInvalid) {
795  Value *Ptr0 = getLoadStorePointerOperand(&I0);
796  Value *Ptr1 = getLoadStorePointerOperand(&I1);
797  if (!Ptr0 || !Ptr1)
798  return false;
799 
800  const SCEV *SCEVPtr0 = SE.getSCEVAtScope(Ptr0, &L0);
801  const SCEV *SCEVPtr1 = SE.getSCEVAtScope(Ptr1, &L1);
802 #ifndef NDEBUG
804  LLVM_DEBUG(dbgs() << " Access function check: " << *SCEVPtr0 << " vs "
805  << *SCEVPtr1 << "\n");
806 #endif
807  AddRecLoopReplacer Rewriter(SE, L0, L1);
808  SCEVPtr0 = Rewriter.visit(SCEVPtr0);
809 #ifndef NDEBUG
811  LLVM_DEBUG(dbgs() << " Access function after rewrite: " << *SCEVPtr0
812  << " [Valid: " << Rewriter.wasValidSCEV() << "]\n");
813 #endif
814  if (!Rewriter.wasValidSCEV())
815  return false;
816 
817  // TODO: isKnownPredicate doesnt work well when one SCEV is loop carried (by
818  // L0) and the other is not. We could check if it is monotone and test
819  // the beginning and end value instead.
820 
821  BasicBlock *L0Header = L0.getHeader();
822  auto HasNonLinearDominanceRelation = [&](const SCEV *S) {
823  const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(S);
824  if (!AddRec)
825  return false;
826  return !DT.dominates(L0Header, AddRec->getLoop()->getHeader()) &&
827  !DT.dominates(AddRec->getLoop()->getHeader(), L0Header);
828  };
829  if (SCEVExprContains(SCEVPtr1, HasNonLinearDominanceRelation))
830  return false;
831 
832  ICmpInst::Predicate Pred =
833  EqualIsInvalid ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_SGE;
834  bool IsAlwaysGE = SE.isKnownPredicate(Pred, SCEVPtr0, SCEVPtr1);
835 #ifndef NDEBUG
837  LLVM_DEBUG(dbgs() << " Relation: " << *SCEVPtr0
838  << (IsAlwaysGE ? " >= " : " may < ") << *SCEVPtr1
839  << "\n");
840 #endif
841  return IsAlwaysGE;
842  }
843 
844  /// Return true if the dependences between @p I0 (in @p L0) and @p I1 (in
845  /// @p L1) allow loop fusion of @p L0 and @p L1. The dependence analyses
846  /// specified by @p DepChoice are used to determine this.
847  bool dependencesAllowFusion(const FusionCandidate &FC0,
848  const FusionCandidate &FC1, Instruction &I0,
849  Instruction &I1, bool AnyDep,
850  FusionDependenceAnalysisChoice DepChoice) {
851 #ifndef NDEBUG
853  LLVM_DEBUG(dbgs() << "Check dep: " << I0 << " vs " << I1 << " : "
854  << DepChoice << "\n");
855  }
856 #endif
857  switch (DepChoice) {
859  return accessDiffIsPositive(*FC0.L, *FC1.L, I0, I1, AnyDep);
861  auto DepResult = DI.depends(&I0, &I1, true);
862  if (!DepResult)
863  return true;
864 #ifndef NDEBUG
866  LLVM_DEBUG(dbgs() << "DA res: "; DepResult->dump(dbgs());
867  dbgs() << " [#l: " << DepResult->getLevels() << "][Ordered: "
868  << (DepResult->isOrdered() ? "true" : "false")
869  << "]\n");
870  LLVM_DEBUG(dbgs() << "DepResult Levels: " << DepResult->getLevels()
871  << "\n");
872  }
873 #endif
874 
875  if (DepResult->getNextPredecessor() || DepResult->getNextSuccessor())
876  LLVM_DEBUG(
877  dbgs() << "TODO: Implement pred/succ dependence handling!\n");
878 
879  // TODO: Can we actually use the dependence info analysis here?
880  return false;
881  }
882 
884  return dependencesAllowFusion(FC0, FC1, I0, I1, AnyDep,
886  dependencesAllowFusion(FC0, FC1, I0, I1, AnyDep,
888  }
889 
890  llvm_unreachable("Unknown fusion dependence analysis choice!");
891  }
892 
893  /// Perform a dependence check and return if @p FC0 and @p FC1 can be fused.
894  bool dependencesAllowFusion(const FusionCandidate &FC0,
895  const FusionCandidate &FC1) {
896  LLVM_DEBUG(dbgs() << "Check if " << FC0 << " can be fused with " << FC1
897  << "\n");
898  assert(FC0.L->getLoopDepth() == FC1.L->getLoopDepth());
899  assert(DT.dominates(FC0.Preheader, FC1.Preheader));
900 
901  for (Instruction *WriteL0 : FC0.MemWrites) {
902  for (Instruction *WriteL1 : FC1.MemWrites)
903  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *WriteL1,
904  /* AnyDep */ false,
906  InvalidDependencies++;
907  return false;
908  }
909  for (Instruction *ReadL1 : FC1.MemReads)
910  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *ReadL1,
911  /* AnyDep */ false,
913  InvalidDependencies++;
914  return false;
915  }
916  }
917 
918  for (Instruction *WriteL1 : FC1.MemWrites) {
919  for (Instruction *WriteL0 : FC0.MemWrites)
920  if (!dependencesAllowFusion(FC0, FC1, *WriteL0, *WriteL1,
921  /* AnyDep */ false,
923  InvalidDependencies++;
924  return false;
925  }
926  for (Instruction *ReadL0 : FC0.MemReads)
927  if (!dependencesAllowFusion(FC0, FC1, *ReadL0, *WriteL1,
928  /* AnyDep */ false,
930  InvalidDependencies++;
931  return false;
932  }
933  }
934 
935  // Walk through all uses in FC1. For each use, find the reaching def. If the
936  // def is located in FC0 then it is is not safe to fuse.
937  for (BasicBlock *BB : FC1.L->blocks())
938  for (Instruction &I : *BB)
939  for (auto &Op : I.operands())
940  if (Instruction *Def = dyn_cast<Instruction>(Op))
941  if (FC0.L->contains(Def->getParent())) {
942  InvalidDependencies++;
943  return false;
944  }
945 
946  return true;
947  }
948 
949  /// Determine if the exit block of \p FC0 is the preheader of \p FC1. In this
950  /// case, there is no code in between the two fusion candidates, thus making
951  /// them adjacent.
952  bool isAdjacent(const FusionCandidate &FC0,
953  const FusionCandidate &FC1) const {
954  return FC0.ExitBlock == FC1.Preheader;
955  }
956 
957  bool isEmptyPreheader(const FusionCandidate &FC) const {
958  return FC.Preheader->size() == 1;
959  }
960 
961  /// Fuse two fusion candidates, creating a new fused loop.
962  ///
963  /// This method contains the mechanics of fusing two loops, represented by \p
964  /// FC0 and \p FC1. It is assumed that \p FC0 dominates \p FC1 and \p FC1
965  /// postdominates \p FC0 (making them control flow equivalent). It also
966  /// assumes that the other conditions for fusion have been met: adjacent,
967  /// identical trip counts, and no negative distance dependencies exist that
968  /// would prevent fusion. Thus, there is no checking for these conditions in
969  /// this method.
970  ///
971  /// Fusion is performed by rewiring the CFG to update successor blocks of the
972  /// components of tho loop. Specifically, the following changes are done:
973  ///
974  /// 1. The preheader of \p FC1 is removed as it is no longer necessary
975  /// (because it is currently only a single statement block).
976  /// 2. The latch of \p FC0 is modified to jump to the header of \p FC1.
977  /// 3. The latch of \p FC1 i modified to jump to the header of \p FC0.
978  /// 4. All blocks from \p FC1 are removed from FC1 and added to FC0.
979  ///
980  /// All of these modifications are done with dominator tree updates, thus
981  /// keeping the dominator (and post dominator) information up-to-date.
982  ///
983  /// This can be improved in the future by actually merging blocks during
984  /// fusion. For example, the preheader of \p FC1 can be merged with the
985  /// preheader of \p FC0. This would allow loops with more than a single
986  /// statement in the preheader to be fused. Similarly, the latch blocks of the
987  /// two loops could also be fused into a single block. This will require
988  /// analysis to prove it is safe to move the contents of the block past
989  /// existing code, which currently has not been implemented.
990  Loop *performFusion(const FusionCandidate &FC0, const FusionCandidate &FC1) {
991  assert(FC0.isValid() && FC1.isValid() &&
992  "Expecting valid fusion candidates");
993 
994  LLVM_DEBUG(dbgs() << "Fusion Candidate 0: \n"; FC0.dump();
995  dbgs() << "Fusion Candidate 1: \n"; FC1.dump(););
996 
997  assert(FC1.Preheader == FC0.ExitBlock);
998  assert(FC1.Preheader->size() == 1 &&
999  FC1.Preheader->getSingleSuccessor() == FC1.Header);
1000 
1001  // Remember the phi nodes originally in the header of FC0 in order to rewire
1002  // them later. However, this is only necessary if the new loop carried
1003  // values might not dominate the exiting branch. While we do not generally
1004  // test if this is the case but simply insert intermediate phi nodes, we
1005  // need to make sure these intermediate phi nodes have different
1006  // predecessors. To this end, we filter the special case where the exiting
1007  // block is the latch block of the first loop. Nothing needs to be done
1008  // anyway as all loop carried values dominate the latch and thereby also the
1009  // exiting branch.
1010  SmallVector<PHINode *, 8> OriginalFC0PHIs;
1011  if (FC0.ExitingBlock != FC0.Latch)
1012  for (PHINode &PHI : FC0.Header->phis())
1013  OriginalFC0PHIs.push_back(&PHI);
1014 
1015  // Replace incoming blocks for header PHIs first.
1018 
1019  // Then modify the control flow and update DT and PDT.
1021 
1022  // The old exiting block of the first loop (FC0) has to jump to the header
1023  // of the second as we need to execute the code in the second header block
1024  // regardless of the trip count. That is, if the trip count is 0, so the
1025  // back edge is never taken, we still have to execute both loop headers,
1026  // especially (but not only!) if the second is a do-while style loop.
1027  // However, doing so might invalidate the phi nodes of the first loop as
1028  // the new values do only need to dominate their latch and not the exiting
1029  // predicate. To remedy this potential problem we always introduce phi
1030  // nodes in the header of the second loop later that select the loop carried
1031  // value, if the second header was reached through an old latch of the
1032  // first, or undef otherwise. This is sound as exiting the first implies the
1033  // second will exit too, __without__ taking the back-edge. [Their
1034  // trip-counts are equal after all.
1035  // KB: Would this sequence be simpler to just just make FC0.ExitingBlock go
1036  // to FC1.Header? I think this is basically what the three sequences are
1037  // trying to accomplish; however, doing this directly in the CFG may mean
1038  // the DT/PDT becomes invalid
1040  FC1.Header);
1045 
1046  // The pre-header of L1 is not necessary anymore.
1049  new UnreachableInst(FC1.Preheader->getContext(), FC1.Preheader);
1052 
1053  // Moves the phi nodes from the second to the first loops header block.
1054  while (PHINode *PHI = dyn_cast<PHINode>(&FC1.Header->front())) {
1055  if (SE.isSCEVable(PHI->getType()))
1056  SE.forgetValue(PHI);
1057  if (PHI->hasNUsesOrMore(1))
1058  PHI->moveBefore(&*FC0.Header->getFirstInsertionPt());
1059  else
1060  PHI->eraseFromParent();
1061  }
1062 
1063  // Introduce new phi nodes in the second loop header to ensure
1064  // exiting the first and jumping to the header of the second does not break
1065  // the SSA property of the phis originally in the first loop. See also the
1066  // comment above.
1067  Instruction *L1HeaderIP = &FC1.Header->front();
1068  for (PHINode *LCPHI : OriginalFC0PHIs) {
1069  int L1LatchBBIdx = LCPHI->getBasicBlockIndex(FC1.Latch);
1070  assert(L1LatchBBIdx >= 0 &&
1071  "Expected loop carried value to be rewired at this point!");
1072 
1073  Value *LCV = LCPHI->getIncomingValue(L1LatchBBIdx);
1074 
1075  PHINode *L1HeaderPHI = PHINode::Create(
1076  LCV->getType(), 2, LCPHI->getName() + ".afterFC0", L1HeaderIP);
1077  L1HeaderPHI->addIncoming(LCV, FC0.Latch);
1078  L1HeaderPHI->addIncoming(UndefValue::get(LCV->getType()),
1079  FC0.ExitingBlock);
1080 
1081  LCPHI->setIncomingValue(L1LatchBBIdx, L1HeaderPHI);
1082  }
1083 
1084  // Replace latch terminator destinations.
1087 
1088  // If FC0.Latch and FC0.ExitingBlock are the same then we have already
1089  // performed the updates above.
1090  if (FC0.Latch != FC0.ExitingBlock)
1092  DominatorTree::Insert, FC0.Latch, FC1.Header));
1093 
1095  FC0.Latch, FC0.Header));
1097  FC1.Latch, FC0.Header));
1099  FC1.Latch, FC1.Header));
1100 
1101  // Update DT/PDT
1102  DTU.applyUpdates(TreeUpdates);
1103 
1104  LI.removeBlock(FC1.Preheader);
1105  DTU.deleteBB(FC1.Preheader);
1106  DTU.flush();
1107 
1108  // Is there a way to keep SE up-to-date so we don't need to forget the loops
1109  // and rebuild the information in subsequent passes of fusion?
1110  SE.forgetLoop(FC1.L);
1111  SE.forgetLoop(FC0.L);
1112 
1113  // Merge the loops.
1115  FC1.L->block_end());
1116  for (BasicBlock *BB : Blocks) {
1117  FC0.L->addBlockEntry(BB);
1118  FC1.L->removeBlockFromLoop(BB);
1119  if (LI.getLoopFor(BB) != FC1.L)
1120  continue;
1121  LI.changeLoopFor(BB, FC0.L);
1122  }
1123  while (!FC1.L->empty()) {
1124  const auto &ChildLoopIt = FC1.L->begin();
1125  Loop *ChildLoop = *ChildLoopIt;
1126  FC1.L->removeChildLoop(ChildLoopIt);
1127  FC0.L->addChildLoop(ChildLoop);
1128  }
1129 
1130  // Delete the now empty loop L1.
1131  LI.erase(FC1.L);
1132 
1133 #ifndef NDEBUG
1134  assert(!verifyFunction(*FC0.Header->getParent(), &errs()));
1136  assert(PDT.verify());
1137  LI.verify(DT);
1138  SE.verify();
1139 #endif
1140 
1141  FuseCounter++;
1142 
1143  LLVM_DEBUG(dbgs() << "Fusion done:\n");
1144 
1145  return FC0.L;
1146  }
1147 
1148  /// Report details on loop fusion opportunities.
1149  ///
1150  /// This template function can be used to report both successful and missed
1151  /// loop fusion opportunities, based on the RemarkKind. The RemarkKind should
1152  /// be one of:
1153  /// - OptimizationRemarkMissed to report when loop fusion is unsuccessful
1154  /// given two valid fusion candidates.
1155  /// - OptimizationRemark to report successful fusion of two fusion
1156  /// candidates.
1157  /// The remarks will be printed using the form:
1158  /// <path/filename>:<line number>:<column number>: [<function name>]:
1159  /// <Cand1 Preheader> and <Cand2 Preheader>: <Stat Description>
1160  template <typename RemarkKind>
1161  void reportLoopFusion(const FusionCandidate &FC0, const FusionCandidate &FC1,
1162  llvm::Statistic &Stat) {
1163  assert(FC0.Preheader && FC1.Preheader &&
1164  "Expecting valid fusion candidates");
1165  using namespace ore;
1166  ++Stat;
1167  ORE.emit(RemarkKind(DEBUG_TYPE, Stat.getName(), FC0.L->getStartLoc(),
1168  FC0.Preheader)
1169  << "[" << FC0.Preheader->getParent()->getName()
1170  << "]: " << NV("Cand1", StringRef(FC0.Preheader->getName()))
1171  << " and " << NV("Cand2", StringRef(FC1.Preheader->getName()))
1172  << ": " << Stat.getDesc());
1173  }
1174 };
1175 
1176 struct LoopFuseLegacy : public FunctionPass {
1177 
1178  static char ID;
1179 
1182  }
1183 
1184  void getAnalysisUsage(AnalysisUsage &AU) const override {
1192 
1197  }
1198 
1199  bool runOnFunction(Function &F) override {
1200  if (skipFunction(F))
1201  return false;
1202  auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
1203  auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
1204  auto &DI = getAnalysis<DependenceAnalysisWrapperPass>().getDI();
1205  auto &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
1206  auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
1207  auto &ORE = getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
1208 
1209  const DataLayout &DL = F.getParent()->getDataLayout();
1210  LoopFuser LF(LI, DT, DI, SE, PDT, ORE, DL);
1211  return LF.fuseLoops(F);
1212  }
1213 };
1214 
1216  auto &LI = AM.getResult<LoopAnalysis>(F);
1217  auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
1218  auto &DI = AM.getResult<DependenceAnalysis>(F);
1219  auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);
1220  auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
1222 
1223  const DataLayout &DL = F.getParent()->getDataLayout();
1224  LoopFuser LF(LI, DT, DI, SE, PDT, ORE, DL);
1225  bool Changed = LF.fuseLoops(F);
1226  if (!Changed)
1227  return PreservedAnalyses::all();
1228 
1229  PreservedAnalyses PA;
1233  PA.preserve<LoopAnalysis>();
1234  return PA;
1235 }
1236 
1237 char LoopFuseLegacy::ID = 0;
1238 
1239 INITIALIZE_PASS_BEGIN(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false,
1240  false)
1247 INITIALIZE_PASS_END(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false, false)
1248 
1249 FunctionPass *llvm::createLoopFusePass() { return new LoopFuseLegacy(); }
void initializeLoopFuseLegacyPass(PassRegistry &)
SymbolTableList< Instruction >::iterator eraseFromParent()
This method unlinks &#39;this&#39; from the containing basic block and deletes it.
Definition: Instruction.cpp:67
static void printLoopVector(const LoopVector &LV)
Definition: LoopFuse.cpp:438
static void printFusionCandidates(const FusionCandidateCollection &FusionCandidates)
Definition: LoopFuse.cpp:368
A parsed version of the target data layout string in and methods for querying it. ...
Definition: DataLayout.h:111
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:641
LoopFuser(LoopInfo &LI, DominatorTree &DT, DependenceInfo &DI, ScalarEvolution &SE, PostDominatorTree &PDT, OptimizationRemarkEmitter &ORE, const DataLayout &DL)
Definition: LoopFuse.cpp:462
AnalysisUsage & addPreserved()
Add the specified Pass class to the set of analyses preserved by this pass.
raw_ostream & errs()
This returns a reference to a raw_ostream for standard error.
bool isRemovedLoop(const Loop *L) const
Test whether a given loop has been removed from the function, and thus is no longer valid...
Definition: LoopFuse.cpp:396
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
BlockT * getLoopLatch() const
If there is a single latch block for this loop, return it.
Definition: LoopInfoImpl.h:211
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
LoopsOnLevelTy::iterator iterator
Definition: LoopFuse.cpp:386
bool empty() const
Definition: LoopInfo.h:907
DiagnosticInfoOptimizationBase::Argument NV
typename SuperClass::const_iterator const_iterator
Definition: SmallVector.h:320
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
Definition: PassManager.h:776
AnalysisPass to compute dependence information in a function.
This class represents lattice values for constants.
Definition: AllocatorList.h:23
#define LLVM_DUMP_METHOD
Mark debug helper function definitions like dump() that should not be stripped from debug builds...
Definition: Compiler.h:473
const PostDominatorTree * PDT
Definition: LoopFuse.cpp:152
Legacy pass manager pass to access dependence information.
This class is used to represent a candidate for loop fusion.
Definition: LoopFuse.cpp:123
unsigned getLoopDepth() const
Return the nesting level of this loop.
Definition: LoopInfo.h:97
void push_back(const T &Elt)
Definition: SmallVector.h:211
LoopT * removeChildLoop(iterator I)
This removes the specified child from being a subloop of this loop.
Definition: LoopInfo.h:384
The main scalar evolution driver.
This file implements the Loop Fusion pass.
BlockT * getLoopPreheader() const
If there is a preheader for this loop, return it.
Definition: LoopInfoImpl.h:160
STATISTIC(NumFunctions, "Total number of functions")
void flush()
Apply all pending updates to available trees and flush all BasicBlocks awaiting deletion.
Analysis pass which computes a DominatorTree.
Definition: Dominators.h:230
F(f)
block Block Frequency true
An instruction for reading from memory.
Definition: Instructions.h:167
const Instruction * getTerminator() const LLVM_READONLY
Returns the terminator instruction if the block is well formed or null if the block is not well forme...
Definition: BasicBlock.cpp:137
INITIALIZE_PASS_BEGIN(LoopFuseLegacy, "loop-fusion", "Loop Fusion", false, false) FunctionPass *llvm
Definition: LoopFuse.cpp:1239
DependenceInfo - This class is the main dependence-analysis driver.
bool isValid() const
Check if all members of the class are valid.
Definition: LoopFuse.cpp:203
LLVMContext & getContext() const
Get the context in which this basic block lives.
Definition: BasicBlock.cpp:32
bool operator()(const FusionCandidate &LHS, const FusionCandidate &RHS) const
Comparison functor to sort two Control Flow Equivalent fusion candidates into dominance order...
Definition: LoopFuse.cpp:311
OptimizationRemarkEmitter & ORE
Definition: LoopFuse.cpp:154
bool isInvalid() const
Return true if this loop is no longer valid.
Definition: LoopInfo.h:198
SmallVector< Instruction *, 16 > MemReads
Vector of instructions in this loop that read from memory.
Definition: LoopFuse.cpp:141
void verify(const DominatorTreeBase< BlockT, false > &DomTree) const
Definition: LoopInfoImpl.h:683
static cl::opt< FusionDependenceAnalysisChoice > FusionDependenceAnalysis("loop-fusion-dependence-analysis", cl::desc("Which dependence analysis should loop fusion use?"), cl::values(clEnumValN(FUSION_DEPENDENCE_ANALYSIS_SCEV, "scev", "Use the scalar evolution interface"), clEnumValN(FUSION_DEPENDENCE_ANALYSIS_DA, "da", "Use the dependence analysis interface"), clEnumValN(FUSION_DEPENDENCE_ANALYSIS_ALL, "all", "Use all available analyses")), cl::Hidden, cl::init(FUSION_DEPENDENCE_ANALYSIS_ALL), cl::ZeroOrMore)
BasicBlock * Latch
Latch of the loop.
Definition: LoopFuse.cpp:137
void dump() const
Support for debugging, callable in GDB: V->dump()
Definition: AsmWriter.cpp:4428
LoopDepthTree(LoopInfo &LI)
Definition: LoopFuse.cpp:389
AnalysisUsage & addRequired()
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:50
bool fuseLoops(Function &F)
This is the main entry point for loop fusion.
Definition: LoopFuse.cpp:471
iterator end()
Definition: LoopFuse.cpp:421
const DataLayout & getDataLayout() const
Get the data layout for the module&#39;s target platform.
Definition: Module.cpp:369
bool verify(VerificationLevel VL=VerificationLevel::Full) const
verify - checks if the tree is correct.
LoopT * getLoopFor(const BlockT *BB) const
Return the inner most loop that BB lives in.
Definition: LoopInfo.h:928
void addBlockEntry(BlockT *BB)
This adds a basic block directly to the basic block list.
Definition: LoopInfo.h:403
Diagnostic information for optimization analysis remarks.
bool empty() const
Definition: LoopFuse.cpp:416
static cl::opt< bool > VerboseFusionDebugging("loop-fusion-verbose-debug", cl::desc("Enable verbose debugging for Loop Fusion"), cl::Hidden, cl::init(false), cl::ZeroOrMore)
void erase(Loop *L)
Update LoopInfo after removing the last backedge from a loop.
Definition: LoopInfo.cpp:853
Analysis pass that exposes the LoopInfo for a function.
Definition: LoopInfo.h:1183
void replaceSuccessorsPhiUsesWith(BasicBlock *Old, BasicBlock *New)
Update all phi nodes in this basic block&#39;s successors to refer to basic block New instead of basic bl...
Definition: BasicBlock.cpp:445
BlockT * getHeader() const
Definition: LoopInfo.h:105
bool isKnownPredicate(ICmpInst::Predicate Pred, const SCEV *LHS, const SCEV *RHS)
Test if the given expression is known to satisfy the condition described by Pred, LHS...
Type * getType() const
All values are typed, get the type of this value.
Definition: Value.h:245
This node represents a polynomial recurrence on the trip count of the specified loop.
const BasicBlock * getSingleSuccessor() const
Return the successor of this block if it has a single successor.
Definition: BasicBlock.cpp:268
Value * getLoadStorePointerOperand(Value *V)
A helper function that returns the pointer operand of a load or store instruction.
An instruction for storing to memory.
Definition: Instructions.h:320
op_iterator op_begin() const
void verify() const
Verify that all members are in sync with the Loop object.
Definition: LoopFuse.cpp:209
const SCEV * getStepRecurrence(ScalarEvolution &SE) const
Constructs and returns the recurrence indicating how much this expression steps by.
Concrete subclass of DominatorTreeBase that is used to compute a normal dominator tree...
Definition: Dominators.h:144
const SCEV * getAddRecExpr(const SCEV *Start, const SCEV *Step, const Loop *L, SCEV::NoWrapFlags Flags)
Get an add recurrence expression for the specified loop.
void replaceUsesOfWith(Value *From, Value *To)
Replace uses of one Value with another.
Definition: User.cpp:20
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:432
std::unique_ptr< Dependence > depends(Instruction *Src, Instruction *Dst, bool PossiblyLoopIndependent)
depends - Tests for a dependence between the Src and Dst instructions.
A set of analyses that are preserved following a run of a transformation pass.
Definition: PassManager.h:153
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:216
Fast - This calling convention attempts to make calls as fast as possible (e.g.
Definition: CallingConv.h:42
LLVM Basic Block Representation.
Definition: BasicBlock.h:57
void deleteBB(BasicBlock *DelBB)
Delete DelBB.
Collect all loops in function at the same nest level, starting at the outermost level.
Definition: LoopFuse.cpp:384
This function has undefined behavior.
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:652
bool Valid
Are all of the members of this fusion candidate still valid.
Definition: LoopFuse.cpp:145
const Instruction & front() const
Definition: BasicBlock.h:280
size_t size() const
Definition: BasicBlock.h:278
Interval::pred_iterator pred_begin(Interval *I)
pred_begin/pred_end - define methods so that Intervals may be used just like BasicBlocks can with the...
Definition: Interval.h:112
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM)
Definition: LoopFuse.cpp:1215
unsigned getDepth() const
Definition: LoopFuse.cpp:418
bool isEligibleForFusion(ScalarEvolution &SE) const
Determine if a fusion candidate (representing a loop) is eligible for fusion.
Definition: LoopFuse.cpp:236
const SCEV * getSCEVAtScope(const SCEV *S, const Loop *L)
Return a SCEV expression for the specified value at the specified scope in the program.
Represent the analysis usage information of a pass.
const_iterator begin() const
Definition: LoopFuse.cpp:422
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:732
bool SCEVExprContains(const SCEV *Root, PredTy Pred)
Return true if any node in Root satisfies the predicate Pred.
FunctionPass class - This class is used to implement most global optimizations.
Definition: Pass.h:284
Interval::pred_iterator pred_end(Interval *I)
Definition: Interval.h:115
bool isAffine() const
Return true if this represents an expression A + B*x where A and B are loop invariant values...
void forgetValue(Value *V)
This method should be called by the client when it has changed a value in a way that may effect its v...
BlockT * getExitBlock() const
If getExitBlocks would return exactly one block, return that block.
Definition: LoopInfoImpl.h:75
static UndefValue * get(Type *T)
Static factory methods - Return an &#39;undef&#39; object of the specified type.
Definition: Constants.cpp:1431
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: PassManager.h:159
DebugLoc getStartLoc() const
Return the debug location of the start of this loop.
Definition: LoopInfo.cpp:607
INITIALIZE_PASS_END(RegBankSelect, DEBUG_TYPE, "Assign register bank of generic virtual registers", false, false) RegBankSelect
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
signed greater than
Definition: InstrTypes.h:759
char & LoopSimplifyID
static char ID
Definition: LoopFuse.cpp:1178
void descend()
Descend the tree to the next (inner) nesting level.
Definition: LoopFuse.cpp:403
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: LoopFuse.cpp:1184
bool contains(const LoopT *L) const
Return true if the specified loop is contained within in this loop.
Definition: LoopInfo.h:115
void print(raw_ostream &OS) const
Definition: LoopInfoImpl.h:612
auto size(R &&Range, typename std::enable_if< std::is_same< typename std::iterator_traits< decltype(Range.begin())>::iterator_category, std::random_access_iterator_tag >::value, void >::type *=nullptr) -> decltype(std::distance(Range.begin(), Range.end()))
Get the size of a range.
Definition: STLExtras.h:1146
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Output the remark via the diagnostic handler and to the optimization record file. ...
bool isKnownPositive(const SCEV *S)
Test if the given expression is known to be positive.
BasicBlock * ExitingBlock
Blocks in the loop that exit the loop.
Definition: LoopFuse.cpp:133
void applyUpdates(ArrayRef< DominatorTree::UpdateType > Updates)
Submit updates to all available trees.
AnalysisUsage & addRequiredID(const void *ID)
Definition: Pass.cpp:314
reverse_iterator rend() const
Definition: LoopInfo.h:906
bool dominates(const Instruction *Def, const Use &U) const
Return true if Def dominates a use in User.
Definition: Dominators.cpp:248
Analysis pass which computes a PostDominatorTree.
iterator begin() const
Definition: LoopInfo.h:147
BasicBlock * ExitBlock
The successor block of this loop (where the exiting blocks go to)
Definition: LoopFuse.cpp:135
static PHINode * Create(Type *Ty, unsigned NumReservedValues, const Twine &NameStr="", Instruction *InsertBefore=nullptr)
Constructors - NumReservedValues is a hint for the number of incoming edges that this phi node will h...
#define DEBUG_TYPE
Definition: LoopFuse.cpp:67
const char * getDesc() const
Definition: Statistic.h:58
size_t size() const
Definition: LoopFuse.cpp:417
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:132
SmallVector< Instruction *, 16 > MemWrites
Vector of instructions in this loop that write to memory.
Definition: LoopFuse.cpp:143
typename SuperClass::iterator iterator
Definition: SmallVector.h:319
PostDominatorTree Class - Concrete subclass of DominatorTree that is used to compute the post-dominat...
void append(in_iter in_start, in_iter in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:387
reverse_iterator rbegin() const
Definition: LoopInfo.h:905
bool dominates(const DomTreeNodeBase< NodeT > *A, const DomTreeNodeBase< NodeT > *B) const
dominates - Returns true iff A dominates B.
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:627
FunctionPass * createLoopFusePass()
Analysis pass that exposes the ScalarEvolution for a function.
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::ZeroOrMore, cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate IT block based on arch"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow deprecated IT based on ARMv8"), clEnumValN(NoRestrictedIT, "arm-no-restrict-it", "Allow IT blocks based on ARMv7")))
Virtual Register Rewriter
Definition: VirtRegMap.cpp:220
bool isLoopSimplifyForm() const
Return true if the Loop is in the form that the LoopSimplify form transforms loops to...
Definition: LoopInfo.cpp:460
Loop * L
The loop that this fusion candidate represents.
Definition: LoopFuse.cpp:139
void forgetLoop(const Loop *L)
This method should be called by the client when it has changed a loop in a way that may effect Scalar...
This class represents an analyzed expression in the program.
void addChildLoop(LoopT *NewChild)
Add the specified loop to be a child of this loop.
Definition: LoopInfo.h:375
StringRef getName() const
Definition: LoopInfo.h:827
const char * getName() const
Definition: Statistic.h:57
Represents a single loop in the control flow graph.
Definition: LoopInfo.h:509
StringRef getName() const
Return a constant reference to the value&#39;s name.
Definition: Value.cpp:214
iterator begin()
Definition: LoopFuse.cpp:420
const Function * getParent() const
Return the enclosing method, or null if none.
Definition: BasicBlock.h:106
#define I(x, y, z)
Definition: MD5.cpp:58
LLVM_DUMP_METHOD void dump() const
Definition: LoopFuse.cpp:221
bool verifyFunction(const Function &F, raw_ostream *OS=nullptr)
Check a function for errors, useful for use when debugging a pass.
Definition: Verifier.cpp:5024
LLVM_NODISCARD std::enable_if<!is_simple_type< Y >::value, typename cast_retty< X, const Y >::ret_type >::type dyn_cast(const Y &Val)
Definition: Casting.h:332
block_iterator block_end() const
Definition: LoopInfo.h:160
void changeLoopFor(BlockT *BB, LoopT *L)
Change the top-level loop that contains BB to the specified loop.
Definition: LoopInfo.h:960
void preserve()
Mark an analysis as preserved.
Definition: PassManager.h:174
iterator_range< const_phi_iterator > phis() const
Returns a range that iterates over the phis in the basic block.
Definition: BasicBlock.h:324
bool runOnFunction(Function &F) override
runOnFunction - Virtual method overriden by subclasses to do the per-function processing of the pass...
Definition: LoopFuse.cpp:1199
const_iterator end() const
Definition: LoopFuse.cpp:423
FusionDependenceAnalysisChoice
Definition: LoopFuse.cpp:89
raw_ostream & operator<<(raw_ostream &OS, const APInt &I)
Definition: APInt.h:2045
void removeBlockFromLoop(BlockT *BB)
This removes the specified basic block from the current loop, updating the Blocks as appropriate...
Definition: LoopInfo.h:440
BasicBlock * Preheader
Cache of parts of the loop used throughout loop fusion.
Definition: LoopFuse.cpp:129
bool empty() const
Definition: LoopInfo.h:151
FusionCandidate(Loop *L, const DominatorTree *DT, const PostDominatorTree *PDT, OptimizationRemarkEmitter &ORE)
Definition: LoopFuse.cpp:156
const SCEV * getBackedgeTakenCount(const Loop *L)
If the specified loop has a predictable backedge-taken count, return it, otherwise return a SCEVCould...
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
Module * getParent()
Get the module that this global value is contained inside of...
Definition: GlobalValue.h:575
LLVM Value Representation.
Definition: Value.h:73
BasicBlock * Header
Header of the loop this candidate represents.
Definition: LoopFuse.cpp:131
bool isSCEVable(Type *Ty) const
Test if values of the given type are analyzable within the SCEV framework.
OptimizationRemarkEmitter legacy analysis pass.
This class implements an extremely fast bulk output stream that can only output to a stream...
Definition: raw_ostream.h:45
The legacy pass manager&#39;s analysis pass to compute loop information.
Definition: LoopInfo.h:1208
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:48
Legacy analysis pass which computes a DominatorTree.
Definition: Dominators.h:259
const DominatorTree * DT
Dominator and PostDominator trees are needed for the FusionCandidateCompare function, required by FusionCandidateSet to determine where the FusionCandidate should be inserted into the set.
Definition: LoopFuse.cpp:151
#define LLVM_DEBUG(X)
Definition: Debug.h:122
LoopsOnLevelTy::const_iterator const_iterator
Definition: LoopFuse.cpp:387
iterator_range< block_iterator > blocks() const
Definition: LoopInfo.h:161
void printLoop(Loop &L, raw_ostream &OS, const std::string &Banner="")
Function to print a loop&#39;s contents as LLVM&#39;s text IR assembly.
Definition: LoopInfo.cpp:932
block_iterator block_begin() const
Definition: LoopInfo.h:159
BlockT * getExitingBlock() const
If getExitingBlocks would return exactly one block, return that block.
Definition: LoopInfoImpl.h:49
NoWrapFlags getNoWrapFlags(NoWrapFlags Mask=NoWrapMask) const
bool hasLoopInvariantBackedgeTakenCount(const Loop *L)
Return true if the specified loop has an analyzable loop-invariant backedge-taken count...
The optimization diagnostic interface.
void removeLoop(const Loop *L)
Record that a given loop has been removed from the function and is no longer valid.
Definition: LoopFuse.cpp:400
void removeBlock(BlockT *BB)
This method completely removes BB from all data structures, including all of the Loop objects it is n...
Definition: LoopInfo.h:987
This visitor recursively visits a SCEV expression and re-writes it.
signed greater or equal
Definition: InstrTypes.h:760