LLVM  4.0.0
SIMachineScheduler.cpp
Go to the documentation of this file.
1 //===-- SIMachineScheduler.cpp - SI Scheduler Interface -------------------===//
2 //
3 // The LLVM Compiler Infrastructure
4 //
5 // This file is distributed under the University of Illinois Open Source
6 // License. See LICENSE.TXT for details.
7 //
8 //===----------------------------------------------------------------------===//
9 //
10 /// \file
11 /// \brief SI Machine Scheduler interface
12 //
13 //===----------------------------------------------------------------------===//
14 
15 #include "AMDGPU.h"
16 #include "SIInstrInfo.h"
17 #include "SIMachineScheduler.h"
18 #include "SIRegisterInfo.h"
19 #include "llvm/ADT/STLExtras.h"
20 #include "llvm/ADT/SmallVector.h"
28 #include "llvm/Support/Debug.h"
32 #include <algorithm>
33 #include <cassert>
34 #include <map>
35 #include <set>
36 #include <utility>
37 #include <vector>
38 
39 using namespace llvm;
40 
41 #define DEBUG_TYPE "misched"
42 
43 // This scheduler implements a different scheduling algorithm than
44 // GenericScheduler.
45 //
46 // There are several specific architecture behaviours that can't be modelled
47 // for GenericScheduler:
48 // . When accessing the result of an SGPR load instruction, you have to wait
49 // for all the SGPR load instructions before your current instruction to
50 // have finished.
51 // . When accessing the result of an VGPR load instruction, you have to wait
52 // for all the VGPR load instructions previous to the VGPR load instruction
53 // you are interested in to finish.
54 // . The less the register pressure, the best load latencies are hidden
55 //
56 // Moreover some specifities (like the fact a lot of instructions in the shader
57 // have few dependencies) makes the generic scheduler have some unpredictable
58 // behaviours. For example when register pressure becomes high, it can either
59 // manage to prevent register pressure from going too high, or it can
60 // increase register pressure even more than if it hadn't taken register
61 // pressure into account.
62 //
63 // Also some other bad behaviours are generated, like loading at the beginning
64 // of the shader a constant in VGPR you won't need until the end of the shader.
65 //
66 // The scheduling problem for SI can distinguish three main parts:
67 // . Hiding high latencies (texture sampling, etc)
68 // . Hiding low latencies (SGPR constant loading, etc)
69 // . Keeping register usage low for better latency hiding and general
70 // performance
71 //
72 // Some other things can also affect performance, but are hard to predict
73 // (cache usage, the fact the HW can issue several instructions from different
74 // wavefronts if different types, etc)
75 //
76 // This scheduler tries to solve the scheduling problem by dividing it into
77 // simpler sub-problems. It divides the instructions into blocks, schedules
78 // locally inside the blocks where it takes care of low latencies, and then
79 // chooses the order of the blocks by taking care of high latencies.
80 // Dividing the instructions into blocks helps control keeping register
81 // usage low.
82 //
83 // First the instructions are put into blocks.
84 // We want the blocks help control register usage and hide high latencies
85 // later. To help control register usage, we typically want all local
86 // computations, when for example you create a result that can be comsummed
87 // right away, to be contained in a block. Block inputs and outputs would
88 // typically be important results that are needed in several locations of
89 // the shader. Since we do want blocks to help hide high latencies, we want
90 // the instructions inside the block to have a minimal set of dependencies
91 // on high latencies. It will make it easy to pick blocks to hide specific
92 // high latencies.
93 // The block creation algorithm is divided into several steps, and several
94 // variants can be tried during the scheduling process.
95 //
96 // Second the order of the instructions inside the blocks is chosen.
97 // At that step we do take into account only register usage and hiding
98 // low latency instructions
99 //
100 // Third the block order is chosen, there we try to hide high latencies
101 // and keep register usage low.
102 //
103 // After the third step, a pass is done to improve the hiding of low
104 // latencies.
105 //
106 // Actually when talking about 'low latency' or 'high latency' it includes
107 // both the latency to get the cache (or global mem) data go to the register,
108 // and the bandwidth limitations.
109 // Increasing the number of active wavefronts helps hide the former, but it
110 // doesn't solve the latter, thus why even if wavefront count is high, we have
111 // to try have as many instructions hiding high latencies as possible.
112 // The OpenCL doc says for example latency of 400 cycles for a global mem access,
113 // which is hidden by 10 instructions if the wavefront count is 10.
114 
115 // Some figures taken from AMD docs:
116 // Both texture and constant L1 caches are 4-way associative with 64 bytes
117 // lines.
118 // Constant cache is shared with 4 CUs.
119 // For texture sampling, the address generation unit receives 4 texture
120 // addresses per cycle, thus we could expect texture sampling latency to be
121 // equivalent to 4 instructions in the very best case (a VGPR is 64 work items,
122 // instructions in a wavefront group are executed every 4 cycles),
123 // or 16 instructions if the other wavefronts associated to the 3 other VALUs
124 // of the CU do texture sampling too. (Don't take these figures too seriously,
125 // as I'm not 100% sure of the computation)
126 // Data exports should get similar latency.
127 // For constant loading, the cache is shader with 4 CUs.
128 // The doc says "a throughput of 16B/cycle for each of the 4 Compute Unit"
129 // I guess if the other CU don't read the cache, it can go up to 64B/cycle.
130 // It means a simple s_buffer_load should take one instruction to hide, as
131 // well as a s_buffer_loadx2 and potentially a s_buffer_loadx8 if on the same
132 // cache line.
133 //
134 // As of today the driver doesn't preload the constants in cache, thus the
135 // first loads get extra latency. The doc says global memory access can be
136 // 300-600 cycles. We do not specially take that into account when scheduling
137 // As we expect the driver to be able to preload the constants soon.
138 
139 // common code //
140 
141 #ifndef NDEBUG
142 
143 static const char *getReasonStr(SIScheduleCandReason Reason) {
144  switch (Reason) {
145  case NoCand: return "NOCAND";
146  case RegUsage: return "REGUSAGE";
147  case Latency: return "LATENCY";
148  case Successor: return "SUCCESSOR";
149  case Depth: return "DEPTH";
150  case NodeOrder: return "ORDER";
151  }
152  llvm_unreachable("Unknown reason!");
153 }
154 
155 #endif
156 
157 static bool tryLess(int TryVal, int CandVal,
158  SISchedulerCandidate &TryCand,
159  SISchedulerCandidate &Cand,
160  SIScheduleCandReason Reason) {
161  if (TryVal < CandVal) {
162  TryCand.Reason = Reason;
163  return true;
164  }
165  if (TryVal > CandVal) {
166  if (Cand.Reason > Reason)
167  Cand.Reason = Reason;
168  return true;
169  }
170  Cand.setRepeat(Reason);
171  return false;
172 }
173 
174 static bool tryGreater(int TryVal, int CandVal,
175  SISchedulerCandidate &TryCand,
176  SISchedulerCandidate &Cand,
177  SIScheduleCandReason Reason) {
178  if (TryVal > CandVal) {
179  TryCand.Reason = Reason;
180  return true;
181  }
182  if (TryVal < CandVal) {
183  if (Cand.Reason > Reason)
184  Cand.Reason = Reason;
185  return true;
186  }
187  Cand.setRepeat(Reason);
188  return false;
189 }
190 
191 // SIScheduleBlock //
192 
194  NodeNum2Index[SU->NodeNum] = SUnits.size();
195  SUnits.push_back(SU);
196 }
197 
198 #ifndef NDEBUG
199 void SIScheduleBlock::traceCandidate(const SISchedCandidate &Cand) {
200 
201  dbgs() << " SU(" << Cand.SU->NodeNum << ") " << getReasonStr(Cand.Reason);
202  dbgs() << '\n';
203 }
204 #endif
205 
206 void SIScheduleBlock::tryCandidateTopDown(SISchedCandidate &Cand,
207  SISchedCandidate &TryCand) {
208  // Initialize the candidate if needed.
209  if (!Cand.isValid()) {
210  TryCand.Reason = NodeOrder;
211  return;
212  }
213 
214  if (Cand.SGPRUsage > 60 &&
215  tryLess(TryCand.SGPRUsage, Cand.SGPRUsage, TryCand, Cand, RegUsage))
216  return;
217 
218  // Schedule low latency instructions as top as possible.
219  // Order of priority is:
220  // . Low latency instructions which do not depend on other low latency
221  // instructions we haven't waited for
222  // . Other instructions which do not depend on low latency instructions
223  // we haven't waited for
224  // . Low latencies
225  // . All other instructions
226  // Goal is to get: low latency instructions - independent instructions
227  // - (eventually some more low latency instructions)
228  // - instructions that depend on the first low latency instructions.
229  // If in the block there is a lot of constant loads, the SGPR usage
230  // could go quite high, thus above the arbitrary limit of 60 will encourage
231  // use the already loaded constants (in order to release some SGPRs) before
232  // loading more.
233  if (tryLess(TryCand.HasLowLatencyNonWaitedParent,
234  Cand.HasLowLatencyNonWaitedParent,
235  TryCand, Cand, SIScheduleCandReason::Depth))
236  return;
237 
238  if (tryGreater(TryCand.IsLowLatency, Cand.IsLowLatency,
239  TryCand, Cand, SIScheduleCandReason::Depth))
240  return;
241 
242  if (TryCand.IsLowLatency &&
243  tryLess(TryCand.LowLatencyOffset, Cand.LowLatencyOffset,
244  TryCand, Cand, SIScheduleCandReason::Depth))
245  return;
246 
247  if (tryLess(TryCand.VGPRUsage, Cand.VGPRUsage, TryCand, Cand, RegUsage))
248  return;
249 
250  // Fall through to original instruction order.
251  if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {
252  TryCand.Reason = NodeOrder;
253  }
254 }
255 
256 SUnit* SIScheduleBlock::pickNode() {
257  SISchedCandidate TopCand;
258 
259  for (SUnit* SU : TopReadySUs) {
260  SISchedCandidate TryCand;
261  std::vector<unsigned> pressure;
262  std::vector<unsigned> MaxPressure;
263  // Predict register usage after this instruction.
264  TryCand.SU = SU;
265  TopRPTracker.getDownwardPressure(SU->getInstr(), pressure, MaxPressure);
266  TryCand.SGPRUsage = pressure[DAG->getSGPRSetID()];
267  TryCand.VGPRUsage = pressure[DAG->getVGPRSetID()];
268  TryCand.IsLowLatency = DAG->IsLowLatencySU[SU->NodeNum];
269  TryCand.LowLatencyOffset = DAG->LowLatencyOffset[SU->NodeNum];
270  TryCand.HasLowLatencyNonWaitedParent =
271  HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]];
272  tryCandidateTopDown(TopCand, TryCand);
273  if (TryCand.Reason != NoCand)
274  TopCand.setBest(TryCand);
275  }
276 
277  return TopCand.SU;
278 }
279 
280 
281 // Schedule something valid.
283  TopReadySUs.clear();
284  if (Scheduled)
285  undoSchedule();
286 
287  for (SUnit* SU : SUnits) {
288  if (!SU->NumPredsLeft)
289  TopReadySUs.push_back(SU);
290  }
291 
292  while (!TopReadySUs.empty()) {
293  SUnit *SU = TopReadySUs[0];
294  ScheduledSUnits.push_back(SU);
295  nodeScheduled(SU);
296  }
297 
298  Scheduled = true;
299 }
300 
301 // Returns if the register was set between first and last.
302 static bool isDefBetween(unsigned Reg,
303  SlotIndex First, SlotIndex Last,
304  const MachineRegisterInfo *MRI,
305  const LiveIntervals *LIS) {
307  UI = MRI->def_instr_begin(Reg),
308  UE = MRI->def_instr_end(); UI != UE; ++UI) {
309  const MachineInstr* MI = &*UI;
310  if (MI->isDebugValue())
311  continue;
312  SlotIndex InstSlot = LIS->getInstructionIndex(*MI).getRegSlot();
313  if (InstSlot >= First && InstSlot <= Last)
314  return true;
315  }
316  return false;
317 }
318 
319 void SIScheduleBlock::initRegPressure(MachineBasicBlock::iterator BeginBlock,
320  MachineBasicBlock::iterator EndBlock) {
321  IntervalPressure Pressure, BotPressure;
322  RegPressureTracker RPTracker(Pressure), BotRPTracker(BotPressure);
323  LiveIntervals *LIS = DAG->getLIS();
324  MachineRegisterInfo *MRI = DAG->getMRI();
325  DAG->initRPTracker(TopRPTracker);
326  DAG->initRPTracker(BotRPTracker);
327  DAG->initRPTracker(RPTracker);
328 
329  // Goes though all SU. RPTracker captures what had to be alive for the SUs
330  // to execute, and what is still alive at the end.
331  for (SUnit* SU : ScheduledSUnits) {
332  RPTracker.setPos(SU->getInstr());
333  RPTracker.advance();
334  }
335 
336  // Close the RPTracker to finalize live ins/outs.
337  RPTracker.closeRegion();
338 
339  // Initialize the live ins and live outs.
340  TopRPTracker.addLiveRegs(RPTracker.getPressure().LiveInRegs);
341  BotRPTracker.addLiveRegs(RPTracker.getPressure().LiveOutRegs);
342 
343  // Do not Track Physical Registers, because it messes up.
344  for (const auto &RegMaskPair : RPTracker.getPressure().LiveInRegs) {
345  if (TargetRegisterInfo::isVirtualRegister(RegMaskPair.RegUnit))
346  LiveInRegs.insert(RegMaskPair.RegUnit);
347  }
348  LiveOutRegs.clear();
349  // There is several possibilities to distinguish:
350  // 1) Reg is not input to any instruction in the block, but is output of one
351  // 2) 1) + read in the block and not needed after it
352  // 3) 1) + read in the block but needed in another block
353  // 4) Reg is input of an instruction but another block will read it too
354  // 5) Reg is input of an instruction and then rewritten in the block.
355  // result is not read in the block (implies used in another block)
356  // 6) Reg is input of an instruction and then rewritten in the block.
357  // result is read in the block and not needed in another block
358  // 7) Reg is input of an instruction and then rewritten in the block.
359  // result is read in the block but also needed in another block
360  // LiveInRegs will contains all the regs in situation 4, 5, 6, 7
361  // We want LiveOutRegs to contain only Regs whose content will be read after
362  // in another block, and whose content was written in the current block,
363  // that is we want it to get 1, 3, 5, 7
364  // Since we made the MIs of a block to be packed all together before
365  // scheduling, then the LiveIntervals were correct, and the RPTracker was
366  // able to correctly handle 5 vs 6, 2 vs 3.
367  // (Note: This is not sufficient for RPTracker to not do mistakes for case 4)
368  // The RPTracker's LiveOutRegs has 1, 3, (some correct or incorrect)4, 5, 7
369  // Comparing to LiveInRegs is not sufficient to differenciate 4 vs 5, 7
370  // The use of findDefBetween removes the case 4.
371  for (const auto &RegMaskPair : RPTracker.getPressure().LiveOutRegs) {
372  unsigned Reg = RegMaskPair.RegUnit;
374  isDefBetween(Reg, LIS->getInstructionIndex(*BeginBlock).getRegSlot(),
375  LIS->getInstructionIndex(*EndBlock).getRegSlot(), MRI,
376  LIS)) {
377  LiveOutRegs.insert(Reg);
378  }
379  }
380 
381  // Pressure = sum_alive_registers register size
382  // Internally llvm will represent some registers as big 128 bits registers
383  // for example, but they actually correspond to 4 actual 32 bits registers.
384  // Thus Pressure is not equal to num_alive_registers * constant.
385  LiveInPressure = TopPressure.MaxSetPressure;
386  LiveOutPressure = BotPressure.MaxSetPressure;
387 
388  // Prepares TopRPTracker for top down scheduling.
389  TopRPTracker.closeTop();
390 }
391 
393  MachineBasicBlock::iterator EndBlock) {
394  if (!Scheduled)
395  fastSchedule();
396 
397  // PreScheduling phase to set LiveIn and LiveOut.
398  initRegPressure(BeginBlock, EndBlock);
399  undoSchedule();
400 
401  // Schedule for real now.
402 
403  TopReadySUs.clear();
404 
405  for (SUnit* SU : SUnits) {
406  if (!SU->NumPredsLeft)
407  TopReadySUs.push_back(SU);
408  }
409 
410  while (!TopReadySUs.empty()) {
411  SUnit *SU = pickNode();
412  ScheduledSUnits.push_back(SU);
413  TopRPTracker.setPos(SU->getInstr());
414  TopRPTracker.advance();
415  nodeScheduled(SU);
416  }
417 
418  // TODO: compute InternalAdditionnalPressure.
419  InternalAdditionnalPressure.resize(TopPressure.MaxSetPressure.size());
420 
421  // Check everything is right.
422 #ifndef NDEBUG
423  assert(SUnits.size() == ScheduledSUnits.size() &&
424  TopReadySUs.empty());
425  for (SUnit* SU : SUnits) {
426  assert(SU->isScheduled &&
427  SU->NumPredsLeft == 0);
428  }
429 #endif
430 
431  Scheduled = true;
432 }
433 
434 void SIScheduleBlock::undoSchedule() {
435  for (SUnit* SU : SUnits) {
436  SU->isScheduled = false;
437  for (SDep& Succ : SU->Succs) {
438  if (BC->isSUInBlock(Succ.getSUnit(), ID))
439  undoReleaseSucc(SU, &Succ);
440  }
441  }
442  HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);
443  ScheduledSUnits.clear();
444  Scheduled = false;
445 }
446 
447 void SIScheduleBlock::undoReleaseSucc(SUnit *SU, SDep *SuccEdge) {
448  SUnit *SuccSU = SuccEdge->getSUnit();
449 
450  if (SuccEdge->isWeak()) {
451  ++SuccSU->WeakPredsLeft;
452  return;
453  }
454  ++SuccSU->NumPredsLeft;
455 }
456 
457 void SIScheduleBlock::releaseSucc(SUnit *SU, SDep *SuccEdge) {
458  SUnit *SuccSU = SuccEdge->getSUnit();
459 
460  if (SuccEdge->isWeak()) {
461  --SuccSU->WeakPredsLeft;
462  return;
463  }
464 #ifndef NDEBUG
465  if (SuccSU->NumPredsLeft == 0) {
466  dbgs() << "*** Scheduling failed! ***\n";
467  SuccSU->dump(DAG);
468  dbgs() << " has been released too many times!\n";
469  llvm_unreachable(nullptr);
470  }
471 #endif
472 
473  --SuccSU->NumPredsLeft;
474 }
475 
476 /// Release Successors of the SU that are in the block or not.
477 void SIScheduleBlock::releaseSuccessors(SUnit *SU, bool InOrOutBlock) {
478  for (SDep& Succ : SU->Succs) {
479  SUnit *SuccSU = Succ.getSUnit();
480 
481  if (SuccSU->NodeNum >= DAG->SUnits.size())
482  continue;
483 
484  if (BC->isSUInBlock(SuccSU, ID) != InOrOutBlock)
485  continue;
486 
487  releaseSucc(SU, &Succ);
488  if (SuccSU->NumPredsLeft == 0 && InOrOutBlock)
489  TopReadySUs.push_back(SuccSU);
490  }
491 }
492 
493 void SIScheduleBlock::nodeScheduled(SUnit *SU) {
494  // Is in TopReadySUs
495  assert (!SU->NumPredsLeft);
496  std::vector<SUnit *>::iterator I = llvm::find(TopReadySUs, SU);
497  if (I == TopReadySUs.end()) {
498  dbgs() << "Data Structure Bug in SI Scheduler\n";
499  llvm_unreachable(nullptr);
500  }
501  TopReadySUs.erase(I);
502 
503  releaseSuccessors(SU, true);
504  // Scheduling this node will trigger a wait,
505  // thus propagate to other instructions that they do not need to wait either.
506  if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]])
507  HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);
508 
509  if (DAG->IsLowLatencySU[SU->NodeNum]) {
510  for (SDep& Succ : SU->Succs) {
511  std::map<unsigned, unsigned>::iterator I =
512  NodeNum2Index.find(Succ.getSUnit()->NodeNum);
513  if (I != NodeNum2Index.end())
514  HasLowLatencyNonWaitedParent[I->second] = 1;
515  }
516  }
517  SU->isScheduled = true;
518 }
519 
521  // We remove links from outside blocks to enable scheduling inside the block.
522  for (SUnit* SU : SUnits) {
523  releaseSuccessors(SU, false);
524  if (DAG->IsHighLatencySU[SU->NodeNum])
525  HighLatencyBlock = true;
526  }
527  HasLowLatencyNonWaitedParent.resize(SUnits.size(), 0);
528 }
529 
530 // we maintain ascending order of IDs
532  unsigned PredID = Pred->getID();
533 
534  // Check if not already predecessor.
535  for (SIScheduleBlock* P : Preds) {
536  if (PredID == P->getID())
537  return;
538  }
539  Preds.push_back(Pred);
540 
541  assert(none_of(Succs,
542  [=](SIScheduleBlock *S) { return PredID == S->getID(); }) &&
543  "Loop in the Block Graph!");
544 }
545 
547  unsigned SuccID = Succ->getID();
548 
549  // Check if not already predecessor.
550  for (SIScheduleBlock* S : Succs) {
551  if (SuccID == S->getID())
552  return;
553  }
554  if (Succ->isHighLatencyBlock())
555  ++NumHighLatencySuccessors;
556  Succs.push_back(Succ);
557  assert(none_of(Preds,
558  [=](SIScheduleBlock *P) { return SuccID == P->getID(); }) &&
559  "Loop in the Block Graph!");
560 }
561 
562 #ifndef NDEBUG
564  dbgs() << "Block (" << ID << ")\n";
565  if (!full)
566  return;
567 
568  dbgs() << "\nContains High Latency Instruction: "
569  << HighLatencyBlock << '\n';
570  dbgs() << "\nDepends On:\n";
571  for (SIScheduleBlock* P : Preds) {
572  P->printDebug(false);
573  }
574 
575  dbgs() << "\nSuccessors:\n";
576  for (SIScheduleBlock* S : Succs) {
577  S->printDebug(false);
578  }
579 
580  if (Scheduled) {
581  dbgs() << "LiveInPressure " << LiveInPressure[DAG->getSGPRSetID()] << ' '
582  << LiveInPressure[DAG->getVGPRSetID()] << '\n';
583  dbgs() << "LiveOutPressure " << LiveOutPressure[DAG->getSGPRSetID()] << ' '
584  << LiveOutPressure[DAG->getVGPRSetID()] << "\n\n";
585  dbgs() << "LiveIns:\n";
586  for (unsigned Reg : LiveInRegs)
587  dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';
588 
589  dbgs() << "\nLiveOuts:\n";
590  for (unsigned Reg : LiveOutRegs)
591  dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';
592  }
593 
594  dbgs() << "\nInstructions:\n";
595  if (!Scheduled) {
596  for (SUnit* SU : SUnits) {
597  SU->dump(DAG);
598  }
599  } else {
600  for (SUnit* SU : SUnits) {
601  SU->dump(DAG);
602  }
603  }
604 
605  dbgs() << "///////////////////////\n";
606 }
607 #endif
608 
609 // SIScheduleBlockCreator //
610 
612 DAG(DAG) {
613 }
614 
616 
619  std::map<SISchedulerBlockCreatorVariant, SIScheduleBlocks>::iterator B =
620  Blocks.find(BlockVariant);
621  if (B == Blocks.end()) {
622  SIScheduleBlocks Res;
623  createBlocksForVariant(BlockVariant);
624  topologicalSort();
625  scheduleInsideBlocks();
626  fillStats();
627  Res.Blocks = CurrentBlocks;
628  Res.TopDownIndex2Block = TopDownIndex2Block;
629  Res.TopDownBlock2Index = TopDownBlock2Index;
630  Blocks[BlockVariant] = Res;
631  return Res;
632  } else {
633  return B->second;
634  }
635 }
636 
638  if (SU->NodeNum >= DAG->SUnits.size())
639  return false;
640  return CurrentBlocks[Node2CurrentBlock[SU->NodeNum]]->getID() == ID;
641 }
642 
643 void SIScheduleBlockCreator::colorHighLatenciesAlone() {
644  unsigned DAGSize = DAG->SUnits.size();
645 
646  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
647  SUnit *SU = &DAG->SUnits[i];
648  if (DAG->IsHighLatencySU[SU->NodeNum]) {
649  CurrentColoring[SU->NodeNum] = NextReservedID++;
650  }
651  }
652 }
653 
654 void SIScheduleBlockCreator::colorHighLatenciesGroups() {
655  unsigned DAGSize = DAG->SUnits.size();
656  unsigned NumHighLatencies = 0;
657  unsigned GroupSize;
658  unsigned Color = NextReservedID;
659  unsigned Count = 0;
660  std::set<unsigned> FormingGroup;
661 
662  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
663  SUnit *SU = &DAG->SUnits[i];
664  if (DAG->IsHighLatencySU[SU->NodeNum])
665  ++NumHighLatencies;
666  }
667 
668  if (NumHighLatencies == 0)
669  return;
670 
671  if (NumHighLatencies <= 6)
672  GroupSize = 2;
673  else if (NumHighLatencies <= 12)
674  GroupSize = 3;
675  else
676  GroupSize = 4;
677 
678  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
679  SUnit *SU = &DAG->SUnits[i];
680  if (DAG->IsHighLatencySU[SU->NodeNum]) {
681  unsigned CompatibleGroup = true;
682  unsigned ProposedColor = Color;
683  for (unsigned j : FormingGroup) {
684  // TODO: Currently CompatibleGroup will always be false,
685  // because the graph enforces the load order. This
686  // can be fixed, but as keeping the load order is often
687  // good for performance that causes a performance hit (both
688  // the default scheduler and this scheduler).
689  // When this scheduler determines a good load order,
690  // this can be fixed.
691  if (!DAG->canAddEdge(SU, &DAG->SUnits[j]) ||
692  !DAG->canAddEdge(&DAG->SUnits[j], SU))
693  CompatibleGroup = false;
694  }
695  if (!CompatibleGroup || ++Count == GroupSize) {
696  FormingGroup.clear();
697  Color = ++NextReservedID;
698  if (!CompatibleGroup) {
699  ProposedColor = Color;
700  FormingGroup.insert(SU->NodeNum);
701  }
702  Count = 0;
703  } else {
704  FormingGroup.insert(SU->NodeNum);
705  }
706  CurrentColoring[SU->NodeNum] = ProposedColor;
707  }
708  }
709 }
710 
711 void SIScheduleBlockCreator::colorComputeReservedDependencies() {
712  unsigned DAGSize = DAG->SUnits.size();
713  std::map<std::set<unsigned>, unsigned> ColorCombinations;
714 
715  CurrentTopDownReservedDependencyColoring.clear();
716  CurrentBottomUpReservedDependencyColoring.clear();
717 
718  CurrentTopDownReservedDependencyColoring.resize(DAGSize, 0);
719  CurrentBottomUpReservedDependencyColoring.resize(DAGSize, 0);
720 
721  // Traverse TopDown, and give different colors to SUs depending
722  // on which combination of High Latencies they depend on.
723 
724  for (unsigned SUNum : DAG->TopDownIndex2SU) {
725  SUnit *SU = &DAG->SUnits[SUNum];
726  std::set<unsigned> SUColors;
727 
728  // Already given.
729  if (CurrentColoring[SU->NodeNum]) {
730  CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
731  CurrentColoring[SU->NodeNum];
732  continue;
733  }
734 
735  for (SDep& PredDep : SU->Preds) {
736  SUnit *Pred = PredDep.getSUnit();
737  if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)
738  continue;
739  if (CurrentTopDownReservedDependencyColoring[Pred->NodeNum] > 0)
740  SUColors.insert(CurrentTopDownReservedDependencyColoring[Pred->NodeNum]);
741  }
742  // Color 0 by default.
743  if (SUColors.empty())
744  continue;
745  // Same color than parents.
746  if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)
747  CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
748  *SUColors.begin();
749  else {
750  std::map<std::set<unsigned>, unsigned>::iterator Pos =
751  ColorCombinations.find(SUColors);
752  if (Pos != ColorCombinations.end()) {
753  CurrentTopDownReservedDependencyColoring[SU->NodeNum] = Pos->second;
754  } else {
755  CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
756  NextNonReservedID;
757  ColorCombinations[SUColors] = NextNonReservedID++;
758  }
759  }
760  }
761 
762  ColorCombinations.clear();
763 
764  // Same as before, but BottomUp.
765 
766  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
767  SUnit *SU = &DAG->SUnits[SUNum];
768  std::set<unsigned> SUColors;
769 
770  // Already given.
771  if (CurrentColoring[SU->NodeNum]) {
772  CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
773  CurrentColoring[SU->NodeNum];
774  continue;
775  }
776 
777  for (SDep& SuccDep : SU->Succs) {
778  SUnit *Succ = SuccDep.getSUnit();
779  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
780  continue;
781  if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0)
782  SUColors.insert(CurrentBottomUpReservedDependencyColoring[Succ->NodeNum]);
783  }
784  // Keep color 0.
785  if (SUColors.empty())
786  continue;
787  // Same color than parents.
788  if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)
789  CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
790  *SUColors.begin();
791  else {
792  std::map<std::set<unsigned>, unsigned>::iterator Pos =
793  ColorCombinations.find(SUColors);
794  if (Pos != ColorCombinations.end()) {
795  CurrentBottomUpReservedDependencyColoring[SU->NodeNum] = Pos->second;
796  } else {
797  CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
798  NextNonReservedID;
799  ColorCombinations[SUColors] = NextNonReservedID++;
800  }
801  }
802  }
803 }
804 
805 void SIScheduleBlockCreator::colorAccordingToReservedDependencies() {
806  unsigned DAGSize = DAG->SUnits.size();
807  std::map<std::pair<unsigned, unsigned>, unsigned> ColorCombinations;
808 
809  // Every combination of colors given by the top down
810  // and bottom up Reserved node dependency
811 
812  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
813  SUnit *SU = &DAG->SUnits[i];
814  std::pair<unsigned, unsigned> SUColors;
815 
816  // High latency instructions: already given.
817  if (CurrentColoring[SU->NodeNum])
818  continue;
819 
820  SUColors.first = CurrentTopDownReservedDependencyColoring[SU->NodeNum];
821  SUColors.second = CurrentBottomUpReservedDependencyColoring[SU->NodeNum];
822 
823  std::map<std::pair<unsigned, unsigned>, unsigned>::iterator Pos =
824  ColorCombinations.find(SUColors);
825  if (Pos != ColorCombinations.end()) {
826  CurrentColoring[SU->NodeNum] = Pos->second;
827  } else {
828  CurrentColoring[SU->NodeNum] = NextNonReservedID;
829  ColorCombinations[SUColors] = NextNonReservedID++;
830  }
831  }
832 }
833 
834 void SIScheduleBlockCreator::colorEndsAccordingToDependencies() {
835  unsigned DAGSize = DAG->SUnits.size();
836  std::vector<int> PendingColoring = CurrentColoring;
837 
838  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
839  SUnit *SU = &DAG->SUnits[SUNum];
840  std::set<unsigned> SUColors;
841  std::set<unsigned> SUColorsPending;
842 
843  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
844  continue;
845 
846  if (CurrentBottomUpReservedDependencyColoring[SU->NodeNum] > 0 ||
847  CurrentTopDownReservedDependencyColoring[SU->NodeNum] > 0)
848  continue;
849 
850  for (SDep& SuccDep : SU->Succs) {
851  SUnit *Succ = SuccDep.getSUnit();
852  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
853  continue;
854  if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0 ||
855  CurrentTopDownReservedDependencyColoring[Succ->NodeNum] > 0)
856  SUColors.insert(CurrentColoring[Succ->NodeNum]);
857  SUColorsPending.insert(PendingColoring[Succ->NodeNum]);
858  }
859  if (SUColors.size() == 1 && SUColorsPending.size() == 1)
860  PendingColoring[SU->NodeNum] = *SUColors.begin();
861  else // TODO: Attribute new colors depending on color
862  // combination of children.
863  PendingColoring[SU->NodeNum] = NextNonReservedID++;
864  }
865  CurrentColoring = PendingColoring;
866 }
867 
868 
869 void SIScheduleBlockCreator::colorForceConsecutiveOrderInGroup() {
870  unsigned DAGSize = DAG->SUnits.size();
871  unsigned PreviousColor;
872  std::set<unsigned> SeenColors;
873 
874  if (DAGSize <= 1)
875  return;
876 
877  PreviousColor = CurrentColoring[0];
878 
879  for (unsigned i = 1, e = DAGSize; i != e; ++i) {
880  SUnit *SU = &DAG->SUnits[i];
881  unsigned CurrentColor = CurrentColoring[i];
882  unsigned PreviousColorSave = PreviousColor;
883  assert(i == SU->NodeNum);
884 
885  if (CurrentColor != PreviousColor)
886  SeenColors.insert(PreviousColor);
887  PreviousColor = CurrentColor;
888 
889  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
890  continue;
891 
892  if (SeenColors.find(CurrentColor) == SeenColors.end())
893  continue;
894 
895  if (PreviousColorSave != CurrentColor)
896  CurrentColoring[i] = NextNonReservedID++;
897  else
898  CurrentColoring[i] = CurrentColoring[i-1];
899  }
900 }
901 
902 void SIScheduleBlockCreator::colorMergeConstantLoadsNextGroup() {
903  unsigned DAGSize = DAG->SUnits.size();
904 
905  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
906  SUnit *SU = &DAG->SUnits[SUNum];
907  std::set<unsigned> SUColors;
908 
909  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
910  continue;
911 
912  // No predecessor: Vgpr constant loading.
913  // Low latency instructions usually have a predecessor (the address)
914  if (SU->Preds.size() > 0 && !DAG->IsLowLatencySU[SU->NodeNum])
915  continue;
916 
917  for (SDep& SuccDep : SU->Succs) {
918  SUnit *Succ = SuccDep.getSUnit();
919  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
920  continue;
921  SUColors.insert(CurrentColoring[Succ->NodeNum]);
922  }
923  if (SUColors.size() == 1)
924  CurrentColoring[SU->NodeNum] = *SUColors.begin();
925  }
926 }
927 
928 void SIScheduleBlockCreator::colorMergeIfPossibleNextGroup() {
929  unsigned DAGSize = DAG->SUnits.size();
930 
931  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
932  SUnit *SU = &DAG->SUnits[SUNum];
933  std::set<unsigned> SUColors;
934 
935  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
936  continue;
937 
938  for (SDep& SuccDep : SU->Succs) {
939  SUnit *Succ = SuccDep.getSUnit();
940  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
941  continue;
942  SUColors.insert(CurrentColoring[Succ->NodeNum]);
943  }
944  if (SUColors.size() == 1)
945  CurrentColoring[SU->NodeNum] = *SUColors.begin();
946  }
947 }
948 
949 void SIScheduleBlockCreator::colorMergeIfPossibleNextGroupOnlyForReserved() {
950  unsigned DAGSize = DAG->SUnits.size();
951 
952  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
953  SUnit *SU = &DAG->SUnits[SUNum];
954  std::set<unsigned> SUColors;
955 
956  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
957  continue;
958 
959  for (SDep& SuccDep : SU->Succs) {
960  SUnit *Succ = SuccDep.getSUnit();
961  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
962  continue;
963  SUColors.insert(CurrentColoring[Succ->NodeNum]);
964  }
965  if (SUColors.size() == 1 && *SUColors.begin() <= DAGSize)
966  CurrentColoring[SU->NodeNum] = *SUColors.begin();
967  }
968 }
969 
970 void SIScheduleBlockCreator::colorMergeIfPossibleSmallGroupsToNextGroup() {
971  unsigned DAGSize = DAG->SUnits.size();
972  std::map<unsigned, unsigned> ColorCount;
973 
974  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
975  SUnit *SU = &DAG->SUnits[SUNum];
976  unsigned color = CurrentColoring[SU->NodeNum];
977  std::map<unsigned, unsigned>::iterator Pos = ColorCount.find(color);
978  if (Pos != ColorCount.end()) {
979  ++ColorCount[color];
980  } else {
981  ColorCount[color] = 1;
982  }
983  }
984 
985  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
986  SUnit *SU = &DAG->SUnits[SUNum];
987  unsigned color = CurrentColoring[SU->NodeNum];
988  std::set<unsigned> SUColors;
989 
990  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
991  continue;
992 
993  if (ColorCount[color] > 1)
994  continue;
995 
996  for (SDep& SuccDep : SU->Succs) {
997  SUnit *Succ = SuccDep.getSUnit();
998  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
999  continue;
1000  SUColors.insert(CurrentColoring[Succ->NodeNum]);
1001  }
1002  if (SUColors.size() == 1 && *SUColors.begin() != color) {
1003  --ColorCount[color];
1004  CurrentColoring[SU->NodeNum] = *SUColors.begin();
1005  ++ColorCount[*SUColors.begin()];
1006  }
1007  }
1008 }
1009 
1010 void SIScheduleBlockCreator::cutHugeBlocks() {
1011  // TODO
1012 }
1013 
1014 void SIScheduleBlockCreator::regroupNoUserInstructions() {
1015  unsigned DAGSize = DAG->SUnits.size();
1016  int GroupID = NextNonReservedID++;
1017 
1018  for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1019  SUnit *SU = &DAG->SUnits[SUNum];
1020  bool hasSuccessor = false;
1021 
1022  if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
1023  continue;
1024 
1025  for (SDep& SuccDep : SU->Succs) {
1026  SUnit *Succ = SuccDep.getSUnit();
1027  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1028  continue;
1029  hasSuccessor = true;
1030  }
1031  if (!hasSuccessor)
1032  CurrentColoring[SU->NodeNum] = GroupID;
1033  }
1034 }
1035 
1036 void SIScheduleBlockCreator::createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant) {
1037  unsigned DAGSize = DAG->SUnits.size();
1038  std::map<unsigned,unsigned> RealID;
1039 
1040  CurrentBlocks.clear();
1041  CurrentColoring.clear();
1042  CurrentColoring.resize(DAGSize, 0);
1043  Node2CurrentBlock.clear();
1044 
1045  // Restore links previous scheduling variant has overridden.
1046  DAG->restoreSULinksLeft();
1047 
1048  NextReservedID = 1;
1049  NextNonReservedID = DAGSize + 1;
1050 
1051  DEBUG(dbgs() << "Coloring the graph\n");
1052 
1054  colorHighLatenciesGroups();
1055  else
1056  colorHighLatenciesAlone();
1057  colorComputeReservedDependencies();
1058  colorAccordingToReservedDependencies();
1059  colorEndsAccordingToDependencies();
1061  colorForceConsecutiveOrderInGroup();
1062  regroupNoUserInstructions();
1063  colorMergeConstantLoadsNextGroup();
1064  colorMergeIfPossibleNextGroupOnlyForReserved();
1065 
1066  // Put SUs of same color into same block
1067  Node2CurrentBlock.resize(DAGSize, -1);
1068  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1069  SUnit *SU = &DAG->SUnits[i];
1070  unsigned Color = CurrentColoring[SU->NodeNum];
1071  if (RealID.find(Color) == RealID.end()) {
1072  int ID = CurrentBlocks.size();
1073  BlockPtrs.push_back(llvm::make_unique<SIScheduleBlock>(DAG, this, ID));
1074  CurrentBlocks.push_back(BlockPtrs.rbegin()->get());
1075  RealID[Color] = ID;
1076  }
1077  CurrentBlocks[RealID[Color]]->addUnit(SU);
1078  Node2CurrentBlock[SU->NodeNum] = RealID[Color];
1079  }
1080 
1081  // Build dependencies between blocks.
1082  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1083  SUnit *SU = &DAG->SUnits[i];
1084  int SUID = Node2CurrentBlock[i];
1085  for (SDep& SuccDep : SU->Succs) {
1086  SUnit *Succ = SuccDep.getSUnit();
1087  if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1088  continue;
1089  if (Node2CurrentBlock[Succ->NodeNum] != SUID)
1090  CurrentBlocks[SUID]->addSucc(CurrentBlocks[Node2CurrentBlock[Succ->NodeNum]]);
1091  }
1092  for (SDep& PredDep : SU->Preds) {
1093  SUnit *Pred = PredDep.getSUnit();
1094  if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)
1095  continue;
1096  if (Node2CurrentBlock[Pred->NodeNum] != SUID)
1097  CurrentBlocks[SUID]->addPred(CurrentBlocks[Node2CurrentBlock[Pred->NodeNum]]);
1098  }
1099  }
1100 
1101  // Free root and leafs of all blocks to enable scheduling inside them.
1102  for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
1103  SIScheduleBlock *Block = CurrentBlocks[i];
1104  Block->finalizeUnits();
1105  }
1106  DEBUG(
1107  dbgs() << "Blocks created:\n\n";
1108  for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
1109  SIScheduleBlock *Block = CurrentBlocks[i];
1110  Block->printDebug(true);
1111  }
1112  );
1113 }
1114 
1115 // Two functions taken from Codegen/MachineScheduler.cpp
1116 
1117 /// Non-const version.
1121  for (; I != End; ++I) {
1122  if (!I->isDebugValue())
1123  break;
1124  }
1125  return I;
1126 }
1127 
1128 void SIScheduleBlockCreator::topologicalSort() {
1129  unsigned DAGSize = CurrentBlocks.size();
1130  std::vector<int> WorkList;
1131 
1132  DEBUG(dbgs() << "Topological Sort\n");
1133 
1134  WorkList.reserve(DAGSize);
1135  TopDownIndex2Block.resize(DAGSize);
1136  TopDownBlock2Index.resize(DAGSize);
1137  BottomUpIndex2Block.resize(DAGSize);
1138 
1139  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1140  SIScheduleBlock *Block = CurrentBlocks[i];
1141  unsigned Degree = Block->getSuccs().size();
1142  TopDownBlock2Index[i] = Degree;
1143  if (Degree == 0) {
1144  WorkList.push_back(i);
1145  }
1146  }
1147 
1148  int Id = DAGSize;
1149  while (!WorkList.empty()) {
1150  int i = WorkList.back();
1151  SIScheduleBlock *Block = CurrentBlocks[i];
1152  WorkList.pop_back();
1153  TopDownBlock2Index[i] = --Id;
1154  TopDownIndex2Block[Id] = i;
1155  for (SIScheduleBlock* Pred : Block->getPreds()) {
1156  if (!--TopDownBlock2Index[Pred->getID()])
1157  WorkList.push_back(Pred->getID());
1158  }
1159  }
1160 
1161 #ifndef NDEBUG
1162  // Check correctness of the ordering.
1163  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1164  SIScheduleBlock *Block = CurrentBlocks[i];
1165  for (SIScheduleBlock* Pred : Block->getPreds()) {
1166  assert(TopDownBlock2Index[i] > TopDownBlock2Index[Pred->getID()] &&
1167  "Wrong Top Down topological sorting");
1168  }
1169  }
1170 #endif
1171 
1172  BottomUpIndex2Block = std::vector<int>(TopDownIndex2Block.rbegin(),
1173  TopDownIndex2Block.rend());
1174 }
1175 
1176 void SIScheduleBlockCreator::scheduleInsideBlocks() {
1177  unsigned DAGSize = CurrentBlocks.size();
1178 
1179  DEBUG(dbgs() << "\nScheduling Blocks\n\n");
1180 
1181  // We do schedule a valid scheduling such that a Block corresponds
1182  // to a range of instructions.
1183  DEBUG(dbgs() << "First phase: Fast scheduling for Reg Liveness\n");
1184  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1185  SIScheduleBlock *Block = CurrentBlocks[i];
1186  Block->fastSchedule();
1187  }
1188 
1189  // Note: the following code, and the part restoring previous position
1190  // is by far the most expensive operation of the Scheduler.
1191 
1192  // Do not update CurrentTop.
1193  MachineBasicBlock::iterator CurrentTopFastSched = DAG->getCurrentTop();
1194  std::vector<MachineBasicBlock::iterator> PosOld;
1195  std::vector<MachineBasicBlock::iterator> PosNew;
1196  PosOld.reserve(DAG->SUnits.size());
1197  PosNew.reserve(DAG->SUnits.size());
1198 
1199  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1200  int BlockIndice = TopDownIndex2Block[i];
1201  SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1202  std::vector<SUnit*> SUs = Block->getScheduledUnits();
1203 
1204  for (SUnit* SU : SUs) {
1205  MachineInstr *MI = SU->getInstr();
1207  PosOld.push_back(Pos);
1208  if (&*CurrentTopFastSched == MI) {
1209  PosNew.push_back(Pos);
1210  CurrentTopFastSched = nextIfDebug(++CurrentTopFastSched,
1211  DAG->getCurrentBottom());
1212  } else {
1213  // Update the instruction stream.
1214  DAG->getBB()->splice(CurrentTopFastSched, DAG->getBB(), MI);
1215 
1216  // Update LiveIntervals.
1217  // Note: Moving all instructions and calling handleMove every time
1218  // is the most cpu intensive operation of the scheduler.
1219  // It would gain a lot if there was a way to recompute the
1220  // LiveIntervals for the entire scheduling region.
1221  DAG->getLIS()->handleMove(*MI, /*UpdateFlags=*/true);
1222  PosNew.push_back(CurrentTopFastSched);
1223  }
1224  }
1225  }
1226 
1227  // Now we have Block of SUs == Block of MI.
1228  // We do the final schedule for the instructions inside the block.
1229  // The property that all the SUs of the Block are grouped together as MI
1230  // is used for correct reg usage tracking.
1231  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1232  SIScheduleBlock *Block = CurrentBlocks[i];
1233  std::vector<SUnit*> SUs = Block->getScheduledUnits();
1234  Block->schedule((*SUs.begin())->getInstr(), (*SUs.rbegin())->getInstr());
1235  }
1236 
1237  DEBUG(dbgs() << "Restoring MI Pos\n");
1238  // Restore old ordering (which prevents a LIS->handleMove bug).
1239  for (unsigned i = PosOld.size(), e = 0; i != e; --i) {
1240  MachineBasicBlock::iterator POld = PosOld[i-1];
1241  MachineBasicBlock::iterator PNew = PosNew[i-1];
1242  if (PNew != POld) {
1243  // Update the instruction stream.
1244  DAG->getBB()->splice(POld, DAG->getBB(), PNew);
1245 
1246  // Update LiveIntervals.
1247  DAG->getLIS()->handleMove(*POld, /*UpdateFlags=*/true);
1248  }
1249  }
1250 
1251  DEBUG(
1252  for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
1253  SIScheduleBlock *Block = CurrentBlocks[i];
1254  Block->printDebug(true);
1255  }
1256  );
1257 }
1258 
1259 void SIScheduleBlockCreator::fillStats() {
1260  unsigned DAGSize = CurrentBlocks.size();
1261 
1262  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1263  int BlockIndice = TopDownIndex2Block[i];
1264  SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1265  if (Block->getPreds().empty())
1266  Block->Depth = 0;
1267  else {
1268  unsigned Depth = 0;
1269  for (SIScheduleBlock *Pred : Block->getPreds()) {
1270  if (Depth < Pred->Depth + 1)
1271  Depth = Pred->Depth + 1;
1272  }
1273  Block->Depth = Depth;
1274  }
1275  }
1276 
1277  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1278  int BlockIndice = BottomUpIndex2Block[i];
1279  SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1280  if (Block->getSuccs().empty())
1281  Block->Height = 0;
1282  else {
1283  unsigned Height = 0;
1284  for (SIScheduleBlock *Succ : Block->getSuccs()) {
1285  if (Height < Succ->Height + 1)
1286  Height = Succ->Height + 1;
1287  }
1288  Block->Height = Height;
1289  }
1290  }
1291 }
1292 
1293 // SIScheduleBlockScheduler //
1294 
1295 SIScheduleBlockScheduler::SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,
1297  SIScheduleBlocks BlocksStruct) :
1298  DAG(DAG), Variant(Variant), Blocks(BlocksStruct.Blocks),
1299  LastPosWaitedHighLatency(0), NumBlockScheduled(0), VregCurrentUsage(0),
1300  SregCurrentUsage(0), maxVregUsage(0), maxSregUsage(0) {
1301 
1302  // Fill the usage of every output
1303  // Warning: while by construction we always have a link between two blocks
1304  // when one needs a result from the other, the number of users of an output
1305  // is not the sum of child blocks having as input the same virtual register.
1306  // Here is an example. A produces x and y. B eats x and produces x'.
1307  // C eats x' and y. The register coalescer may have attributed the same
1308  // virtual register to x and x'.
1309  // To count accurately, we do a topological sort. In case the register is
1310  // found for several parents, we increment the usage of the one with the
1311  // highest topological index.
1312  LiveOutRegsNumUsages.resize(Blocks.size());
1313  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1314  SIScheduleBlock *Block = Blocks[i];
1315  for (unsigned Reg : Block->getInRegs()) {
1316  bool Found = false;
1317  int topoInd = -1;
1318  for (SIScheduleBlock* Pred: Block->getPreds()) {
1319  std::set<unsigned> PredOutRegs = Pred->getOutRegs();
1320  std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);
1321 
1322  if (RegPos != PredOutRegs.end()) {
1323  Found = true;
1324  if (topoInd < BlocksStruct.TopDownBlock2Index[Pred->getID()]) {
1325  topoInd = BlocksStruct.TopDownBlock2Index[Pred->getID()];
1326  }
1327  }
1328  }
1329 
1330  if (!Found)
1331  continue;
1332 
1333  int PredID = BlocksStruct.TopDownIndex2Block[topoInd];
1334  std::map<unsigned, unsigned>::iterator RegPos =
1335  LiveOutRegsNumUsages[PredID].find(Reg);
1336  if (RegPos != LiveOutRegsNumUsages[PredID].end()) {
1337  ++LiveOutRegsNumUsages[PredID][Reg];
1338  } else {
1339  LiveOutRegsNumUsages[PredID][Reg] = 1;
1340  }
1341  }
1342  }
1343 
1344  LastPosHighLatencyParentScheduled.resize(Blocks.size(), 0);
1345  BlockNumPredsLeft.resize(Blocks.size());
1346  BlockNumSuccsLeft.resize(Blocks.size());
1347 
1348  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1349  SIScheduleBlock *Block = Blocks[i];
1350  BlockNumPredsLeft[i] = Block->getPreds().size();
1351  BlockNumSuccsLeft[i] = Block->getSuccs().size();
1352  }
1353 
1354 #ifndef NDEBUG
1355  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1356  SIScheduleBlock *Block = Blocks[i];
1357  assert(Block->getID() == i);
1358  }
1359 #endif
1360 
1361  std::set<unsigned> InRegs = DAG->getInRegs();
1362  addLiveRegs(InRegs);
1363 
1364  // Fill LiveRegsConsumers for regs that were already
1365  // defined before scheduling.
1366  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1367  SIScheduleBlock *Block = Blocks[i];
1368  for (unsigned Reg : Block->getInRegs()) {
1369  bool Found = false;
1370  for (SIScheduleBlock* Pred: Block->getPreds()) {
1371  std::set<unsigned> PredOutRegs = Pred->getOutRegs();
1372  std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);
1373 
1374  if (RegPos != PredOutRegs.end()) {
1375  Found = true;
1376  break;
1377  }
1378  }
1379 
1380  if (!Found) {
1381  if (LiveRegsConsumers.find(Reg) == LiveRegsConsumers.end())
1382  LiveRegsConsumers[Reg] = 1;
1383  else
1384  ++LiveRegsConsumers[Reg];
1385  }
1386  }
1387  }
1388 
1389  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1390  SIScheduleBlock *Block = Blocks[i];
1391  if (BlockNumPredsLeft[i] == 0) {
1392  ReadyBlocks.push_back(Block);
1393  }
1394  }
1395 
1396  while (SIScheduleBlock *Block = pickBlock()) {
1397  BlocksScheduled.push_back(Block);
1398  blockScheduled(Block);
1399  }
1400 
1401  DEBUG(
1402  dbgs() << "Block Order:";
1403  for (SIScheduleBlock* Block : BlocksScheduled) {
1404  dbgs() << ' ' << Block->getID();
1405  }
1406  );
1407 }
1408 
1409 bool SIScheduleBlockScheduler::tryCandidateLatency(SIBlockSchedCandidate &Cand,
1410  SIBlockSchedCandidate &TryCand) {
1411  if (!Cand.isValid()) {
1412  TryCand.Reason = NodeOrder;
1413  return true;
1414  }
1415 
1416  // Try to hide high latencies.
1417  if (tryLess(TryCand.LastPosHighLatParentScheduled,
1418  Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))
1419  return true;
1420  // Schedule high latencies early so you can hide them better.
1421  if (tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,
1422  TryCand, Cand, Latency))
1423  return true;
1424  if (TryCand.IsHighLatency && tryGreater(TryCand.Height, Cand.Height,
1425  TryCand, Cand, Depth))
1426  return true;
1427  if (tryGreater(TryCand.NumHighLatencySuccessors,
1428  Cand.NumHighLatencySuccessors,
1429  TryCand, Cand, Successor))
1430  return true;
1431  return false;
1432 }
1433 
1434 bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,
1435  SIBlockSchedCandidate &TryCand) {
1436  if (!Cand.isValid()) {
1437  TryCand.Reason = NodeOrder;
1438  return true;
1439  }
1440 
1441  if (tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,
1442  TryCand, Cand, RegUsage))
1443  return true;
1444  if (tryGreater(TryCand.NumSuccessors > 0,
1445  Cand.NumSuccessors > 0,
1446  TryCand, Cand, Successor))
1447  return true;
1448  if (tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))
1449  return true;
1450  if (tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,
1451  TryCand, Cand, RegUsage))
1452  return true;
1453  return false;
1454 }
1455 
1456 SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {
1457  SIBlockSchedCandidate Cand;
1458  std::vector<SIScheduleBlock*>::iterator Best;
1459  SIScheduleBlock *Block;
1460  if (ReadyBlocks.empty())
1461  return nullptr;
1462 
1463  DAG->fillVgprSgprCost(LiveRegs.begin(), LiveRegs.end(),
1464  VregCurrentUsage, SregCurrentUsage);
1465  if (VregCurrentUsage > maxVregUsage)
1466  maxVregUsage = VregCurrentUsage;
1467  if (VregCurrentUsage > maxSregUsage)
1468  maxSregUsage = VregCurrentUsage;
1469  DEBUG(
1470  dbgs() << "Picking New Blocks\n";
1471  dbgs() << "Available: ";
1472  for (SIScheduleBlock* Block : ReadyBlocks)
1473  dbgs() << Block->getID() << ' ';
1474  dbgs() << "\nCurrent Live:\n";
1475  for (unsigned Reg : LiveRegs)
1476  dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';
1477  dbgs() << '\n';
1478  dbgs() << "Current VGPRs: " << VregCurrentUsage << '\n';
1479  dbgs() << "Current SGPRs: " << SregCurrentUsage << '\n';
1480  );
1481 
1482  Cand.Block = nullptr;
1483  for (std::vector<SIScheduleBlock*>::iterator I = ReadyBlocks.begin(),
1484  E = ReadyBlocks.end(); I != E; ++I) {
1485  SIBlockSchedCandidate TryCand;
1486  TryCand.Block = *I;
1487  TryCand.IsHighLatency = TryCand.Block->isHighLatencyBlock();
1488  TryCand.VGPRUsageDiff =
1489  checkRegUsageImpact(TryCand.Block->getInRegs(),
1490  TryCand.Block->getOutRegs())[DAG->getVGPRSetID()];
1491  TryCand.NumSuccessors = TryCand.Block->getSuccs().size();
1492  TryCand.NumHighLatencySuccessors =
1493  TryCand.Block->getNumHighLatencySuccessors();
1494  TryCand.LastPosHighLatParentScheduled =
1495  (unsigned int) std::max<int> (0,
1496  LastPosHighLatencyParentScheduled[TryCand.Block->getID()] -
1497  LastPosWaitedHighLatency);
1498  TryCand.Height = TryCand.Block->Height;
1499  // Try not to increase VGPR usage too much, else we may spill.
1500  if (VregCurrentUsage > 120 ||
1502  if (!tryCandidateRegUsage(Cand, TryCand) &&
1504  tryCandidateLatency(Cand, TryCand);
1505  } else {
1506  if (!tryCandidateLatency(Cand, TryCand))
1507  tryCandidateRegUsage(Cand, TryCand);
1508  }
1509  if (TryCand.Reason != NoCand) {
1510  Cand.setBest(TryCand);
1511  Best = I;
1512  DEBUG(dbgs() << "Best Current Choice: " << Cand.Block->getID() << ' '
1513  << getReasonStr(Cand.Reason) << '\n');
1514  }
1515  }
1516 
1517  DEBUG(
1518  dbgs() << "Picking: " << Cand.Block->getID() << '\n';
1519  dbgs() << "Is a block with high latency instruction: "
1520  << (Cand.IsHighLatency ? "yes\n" : "no\n");
1521  dbgs() << "Position of last high latency dependency: "
1522  << Cand.LastPosHighLatParentScheduled << '\n';
1523  dbgs() << "VGPRUsageDiff: " << Cand.VGPRUsageDiff << '\n';
1524  dbgs() << '\n';
1525  );
1526 
1527  Block = Cand.Block;
1528  ReadyBlocks.erase(Best);
1529  return Block;
1530 }
1531 
1532 // Tracking of currently alive registers to determine VGPR Usage.
1533 
1534 void SIScheduleBlockScheduler::addLiveRegs(std::set<unsigned> &Regs) {
1535  for (unsigned Reg : Regs) {
1536  // For now only track virtual registers.
1538  continue;
1539  // If not already in the live set, then add it.
1540  (void) LiveRegs.insert(Reg);
1541  }
1542 }
1543 
1544 void SIScheduleBlockScheduler::decreaseLiveRegs(SIScheduleBlock *Block,
1545  std::set<unsigned> &Regs) {
1546  for (unsigned Reg : Regs) {
1547  // For now only track virtual registers.
1548  std::set<unsigned>::iterator Pos = LiveRegs.find(Reg);
1549  assert (Pos != LiveRegs.end() && // Reg must be live.
1550  LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end() &&
1551  LiveRegsConsumers[Reg] >= 1);
1552  --LiveRegsConsumers[Reg];
1553  if (LiveRegsConsumers[Reg] == 0)
1554  LiveRegs.erase(Pos);
1555  }
1556 }
1557 
1558 void SIScheduleBlockScheduler::releaseBlockSuccs(SIScheduleBlock *Parent) {
1559  for (SIScheduleBlock* Block : Parent->getSuccs()) {
1560  --BlockNumPredsLeft[Block->getID()];
1561  if (BlockNumPredsLeft[Block->getID()] == 0) {
1562  ReadyBlocks.push_back(Block);
1563  }
1564  // TODO: Improve check. When the dependency between the high latency
1565  // instructions and the instructions of the other blocks are WAR or WAW
1566  // there will be no wait triggered. We would like these cases to not
1567  // update LastPosHighLatencyParentScheduled.
1568  if (Parent->isHighLatencyBlock())
1569  LastPosHighLatencyParentScheduled[Block->getID()] = NumBlockScheduled;
1570  }
1571 }
1572 
1573 void SIScheduleBlockScheduler::blockScheduled(SIScheduleBlock *Block) {
1574  decreaseLiveRegs(Block, Block->getInRegs());
1575  addLiveRegs(Block->getOutRegs());
1576  releaseBlockSuccs(Block);
1577  for (std::map<unsigned, unsigned>::iterator RegI =
1578  LiveOutRegsNumUsages[Block->getID()].begin(),
1579  E = LiveOutRegsNumUsages[Block->getID()].end(); RegI != E; ++RegI) {
1580  std::pair<unsigned, unsigned> RegP = *RegI;
1581  if (LiveRegsConsumers.find(RegP.first) == LiveRegsConsumers.end())
1582  LiveRegsConsumers[RegP.first] = RegP.second;
1583  else {
1584  assert(LiveRegsConsumers[RegP.first] == 0);
1585  LiveRegsConsumers[RegP.first] += RegP.second;
1586  }
1587  }
1588  if (LastPosHighLatencyParentScheduled[Block->getID()] >
1589  (unsigned)LastPosWaitedHighLatency)
1590  LastPosWaitedHighLatency =
1591  LastPosHighLatencyParentScheduled[Block->getID()];
1592  ++NumBlockScheduled;
1593 }
1594 
1595 std::vector<int>
1596 SIScheduleBlockScheduler::checkRegUsageImpact(std::set<unsigned> &InRegs,
1597  std::set<unsigned> &OutRegs) {
1598  std::vector<int> DiffSetPressure;
1599  DiffSetPressure.assign(DAG->getTRI()->getNumRegPressureSets(), 0);
1600 
1601  for (unsigned Reg : InRegs) {
1602  // For now only track virtual registers.
1604  continue;
1605  if (LiveRegsConsumers[Reg] > 1)
1606  continue;
1607  PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
1608  for (; PSetI.isValid(); ++PSetI) {
1609  DiffSetPressure[*PSetI] -= PSetI.getWeight();
1610  }
1611  }
1612 
1613  for (unsigned Reg : OutRegs) {
1614  // For now only track virtual registers.
1616  continue;
1617  PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
1618  for (; PSetI.isValid(); ++PSetI) {
1619  DiffSetPressure[*PSetI] += PSetI.getWeight();
1620  }
1621  }
1622 
1623  return DiffSetPressure;
1624 }
1625 
1626 // SIScheduler //
1627 
1628 struct SIScheduleBlockResult
1629 SIScheduler::scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,
1630  SISchedulerBlockSchedulerVariant ScheduleVariant) {
1631  SIScheduleBlocks Blocks = BlockCreator.getBlocks(BlockVariant);
1632  SIScheduleBlockScheduler Scheduler(DAG, ScheduleVariant, Blocks);
1633  std::vector<SIScheduleBlock*> ScheduledBlocks;
1634  struct SIScheduleBlockResult Res;
1635 
1636  ScheduledBlocks = Scheduler.getBlocks();
1637 
1638  for (unsigned b = 0; b < ScheduledBlocks.size(); ++b) {
1639  SIScheduleBlock *Block = ScheduledBlocks[b];
1640  std::vector<SUnit*> SUs = Block->getScheduledUnits();
1641 
1642  for (SUnit* SU : SUs)
1643  Res.SUs.push_back(SU->NodeNum);
1644  }
1645 
1646  Res.MaxSGPRUsage = Scheduler.getSGPRUsage();
1647  Res.MaxVGPRUsage = Scheduler.getVGPRUsage();
1648  return Res;
1649 }
1650 
1651 // SIScheduleDAGMI //
1652 
1655  SITII = static_cast<const SIInstrInfo*>(TII);
1656  SITRI = static_cast<const SIRegisterInfo*>(TRI);
1657 
1658  VGPRSetID = SITRI->getVGPRPressureSet();
1659  SGPRSetID = SITRI->getSGPRPressureSet();
1660 }
1661 
1663 
1664 // Code adapted from scheduleDAG.cpp
1665 // Does a topological sort over the SUs.
1666 // Both TopDown and BottomUp
1667 void SIScheduleDAGMI::topologicalSort() {
1669 
1670  TopDownIndex2SU = std::vector<int>(Topo.begin(), Topo.end());
1671  BottomUpIndex2SU = std::vector<int>(Topo.rbegin(), Topo.rend());
1672 }
1673 
1674 // Move low latencies further from their user without
1675 // increasing SGPR usage (in general)
1676 // This is to be replaced by a better pass that would
1677 // take into account SGPR usage (based on VGPR Usage
1678 // and the corresponding wavefront count), that would
1679 // try to merge groups of loads if it make sense, etc
1680 void SIScheduleDAGMI::moveLowLatencies() {
1681  unsigned DAGSize = SUnits.size();
1682  int LastLowLatencyUser = -1;
1683  int LastLowLatencyPos = -1;
1684 
1685  for (unsigned i = 0, e = ScheduledSUnits.size(); i != e; ++i) {
1686  SUnit *SU = &SUnits[ScheduledSUnits[i]];
1687  bool IsLowLatencyUser = false;
1688  unsigned MinPos = 0;
1689 
1690  for (SDep& PredDep : SU->Preds) {
1691  SUnit *Pred = PredDep.getSUnit();
1692  if (SITII->isLowLatencyInstruction(*Pred->getInstr())) {
1693  IsLowLatencyUser = true;
1694  }
1695  if (Pred->NodeNum >= DAGSize)
1696  continue;
1697  unsigned PredPos = ScheduledSUnitsInv[Pred->NodeNum];
1698  if (PredPos >= MinPos)
1699  MinPos = PredPos + 1;
1700  }
1701 
1702  if (SITII->isLowLatencyInstruction(*SU->getInstr())) {
1703  unsigned BestPos = LastLowLatencyUser + 1;
1704  if ((int)BestPos <= LastLowLatencyPos)
1705  BestPos = LastLowLatencyPos + 1;
1706  if (BestPos < MinPos)
1707  BestPos = MinPos;
1708  if (BestPos < i) {
1709  for (unsigned u = i; u > BestPos; --u) {
1710  ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];
1711  ScheduledSUnits[u] = ScheduledSUnits[u-1];
1712  }
1713  ScheduledSUnits[BestPos] = SU->NodeNum;
1714  ScheduledSUnitsInv[SU->NodeNum] = BestPos;
1715  }
1716  LastLowLatencyPos = BestPos;
1717  if (IsLowLatencyUser)
1718  LastLowLatencyUser = BestPos;
1719  } else if (IsLowLatencyUser) {
1720  LastLowLatencyUser = i;
1721  // Moves COPY instructions on which depends
1722  // the low latency instructions too.
1723  } else if (SU->getInstr()->getOpcode() == AMDGPU::COPY) {
1724  bool CopyForLowLat = false;
1725  for (SDep& SuccDep : SU->Succs) {
1726  SUnit *Succ = SuccDep.getSUnit();
1727  if (SITII->isLowLatencyInstruction(*Succ->getInstr())) {
1728  CopyForLowLat = true;
1729  }
1730  }
1731  if (!CopyForLowLat)
1732  continue;
1733  if (MinPos < i) {
1734  for (unsigned u = i; u > MinPos; --u) {
1735  ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];
1736  ScheduledSUnits[u] = ScheduledSUnits[u-1];
1737  }
1738  ScheduledSUnits[MinPos] = SU->NodeNum;
1739  ScheduledSUnitsInv[SU->NodeNum] = MinPos;
1740  }
1741  }
1742  }
1743 }
1744 
1746  for (unsigned i = 0, e = SUnits.size(); i != e; ++i) {
1747  SUnits[i].isScheduled = false;
1748  SUnits[i].WeakPredsLeft = SUnitsLinksBackup[i].WeakPredsLeft;
1749  SUnits[i].NumPredsLeft = SUnitsLinksBackup[i].NumPredsLeft;
1750  SUnits[i].WeakSuccsLeft = SUnitsLinksBackup[i].WeakSuccsLeft;
1751  SUnits[i].NumSuccsLeft = SUnitsLinksBackup[i].NumSuccsLeft;
1752  }
1753 }
1754 
1755 // Return the Vgpr and Sgpr usage corresponding to some virtual registers.
1756 template<typename _Iterator> void
1757 SIScheduleDAGMI::fillVgprSgprCost(_Iterator First, _Iterator End,
1758  unsigned &VgprUsage, unsigned &SgprUsage) {
1759  VgprUsage = 0;
1760  SgprUsage = 0;
1761  for (_Iterator RegI = First; RegI != End; ++RegI) {
1762  unsigned Reg = *RegI;
1763  // For now only track virtual registers
1765  continue;
1766  PSetIterator PSetI = MRI.getPressureSets(Reg);
1767  for (; PSetI.isValid(); ++PSetI) {
1768  if (*PSetI == VGPRSetID)
1769  VgprUsage += PSetI.getWeight();
1770  else if (*PSetI == SGPRSetID)
1771  SgprUsage += PSetI.getWeight();
1772  }
1773  }
1774 }
1775 
1777 {
1778  SmallVector<SUnit*, 8> TopRoots, BotRoots;
1779  SIScheduleBlockResult Best, Temp;
1780  DEBUG(dbgs() << "Preparing Scheduling\n");
1781 
1783  DEBUG(
1784  for(SUnit& SU : SUnits)
1785  SU.dumpAll(this)
1786  );
1787 
1788  topologicalSort();
1789  findRootsAndBiasEdges(TopRoots, BotRoots);
1790  // We reuse several ScheduleDAGMI and ScheduleDAGMILive
1791  // functions, but to make them happy we must initialize
1792  // the default Scheduler implementation (even if we do not
1793  // run it)
1794  SchedImpl->initialize(this);
1795  initQueues(TopRoots, BotRoots);
1796 
1797  // Fill some stats to help scheduling.
1798 
1799  SUnitsLinksBackup = SUnits;
1800  IsLowLatencySU.clear();
1801  LowLatencyOffset.clear();
1802  IsHighLatencySU.clear();
1803 
1804  IsLowLatencySU.resize(SUnits.size(), 0);
1805  LowLatencyOffset.resize(SUnits.size(), 0);
1806  IsHighLatencySU.resize(SUnits.size(), 0);
1807 
1808  for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {
1809  SUnit *SU = &SUnits[i];
1810  unsigned BaseLatReg;
1811  int64_t OffLatReg;
1812  if (SITII->isLowLatencyInstruction(*SU->getInstr())) {
1813  IsLowLatencySU[i] = 1;
1814  if (SITII->getMemOpBaseRegImmOfs(*SU->getInstr(), BaseLatReg, OffLatReg,
1815  TRI))
1816  LowLatencyOffset[i] = OffLatReg;
1817  } else if (SITII->isHighLatencyInstruction(*SU->getInstr()))
1818  IsHighLatencySU[i] = 1;
1819  }
1820 
1821  SIScheduler Scheduler(this);
1824 
1825  // if VGPR usage is extremely high, try other good performing variants
1826  // which could lead to lower VGPR usage
1827  if (Best.MaxVGPRUsage > 180) {
1828  std::vector<std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant>> Variants = {
1830 // { LatenciesAlone, BlockRegUsage },
1832 // { LatenciesGrouped, BlockRegUsageLatency },
1833 // { LatenciesGrouped, BlockRegUsage },
1835 // { LatenciesAlonePlusConsecutive, BlockRegUsageLatency },
1836 // { LatenciesAlonePlusConsecutive, BlockRegUsage }
1837  };
1838  for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {
1839  Temp = Scheduler.scheduleVariant(v.first, v.second);
1840  if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)
1841  Best = Temp;
1842  }
1843  }
1844  // if VGPR usage is still extremely high, we may spill. Try other variants
1845  // which are less performing, but that could lead to lower VGPR usage.
1846  if (Best.MaxVGPRUsage > 200) {
1847  std::vector<std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant>> Variants = {
1848 // { LatenciesAlone, BlockRegUsageLatency },
1850 // { LatenciesGrouped, BlockLatencyRegUsage },
1853 // { LatenciesAlonePlusConsecutive, BlockLatencyRegUsage },
1856  };
1857  for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {
1858  Temp = Scheduler.scheduleVariant(v.first, v.second);
1859  if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)
1860  Best = Temp;
1861  }
1862  }
1863 
1864  ScheduledSUnits = Best.SUs;
1865  ScheduledSUnitsInv.resize(SUnits.size());
1866 
1867  for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {
1868  ScheduledSUnitsInv[ScheduledSUnits[i]] = i;
1869  }
1870 
1871  moveLowLatencies();
1872 
1873  // Tell the outside world about the result of the scheduling.
1874 
1875  assert(TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker");
1877 
1878  for (std::vector<unsigned>::iterator I = ScheduledSUnits.begin(),
1879  E = ScheduledSUnits.end(); I != E; ++I) {
1880  SUnit *SU = &SUnits[*I];
1881 
1882  scheduleMI(SU, true);
1883 
1884  DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "
1885  << *SU->getInstr());
1886  }
1887 
1888  assert(CurrentTop == CurrentBottom && "Nonempty unscheduled zone.");
1889 
1890  placeDebugValues();
1891 
1892  DEBUG({
1893  unsigned BBNum = begin()->getParent()->getNumber();
1894  dbgs() << "*** Final schedule for BB#" << BBNum << " ***\n";
1895  dumpSchedule();
1896  dbgs() << '\n';
1897  });
1898 }
const_iterator end(StringRef path)
Get end iterator over path.
Definition: Path.cpp:241
Interface definition for SIRegisterInfo.
virtual unsigned getNumRegPressureSets() const =0
Get the number of dimensions of register pressure.
std::vector< SIScheduleBlock * > Blocks
SIScheduleDAGMI(MachineSchedContext *C)
size_t i
ScheduleDAGTopologicalSort Topo
Topo - A topological ordering for SUnits which permits fast IsReachable and similar queries...
MachineBasicBlock::iterator CurrentTop
The top of the unscheduled zone.
SIScheduleCandReason Reason
MachineInstr * getInstr() const
getInstr - Return the representative MachineInstr for this SUnit.
Definition: ScheduleDAG.h:389
static bool isVirtualRegister(unsigned Reg)
Return true if the specified register number is in the virtual register namespace.
std::vector< unsigned > IsLowLatencySU
void addUnit(SUnit *SU)
Functions for Block construction.
std::vector< SIScheduleBlock * > getBlocks()
static MachineBasicBlock::iterator nextIfDebug(MachineBasicBlock::iterator I, MachineBasicBlock::const_iterator End)
Non-const version.
MachineBasicBlock::iterator begin() const
begin - Return an iterator to the top of the current scheduling region.
SlotIndex getInstructionIndex(const MachineInstr &Instr) const
Returns the base index of the given instruction.
std::unique_ptr< MachineSchedStrategy > SchedImpl
SmallVector< SDep, 4 > Preds
Definition: ScheduleDAG.h:258
MachineBasicBlock::const_iterator getPos() const
Get the MI position corresponding to this register pressure.
ScheduleDAGMILive is an implementation of ScheduleDAGInstrs that schedules machine instructions while...
void dumpAll(const ScheduleDAG *G) const
bool isScheduled
Definition: ScheduleDAG.h:286
std::vector< unsigned > LowLatencyOffset
static bool tryGreater(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
bool isWeak() const
isWeak - Test if this a weak dependence.
Definition: ScheduleDAG.h:193
bool none_of(R &&Range, UnaryPredicate P)
Provide wrappers to std::none_of which take ranges instead of having to pass begin/end explicitly...
Definition: STLExtras.h:750
Reg
All possible values of the reg field in the ModR/M byte.
void InitDAGTopologicalSorting()
InitDAGTopologicalSorting - create the initial topological ordering from the DAG to be scheduled...
defusechain_iterator - This class provides iterator support for machine operands in the function that...
static const char * getReasonStr(SIScheduleCandReason Reason)
void scheduleMI(SUnit *SU, bool IsTopNode)
Move an instruction and update register pressure.
std::set< unsigned > getInRegs()
static bool tryLess(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
MachineBasicBlock::iterator RegionBegin
The beginning of the range to be scheduled.
const RegList & Regs
void buildDAGWithRegPressure()
Call ScheduleDAGInstrs::buildSchedGraph with register pressure tracking enabled.
static def_instr_iterator def_instr_end()
void schedule() override
Implement ScheduleDAGInstrs interface for scheduling a sequence of reorderable instructions.
unsigned NumPredsLeft
Definition: ScheduleDAG.h:270
static GCRegistry::Add< OcamlGC > B("ocaml","ocaml 3.10-compatible GC")
unsigned getSGPRSetID() const
std::vector< int > TopDownIndex2Block
struct SIScheduleBlockResult scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant, SISchedulerBlockSchedulerVariant ScheduleVariant)
unsigned getVGPRPressureSet() const
unsigned getOpcode() const
Returns the opcode of this MachineInstr.
Definition: MachineInstr.h:273
static GCRegistry::Add< CoreCLRGC > E("coreclr","CoreCLR-compatible GC")
SI Machine Scheduler interface.
bool isDebugValue() const
Definition: MachineInstr.h:777
SDep - Scheduling dependency.
Definition: ScheduleDAG.h:45
void addLiveRegs(ArrayRef< RegisterMaskPair > Regs)
Force liveness of virtual registers or physical register units.
#define P(N)
bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg, int64_t &Offset, const TargetRegisterInfo *TRI) const final
void initQueues(ArrayRef< SUnit * > TopRoots, ArrayRef< SUnit * > BotRoots)
Release ExitSU predecessors and setup scheduler queues.
unsigned const MachineRegisterInfo * MRI
~SIScheduleDAGMI() override
RegisterPressure computed within a region of instructions delimited by TopIdx and BottomIdx...
std::enable_if<!std::is_array< T >::value, std::unique_ptr< T > >::type make_unique(Args &&...args)
Constructs a new T() with the given args and returns a unique_ptr<T> which owns the object...
Definition: STLExtras.h:845
bool isSUInBlock(SUnit *SU, unsigned ID)
void dumpSchedule() const
dump the scheduled Sequence.
std::vector< SUnit * > getScheduledUnits()
std::vector< int > TopDownIndex2SU
MachineRegisterInfo * getMRI()
MachineBasicBlock * getBB()
static const unsigned End
Track the current register pressure at some position in the instruction stream, and remember the high...
LiveIntervals * getLIS()
void findRootsAndBiasEdges(SmallVectorImpl< SUnit * > &TopRoots, SmallVectorImpl< SUnit * > &BotRoots)
for(unsigned i=0, e=MI->getNumOperands();i!=e;++i)
const TargetRegisterInfo * getTRI()
SIScheduleBlocks getBlocks(SISchedulerBlockCreatorVariant BlockVariant)
unsigned getID() const
bool isHighLatencyInstruction(const MachineInstr &MI) const
MachineBasicBlock::iterator getCurrentBottom()
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
void schedule(MachineBasicBlock::iterator BeginBlock, MachineBasicBlock::iterator EndBlock)
unsigned WeakPredsLeft
Definition: ScheduleDAG.h:272
std::vector< unsigned > MaxSetPressure
Map of max reg pressure indexed by pressure set ID, not class ID.
void advance()
Advance across the current instruction.
std::set< unsigned > & getOutRegs()
Color
A "color", which is either even or odd.
auto find(R &&Range, const T &Val) -> decltype(std::begin(Range))
Provide wrappers to std::find which take ranges instead of having to pass begin/end explicitly...
Definition: STLExtras.h:757
GenericScheduler shrinks the unscheduled zone using heuristics to balance the schedule.
void addSucc(SIScheduleBlock *Succ)
std::vector< unsigned > IsHighLatencySU
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small...
Definition: SmallVector.h:843
std::vector< unsigned > SUs
SISchedulerBlockCreatorVariant
unsigned getWeight() const
SIScheduleBlockCreator(SIScheduleDAGMI *DAG)
static GCRegistry::Add< ShadowStackGC > C("shadow-stack","Very portable GC for uncooperative code generators")
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:132
MachineBasicBlock::iterator getCurrentTop()
void fillVgprSgprCost(_Iterator First, _Iterator End, unsigned &VgprUsage, unsigned &SgprUsage)
machine Machine Instruction Scheduler
const std::vector< SIScheduleBlock * > & getSuccs() const
unsigned getVGPRSetID() const
MachineRegisterInfo - Keep track of information for virtual and physical registers, including vreg register classes, use/def chains for registers, etc.
void closeTop()
Set the boundary for the top of the region and summarize live ins.
Representation of each machine instruction.
Definition: MachineInstr.h:52
Interface definition for SIInstrInfo.
void addPred(SIScheduleBlock *Pred)
const TargetRegisterInfo * TRI
Definition: ScheduleDAG.h:580
def_instr_iterator def_instr_begin(unsigned RegNo) const
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
std::vector< int > BottomUpIndex2SU
SUnit * getSUnit() const
Definition: ScheduleDAG.h:503
Iterate over the pressure sets affected by the given physical or virtual register.
PSetIterator getPressureSets(unsigned RegUnit) const
Get an iterator over the pressure sets affected by the given physical or virtual register.
MachineSchedContext provides enough context from the MachineScheduler pass for the target to instanti...
#define I(x, y, z)
Definition: MD5.cpp:54
static bool isDefBetween(unsigned Reg, SlotIndex First, SlotIndex Last, const MachineRegisterInfo *MRI, const LiveIntervals *LIS)
bool isLowLatencyInstruction(const MachineInstr &MI) const
void placeDebugValues()
Reinsert debug_values recorded in ScheduleDAGInstrs::DbgValues.
const TargetInstrInfo * TII
Definition: ScheduleDAG.h:579
unsigned getSGPRPressureSet() const
unsigned NodeNum
Definition: ScheduleDAG.h:266
void initRPTracker(RegPressureTracker &RPTracker)
SlotIndex getRegSlot(bool EC=false) const
Returns the register use/def slot in the current instruction for a normal or early-clobber def...
Definition: SlotIndexes.h:247
MachineBasicBlock::iterator CurrentBottom
The bottom of the unscheduled zone.
std::set< unsigned > & getInRegs()
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
std::vector< int > TopDownBlock2Index
void setRepeat(SIScheduleCandReason R)
const std::vector< SIScheduleBlock * > & getPreds() const
SmallVector< SDep, 4 > Succs
Definition: ScheduleDAG.h:259
#define DEBUG(X)
Definition: Debug.h:100
void getDownwardPressure(const MachineInstr *MI, std::vector< unsigned > &PressureResult, std::vector< unsigned > &MaxPressureResult)
Get the pressure of each PSet after traversing this instruction top-down.
IRTranslator LLVM IR MI
bool canAddEdge(SUnit *SuccSU, SUnit *PredSU)
True if an edge can be added from PredSU to SuccSU without creating a cycle.
void setPos(MachineBasicBlock::const_iterator Pos)
std::vector< SUnit > SUnits
Definition: ScheduleDAG.h:583
SlotIndex - An opaque wrapper around machine indexes.
Definition: SlotIndexes.h:76
Printable PrintVRegOrUnit(unsigned VRegOrUnit, const TargetRegisterInfo *TRI)
Create Printable object to print virtual registers and physical registers on a raw_ostream.
RegPressureTracker TopRPTracker
SISchedulerBlockSchedulerVariant
void dump(const ScheduleDAG *G) const
SUnit - Scheduling unit.
void handleMove(MachineInstr &MI, bool UpdateFlags=false)
handleMove - call this method to notify LiveIntervals that instruction 'mi' has been moved within a b...
SUnit - Scheduling unit. This is a node in the scheduling DAG.
Definition: ScheduleDAG.h:244