LLVM 19.0.0git
SIMachineScheduler.cpp
Go to the documentation of this file.
1//===-- SIMachineScheduler.cpp - SI Scheduler Interface -------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// SI Machine Scheduler interface
11//
12//===----------------------------------------------------------------------===//
13
14#include "SIMachineScheduler.h"
16#include "SIInstrInfo.h"
19
20using namespace llvm;
21
22#define DEBUG_TYPE "machine-scheduler"
23
24// This scheduler implements a different scheduling algorithm than
25// GenericScheduler.
26//
27// There are several specific architecture behaviours that can't be modelled
28// for GenericScheduler:
29// . When accessing the result of an SGPR load instruction, you have to wait
30// for all the SGPR load instructions before your current instruction to
31// have finished.
32// . When accessing the result of an VGPR load instruction, you have to wait
33// for all the VGPR load instructions previous to the VGPR load instruction
34// you are interested in to finish.
35// . The less the register pressure, the best load latencies are hidden
36//
37// Moreover some specifities (like the fact a lot of instructions in the shader
38// have few dependencies) makes the generic scheduler have some unpredictable
39// behaviours. For example when register pressure becomes high, it can either
40// manage to prevent register pressure from going too high, or it can
41// increase register pressure even more than if it hadn't taken register
42// pressure into account.
43//
44// Also some other bad behaviours are generated, like loading at the beginning
45// of the shader a constant in VGPR you won't need until the end of the shader.
46//
47// The scheduling problem for SI can distinguish three main parts:
48// . Hiding high latencies (texture sampling, etc)
49// . Hiding low latencies (SGPR constant loading, etc)
50// . Keeping register usage low for better latency hiding and general
51// performance
52//
53// Some other things can also affect performance, but are hard to predict
54// (cache usage, the fact the HW can issue several instructions from different
55// wavefronts if different types, etc)
56//
57// This scheduler tries to solve the scheduling problem by dividing it into
58// simpler sub-problems. It divides the instructions into blocks, schedules
59// locally inside the blocks where it takes care of low latencies, and then
60// chooses the order of the blocks by taking care of high latencies.
61// Dividing the instructions into blocks helps control keeping register
62// usage low.
63//
64// First the instructions are put into blocks.
65// We want the blocks help control register usage and hide high latencies
66// later. To help control register usage, we typically want all local
67// computations, when for example you create a result that can be consumed
68// right away, to be contained in a block. Block inputs and outputs would
69// typically be important results that are needed in several locations of
70// the shader. Since we do want blocks to help hide high latencies, we want
71// the instructions inside the block to have a minimal set of dependencies
72// on high latencies. It will make it easy to pick blocks to hide specific
73// high latencies.
74// The block creation algorithm is divided into several steps, and several
75// variants can be tried during the scheduling process.
76//
77// Second the order of the instructions inside the blocks is chosen.
78// At that step we do take into account only register usage and hiding
79// low latency instructions
80//
81// Third the block order is chosen, there we try to hide high latencies
82// and keep register usage low.
83//
84// After the third step, a pass is done to improve the hiding of low
85// latencies.
86//
87// Actually when talking about 'low latency' or 'high latency' it includes
88// both the latency to get the cache (or global mem) data go to the register,
89// and the bandwidth limitations.
90// Increasing the number of active wavefronts helps hide the former, but it
91// doesn't solve the latter, thus why even if wavefront count is high, we have
92// to try have as many instructions hiding high latencies as possible.
93// The OpenCL doc says for example latency of 400 cycles for a global mem
94// access, which is hidden by 10 instructions if the wavefront count is 10.
95
96// Some figures taken from AMD docs:
97// Both texture and constant L1 caches are 4-way associative with 64 bytes
98// lines.
99// Constant cache is shared with 4 CUs.
100// For texture sampling, the address generation unit receives 4 texture
101// addresses per cycle, thus we could expect texture sampling latency to be
102// equivalent to 4 instructions in the very best case (a VGPR is 64 work items,
103// instructions in a wavefront group are executed every 4 cycles),
104// or 16 instructions if the other wavefronts associated to the 3 other VALUs
105// of the CU do texture sampling too. (Don't take these figures too seriously,
106// as I'm not 100% sure of the computation)
107// Data exports should get similar latency.
108// For constant loading, the cache is shader with 4 CUs.
109// The doc says "a throughput of 16B/cycle for each of the 4 Compute Unit"
110// I guess if the other CU don't read the cache, it can go up to 64B/cycle.
111// It means a simple s_buffer_load should take one instruction to hide, as
112// well as a s_buffer_loadx2 and potentially a s_buffer_loadx8 if on the same
113// cache line.
114//
115// As of today the driver doesn't preload the constants in cache, thus the
116// first loads get extra latency. The doc says global memory access can be
117// 300-600 cycles. We do not specially take that into account when scheduling
118// As we expect the driver to be able to preload the constants soon.
119
120// common code //
121
122#ifndef NDEBUG
123
124static const char *getReasonStr(SIScheduleCandReason Reason) {
125 switch (Reason) {
126 case NoCand: return "NOCAND";
127 case RegUsage: return "REGUSAGE";
128 case Latency: return "LATENCY";
129 case Successor: return "SUCCESSOR";
130 case Depth: return "DEPTH";
131 case NodeOrder: return "ORDER";
132 }
133 llvm_unreachable("Unknown reason!");
134}
135
136#endif
137
138namespace llvm::SISched {
139static bool tryLess(int TryVal, int CandVal,
140 SISchedulerCandidate &TryCand,
142 SIScheduleCandReason Reason) {
143 if (TryVal < CandVal) {
144 TryCand.Reason = Reason;
145 return true;
146 }
147 if (TryVal > CandVal) {
148 if (Cand.Reason > Reason)
149 Cand.Reason = Reason;
150 return true;
151 }
152 Cand.setRepeat(Reason);
153 return false;
154}
155
156static bool tryGreater(int TryVal, int CandVal,
157 SISchedulerCandidate &TryCand,
159 SIScheduleCandReason Reason) {
160 if (TryVal > CandVal) {
161 TryCand.Reason = Reason;
162 return true;
163 }
164 if (TryVal < CandVal) {
165 if (Cand.Reason > Reason)
166 Cand.Reason = Reason;
167 return true;
168 }
169 Cand.setRepeat(Reason);
170 return false;
171}
172} // end namespace llvm::SISched
173
174// SIScheduleBlock //
175
177 NodeNum2Index[SU->NodeNum] = SUnits.size();
178 SUnits.push_back(SU);
179}
180
181#ifndef NDEBUG
182void SIScheduleBlock::traceCandidate(const SISchedCandidate &Cand) {
183
184 dbgs() << " SU(" << Cand.SU->NodeNum << ") " << getReasonStr(Cand.Reason);
185 dbgs() << '\n';
186}
187#endif
188
189void SIScheduleBlock::tryCandidateTopDown(SISchedCandidate &Cand,
190 SISchedCandidate &TryCand) {
191 // Initialize the candidate if needed.
192 if (!Cand.isValid()) {
193 TryCand.Reason = NodeOrder;
194 return;
195 }
196
197 if (Cand.SGPRUsage > 60 &&
198 SISched::tryLess(TryCand.SGPRUsage, Cand.SGPRUsage,
199 TryCand, Cand, RegUsage))
200 return;
201
202 // Schedule low latency instructions as top as possible.
203 // Order of priority is:
204 // . Low latency instructions which do not depend on other low latency
205 // instructions we haven't waited for
206 // . Other instructions which do not depend on low latency instructions
207 // we haven't waited for
208 // . Low latencies
209 // . All other instructions
210 // Goal is to get: low latency instructions - independent instructions
211 // - (eventually some more low latency instructions)
212 // - instructions that depend on the first low latency instructions.
213 // If in the block there is a lot of constant loads, the SGPR usage
214 // could go quite high, thus above the arbitrary limit of 60 will encourage
215 // use the already loaded constants (in order to release some SGPRs) before
216 // loading more.
217 if (SISched::tryLess(TryCand.HasLowLatencyNonWaitedParent,
218 Cand.HasLowLatencyNonWaitedParent,
219 TryCand, Cand, SIScheduleCandReason::Depth))
220 return;
221
222 if (SISched::tryGreater(TryCand.IsLowLatency, Cand.IsLowLatency,
223 TryCand, Cand, SIScheduleCandReason::Depth))
224 return;
225
226 if (TryCand.IsLowLatency &&
227 SISched::tryLess(TryCand.LowLatencyOffset, Cand.LowLatencyOffset,
228 TryCand, Cand, SIScheduleCandReason::Depth))
229 return;
230
231 if (SISched::tryLess(TryCand.VGPRUsage, Cand.VGPRUsage,
232 TryCand, Cand, RegUsage))
233 return;
234
235 // Fall through to original instruction order.
236 if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {
237 TryCand.Reason = NodeOrder;
238 }
239}
240
241SUnit* SIScheduleBlock::pickNode() {
242 SISchedCandidate TopCand;
243
244 for (SUnit* SU : TopReadySUs) {
245 SISchedCandidate TryCand;
246 std::vector<unsigned> pressure;
247 std::vector<unsigned> MaxPressure;
248 // Predict register usage after this instruction.
249 TryCand.SU = SU;
250 TopRPTracker.getDownwardPressure(SU->getInstr(), pressure, MaxPressure);
251 TryCand.SGPRUsage = pressure[AMDGPU::RegisterPressureSets::SReg_32];
252 TryCand.VGPRUsage = pressure[AMDGPU::RegisterPressureSets::VGPR_32];
253 TryCand.IsLowLatency = DAG->IsLowLatencySU[SU->NodeNum];
254 TryCand.LowLatencyOffset = DAG->LowLatencyOffset[SU->NodeNum];
255 TryCand.HasLowLatencyNonWaitedParent =
256 HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]];
257 tryCandidateTopDown(TopCand, TryCand);
258 if (TryCand.Reason != NoCand)
259 TopCand.setBest(TryCand);
260 }
261
262 return TopCand.SU;
263}
264
265
266// Schedule something valid.
268 TopReadySUs.clear();
269 if (Scheduled)
270 undoSchedule();
271
272 for (SUnit* SU : SUnits) {
273 if (!SU->NumPredsLeft)
274 TopReadySUs.push_back(SU);
275 }
276
277 while (!TopReadySUs.empty()) {
278 SUnit *SU = TopReadySUs[0];
279 ScheduledSUnits.push_back(SU);
280 nodeScheduled(SU);
281 }
282
283 Scheduled = true;
284}
285
286// Returns if the register was set between first and last.
287static bool isDefBetween(unsigned Reg,
290 const LiveIntervals *LIS) {
292 UI = MRI->def_instr_begin(Reg),
293 UE = MRI->def_instr_end(); UI != UE; ++UI) {
294 const MachineInstr* MI = &*UI;
295 if (MI->isDebugValue())
296 continue;
297 SlotIndex InstSlot = LIS->getInstructionIndex(*MI).getRegSlot();
298 if (InstSlot >= First && InstSlot <= Last)
299 return true;
300 }
301 return false;
302}
303
304void SIScheduleBlock::initRegPressure(MachineBasicBlock::iterator BeginBlock,
306 IntervalPressure Pressure, BotPressure;
307 RegPressureTracker RPTracker(Pressure), BotRPTracker(BotPressure);
308 LiveIntervals *LIS = DAG->getLIS();
310 DAG->initRPTracker(TopRPTracker);
311 DAG->initRPTracker(BotRPTracker);
312 DAG->initRPTracker(RPTracker);
313
314 // Goes though all SU. RPTracker captures what had to be alive for the SUs
315 // to execute, and what is still alive at the end.
316 for (SUnit* SU : ScheduledSUnits) {
317 RPTracker.setPos(SU->getInstr());
318 RPTracker.advance();
319 }
320
321 // Close the RPTracker to finalize live ins/outs.
322 RPTracker.closeRegion();
323
324 // Initialize the live ins and live outs.
325 TopRPTracker.addLiveRegs(RPTracker.getPressure().LiveInRegs);
326 BotRPTracker.addLiveRegs(RPTracker.getPressure().LiveOutRegs);
327
328 // Do not Track Physical Registers, because it messes up.
329 for (const auto &RegMaskPair : RPTracker.getPressure().LiveInRegs) {
330 if (RegMaskPair.RegUnit.isVirtual())
331 LiveInRegs.insert(RegMaskPair.RegUnit);
332 }
333 LiveOutRegs.clear();
334 // There is several possibilities to distinguish:
335 // 1) Reg is not input to any instruction in the block, but is output of one
336 // 2) 1) + read in the block and not needed after it
337 // 3) 1) + read in the block but needed in another block
338 // 4) Reg is input of an instruction but another block will read it too
339 // 5) Reg is input of an instruction and then rewritten in the block.
340 // result is not read in the block (implies used in another block)
341 // 6) Reg is input of an instruction and then rewritten in the block.
342 // result is read in the block and not needed in another block
343 // 7) Reg is input of an instruction and then rewritten in the block.
344 // result is read in the block but also needed in another block
345 // LiveInRegs will contains all the regs in situation 4, 5, 6, 7
346 // We want LiveOutRegs to contain only Regs whose content will be read after
347 // in another block, and whose content was written in the current block,
348 // that is we want it to get 1, 3, 5, 7
349 // Since we made the MIs of a block to be packed all together before
350 // scheduling, then the LiveIntervals were correct, and the RPTracker was
351 // able to correctly handle 5 vs 6, 2 vs 3.
352 // (Note: This is not sufficient for RPTracker to not do mistakes for case 4)
353 // The RPTracker's LiveOutRegs has 1, 3, (some correct or incorrect)4, 5, 7
354 // Comparing to LiveInRegs is not sufficient to differentiate 4 vs 5, 7
355 // The use of findDefBetween removes the case 4.
356 for (const auto &RegMaskPair : RPTracker.getPressure().LiveOutRegs) {
357 Register Reg = RegMaskPair.RegUnit;
358 if (Reg.isVirtual() &&
359 isDefBetween(Reg, LIS->getInstructionIndex(*BeginBlock).getRegSlot(),
360 LIS->getInstructionIndex(*EndBlock).getRegSlot(), MRI,
361 LIS)) {
362 LiveOutRegs.insert(Reg);
363 }
364 }
365
366 // Pressure = sum_alive_registers register size
367 // Internally llvm will represent some registers as big 128 bits registers
368 // for example, but they actually correspond to 4 actual 32 bits registers.
369 // Thus Pressure is not equal to num_alive_registers * constant.
370 LiveInPressure = TopPressure.MaxSetPressure;
371 LiveOutPressure = BotPressure.MaxSetPressure;
372
373 // Prepares TopRPTracker for top down scheduling.
374 TopRPTracker.closeTop();
375}
376
379 if (!Scheduled)
380 fastSchedule();
381
382 // PreScheduling phase to set LiveIn and LiveOut.
383 initRegPressure(BeginBlock, EndBlock);
384 undoSchedule();
385
386 // Schedule for real now.
387
388 TopReadySUs.clear();
389
390 for (SUnit* SU : SUnits) {
391 if (!SU->NumPredsLeft)
392 TopReadySUs.push_back(SU);
393 }
394
395 while (!TopReadySUs.empty()) {
396 SUnit *SU = pickNode();
397 ScheduledSUnits.push_back(SU);
398 TopRPTracker.setPos(SU->getInstr());
399 TopRPTracker.advance();
400 nodeScheduled(SU);
401 }
402
403 // TODO: compute InternalAdditionalPressure.
404 InternalAdditionalPressure.resize(TopPressure.MaxSetPressure.size());
405
406 // Check everything is right.
407#ifndef NDEBUG
408 assert(SUnits.size() == ScheduledSUnits.size() &&
409 TopReadySUs.empty());
410 for (SUnit* SU : SUnits) {
411 assert(SU->isScheduled &&
412 SU->NumPredsLeft == 0);
413 }
414#endif
415
416 Scheduled = true;
417}
418
419void SIScheduleBlock::undoSchedule() {
420 for (SUnit* SU : SUnits) {
421 SU->isScheduled = false;
422 for (SDep& Succ : SU->Succs) {
423 if (BC->isSUInBlock(Succ.getSUnit(), ID))
424 undoReleaseSucc(SU, &Succ);
425 }
426 }
427 HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);
428 ScheduledSUnits.clear();
429 Scheduled = false;
430}
431
432void SIScheduleBlock::undoReleaseSucc(SUnit *SU, SDep *SuccEdge) {
433 SUnit *SuccSU = SuccEdge->getSUnit();
434
435 if (SuccEdge->isWeak()) {
436 ++SuccSU->WeakPredsLeft;
437 return;
438 }
439 ++SuccSU->NumPredsLeft;
440}
441
442void SIScheduleBlock::releaseSucc(SUnit *SU, SDep *SuccEdge) {
443 SUnit *SuccSU = SuccEdge->getSUnit();
444
445 if (SuccEdge->isWeak()) {
446 --SuccSU->WeakPredsLeft;
447 return;
448 }
449#ifndef NDEBUG
450 if (SuccSU->NumPredsLeft == 0) {
451 dbgs() << "*** Scheduling failed! ***\n";
452 DAG->dumpNode(*SuccSU);
453 dbgs() << " has been released too many times!\n";
454 llvm_unreachable(nullptr);
455 }
456#endif
457
458 --SuccSU->NumPredsLeft;
459}
460
461/// Release Successors of the SU that are in the block or not.
462void SIScheduleBlock::releaseSuccessors(SUnit *SU, bool InOrOutBlock) {
463 for (SDep& Succ : SU->Succs) {
464 SUnit *SuccSU = Succ.getSUnit();
465
466 if (SuccSU->NodeNum >= DAG->SUnits.size())
467 continue;
468
469 if (BC->isSUInBlock(SuccSU, ID) != InOrOutBlock)
470 continue;
471
472 releaseSucc(SU, &Succ);
473 if (SuccSU->NumPredsLeft == 0 && InOrOutBlock)
474 TopReadySUs.push_back(SuccSU);
475 }
476}
477
478void SIScheduleBlock::nodeScheduled(SUnit *SU) {
479 // Is in TopReadySUs
480 assert (!SU->NumPredsLeft);
481 std::vector<SUnit *>::iterator I = llvm::find(TopReadySUs, SU);
482 if (I == TopReadySUs.end()) {
483 dbgs() << "Data Structure Bug in SI Scheduler\n";
484 llvm_unreachable(nullptr);
485 }
486 TopReadySUs.erase(I);
487
488 releaseSuccessors(SU, true);
489 // Scheduling this node will trigger a wait,
490 // thus propagate to other instructions that they do not need to wait either.
491 if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]])
492 HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);
493
494 if (DAG->IsLowLatencySU[SU->NodeNum]) {
495 for (SDep& Succ : SU->Succs) {
496 std::map<unsigned, unsigned>::iterator I =
497 NodeNum2Index.find(Succ.getSUnit()->NodeNum);
498 if (I != NodeNum2Index.end())
499 HasLowLatencyNonWaitedParent[I->second] = 1;
500 }
501 }
502 SU->isScheduled = true;
503}
504
506 // We remove links from outside blocks to enable scheduling inside the block.
507 for (SUnit* SU : SUnits) {
508 releaseSuccessors(SU, false);
509 if (DAG->IsHighLatencySU[SU->NodeNum])
510 HighLatencyBlock = true;
511 }
512 HasLowLatencyNonWaitedParent.resize(SUnits.size(), 0);
513}
514
515// we maintain ascending order of IDs
517 unsigned PredID = Pred->getID();
518
519 // Check if not already predecessor.
520 for (SIScheduleBlock* P : Preds) {
521 if (PredID == P->getID())
522 return;
523 }
524 Preds.push_back(Pred);
525
526 assert(none_of(Succs,
527 [=](std::pair<SIScheduleBlock*,
529 return PredID == S.first->getID();
530 }) &&
531 "Loop in the Block Graph!");
532}
533
536 unsigned SuccID = Succ->getID();
537
538 // Check if not already predecessor.
539 for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> &S : Succs) {
540 if (SuccID == S.first->getID()) {
541 if (S.second == SIScheduleBlockLinkKind::NoData &&
543 S.second = Kind;
544 return;
545 }
546 }
547 if (Succ->isHighLatencyBlock())
548 ++NumHighLatencySuccessors;
549 Succs.push_back(std::pair(Succ, Kind));
550
551 assert(none_of(Preds,
552 [=](SIScheduleBlock *P) { return SuccID == P->getID(); }) &&
553 "Loop in the Block Graph!");
554}
555
556#ifndef NDEBUG
558 dbgs() << "Block (" << ID << ")\n";
559 if (!full)
560 return;
561
562 dbgs() << "\nContains High Latency Instruction: "
563 << HighLatencyBlock << '\n';
564 dbgs() << "\nDepends On:\n";
565 for (SIScheduleBlock* P : Preds) {
566 P->printDebug(false);
567 }
568
569 dbgs() << "\nSuccessors:\n";
570 for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> S : Succs) {
571 if (S.second == SIScheduleBlockLinkKind::Data)
572 dbgs() << "(Data Dep) ";
573 S.first->printDebug(false);
574 }
575
576 if (Scheduled) {
577 dbgs() << "LiveInPressure "
578 << LiveInPressure[AMDGPU::RegisterPressureSets::SReg_32] << ' '
579 << LiveInPressure[AMDGPU::RegisterPressureSets::VGPR_32] << '\n';
580 dbgs() << "LiveOutPressure "
581 << LiveOutPressure[AMDGPU::RegisterPressureSets::SReg_32] << ' '
582 << LiveOutPressure[AMDGPU::RegisterPressureSets::VGPR_32] << "\n\n";
583 dbgs() << "LiveIns:\n";
584 for (unsigned Reg : LiveInRegs)
585 dbgs() << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';
586
587 dbgs() << "\nLiveOuts:\n";
588 for (unsigned Reg : LiveOutRegs)
589 dbgs() << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';
590 }
591
592 dbgs() << "\nInstructions:\n";
593 for (const SUnit* SU : SUnits)
594 DAG->dumpNode(*SU);
595
596 dbgs() << "///////////////////////\n";
597}
598#endif
599
600// SIScheduleBlockCreator //
601
603 : DAG(DAG) {}
604
607 std::map<SISchedulerBlockCreatorVariant, SIScheduleBlocks>::iterator B =
608 Blocks.find(BlockVariant);
609 if (B == Blocks.end()) {
611 createBlocksForVariant(BlockVariant);
612 topologicalSort();
613 scheduleInsideBlocks();
614 fillStats();
615 Res.Blocks = CurrentBlocks;
616 Res.TopDownIndex2Block = TopDownIndex2Block;
617 Res.TopDownBlock2Index = TopDownBlock2Index;
618 Blocks[BlockVariant] = Res;
619 return Res;
620 } else {
621 return B->second;
622 }
623}
624
626 if (SU->NodeNum >= DAG->SUnits.size())
627 return false;
628 return CurrentBlocks[Node2CurrentBlock[SU->NodeNum]]->getID() == ID;
629}
630
631void SIScheduleBlockCreator::colorHighLatenciesAlone() {
632 unsigned DAGSize = DAG->SUnits.size();
633
634 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
635 SUnit *SU = &DAG->SUnits[i];
636 if (DAG->IsHighLatencySU[SU->NodeNum]) {
637 CurrentColoring[SU->NodeNum] = NextReservedID++;
638 }
639 }
640}
641
642static bool
643hasDataDependencyPred(const SUnit &SU, const SUnit &FromSU) {
644 for (const auto &PredDep : SU.Preds) {
645 if (PredDep.getSUnit() == &FromSU &&
646 PredDep.getKind() == llvm::SDep::Data)
647 return true;
648 }
649 return false;
650}
651
652void SIScheduleBlockCreator::colorHighLatenciesGroups() {
653 unsigned DAGSize = DAG->SUnits.size();
654 unsigned NumHighLatencies = 0;
655 unsigned GroupSize;
656 int Color = NextReservedID;
657 unsigned Count = 0;
658 std::set<unsigned> FormingGroup;
659
660 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
661 SUnit *SU = &DAG->SUnits[i];
662 if (DAG->IsHighLatencySU[SU->NodeNum])
663 ++NumHighLatencies;
664 }
665
666 if (NumHighLatencies == 0)
667 return;
668
669 if (NumHighLatencies <= 6)
670 GroupSize = 2;
671 else if (NumHighLatencies <= 12)
672 GroupSize = 3;
673 else
674 GroupSize = 4;
675
676 for (unsigned SUNum : DAG->TopDownIndex2SU) {
677 const SUnit &SU = DAG->SUnits[SUNum];
678 if (DAG->IsHighLatencySU[SU.NodeNum]) {
679 unsigned CompatibleGroup = true;
680 int ProposedColor = Color;
681 std::vector<int> AdditionalElements;
682
683 // We don't want to put in the same block
684 // two high latency instructions that depend
685 // on each other.
686 // One way would be to check canAddEdge
687 // in both directions, but that currently is not
688 // enough because there the high latency order is
689 // enforced (via links).
690 // Instead, look at the dependencies between the
691 // high latency instructions and deduce if it is
692 // a data dependency or not.
693 for (unsigned j : FormingGroup) {
694 bool HasSubGraph;
695 std::vector<int> SubGraph;
696 // By construction (topological order), if SU and
697 // DAG->SUnits[j] are linked, DAG->SUnits[j] is necessary
698 // in the parent graph of SU.
699#ifndef NDEBUG
700 SubGraph = DAG->GetTopo()->GetSubGraph(SU, DAG->SUnits[j],
701 HasSubGraph);
702 assert(!HasSubGraph);
703#endif
704 SubGraph = DAG->GetTopo()->GetSubGraph(DAG->SUnits[j], SU,
705 HasSubGraph);
706 if (!HasSubGraph)
707 continue; // No dependencies between each other
708 else if (SubGraph.size() > 5) {
709 // Too many elements would be required to be added to the block.
710 CompatibleGroup = false;
711 break;
712 }
713 else {
714 // Check the type of dependency
715 for (unsigned k : SubGraph) {
716 // If in the path to join the two instructions,
717 // there is another high latency instruction,
718 // or instructions colored for another block
719 // abort the merge.
720 if (DAG->IsHighLatencySU[k] ||
721 (CurrentColoring[k] != ProposedColor &&
722 CurrentColoring[k] != 0)) {
723 CompatibleGroup = false;
724 break;
725 }
726 // If one of the SU in the subgraph depends on the result of SU j,
727 // there'll be a data dependency.
728 if (hasDataDependencyPred(DAG->SUnits[k], DAG->SUnits[j])) {
729 CompatibleGroup = false;
730 break;
731 }
732 }
733 if (!CompatibleGroup)
734 break;
735 // Same check for the SU
736 if (hasDataDependencyPred(SU, DAG->SUnits[j])) {
737 CompatibleGroup = false;
738 break;
739 }
740 // Add all the required instructions to the block
741 // These cannot live in another block (because they
742 // depend (order dependency) on one of the
743 // instruction in the block, and are required for the
744 // high latency instruction we add.
745 llvm::append_range(AdditionalElements, SubGraph);
746 }
747 }
748 if (CompatibleGroup) {
749 FormingGroup.insert(SU.NodeNum);
750 for (unsigned j : AdditionalElements)
751 CurrentColoring[j] = ProposedColor;
752 CurrentColoring[SU.NodeNum] = ProposedColor;
753 ++Count;
754 }
755 // Found one incompatible instruction,
756 // or has filled a big enough group.
757 // -> start a new one.
758 if (!CompatibleGroup) {
759 FormingGroup.clear();
760 Color = ++NextReservedID;
761 ProposedColor = Color;
762 FormingGroup.insert(SU.NodeNum);
763 CurrentColoring[SU.NodeNum] = ProposedColor;
764 Count = 0;
765 } else if (Count == GroupSize) {
766 FormingGroup.clear();
767 Color = ++NextReservedID;
768 ProposedColor = Color;
769 Count = 0;
770 }
771 }
772 }
773}
774
775void SIScheduleBlockCreator::colorComputeReservedDependencies() {
776 unsigned DAGSize = DAG->SUnits.size();
777 std::map<std::set<unsigned>, unsigned> ColorCombinations;
778
779 CurrentTopDownReservedDependencyColoring.clear();
780 CurrentBottomUpReservedDependencyColoring.clear();
781
782 CurrentTopDownReservedDependencyColoring.resize(DAGSize, 0);
783 CurrentBottomUpReservedDependencyColoring.resize(DAGSize, 0);
784
785 // Traverse TopDown, and give different colors to SUs depending
786 // on which combination of High Latencies they depend on.
787
788 for (unsigned SUNum : DAG->TopDownIndex2SU) {
789 SUnit *SU = &DAG->SUnits[SUNum];
790 std::set<unsigned> SUColors;
791
792 // Already given.
793 if (CurrentColoring[SU->NodeNum]) {
794 CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
795 CurrentColoring[SU->NodeNum];
796 continue;
797 }
798
799 for (SDep& PredDep : SU->Preds) {
800 SUnit *Pred = PredDep.getSUnit();
801 if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)
802 continue;
803 if (CurrentTopDownReservedDependencyColoring[Pred->NodeNum] > 0)
804 SUColors.insert(CurrentTopDownReservedDependencyColoring[Pred->NodeNum]);
805 }
806 // Color 0 by default.
807 if (SUColors.empty())
808 continue;
809 // Same color than parents.
810 if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)
811 CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
812 *SUColors.begin();
813 else {
814 std::map<std::set<unsigned>, unsigned>::iterator Pos =
815 ColorCombinations.find(SUColors);
816 if (Pos != ColorCombinations.end()) {
817 CurrentTopDownReservedDependencyColoring[SU->NodeNum] = Pos->second;
818 } else {
819 CurrentTopDownReservedDependencyColoring[SU->NodeNum] =
820 NextNonReservedID;
821 ColorCombinations[SUColors] = NextNonReservedID++;
822 }
823 }
824 }
825
826 ColorCombinations.clear();
827
828 // Same as before, but BottomUp.
829
830 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
831 SUnit *SU = &DAG->SUnits[SUNum];
832 std::set<unsigned> SUColors;
833
834 // Already given.
835 if (CurrentColoring[SU->NodeNum]) {
836 CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
837 CurrentColoring[SU->NodeNum];
838 continue;
839 }
840
841 for (SDep& SuccDep : SU->Succs) {
842 SUnit *Succ = SuccDep.getSUnit();
843 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
844 continue;
845 if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0)
846 SUColors.insert(CurrentBottomUpReservedDependencyColoring[Succ->NodeNum]);
847 }
848 // Keep color 0.
849 if (SUColors.empty())
850 continue;
851 // Same color than parents.
852 if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)
853 CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
854 *SUColors.begin();
855 else {
856 std::map<std::set<unsigned>, unsigned>::iterator Pos =
857 ColorCombinations.find(SUColors);
858 if (Pos != ColorCombinations.end()) {
859 CurrentBottomUpReservedDependencyColoring[SU->NodeNum] = Pos->second;
860 } else {
861 CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =
862 NextNonReservedID;
863 ColorCombinations[SUColors] = NextNonReservedID++;
864 }
865 }
866 }
867}
868
869void SIScheduleBlockCreator::colorAccordingToReservedDependencies() {
870 std::map<std::pair<unsigned, unsigned>, unsigned> ColorCombinations;
871
872 // Every combination of colors given by the top down
873 // and bottom up Reserved node dependency
874
875 for (const SUnit &SU : DAG->SUnits) {
876 std::pair<unsigned, unsigned> SUColors;
877
878 // High latency instructions: already given.
879 if (CurrentColoring[SU.NodeNum])
880 continue;
881
882 SUColors.first = CurrentTopDownReservedDependencyColoring[SU.NodeNum];
883 SUColors.second = CurrentBottomUpReservedDependencyColoring[SU.NodeNum];
884
885 std::map<std::pair<unsigned, unsigned>, unsigned>::iterator Pos =
886 ColorCombinations.find(SUColors);
887 if (Pos != ColorCombinations.end()) {
888 CurrentColoring[SU.NodeNum] = Pos->second;
889 } else {
890 CurrentColoring[SU.NodeNum] = NextNonReservedID;
891 ColorCombinations[SUColors] = NextNonReservedID++;
892 }
893 }
894}
895
896void SIScheduleBlockCreator::colorEndsAccordingToDependencies() {
897 unsigned DAGSize = DAG->SUnits.size();
898 std::vector<int> PendingColoring = CurrentColoring;
899
900 assert(DAGSize >= 1 &&
901 CurrentBottomUpReservedDependencyColoring.size() == DAGSize &&
902 CurrentTopDownReservedDependencyColoring.size() == DAGSize);
903 // If there is no reserved block at all, do nothing. We don't want
904 // everything in one block.
905 if (*llvm::max_element(CurrentBottomUpReservedDependencyColoring) == 0 &&
906 *llvm::max_element(CurrentTopDownReservedDependencyColoring) == 0)
907 return;
908
909 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
910 SUnit *SU = &DAG->SUnits[SUNum];
911 std::set<unsigned> SUColors;
912 std::set<unsigned> SUColorsPending;
913
914 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
915 continue;
916
917 if (CurrentBottomUpReservedDependencyColoring[SU->NodeNum] > 0 ||
918 CurrentTopDownReservedDependencyColoring[SU->NodeNum] > 0)
919 continue;
920
921 for (SDep& SuccDep : SU->Succs) {
922 SUnit *Succ = SuccDep.getSUnit();
923 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
924 continue;
925 if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0 ||
926 CurrentTopDownReservedDependencyColoring[Succ->NodeNum] > 0)
927 SUColors.insert(CurrentColoring[Succ->NodeNum]);
928 SUColorsPending.insert(PendingColoring[Succ->NodeNum]);
929 }
930 // If there is only one child/parent block, and that block
931 // is not among the ones we are removing in this path, then
932 // merge the instruction to that block
933 if (SUColors.size() == 1 && SUColorsPending.size() == 1)
934 PendingColoring[SU->NodeNum] = *SUColors.begin();
935 else // TODO: Attribute new colors depending on color
936 // combination of children.
937 PendingColoring[SU->NodeNum] = NextNonReservedID++;
938 }
939 CurrentColoring = PendingColoring;
940}
941
942
943void SIScheduleBlockCreator::colorForceConsecutiveOrderInGroup() {
944 unsigned DAGSize = DAG->SUnits.size();
945 unsigned PreviousColor;
946 std::set<unsigned> SeenColors;
947
948 if (DAGSize <= 1)
949 return;
950
951 PreviousColor = CurrentColoring[0];
952
953 for (unsigned i = 1, e = DAGSize; i != e; ++i) {
954 SUnit *SU = &DAG->SUnits[i];
955 unsigned CurrentColor = CurrentColoring[i];
956 unsigned PreviousColorSave = PreviousColor;
957 assert(i == SU->NodeNum);
958
959 if (CurrentColor != PreviousColor)
960 SeenColors.insert(PreviousColor);
961 PreviousColor = CurrentColor;
962
963 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
964 continue;
965
966 if (SeenColors.find(CurrentColor) == SeenColors.end())
967 continue;
968
969 if (PreviousColorSave != CurrentColor)
970 CurrentColoring[i] = NextNonReservedID++;
971 else
972 CurrentColoring[i] = CurrentColoring[i-1];
973 }
974}
975
976void SIScheduleBlockCreator::colorMergeConstantLoadsNextGroup() {
977 unsigned DAGSize = DAG->SUnits.size();
978
979 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
980 SUnit *SU = &DAG->SUnits[SUNum];
981 std::set<unsigned> SUColors;
982
983 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
984 continue;
985
986 // No predecessor: Vgpr constant loading.
987 // Low latency instructions usually have a predecessor (the address)
988 if (SU->Preds.size() > 0 && !DAG->IsLowLatencySU[SU->NodeNum])
989 continue;
990
991 for (SDep& SuccDep : SU->Succs) {
992 SUnit *Succ = SuccDep.getSUnit();
993 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
994 continue;
995 SUColors.insert(CurrentColoring[Succ->NodeNum]);
996 }
997 if (SUColors.size() == 1)
998 CurrentColoring[SU->NodeNum] = *SUColors.begin();
999 }
1000}
1001
1002void SIScheduleBlockCreator::colorMergeIfPossibleNextGroup() {
1003 unsigned DAGSize = DAG->SUnits.size();
1004
1005 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1006 SUnit *SU = &DAG->SUnits[SUNum];
1007 std::set<unsigned> SUColors;
1008
1009 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
1010 continue;
1011
1012 for (SDep& SuccDep : SU->Succs) {
1013 SUnit *Succ = SuccDep.getSUnit();
1014 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1015 continue;
1016 SUColors.insert(CurrentColoring[Succ->NodeNum]);
1017 }
1018 if (SUColors.size() == 1)
1019 CurrentColoring[SU->NodeNum] = *SUColors.begin();
1020 }
1021}
1022
1023void SIScheduleBlockCreator::colorMergeIfPossibleNextGroupOnlyForReserved() {
1024 unsigned DAGSize = DAG->SUnits.size();
1025
1026 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1027 SUnit *SU = &DAG->SUnits[SUNum];
1028 std::set<unsigned> SUColors;
1029
1030 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
1031 continue;
1032
1033 for (SDep& SuccDep : SU->Succs) {
1034 SUnit *Succ = SuccDep.getSUnit();
1035 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1036 continue;
1037 SUColors.insert(CurrentColoring[Succ->NodeNum]);
1038 }
1039 if (SUColors.size() == 1 && *SUColors.begin() <= DAGSize)
1040 CurrentColoring[SU->NodeNum] = *SUColors.begin();
1041 }
1042}
1043
1044void SIScheduleBlockCreator::colorMergeIfPossibleSmallGroupsToNextGroup() {
1045 unsigned DAGSize = DAG->SUnits.size();
1046 std::map<unsigned, unsigned> ColorCount;
1047
1048 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1049 SUnit *SU = &DAG->SUnits[SUNum];
1050 unsigned color = CurrentColoring[SU->NodeNum];
1051 ++ColorCount[color];
1052 }
1053
1054 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1055 SUnit *SU = &DAG->SUnits[SUNum];
1056 unsigned color = CurrentColoring[SU->NodeNum];
1057 std::set<unsigned> SUColors;
1058
1059 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
1060 continue;
1061
1062 if (ColorCount[color] > 1)
1063 continue;
1064
1065 for (SDep& SuccDep : SU->Succs) {
1066 SUnit *Succ = SuccDep.getSUnit();
1067 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1068 continue;
1069 SUColors.insert(CurrentColoring[Succ->NodeNum]);
1070 }
1071 if (SUColors.size() == 1 && *SUColors.begin() != color) {
1072 --ColorCount[color];
1073 CurrentColoring[SU->NodeNum] = *SUColors.begin();
1074 ++ColorCount[*SUColors.begin()];
1075 }
1076 }
1077}
1078
1079void SIScheduleBlockCreator::cutHugeBlocks() {
1080 // TODO
1081}
1082
1083void SIScheduleBlockCreator::regroupNoUserInstructions() {
1084 unsigned DAGSize = DAG->SUnits.size();
1085 int GroupID = NextNonReservedID++;
1086
1087 for (unsigned SUNum : DAG->BottomUpIndex2SU) {
1088 SUnit *SU = &DAG->SUnits[SUNum];
1089 bool hasSuccessor = false;
1090
1091 if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
1092 continue;
1093
1094 for (SDep& SuccDep : SU->Succs) {
1095 SUnit *Succ = SuccDep.getSUnit();
1096 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1097 continue;
1098 hasSuccessor = true;
1099 }
1100 if (!hasSuccessor)
1101 CurrentColoring[SU->NodeNum] = GroupID;
1102 }
1103}
1104
1105void SIScheduleBlockCreator::colorExports() {
1106 unsigned ExportColor = NextNonReservedID++;
1107 SmallVector<unsigned, 8> ExpGroup;
1108
1109 // Put all exports together in a block.
1110 // The block will naturally end up being scheduled last,
1111 // thus putting exports at the end of the schedule, which
1112 // is better for performance.
1113 // However we must ensure, for safety, the exports can be put
1114 // together in the same block without any other instruction.
1115 // This could happen, for example, when scheduling after regalloc
1116 // if reloading a spilled register from memory using the same
1117 // register than used in a previous export.
1118 // If that happens, do not regroup the exports.
1119 for (unsigned SUNum : DAG->TopDownIndex2SU) {
1120 const SUnit &SU = DAG->SUnits[SUNum];
1121 if (SIInstrInfo::isEXP(*SU.getInstr())) {
1122 // SU is an export instruction. Check whether one of its successor
1123 // dependencies is a non-export, in which case we skip export grouping.
1124 for (const SDep &SuccDep : SU.Succs) {
1125 const SUnit *SuccSU = SuccDep.getSUnit();
1126 if (SuccDep.isWeak() || SuccSU->NodeNum >= DAG->SUnits.size()) {
1127 // Ignore these dependencies.
1128 continue;
1129 }
1130 assert(SuccSU->isInstr() &&
1131 "SUnit unexpectedly not representing an instruction!");
1132
1133 if (!SIInstrInfo::isEXP(*SuccSU->getInstr())) {
1134 // A non-export depends on us. Skip export grouping.
1135 // Note that this is a bit pessimistic: We could still group all other
1136 // exports that are not depended on by non-exports, directly or
1137 // indirectly. Simply skipping this particular export but grouping all
1138 // others would not account for indirect dependencies.
1139 return;
1140 }
1141 }
1142 ExpGroup.push_back(SUNum);
1143 }
1144 }
1145
1146 // The group can be formed. Give the color.
1147 for (unsigned j : ExpGroup)
1148 CurrentColoring[j] = ExportColor;
1149}
1150
1151void SIScheduleBlockCreator::createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant) {
1152 unsigned DAGSize = DAG->SUnits.size();
1153 std::map<unsigned,unsigned> RealID;
1154
1155 CurrentBlocks.clear();
1156 CurrentColoring.clear();
1157 CurrentColoring.resize(DAGSize, 0);
1158 Node2CurrentBlock.clear();
1159
1160 // Restore links previous scheduling variant has overridden.
1161 DAG->restoreSULinksLeft();
1162
1163 NextReservedID = 1;
1164 NextNonReservedID = DAGSize + 1;
1165
1166 LLVM_DEBUG(dbgs() << "Coloring the graph\n");
1167
1169 colorHighLatenciesGroups();
1170 else
1171 colorHighLatenciesAlone();
1172 colorComputeReservedDependencies();
1173 colorAccordingToReservedDependencies();
1174 colorEndsAccordingToDependencies();
1176 colorForceConsecutiveOrderInGroup();
1177 regroupNoUserInstructions();
1178 colorMergeConstantLoadsNextGroup();
1179 colorMergeIfPossibleNextGroupOnlyForReserved();
1180 colorExports();
1181
1182 // Put SUs of same color into same block
1183 Node2CurrentBlock.resize(DAGSize, -1);
1184 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1185 SUnit *SU = &DAG->SUnits[i];
1186 unsigned Color = CurrentColoring[SU->NodeNum];
1187 if (RealID.find(Color) == RealID.end()) {
1188 int ID = CurrentBlocks.size();
1189 BlockPtrs.push_back(std::make_unique<SIScheduleBlock>(DAG, this, ID));
1190 CurrentBlocks.push_back(BlockPtrs.rbegin()->get());
1191 RealID[Color] = ID;
1192 }
1193 CurrentBlocks[RealID[Color]]->addUnit(SU);
1194 Node2CurrentBlock[SU->NodeNum] = RealID[Color];
1195 }
1196
1197 // Build dependencies between blocks.
1198 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1199 SUnit *SU = &DAG->SUnits[i];
1200 int SUID = Node2CurrentBlock[i];
1201 for (SDep& SuccDep : SU->Succs) {
1202 SUnit *Succ = SuccDep.getSUnit();
1203 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1204 continue;
1205 if (Node2CurrentBlock[Succ->NodeNum] != SUID)
1206 CurrentBlocks[SUID]->addSucc(CurrentBlocks[Node2CurrentBlock[Succ->NodeNum]],
1207 SuccDep.isCtrl() ? NoData : Data);
1208 }
1209 for (SDep& PredDep : SU->Preds) {
1210 SUnit *Pred = PredDep.getSUnit();
1211 if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)
1212 continue;
1213 if (Node2CurrentBlock[Pred->NodeNum] != SUID)
1214 CurrentBlocks[SUID]->addPred(CurrentBlocks[Node2CurrentBlock[Pred->NodeNum]]);
1215 }
1216 }
1217
1218 // Free root and leafs of all blocks to enable scheduling inside them.
1219 for (SIScheduleBlock *Block : CurrentBlocks)
1220 Block->finalizeUnits();
1221 LLVM_DEBUG({
1222 dbgs() << "Blocks created:\n\n";
1223 for (SIScheduleBlock *Block : CurrentBlocks)
1224 Block->printDebug(true);
1225 });
1226}
1227
1228// Two functions taken from Codegen/MachineScheduler.cpp
1229
1230/// Non-const version.
1234 for (; I != End; ++I) {
1235 if (!I->isDebugInstr())
1236 break;
1237 }
1238 return I;
1239}
1240
1241void SIScheduleBlockCreator::topologicalSort() {
1242 unsigned DAGSize = CurrentBlocks.size();
1243 std::vector<int> WorkList;
1244
1245 LLVM_DEBUG(dbgs() << "Topological Sort\n");
1246
1247 WorkList.reserve(DAGSize);
1248 TopDownIndex2Block.resize(DAGSize);
1249 TopDownBlock2Index.resize(DAGSize);
1250 BottomUpIndex2Block.resize(DAGSize);
1251
1252 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1253 SIScheduleBlock *Block = CurrentBlocks[i];
1254 unsigned Degree = Block->getSuccs().size();
1255 TopDownBlock2Index[i] = Degree;
1256 if (Degree == 0) {
1257 WorkList.push_back(i);
1258 }
1259 }
1260
1261 int Id = DAGSize;
1262 while (!WorkList.empty()) {
1263 int i = WorkList.back();
1264 SIScheduleBlock *Block = CurrentBlocks[i];
1265 WorkList.pop_back();
1266 TopDownBlock2Index[i] = --Id;
1267 TopDownIndex2Block[Id] = i;
1268 for (SIScheduleBlock* Pred : Block->getPreds()) {
1269 if (!--TopDownBlock2Index[Pred->getID()])
1270 WorkList.push_back(Pred->getID());
1271 }
1272 }
1273
1274#ifndef NDEBUG
1275 // Check correctness of the ordering.
1276 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1277 SIScheduleBlock *Block = CurrentBlocks[i];
1278 for (SIScheduleBlock* Pred : Block->getPreds()) {
1279 assert(TopDownBlock2Index[i] > TopDownBlock2Index[Pred->getID()] &&
1280 "Wrong Top Down topological sorting");
1281 }
1282 }
1283#endif
1284
1285 BottomUpIndex2Block = std::vector<int>(TopDownIndex2Block.rbegin(),
1286 TopDownIndex2Block.rend());
1287}
1288
1289void SIScheduleBlockCreator::scheduleInsideBlocks() {
1290 unsigned DAGSize = CurrentBlocks.size();
1291
1292 LLVM_DEBUG(dbgs() << "\nScheduling Blocks\n\n");
1293
1294 // We do schedule a valid scheduling such that a Block corresponds
1295 // to a range of instructions.
1296 LLVM_DEBUG(dbgs() << "First phase: Fast scheduling for Reg Liveness\n");
1297 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1298 SIScheduleBlock *Block = CurrentBlocks[i];
1299 Block->fastSchedule();
1300 }
1301
1302 // Note: the following code, and the part restoring previous position
1303 // is by far the most expensive operation of the Scheduler.
1304
1305 // Do not update CurrentTop.
1306 MachineBasicBlock::iterator CurrentTopFastSched = DAG->getCurrentTop();
1307 std::vector<MachineBasicBlock::iterator> PosOld;
1308 std::vector<MachineBasicBlock::iterator> PosNew;
1309 PosOld.reserve(DAG->SUnits.size());
1310 PosNew.reserve(DAG->SUnits.size());
1311
1312 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1313 int BlockIndice = TopDownIndex2Block[i];
1314 SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1315 std::vector<SUnit*> SUs = Block->getScheduledUnits();
1316
1317 for (SUnit* SU : SUs) {
1318 MachineInstr *MI = SU->getInstr();
1320 PosOld.push_back(Pos);
1321 if (&*CurrentTopFastSched == MI) {
1322 PosNew.push_back(Pos);
1323 CurrentTopFastSched = nextIfDebug(++CurrentTopFastSched,
1324 DAG->getCurrentBottom());
1325 } else {
1326 // Update the instruction stream.
1327 DAG->getBB()->splice(CurrentTopFastSched, DAG->getBB(), MI);
1328
1329 // Update LiveIntervals.
1330 // Note: Moving all instructions and calling handleMove every time
1331 // is the most cpu intensive operation of the scheduler.
1332 // It would gain a lot if there was a way to recompute the
1333 // LiveIntervals for the entire scheduling region.
1334 DAG->getLIS()->handleMove(*MI, /*UpdateFlags=*/true);
1335 PosNew.push_back(CurrentTopFastSched);
1336 }
1337 }
1338 }
1339
1340 // Now we have Block of SUs == Block of MI.
1341 // We do the final schedule for the instructions inside the block.
1342 // The property that all the SUs of the Block are grouped together as MI
1343 // is used for correct reg usage tracking.
1344 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1345 SIScheduleBlock *Block = CurrentBlocks[i];
1346 std::vector<SUnit*> SUs = Block->getScheduledUnits();
1347 Block->schedule((*SUs.begin())->getInstr(), (*SUs.rbegin())->getInstr());
1348 }
1349
1350 LLVM_DEBUG(dbgs() << "Restoring MI Pos\n");
1351 // Restore old ordering (which prevents a LIS->handleMove bug).
1352 for (unsigned i = PosOld.size(), e = 0; i != e; --i) {
1353 MachineBasicBlock::iterator POld = PosOld[i-1];
1354 MachineBasicBlock::iterator PNew = PosNew[i-1];
1355 if (PNew != POld) {
1356 // Update the instruction stream.
1357 DAG->getBB()->splice(POld, DAG->getBB(), PNew);
1358
1359 // Update LiveIntervals.
1360 DAG->getLIS()->handleMove(*POld, /*UpdateFlags=*/true);
1361 }
1362 }
1363
1364 LLVM_DEBUG({
1365 for (SIScheduleBlock *Block : CurrentBlocks)
1366 Block->printDebug(true);
1367 });
1368}
1369
1370void SIScheduleBlockCreator::fillStats() {
1371 unsigned DAGSize = CurrentBlocks.size();
1372
1373 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1374 int BlockIndice = TopDownIndex2Block[i];
1375 SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1376 if (Block->getPreds().empty())
1377 Block->Depth = 0;
1378 else {
1379 unsigned Depth = 0;
1380 for (SIScheduleBlock *Pred : Block->getPreds()) {
1381 if (Depth < Pred->Depth + Pred->getCost())
1382 Depth = Pred->Depth + Pred->getCost();
1383 }
1384 Block->Depth = Depth;
1385 }
1386 }
1387
1388 for (unsigned i = 0, e = DAGSize; i != e; ++i) {
1389 int BlockIndice = BottomUpIndex2Block[i];
1390 SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
1391 if (Block->getSuccs().empty())
1392 Block->Height = 0;
1393 else {
1394 unsigned Height = 0;
1395 for (const auto &Succ : Block->getSuccs())
1396 Height = std::max(Height, Succ.first->Height + Succ.first->getCost());
1397 Block->Height = Height;
1398 }
1399 }
1400}
1401
1402// SIScheduleBlockScheduler //
1403
1406 SIScheduleBlocks BlocksStruct) :
1407 DAG(DAG), Variant(Variant), Blocks(BlocksStruct.Blocks),
1408 LastPosWaitedHighLatency(0), NumBlockScheduled(0), VregCurrentUsage(0),
1409 SregCurrentUsage(0), maxVregUsage(0), maxSregUsage(0) {
1410
1411 // Fill the usage of every output
1412 // Warning: while by construction we always have a link between two blocks
1413 // when one needs a result from the other, the number of users of an output
1414 // is not the sum of child blocks having as input the same virtual register.
1415 // Here is an example. A produces x and y. B eats x and produces x'.
1416 // C eats x' and y. The register coalescer may have attributed the same
1417 // virtual register to x and x'.
1418 // To count accurately, we do a topological sort. In case the register is
1419 // found for several parents, we increment the usage of the one with the
1420 // highest topological index.
1421 LiveOutRegsNumUsages.resize(Blocks.size());
1422 for (SIScheduleBlock *Block : Blocks) {
1423 for (unsigned Reg : Block->getInRegs()) {
1424 bool Found = false;
1425 int topoInd = -1;
1426 for (SIScheduleBlock* Pred: Block->getPreds()) {
1427 std::set<unsigned> PredOutRegs = Pred->getOutRegs();
1428 std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);
1429
1430 if (RegPos != PredOutRegs.end()) {
1431 Found = true;
1432 if (topoInd < BlocksStruct.TopDownBlock2Index[Pred->getID()]) {
1433 topoInd = BlocksStruct.TopDownBlock2Index[Pred->getID()];
1434 }
1435 }
1436 }
1437
1438 if (!Found)
1439 continue;
1440
1441 int PredID = BlocksStruct.TopDownIndex2Block[topoInd];
1442 ++LiveOutRegsNumUsages[PredID][Reg];
1443 }
1444 }
1445
1446 LastPosHighLatencyParentScheduled.resize(Blocks.size(), 0);
1447 BlockNumPredsLeft.resize(Blocks.size());
1448 BlockNumSuccsLeft.resize(Blocks.size());
1449
1450 for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1452 BlockNumPredsLeft[i] = Block->getPreds().size();
1453 BlockNumSuccsLeft[i] = Block->getSuccs().size();
1454 }
1455
1456#ifndef NDEBUG
1457 for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1459 assert(Block->getID() == i);
1460 }
1461#endif
1462
1463 std::set<unsigned> InRegs = DAG->getInRegs();
1464 addLiveRegs(InRegs);
1465
1466 // Increase LiveOutRegsNumUsages for blocks
1467 // producing registers consumed in another
1468 // scheduling region.
1469 for (unsigned Reg : DAG->getOutRegs()) {
1470 for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1471 // Do reverse traversal
1472 int ID = BlocksStruct.TopDownIndex2Block[Blocks.size()-1-i];
1474 const std::set<unsigned> &OutRegs = Block->getOutRegs();
1475
1476 if (OutRegs.find(Reg) == OutRegs.end())
1477 continue;
1478
1479 ++LiveOutRegsNumUsages[ID][Reg];
1480 break;
1481 }
1482 }
1483
1484 // Fill LiveRegsConsumers for regs that were already
1485 // defined before scheduling.
1486 for (SIScheduleBlock *Block : Blocks) {
1487 for (unsigned Reg : Block->getInRegs()) {
1488 bool Found = false;
1489 for (SIScheduleBlock* Pred: Block->getPreds()) {
1490 std::set<unsigned> PredOutRegs = Pred->getOutRegs();
1491 std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);
1492
1493 if (RegPos != PredOutRegs.end()) {
1494 Found = true;
1495 break;
1496 }
1497 }
1498
1499 if (!Found)
1500 ++LiveRegsConsumers[Reg];
1501 }
1502 }
1503
1504 for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
1506 if (BlockNumPredsLeft[i] == 0) {
1507 ReadyBlocks.push_back(Block);
1508 }
1509 }
1510
1511 while (SIScheduleBlock *Block = pickBlock()) {
1512 BlocksScheduled.push_back(Block);
1513 blockScheduled(Block);
1514 }
1515
1516 LLVM_DEBUG(dbgs() << "Block Order:"; for (SIScheduleBlock *Block
1517 : BlocksScheduled) {
1518 dbgs() << ' ' << Block->getID();
1519 } dbgs() << '\n';);
1520}
1521
1522bool SIScheduleBlockScheduler::tryCandidateLatency(SIBlockSchedCandidate &Cand,
1523 SIBlockSchedCandidate &TryCand) {
1524 if (!Cand.isValid()) {
1525 TryCand.Reason = NodeOrder;
1526 return true;
1527 }
1528
1529 // Try to hide high latencies.
1530 if (SISched::tryLess(TryCand.LastPosHighLatParentScheduled,
1531 Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))
1532 return true;
1533 // Schedule high latencies early so you can hide them better.
1534 if (SISched::tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,
1535 TryCand, Cand, Latency))
1536 return true;
1537 if (TryCand.IsHighLatency && SISched::tryGreater(TryCand.Height, Cand.Height,
1538 TryCand, Cand, Depth))
1539 return true;
1540 if (SISched::tryGreater(TryCand.NumHighLatencySuccessors,
1541 Cand.NumHighLatencySuccessors,
1542 TryCand, Cand, Successor))
1543 return true;
1544 return false;
1545}
1546
1547bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,
1548 SIBlockSchedCandidate &TryCand) {
1549 if (!Cand.isValid()) {
1550 TryCand.Reason = NodeOrder;
1551 return true;
1552 }
1553
1554 if (SISched::tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,
1555 TryCand, Cand, RegUsage))
1556 return true;
1557 if (SISched::tryGreater(TryCand.NumSuccessors > 0,
1558 Cand.NumSuccessors > 0,
1559 TryCand, Cand, Successor))
1560 return true;
1561 if (SISched::tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))
1562 return true;
1563 if (SISched::tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,
1564 TryCand, Cand, RegUsage))
1565 return true;
1566 return false;
1567}
1568
1569SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {
1570 SIBlockSchedCandidate Cand;
1571 std::vector<SIScheduleBlock*>::iterator Best;
1573 if (ReadyBlocks.empty())
1574 return nullptr;
1575
1576 DAG->fillVgprSgprCost(LiveRegs.begin(), LiveRegs.end(),
1577 VregCurrentUsage, SregCurrentUsage);
1578 if (VregCurrentUsage > maxVregUsage)
1579 maxVregUsage = VregCurrentUsage;
1580 if (SregCurrentUsage > maxSregUsage)
1581 maxSregUsage = SregCurrentUsage;
1582 LLVM_DEBUG(dbgs() << "Picking New Blocks\n"; dbgs() << "Available: ";
1584 : ReadyBlocks) dbgs()
1585 << Block->getID() << ' ';
1586 dbgs() << "\nCurrent Live:\n";
1587 for (unsigned Reg
1588 : LiveRegs) dbgs()
1589 << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';
1590 dbgs() << '\n';
1591 dbgs() << "Current VGPRs: " << VregCurrentUsage << '\n';
1592 dbgs() << "Current SGPRs: " << SregCurrentUsage << '\n';);
1593
1594 Cand.Block = nullptr;
1595 for (std::vector<SIScheduleBlock*>::iterator I = ReadyBlocks.begin(),
1596 E = ReadyBlocks.end(); I != E; ++I) {
1597 SIBlockSchedCandidate TryCand;
1598 TryCand.Block = *I;
1599 TryCand.IsHighLatency = TryCand.Block->isHighLatencyBlock();
1600 TryCand.VGPRUsageDiff =
1601 checkRegUsageImpact(TryCand.Block->getInRegs(),
1602 TryCand.Block->getOutRegs())[AMDGPU::RegisterPressureSets::VGPR_32];
1603 TryCand.NumSuccessors = TryCand.Block->getSuccs().size();
1604 TryCand.NumHighLatencySuccessors =
1605 TryCand.Block->getNumHighLatencySuccessors();
1606 TryCand.LastPosHighLatParentScheduled =
1607 (unsigned int) std::max<int> (0,
1608 LastPosHighLatencyParentScheduled[TryCand.Block->getID()] -
1609 LastPosWaitedHighLatency);
1610 TryCand.Height = TryCand.Block->Height;
1611 // Try not to increase VGPR usage too much, else we may spill.
1612 if (VregCurrentUsage > 120 ||
1614 if (!tryCandidateRegUsage(Cand, TryCand) &&
1616 tryCandidateLatency(Cand, TryCand);
1617 } else {
1618 if (!tryCandidateLatency(Cand, TryCand))
1619 tryCandidateRegUsage(Cand, TryCand);
1620 }
1621 if (TryCand.Reason != NoCand) {
1622 Cand.setBest(TryCand);
1623 Best = I;
1624 LLVM_DEBUG(dbgs() << "Best Current Choice: " << Cand.Block->getID() << ' '
1625 << getReasonStr(Cand.Reason) << '\n');
1626 }
1627 }
1628
1629 LLVM_DEBUG(dbgs() << "Picking: " << Cand.Block->getID() << '\n';
1630 dbgs() << "Is a block with high latency instruction: "
1631 << (Cand.IsHighLatency ? "yes\n" : "no\n");
1632 dbgs() << "Position of last high latency dependency: "
1633 << Cand.LastPosHighLatParentScheduled << '\n';
1634 dbgs() << "VGPRUsageDiff: " << Cand.VGPRUsageDiff << '\n';
1635 dbgs() << '\n';);
1636
1637 Block = Cand.Block;
1638 ReadyBlocks.erase(Best);
1639 return Block;
1640}
1641
1642// Tracking of currently alive registers to determine VGPR Usage.
1643
1644void SIScheduleBlockScheduler::addLiveRegs(std::set<unsigned> &Regs) {
1645 for (Register Reg : Regs) {
1646 // For now only track virtual registers.
1647 if (!Reg.isVirtual())
1648 continue;
1649 // If not already in the live set, then add it.
1650 (void) LiveRegs.insert(Reg);
1651 }
1652}
1653
1654void SIScheduleBlockScheduler::decreaseLiveRegs(SIScheduleBlock *Block,
1655 std::set<unsigned> &Regs) {
1656 for (unsigned Reg : Regs) {
1657 // For now only track virtual registers.
1658 std::set<unsigned>::iterator Pos = LiveRegs.find(Reg);
1659 assert (Pos != LiveRegs.end() && // Reg must be live.
1660 LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end() &&
1661 LiveRegsConsumers[Reg] >= 1);
1662 --LiveRegsConsumers[Reg];
1663 if (LiveRegsConsumers[Reg] == 0)
1664 LiveRegs.erase(Pos);
1665 }
1666}
1667
1668void SIScheduleBlockScheduler::releaseBlockSuccs(SIScheduleBlock *Parent) {
1669 for (const auto &Block : Parent->getSuccs()) {
1670 if (--BlockNumPredsLeft[Block.first->getID()] == 0)
1671 ReadyBlocks.push_back(Block.first);
1672
1673 if (Parent->isHighLatencyBlock() &&
1675 LastPosHighLatencyParentScheduled[Block.first->getID()] = NumBlockScheduled;
1676 }
1677}
1678
1679void SIScheduleBlockScheduler::blockScheduled(SIScheduleBlock *Block) {
1680 decreaseLiveRegs(Block, Block->getInRegs());
1681 addLiveRegs(Block->getOutRegs());
1682 releaseBlockSuccs(Block);
1683 for (const auto &RegP : LiveOutRegsNumUsages[Block->getID()]) {
1684 // We produce this register, thus it must not be previously alive.
1685 assert(LiveRegsConsumers.find(RegP.first) == LiveRegsConsumers.end() ||
1686 LiveRegsConsumers[RegP.first] == 0);
1687 LiveRegsConsumers[RegP.first] += RegP.second;
1688 }
1689 if (LastPosHighLatencyParentScheduled[Block->getID()] >
1690 (unsigned)LastPosWaitedHighLatency)
1691 LastPosWaitedHighLatency =
1692 LastPosHighLatencyParentScheduled[Block->getID()];
1693 ++NumBlockScheduled;
1694}
1695
1696std::vector<int>
1697SIScheduleBlockScheduler::checkRegUsageImpact(std::set<unsigned> &InRegs,
1698 std::set<unsigned> &OutRegs) {
1699 std::vector<int> DiffSetPressure;
1700 DiffSetPressure.assign(DAG->getTRI()->getNumRegPressureSets(), 0);
1701
1702 for (Register Reg : InRegs) {
1703 // For now only track virtual registers.
1704 if (!Reg.isVirtual())
1705 continue;
1706 if (LiveRegsConsumers[Reg] > 1)
1707 continue;
1708 PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
1709 for (; PSetI.isValid(); ++PSetI) {
1710 DiffSetPressure[*PSetI] -= PSetI.getWeight();
1711 }
1712 }
1713
1714 for (Register Reg : OutRegs) {
1715 // For now only track virtual registers.
1716 if (!Reg.isVirtual())
1717 continue;
1718 PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
1719 for (; PSetI.isValid(); ++PSetI) {
1720 DiffSetPressure[*PSetI] += PSetI.getWeight();
1721 }
1722 }
1723
1724 return DiffSetPressure;
1725}
1726
1727// SIScheduler //
1728
1730SIScheduler::scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,
1731 SISchedulerBlockSchedulerVariant ScheduleVariant) {
1732 SIScheduleBlocks Blocks = BlockCreator.getBlocks(BlockVariant);
1733 SIScheduleBlockScheduler Scheduler(DAG, ScheduleVariant, Blocks);
1734 std::vector<SIScheduleBlock*> ScheduledBlocks;
1735 struct SIScheduleBlockResult Res;
1736
1737 ScheduledBlocks = Scheduler.getBlocks();
1738
1739 for (SIScheduleBlock *Block : ScheduledBlocks) {
1740 std::vector<SUnit*> SUs = Block->getScheduledUnits();
1741
1742 for (SUnit* SU : SUs)
1743 Res.SUs.push_back(SU->NodeNum);
1744 }
1745
1746 Res.MaxSGPRUsage = Scheduler.getSGPRUsage();
1747 Res.MaxVGPRUsage = Scheduler.getVGPRUsage();
1748 return Res;
1749}
1750
1751// SIScheduleDAGMI //
1752
1754 ScheduleDAGMILive(C, std::make_unique<GenericScheduler>(C)) {
1755 SITII = static_cast<const SIInstrInfo*>(TII);
1756 SITRI = static_cast<const SIRegisterInfo*>(TRI);
1757}
1758
1760
1761// Code adapted from scheduleDAG.cpp
1762// Does a topological sort over the SUs.
1763// Both TopDown and BottomUp
1764void SIScheduleDAGMI::topologicalSort() {
1766
1767 TopDownIndex2SU = std::vector<int>(Topo.begin(), Topo.end());
1768 BottomUpIndex2SU = std::vector<int>(Topo.rbegin(), Topo.rend());
1769}
1770
1771// Move low latencies further from their user without
1772// increasing SGPR usage (in general)
1773// This is to be replaced by a better pass that would
1774// take into account SGPR usage (based on VGPR Usage
1775// and the corresponding wavefront count), that would
1776// try to merge groups of loads if it make sense, etc
1777void SIScheduleDAGMI::moveLowLatencies() {
1778 unsigned DAGSize = SUnits.size();
1779 int LastLowLatencyUser = -1;
1780 int LastLowLatencyPos = -1;
1781
1782 for (unsigned i = 0, e = ScheduledSUnits.size(); i != e; ++i) {
1783 SUnit *SU = &SUnits[ScheduledSUnits[i]];
1784 bool IsLowLatencyUser = false;
1785 unsigned MinPos = 0;
1786
1787 for (SDep& PredDep : SU->Preds) {
1788 SUnit *Pred = PredDep.getSUnit();
1789 if (SITII->isLowLatencyInstruction(*Pred->getInstr())) {
1790 IsLowLatencyUser = true;
1791 }
1792 if (Pred->NodeNum >= DAGSize)
1793 continue;
1794 unsigned PredPos = ScheduledSUnitsInv[Pred->NodeNum];
1795 if (PredPos >= MinPos)
1796 MinPos = PredPos + 1;
1797 }
1798
1799 if (SITII->isLowLatencyInstruction(*SU->getInstr())) {
1800 unsigned BestPos = LastLowLatencyUser + 1;
1801 if ((int)BestPos <= LastLowLatencyPos)
1802 BestPos = LastLowLatencyPos + 1;
1803 if (BestPos < MinPos)
1804 BestPos = MinPos;
1805 if (BestPos < i) {
1806 for (unsigned u = i; u > BestPos; --u) {
1807 ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];
1808 ScheduledSUnits[u] = ScheduledSUnits[u-1];
1809 }
1810 ScheduledSUnits[BestPos] = SU->NodeNum;
1811 ScheduledSUnitsInv[SU->NodeNum] = BestPos;
1812 }
1813 LastLowLatencyPos = BestPos;
1814 if (IsLowLatencyUser)
1815 LastLowLatencyUser = BestPos;
1816 } else if (IsLowLatencyUser) {
1817 LastLowLatencyUser = i;
1818 // Moves COPY instructions on which depends
1819 // the low latency instructions too.
1820 } else if (SU->getInstr()->getOpcode() == AMDGPU::COPY) {
1821 bool CopyForLowLat = false;
1822 for (SDep& SuccDep : SU->Succs) {
1823 SUnit *Succ = SuccDep.getSUnit();
1824 if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)
1825 continue;
1826 if (SITII->isLowLatencyInstruction(*Succ->getInstr())) {
1827 CopyForLowLat = true;
1828 }
1829 }
1830 if (!CopyForLowLat)
1831 continue;
1832 if (MinPos < i) {
1833 for (unsigned u = i; u > MinPos; --u) {
1834 ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];
1835 ScheduledSUnits[u] = ScheduledSUnits[u-1];
1836 }
1837 ScheduledSUnits[MinPos] = SU->NodeNum;
1838 ScheduledSUnitsInv[SU->NodeNum] = MinPos;
1839 }
1840 }
1841 }
1842}
1843
1845 for (unsigned i = 0, e = SUnits.size(); i != e; ++i) {
1846 SUnits[i].isScheduled = false;
1847 SUnits[i].WeakPredsLeft = SUnitsLinksBackup[i].WeakPredsLeft;
1848 SUnits[i].NumPredsLeft = SUnitsLinksBackup[i].NumPredsLeft;
1849 SUnits[i].WeakSuccsLeft = SUnitsLinksBackup[i].WeakSuccsLeft;
1850 SUnits[i].NumSuccsLeft = SUnitsLinksBackup[i].NumSuccsLeft;
1851 }
1852}
1853
1854// Return the Vgpr and Sgpr usage corresponding to some virtual registers.
1855template<typename _Iterator> void
1857 unsigned &VgprUsage, unsigned &SgprUsage) {
1858 VgprUsage = 0;
1859 SgprUsage = 0;
1860 for (_Iterator RegI = First; RegI != End; ++RegI) {
1861 Register Reg = *RegI;
1862 // For now only track virtual registers
1863 if (!Reg.isVirtual())
1864 continue;
1866 for (; PSetI.isValid(); ++PSetI) {
1867 if (*PSetI == AMDGPU::RegisterPressureSets::VGPR_32)
1868 VgprUsage += PSetI.getWeight();
1869 else if (*PSetI == AMDGPU::RegisterPressureSets::SReg_32)
1870 SgprUsage += PSetI.getWeight();
1871 }
1872 }
1873}
1874
1876{
1877 SmallVector<SUnit*, 8> TopRoots, BotRoots;
1878 SIScheduleBlockResult Best, Temp;
1879 LLVM_DEBUG(dbgs() << "Preparing Scheduling\n");
1880
1883
1884 LLVM_DEBUG(dump());
1885 if (PrintDAGs)
1886 dump();
1887 if (ViewMISchedDAGs)
1888 viewGraph();
1889
1890 topologicalSort();
1891 findRootsAndBiasEdges(TopRoots, BotRoots);
1892 // We reuse several ScheduleDAGMI and ScheduleDAGMILive
1893 // functions, but to make them happy we must initialize
1894 // the default Scheduler implementation (even if we do not
1895 // run it)
1896 SchedImpl->initialize(this);
1897 initQueues(TopRoots, BotRoots);
1898
1899 // Fill some stats to help scheduling.
1900
1901 SUnitsLinksBackup = SUnits;
1902 IsLowLatencySU.clear();
1903 LowLatencyOffset.clear();
1904 IsHighLatencySU.clear();
1905
1906 IsLowLatencySU.resize(SUnits.size(), 0);
1907 LowLatencyOffset.resize(SUnits.size(), 0);
1908 IsHighLatencySU.resize(SUnits.size(), 0);
1909
1910 for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {
1911 SUnit *SU = &SUnits[i];
1912 const MachineOperand *BaseLatOp;
1913 int64_t OffLatReg;
1914 if (SITII->isLowLatencyInstruction(*SU->getInstr())) {
1915 IsLowLatencySU[i] = 1;
1916 bool OffsetIsScalable;
1917 if (SITII->getMemOperandWithOffset(*SU->getInstr(), BaseLatOp, OffLatReg,
1918 OffsetIsScalable, TRI))
1919 LowLatencyOffset[i] = OffLatReg;
1920 } else if (SITII->isHighLatencyDef(SU->getInstr()->getOpcode()))
1921 IsHighLatencySU[i] = 1;
1922 }
1923
1924 SIScheduler Scheduler(this);
1927
1928 // if VGPR usage is extremely high, try other good performing variants
1929 // which could lead to lower VGPR usage
1930 if (Best.MaxVGPRUsage > 180) {
1931 static const std::pair<SISchedulerBlockCreatorVariant,
1933 Variants[] = {
1935// { LatenciesAlone, BlockRegUsage },
1937// { LatenciesGrouped, BlockRegUsageLatency },
1938// { LatenciesGrouped, BlockRegUsage },
1940// { LatenciesAlonePlusConsecutive, BlockRegUsageLatency },
1941// { LatenciesAlonePlusConsecutive, BlockRegUsage }
1942 };
1943 for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {
1944 Temp = Scheduler.scheduleVariant(v.first, v.second);
1945 if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)
1946 Best = Temp;
1947 }
1948 }
1949 // if VGPR usage is still extremely high, we may spill. Try other variants
1950 // which are less performing, but that could lead to lower VGPR usage.
1951 if (Best.MaxVGPRUsage > 200) {
1952 static const std::pair<SISchedulerBlockCreatorVariant,
1954 Variants[] = {
1955// { LatenciesAlone, BlockRegUsageLatency },
1957// { LatenciesGrouped, BlockLatencyRegUsage },
1960// { LatenciesAlonePlusConsecutive, BlockLatencyRegUsage },
1963 };
1964 for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {
1965 Temp = Scheduler.scheduleVariant(v.first, v.second);
1966 if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)
1967 Best = Temp;
1968 }
1969 }
1970
1971 ScheduledSUnits = Best.SUs;
1972 ScheduledSUnitsInv.resize(SUnits.size());
1973
1974 for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {
1975 ScheduledSUnitsInv[ScheduledSUnits[i]] = i;
1976 }
1977
1978 moveLowLatencies();
1979
1980 // Tell the outside world about the result of the scheduling.
1981
1982 assert(TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker");
1984
1985 for (unsigned I : ScheduledSUnits) {
1986 SUnit *SU = &SUnits[I];
1987
1988 scheduleMI(SU, true);
1989
1990 LLVM_DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "
1991 << *SU->getInstr());
1992 }
1993
1994 assert(CurrentTop == CurrentBottom && "Nonempty unscheduled zone.");
1995
1997
1998 LLVM_DEBUG({
1999 dbgs() << "*** Final schedule for "
2000 << printMBBReference(*begin()->getParent()) << " ***\n";
2001 dumpSchedule();
2002 dbgs() << '\n';
2003 });
2004}
unsigned const MachineRegisterInfo * MRI
Provides AMDGPU specific target descriptions.
static const Function * getParent(const Value *V)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
#define LLVM_DEBUG(X)
Definition: Debug.h:101
bool End
Definition: ELF_riscv.cpp:480
DenseMap< Block *, BlockRelaxAux > Blocks
Definition: ELF_riscv.cpp:507
IRTranslator LLVM IR MI
#define I(x, y, z)
Definition: MD5.cpp:58
Machine Instruction Scheduler
static MachineBasicBlock::const_iterator nextIfDebug(MachineBasicBlock::const_iterator I, MachineBasicBlock::const_iterator End)
If this iterator is a debug value, increment until reaching the End or a non-debug instruction.
unsigned Reg
#define P(N)
Interface definition for SIInstrInfo.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
static bool isDefBetween(unsigned Reg, SlotIndex First, SlotIndex Last, const MachineRegisterInfo *MRI, const LiveIntervals *LIS)
static const char * getReasonStr(SIScheduleCandReason Reason)
static bool hasDataDependencyPred(const SUnit &SU, const SUnit &FromSU)
SI Machine Scheduler interface.
GenericScheduler shrinks the unscheduled zone using heuristics to balance the schedule.
void handleMove(MachineInstr &MI, bool UpdateFlags=false)
Call this method to notify LiveIntervals that instruction MI has been moved within a basic block.
SlotIndex getInstructionIndex(const MachineInstr &Instr) const
Returns the base index of the given instruction.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
Representation of each machine instruction.
Definition: MachineInstr.h:69
unsigned getOpcode() const
Returns the opcode of this MachineInstr.
Definition: MachineInstr.h:569
MachineOperand class - Representation of each machine instruction operand.
defusechain_iterator - This class provides iterator support for machine operands in the function that...
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
PSetIterator getPressureSets(Register RegUnit) const
Get an iterator over the pressure sets affected by the given physical or virtual register.
Iterate over the pressure sets affected by the given physical or virtual register.
unsigned getWeight() const
Track the current register pressure at some position in the instruction stream, and remember the high...
void setPos(MachineBasicBlock::const_iterator Pos)
MachineBasicBlock::const_iterator getPos() const
Get the MI position corresponding to this register pressure.
void closeTop()
Set the boundary for the top of the region and summarize live ins.
void advance()
Advance across the current instruction.
void getDownwardPressure(const MachineInstr *MI, std::vector< unsigned > &PressureResult, std::vector< unsigned > &MaxPressureResult)
Get the pressure of each PSet after traversing this instruction top-down.
void addLiveRegs(ArrayRef< RegisterMaskPair > Regs)
Force liveness of virtual registers or physical register units.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
Scheduling dependency.
Definition: ScheduleDAG.h:49
SUnit * getSUnit() const
Definition: ScheduleDAG.h:498
@ Data
Regular data dependence (aka true-dependence).
Definition: ScheduleDAG.h:53
bool isWeak() const
Tests if this a weak dependence.
Definition: ScheduleDAG.h:194
bool isCtrl() const
Shorthand for getKind() != SDep::Data.
Definition: ScheduleDAG.h:161
static bool isEXP(const MachineInstr &MI)
Definition: SIInstrInfo.h:649
bool isHighLatencyDef(int Opc) const override
bool isLowLatencyInstruction(const MachineInstr &MI) const
bool isSUInBlock(SUnit *SU, unsigned ID)
SIScheduleBlockCreator(SIScheduleDAGMI *DAG)
SIScheduleBlocks getBlocks(SISchedulerBlockCreatorVariant BlockVariant)
SIScheduleBlockScheduler(SIScheduleDAGMI *DAG, SISchedulerBlockSchedulerVariant Variant, SIScheduleBlocks BlocksStruct)
ArrayRef< std::pair< SIScheduleBlock *, SIScheduleBlockLinkKind > > getSuccs() const
void addPred(SIScheduleBlock *Pred)
void addSucc(SIScheduleBlock *Succ, SIScheduleBlockLinkKind Kind)
void schedule(MachineBasicBlock::iterator BeginBlock, MachineBasicBlock::iterator EndBlock)
void addUnit(SUnit *SU)
Functions for Block construction.
std::vector< int > BottomUpIndex2SU
std::vector< unsigned > IsHighLatencySU
std::vector< unsigned > LowLatencyOffset
std::vector< int > TopDownIndex2SU
void schedule() override
Implement ScheduleDAGInstrs interface for scheduling a sequence of reorderable instructions.
LiveIntervals * getLIS()
ScheduleDAGTopologicalSort * GetTopo()
void fillVgprSgprCost(_Iterator First, _Iterator End, unsigned &VgprUsage, unsigned &SgprUsage)
MachineRegisterInfo * getMRI()
SIScheduleDAGMI(MachineSchedContext *C)
MachineBasicBlock::iterator getCurrentBottom()
std::vector< unsigned > IsLowLatencySU
MachineBasicBlock::iterator getCurrentTop()
std::set< unsigned > getInRegs()
MachineBasicBlock * getBB()
void initRPTracker(RegPressureTracker &RPTracker)
std::set< unsigned > getOutRegs()
~SIScheduleDAGMI() override
const TargetRegisterInfo * getTRI()
Scheduling unit. This is a node in the scheduling DAG.
Definition: ScheduleDAG.h:242
bool isInstr() const
Returns true if this SUnit refers to a machine instruction as opposed to an SDNode.
Definition: ScheduleDAG.h:378
unsigned NodeNum
Entry # of node in the node vector.
Definition: ScheduleDAG.h:270
bool isScheduled
True once scheduled.
Definition: ScheduleDAG.h:296
unsigned NumPredsLeft
Definition: ScheduleDAG.h:274
SmallVector< SDep, 4 > Succs
All sunit successors.
Definition: ScheduleDAG.h:263
unsigned WeakPredsLeft
Definition: ScheduleDAG.h:276
SmallVector< SDep, 4 > Preds
All sunit predecessors.
Definition: ScheduleDAG.h:262
MachineInstr * getInstr() const
Returns the representative MachineInstr for this SUnit.
Definition: ScheduleDAG.h:390
ScheduleDAGTopologicalSort Topo
Topo - A topological ordering for SUnits which permits fast IsReachable and similar queries.
void dumpNode(const SUnit &SU) const override
MachineBasicBlock::iterator begin() const
Returns an iterator to the top of the current scheduling region.
MachineBasicBlock::iterator RegionBegin
The beginning of the range to be scheduled.
ScheduleDAGMILive is an implementation of ScheduleDAGInstrs that schedules machine instructions while...
void scheduleMI(SUnit *SU, bool IsTopNode)
Move an instruction and update register pressure.
void initQueues(ArrayRef< SUnit * > TopRoots, ArrayRef< SUnit * > BotRoots)
Release ExitSU predecessors and setup scheduler queues.
void buildDAGWithRegPressure()
Call ScheduleDAGInstrs::buildSchedGraph with register pressure tracking enabled.
void dump() const override
RegPressureTracker TopRPTracker
void dumpSchedule() const
dump the scheduled Sequence.
std::unique_ptr< MachineSchedStrategy > SchedImpl
void postProcessDAG()
Apply each ScheduleDAGMutation step in order.
void findRootsAndBiasEdges(SmallVectorImpl< SUnit * > &TopRoots, SmallVectorImpl< SUnit * > &BotRoots)
MachineBasicBlock::iterator CurrentBottom
The bottom of the unscheduled zone.
void viewGraph() override
Out-of-line implementation with no arguments is handy for gdb.
void placeDebugValues()
Reinsert debug_values recorded in ScheduleDAGInstrs::DbgValues.
MachineBasicBlock::iterator CurrentTop
The top of the unscheduled zone.
std::vector< int > GetSubGraph(const SUnit &StartSU, const SUnit &TargetSU, bool &Success)
Returns an array of SUs that are both in the successor subtree of StartSU and in the predecessor subt...
void InitDAGTopologicalSorting()
Creates the initial topological ordering from the DAG to be scheduled.
MachineRegisterInfo & MRI
Virtual/real register map.
Definition: ScheduleDAG.h:578
const TargetInstrInfo * TII
Target instruction information.
Definition: ScheduleDAG.h:575
std::vector< SUnit > SUnits
The scheduling units.
Definition: ScheduleDAG.h:579
const TargetRegisterInfo * TRI
Target processor register info.
Definition: ScheduleDAG.h:576
SlotIndex - An opaque wrapper around machine indexes.
Definition: SlotIndexes.h:65
SlotIndex getRegSlot(bool EC=false) const
Returns the register use/def slot in the current instruction for a normal or early-clobber def.
Definition: SlotIndexes.h:237
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
virtual unsigned getNumRegPressureSets() const =0
Get the number of dimensions of register pressure.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
static bool tryGreater(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
static bool tryLess(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
Reg
All possible values of the reg field in the ModR/M byte.
constexpr double e
Definition: MathExtras.h:47
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
auto find(R &&Range, const T &Val)
Provide wrappers to std::find which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1742
cl::opt< bool > PrintDAGs
void append_range(Container &C, Range &&R)
Wrapper function to append range R to container C.
Definition: STLExtras.h:2067
SISchedulerBlockSchedulerVariant
@ BlockLatencyRegUsage
@ BlockRegUsageLatency
cl::opt< bool > ViewMISchedDAGs
Printable printVRegOrUnit(unsigned VRegOrUnit, const TargetRegisterInfo *TRI)
Create Printable object to print virtual registers and physical registers on a raw_ostream.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
bool none_of(R &&Range, UnaryPredicate P)
Provide wrappers to std::none_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1736
SISchedulerBlockCreatorVariant
@ LatenciesAlonePlusConsecutive
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
auto max_element(R &&Range)
Definition: STLExtras.h:1986
SIScheduleBlockLinkKind
Printable printMBBReference(const MachineBasicBlock &MBB)
Prints a machine basic block reference.
Implement std::hash so that hash_code can be used in STL containers.
Definition: BitVector.h:858
RegisterPressure computed within a region of instructions delimited by TopIdx and BottomIdx.
MachineSchedContext provides enough context from the MachineScheduler pass for the target to instanti...
std::vector< unsigned > MaxSetPressure
Map of max reg pressure indexed by pressure set ID, not class ID.
std::vector< unsigned > SUs
std::vector< int > TopDownIndex2Block
std::vector< SIScheduleBlock * > Blocks
std::vector< int > TopDownBlock2Index
SIScheduleCandReason Reason
void setRepeat(SIScheduleCandReason R)