LLVM 20.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "GCNSubtarget.h"
78#include "SIRegisterInfo.h"
84#include "llvm/IR/IntrinsicsAMDGPU.h"
85
86#define GET_TARGET_REGBANK_IMPL
87#include "AMDGPUGenRegisterBank.inc"
88
89// This file will be TableGen'ed at some point.
90#include "AMDGPUGenRegisterBankInfo.def"
91
92using namespace llvm;
93using namespace MIPatternMatch;
94
95namespace {
96
97// Observer to apply a register bank to new registers created by LegalizerHelper.
98class ApplyRegBankMapping final : public GISelChangeObserver {
99private:
101 const AMDGPURegisterBankInfo &RBI;
103 const RegisterBank *NewBank;
105
106public:
107 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
108 MachineRegisterInfo &MRI_, const RegisterBank *RB)
109 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
110 assert(!B.isObservingChanges());
111 B.setChangeObserver(*this);
112 }
113
114 ~ApplyRegBankMapping() override {
115 for (MachineInstr *MI : NewInsts)
116 applyBank(*MI);
117
118 B.stopObservingChanges();
119 }
120
121 /// Set any registers that don't have a set register class or bank to SALU.
122 void applyBank(MachineInstr &MI) {
123 const unsigned Opc = MI.getOpcode();
124 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
125 Opc == AMDGPU::G_SEXT) {
126 // LegalizerHelper wants to use the basic legalization artifacts when
127 // widening etc. We don't handle selection with vcc in artifact sources,
128 // so we need to use a select instead to handle these properly.
129 Register DstReg = MI.getOperand(0).getReg();
130 Register SrcReg = MI.getOperand(1).getReg();
131 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
132 if (SrcBank == &AMDGPU::VCCRegBank) {
133 const LLT S32 = LLT::scalar(32);
134 assert(MRI.getType(SrcReg) == LLT::scalar(1));
135 assert(MRI.getType(DstReg) == S32);
136 assert(NewBank == &AMDGPU::VGPRRegBank);
137
138 // Replace the extension with a select, which really uses the boolean
139 // source.
140 B.setInsertPt(*MI.getParent(), MI);
141
142 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
143 auto False = B.buildConstant(S32, 0);
144 B.buildSelect(DstReg, SrcReg, True, False);
145 MRI.setRegBank(True.getReg(0), *NewBank);
146 MRI.setRegBank(False.getReg(0), *NewBank);
147 MI.eraseFromParent();
148 }
149
150 assert(!MRI.getRegClassOrRegBank(DstReg));
151 MRI.setRegBank(DstReg, *NewBank);
152 return;
153 }
154
155#ifndef NDEBUG
156 if (Opc == AMDGPU::G_TRUNC) {
157 Register DstReg = MI.getOperand(0).getReg();
158 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
159 assert(DstBank != &AMDGPU::VCCRegBank);
160 }
161#endif
162
163 for (MachineOperand &Op : MI.operands()) {
164 if (!Op.isReg())
165 continue;
166
167 // We may see physical registers if building a real MI
168 Register Reg = Op.getReg();
169 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
170 continue;
171
172 const RegisterBank *RB = NewBank;
173 if (MRI.getType(Reg) == LLT::scalar(1)) {
174 assert(NewBank == &AMDGPU::VGPRRegBank &&
175 "s1 operands should only be used for vector bools");
176 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
177 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
178 "not expecting legalization artifacts here");
179 RB = &AMDGPU::VCCRegBank;
180 }
181
182 MRI.setRegBank(Reg, *RB);
183 }
184 }
185
186 void erasingInstr(MachineInstr &MI) override {}
187
188 void createdInstr(MachineInstr &MI) override {
189 // At this point, the instruction was just inserted and has no operands.
190 NewInsts.push_back(&MI);
191 }
192
193 void changingInstr(MachineInstr &MI) override {}
194 void changedInstr(MachineInstr &MI) override {
195 // FIXME: In principle we should probably add the instruction to NewInsts,
196 // but the way the LegalizerHelper uses the observer, we will always see the
197 // registers we need to set the regbank on also referenced in a new
198 // instruction.
199 }
200};
201
202} // anonymous namespace
203
205 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
206 TII(Subtarget.getInstrInfo()) {
207
208 // HACK: Until this is fully tablegen'd.
209 static llvm::once_flag InitializeRegisterBankFlag;
210
211 static auto InitializeRegisterBankOnce = [this]() {
212 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
213 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
214 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
215 (void)this;
216 };
217
218 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
219}
220
221static bool isVectorRegisterBank(const RegisterBank &Bank) {
222 unsigned BankID = Bank.getID();
223 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
224}
225
227 return RB != &AMDGPU::SGPRRegBank;
228}
229
231 const RegisterBank &Src,
232 TypeSize Size) const {
233 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
234 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
235 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
236 return std::numeric_limits<unsigned>::max();
237 }
238
239 // Bool values are tricky, because the meaning is based on context. The SCC
240 // and VCC banks are for the natural scalar and vector conditions produced by
241 // a compare.
242 //
243 // Legalization doesn't know about the necessary context, so an s1 use may
244 // have been a truncate from an arbitrary value, in which case a copy (lowered
245 // as a compare with 0) needs to be inserted.
246 if (Size == 1 &&
247 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
248 (isVectorRegisterBank(Src) ||
249 Src.getID() == AMDGPU::SGPRRegBankID ||
250 Src.getID() == AMDGPU::VCCRegBankID))
251 return std::numeric_limits<unsigned>::max();
252
253 // There is no direct copy between AGPRs.
254 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
255 Src.getID() == AMDGPU::AGPRRegBankID)
256 return 4;
257
258 return RegisterBankInfo::copyCost(Dst, Src, Size);
259}
260
262 const ValueMapping &ValMapping,
263 const RegisterBank *CurBank) const {
264 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
265 // VGPR.
266 // FIXME: Is there a better way to do this?
267 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
268 return 10; // This is expensive.
269
270 assert(ValMapping.NumBreakDowns == 2 &&
271 ValMapping.BreakDown[0].Length == 32 &&
272 ValMapping.BreakDown[0].StartIdx == 0 &&
273 ValMapping.BreakDown[1].Length == 32 &&
274 ValMapping.BreakDown[1].StartIdx == 32 &&
275 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
276
277 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
278 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
279 // want.
280
281 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
282 // alignment restrictions, but this probably isn't important.
283 return 1;
284}
285
286const RegisterBank &
288 LLT Ty) const {
289 if (&RC == &AMDGPU::SReg_1RegClass)
290 return AMDGPU::VCCRegBank;
291
292 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
293 // VCC-like use.
294 if (TRI->isSGPRClass(&RC)) {
295 // FIXME: This probably came from a copy from a physical register, which
296 // should be inferable from the copied to-type. We don't have many boolean
297 // physical register constraints so just assume a normal SGPR for now.
298 if (!Ty.isValid())
299 return AMDGPU::SGPRRegBank;
300
301 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
302 }
303
304 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
305}
306
307template <unsigned NumOps>
310 const MachineInstr &MI, const MachineRegisterInfo &MRI,
311 const std::array<unsigned, NumOps> RegSrcOpIdx,
312 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
313
314 InstructionMappings AltMappings;
315
317
318 unsigned Sizes[NumOps];
319 for (unsigned I = 0; I < NumOps; ++I) {
320 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
321 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
322 }
323
324 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
325 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
326 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
327 }
328
329 // getInstrMapping's default mapping uses ID 1, so start at 2.
330 unsigned MappingID = 2;
331 for (const auto &Entry : Table) {
332 for (unsigned I = 0; I < NumOps; ++I) {
333 int OpIdx = RegSrcOpIdx[I];
334 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
335 }
336
337 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
339 Operands.size()));
340 }
341
342 return AltMappings;
343}
344
347 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
348 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
349 case Intrinsic::amdgcn_readlane: {
350 static const OpRegBankEntry<3> Table[2] = {
351 // Perfectly legal.
352 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
353
354 // Need a readfirstlane for the index.
355 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
356 };
357
358 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
359 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
360 }
361 case Intrinsic::amdgcn_writelane: {
362 static const OpRegBankEntry<4> Table[4] = {
363 // Perfectly legal.
364 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
365
366 // Need readfirstlane of first op
367 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
368
369 // Need readfirstlane of second op
370 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
371
372 // Need readfirstlane of both ops
373 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
374 };
375
376 // rsrc, voffset, offset
377 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
378 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
379 }
380 default:
382 }
383}
384
387 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
388
389 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
390 case Intrinsic::amdgcn_s_buffer_load: {
391 static const OpRegBankEntry<2> Table[4] = {
392 // Perfectly legal.
393 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
394
395 // Only need 1 register in loop
396 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
397
398 // Have to waterfall the resource.
399 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
400
401 // Have to waterfall the resource, and the offset.
402 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
403 };
404
405 // rsrc, offset
406 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
407 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
408 }
409 case Intrinsic::amdgcn_ds_ordered_add:
410 case Intrinsic::amdgcn_ds_ordered_swap: {
411 // VGPR = M0, VGPR
412 static const OpRegBankEntry<3> Table[2] = {
413 // Perfectly legal.
414 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
415
416 // Need a readfirstlane for m0
417 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
418 };
419
420 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
421 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
422 }
423 case Intrinsic::amdgcn_s_sendmsg:
424 case Intrinsic::amdgcn_s_sendmsghalt: {
425 // FIXME: Should have no register for immediate
426 static const OpRegBankEntry<1> Table[2] = {
427 // Perfectly legal.
428 { { AMDGPU::SGPRRegBankID }, 1 },
429
430 // Need readlane
431 { { AMDGPU::VGPRRegBankID }, 3 }
432 };
433
434 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
435 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
436 }
437 default:
439 }
440}
441
442// FIXME: Returns uniform if there's no source value information. This is
443// probably wrong.
445 if (!MI.hasOneMemOperand())
446 return false;
447
448 const MachineMemOperand *MMO = *MI.memoperands_begin();
449 const unsigned AS = MMO->getAddrSpace();
450 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
452 const unsigned MemSize = 8 * MMO->getSize().getValue();
453
454 // Require 4-byte alignment.
455 return (MMO->getAlign() >= Align(4) ||
457 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
458 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
459 // Can't do a scalar atomic load.
460 !MMO->isAtomic() &&
461 // Don't use scalar loads for volatile accesses to non-constant address
462 // spaces.
463 (IsConst || !MMO->isVolatile()) &&
464 // Memory must be known constant, or not written before this load.
465 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
467}
468
471 const MachineInstr &MI) const {
472
473 const MachineFunction &MF = *MI.getParent()->getParent();
474 const MachineRegisterInfo &MRI = MF.getRegInfo();
475
476
477 InstructionMappings AltMappings;
478 switch (MI.getOpcode()) {
479 case TargetOpcode::G_CONSTANT:
480 case TargetOpcode::G_IMPLICIT_DEF: {
481 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
482 if (Size == 1) {
483 static const OpRegBankEntry<1> Table[3] = {
484 { { AMDGPU::VGPRRegBankID }, 1 },
485 { { AMDGPU::SGPRRegBankID }, 1 },
486 { { AMDGPU::VCCRegBankID }, 1 }
487 };
488
489 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
490 }
491
492 [[fallthrough]];
493 }
494 case TargetOpcode::G_FCONSTANT:
495 case TargetOpcode::G_FRAME_INDEX:
496 case TargetOpcode::G_GLOBAL_VALUE: {
497 static const OpRegBankEntry<1> Table[2] = {
498 { { AMDGPU::VGPRRegBankID }, 1 },
499 { { AMDGPU::SGPRRegBankID }, 1 }
500 };
501
502 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
503 }
504 case TargetOpcode::G_AND:
505 case TargetOpcode::G_OR:
506 case TargetOpcode::G_XOR: {
507 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
508
509 if (Size == 1) {
510 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
511 const InstructionMapping &SCCMapping = getInstructionMapping(
512 1, 1, getOperandsMapping(
513 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
514 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
515 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
516 3); // Num Operands
517 AltMappings.push_back(&SCCMapping);
518
519 const InstructionMapping &VCCMapping0 = getInstructionMapping(
520 2, 1, getOperandsMapping(
521 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
522 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
523 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
524 3); // Num Operands
525 AltMappings.push_back(&VCCMapping0);
526 return AltMappings;
527 }
528
529 if (Size != 64)
530 break;
531
532 const InstructionMapping &SSMapping = getInstructionMapping(
533 1, 1, getOperandsMapping(
534 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
535 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
536 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
537 3); // Num Operands
538 AltMappings.push_back(&SSMapping);
539
540 const InstructionMapping &VVMapping = getInstructionMapping(
541 2, 2, getOperandsMapping(
542 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
543 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
544 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
545 3); // Num Operands
546 AltMappings.push_back(&VVMapping);
547 break;
548 }
549 case TargetOpcode::G_LOAD:
550 case TargetOpcode::G_ZEXTLOAD:
551 case TargetOpcode::G_SEXTLOAD: {
552 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
553 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
554 unsigned PtrSize = PtrTy.getSizeInBits();
555 unsigned AS = PtrTy.getAddressSpace();
556
560 const InstructionMapping &SSMapping = getInstructionMapping(
561 1, 1, getOperandsMapping(
562 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
563 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
564 2); // Num Operands
565 AltMappings.push_back(&SSMapping);
566 }
567
568 const InstructionMapping &VVMapping = getInstructionMapping(
569 2, 1,
571 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
572 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
573 2); // Num Operands
574 AltMappings.push_back(&VVMapping);
575
576 // It may be possible to have a vgpr = load sgpr mapping here, because
577 // the mubuf instructions support this kind of load, but probably for only
578 // gfx7 and older. However, the addressing mode matching in the instruction
579 // selector should be able to do a better job of detecting and selecting
580 // these kinds of loads from the vgpr = load vgpr mapping.
581
582 return AltMappings;
583
584 }
585 case TargetOpcode::G_SELECT: {
586 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
587 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
588 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
589 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
590 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
591 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
592 4); // Num Operands
593 AltMappings.push_back(&SSMapping);
594
595 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
596 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
597 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
598 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
599 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
600 4); // Num Operands
601 AltMappings.push_back(&VVMapping);
602
603 return AltMappings;
604 }
605 case TargetOpcode::G_UADDE:
606 case TargetOpcode::G_USUBE:
607 case TargetOpcode::G_SADDE:
608 case TargetOpcode::G_SSUBE: {
609 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
610 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
612 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
613 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
615 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
616 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
617 5); // Num Operands
618 AltMappings.push_back(&SSMapping);
619
620 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
621 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
622 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
623 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
624 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
625 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
626 5); // Num Operands
627 AltMappings.push_back(&VVMapping);
628 return AltMappings;
629 }
630 case AMDGPU::G_BRCOND: {
631 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
632
633 // TODO: Change type to 32 for scalar
635 1, 1, getOperandsMapping(
636 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
637 2); // Num Operands
638 AltMappings.push_back(&SMapping);
639
641 1, 1, getOperandsMapping(
642 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
643 2); // Num Operands
644 AltMappings.push_back(&VMapping);
645 return AltMappings;
646 }
647 case AMDGPU::G_INTRINSIC:
648 case AMDGPU::G_INTRINSIC_CONVERGENT:
650 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
651 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
653 default:
654 break;
655 }
657}
658
662 LLT HalfTy,
663 Register Reg) const {
664 assert(HalfTy.getSizeInBits() == 32);
665 MachineRegisterInfo *MRI = B.getMRI();
666 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
667 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
668 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
669 MRI->setRegBank(LoLHS, *Bank);
670 MRI->setRegBank(HiLHS, *Bank);
671
672 Regs.push_back(LoLHS);
673 Regs.push_back(HiLHS);
674
675 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
676 .addDef(LoLHS)
677 .addDef(HiLHS)
678 .addUse(Reg);
679}
680
681/// Replace the current type each register in \p Regs has with \p NewTy
683 LLT NewTy) {
684 for (Register Reg : Regs) {
685 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
686 MRI.setType(Reg, NewTy);
687 }
688}
689
691 if (Ty.isVector()) {
694 Ty.getElementType());
695 }
696
697 assert(Ty.getScalarSizeInBits() % 2 == 0);
698 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
699}
700
701// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
702// source value into a scalar register.
705 Register Src) const {
706 LLT Ty = MRI.getType(Src);
707 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
708
709 if (Bank == &AMDGPU::SGPRRegBank)
710 return Src;
711
712 unsigned Bits = Ty.getSizeInBits();
713 assert(Bits % 32 == 0);
714
715 if (Bank != &AMDGPU::VGPRRegBank) {
716 // We need to copy from AGPR to VGPR
717 Src = B.buildCopy(Ty, Src).getReg(0);
718 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
719 }
720
721 LLT S32 = LLT::scalar(32);
722 unsigned NumParts = Bits / 32;
725
726 if (Bits == 32) {
727 SrcParts.push_back(Src);
728 } else {
729 auto Unmerge = B.buildUnmerge(S32, Src);
730 for (unsigned i = 0; i < NumParts; ++i)
731 SrcParts.push_back(Unmerge.getReg(i));
732 }
733
734 for (unsigned i = 0; i < NumParts; ++i) {
735 Register SrcPart = SrcParts[i];
736 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
737 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
738
739 const TargetRegisterClass *Constrained =
740 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
741 (void)Constrained;
742 assert(Constrained && "Failed to constrain readfirstlane src reg");
743
744 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
745
746 DstParts.push_back(DstPart);
747 }
748
749 if (Bits == 32)
750 return DstParts[0];
751
752 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
753 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
754 return Dst;
755}
756
757/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
758/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
759/// execute the instruction for each unique combination of values in all lanes
760/// in the wave. The block will be split such that rest of the instructions are
761/// moved to a new block.
762///
763/// Essentially performs this loop:
764//
765/// Save Execution Mask
766/// For (Lane : Wavefront) {
767/// Enable Lane, Disable all other lanes
768/// SGPR = read SGPR value for current lane from VGPR
769/// VGPRResult[Lane] = use_op SGPR
770/// }
771/// Restore Execution Mask
772///
773/// There is additional complexity to try for compare values to identify the
774/// unique values used.
777 SmallSet<Register, 4> &SGPROperandRegs) const {
778 // Track use registers which have already been expanded with a readfirstlane
779 // sequence. This may have multiple uses if moving a sequence.
780 DenseMap<Register, Register> WaterfalledRegMap;
781
782 MachineBasicBlock &MBB = B.getMBB();
783 MachineFunction *MF = &B.getMF();
784
786 const unsigned MovExecOpc =
787 Subtarget.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
788 const unsigned MovExecTermOpc =
789 Subtarget.isWave32() ? AMDGPU::S_MOV_B32_term : AMDGPU::S_MOV_B64_term;
790
791 const unsigned XorTermOpc = Subtarget.isWave32() ?
792 AMDGPU::S_XOR_B32_term : AMDGPU::S_XOR_B64_term;
793 const unsigned AndSaveExecOpc = Subtarget.isWave32() ?
794 AMDGPU::S_AND_SAVEEXEC_B32 : AMDGPU::S_AND_SAVEEXEC_B64;
795 const unsigned ExecReg = Subtarget.isWave32() ?
796 AMDGPU::EXEC_LO : AMDGPU::EXEC;
797
798#ifndef NDEBUG
799 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
800#endif
801
802 MachineRegisterInfo &MRI = *B.getMRI();
803 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
804 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
805
806 // Don't bother using generic instructions/registers for the exec mask.
807 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
808 .addDef(InitSaveExecReg);
809
810 Register PhiExec = MRI.createVirtualRegister(WaveRC);
811 Register NewExec = MRI.createVirtualRegister(WaveRC);
812
813 // To insert the loop we need to split the block. Move everything before this
814 // point to a new block, and insert a new empty block before this instruction.
817 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
818 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
820 ++MBBI;
821 MF->insert(MBBI, LoopBB);
822 MF->insert(MBBI, BodyBB);
823 MF->insert(MBBI, RestoreExecBB);
824 MF->insert(MBBI, RemainderBB);
825
826 LoopBB->addSuccessor(BodyBB);
827 BodyBB->addSuccessor(RestoreExecBB);
828 BodyBB->addSuccessor(LoopBB);
829
830 // Move the rest of the block into a new block.
832 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
833
834 MBB.addSuccessor(LoopBB);
835 RestoreExecBB->addSuccessor(RemainderBB);
836
837 B.setInsertPt(*LoopBB, LoopBB->end());
838
839 B.buildInstr(TargetOpcode::PHI)
840 .addDef(PhiExec)
841 .addReg(InitSaveExecReg)
842 .addMBB(&MBB)
843 .addReg(NewExec)
844 .addMBB(BodyBB);
845
846 const DebugLoc &DL = B.getDL();
847
848 MachineInstr &FirstInst = *Range.begin();
849
850 // Move the instruction into the loop body. Note we moved everything after
851 // Range.end() already into a new block, so Range.end() is no longer valid.
852 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
853
854 // Figure out the iterator range after splicing the instructions.
855 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
856 auto NewEnd = BodyBB->end();
857
858 B.setMBB(*LoopBB);
859
860 LLT S1 = LLT::scalar(1);
861 Register CondReg;
862
863 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
864
865 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
866 for (MachineOperand &Op : MI.all_uses()) {
867 Register OldReg = Op.getReg();
868 if (!SGPROperandRegs.count(OldReg))
869 continue;
870
871 // See if we already processed this register in another instruction in the
872 // sequence.
873 auto OldVal = WaterfalledRegMap.find(OldReg);
874 if (OldVal != WaterfalledRegMap.end()) {
875 Op.setReg(OldVal->second);
876 continue;
877 }
878
879 Register OpReg = Op.getReg();
880 LLT OpTy = MRI.getType(OpReg);
881
882 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
883 if (OpBank != &AMDGPU::VGPRRegBank) {
884 // Insert copy from AGPR to VGPR before the loop.
885 B.setMBB(MBB);
886 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
887 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
888 B.setMBB(*LoopBB);
889 }
890
891 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
892
893 // Build the comparison(s).
894 unsigned OpSize = OpTy.getSizeInBits();
895 bool Is64 = OpSize % 64 == 0;
896 unsigned PartSize = Is64 ? 64 : 32;
897 LLT PartTy = LLT::scalar(PartSize);
898 unsigned NumParts = OpSize / PartSize;
900 SmallVector<Register, 8> CurrentLaneParts;
901
902 if (NumParts == 1) {
903 OpParts.push_back(OpReg);
904 CurrentLaneParts.push_back(CurrentLaneReg);
905 } else {
906 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
907 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
908 for (unsigned i = 0; i < NumParts; ++i) {
909 OpParts.push_back(UnmergeOp.getReg(i));
910 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
911 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
912 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
913 }
914 }
915
916 for (unsigned i = 0; i < NumParts; ++i) {
917 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
918 OpParts[i]).getReg(0);
919 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
920
921 if (!CondReg) {
922 CondReg = CmpReg;
923 } else {
924 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
925 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
926 }
927 }
928
929 Op.setReg(CurrentLaneReg);
930
931 // Make sure we don't re-process this register again.
932 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
933 }
934 }
935
936 // The ballot becomes a no-op during instruction selection.
937 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
938 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
939 .addReg(CondReg)
940 .getReg(0);
941 MRI.setRegClass(CondReg, WaveRC);
942
943 // Update EXEC, save the original EXEC value to VCC.
944 B.buildInstr(AndSaveExecOpc)
945 .addDef(NewExec)
946 .addReg(CondReg, RegState::Kill);
947
948 MRI.setSimpleHint(NewExec, CondReg);
949
950 B.setInsertPt(*BodyBB, BodyBB->end());
951
952 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
953 B.buildInstr(XorTermOpc)
954 .addDef(ExecReg)
955 .addReg(ExecReg)
956 .addReg(NewExec);
957
958 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
959 // s_cbranch_scc0?
960
961 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
962 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
963
964 // Save the EXEC mask before the loop.
965 BuildMI(MBB, MBB.end(), DL, TII->get(MovExecOpc), SaveExecReg)
966 .addReg(ExecReg);
967
968 // Restore the EXEC mask after the loop.
969 B.setMBB(*RestoreExecBB);
970 B.buildInstr(MovExecTermOpc)
971 .addDef(ExecReg)
972 .addReg(SaveExecReg);
973
974 // Set the insert point after the original instruction, so any new
975 // instructions will be in the remainder.
976 B.setInsertPt(*RemainderBB, RemainderBB->begin());
977
978 return true;
979}
980
981// Return any unique registers used by \p MI at \p OpIndices that need to be
982// handled in a waterfall loop. Returns these registers in \p
983// SGPROperandRegs. Returns true if there are any operands to handle and a
984// waterfall loop is necessary.
986 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
987 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
988 for (unsigned Op : OpIndices) {
989 assert(MI.getOperand(Op).isUse());
990 Register Reg = MI.getOperand(Op).getReg();
991 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
992 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
993 SGPROperandRegs.insert(Reg);
994 }
995
996 // No operands need to be replaced, so no need to loop.
997 return !SGPROperandRegs.empty();
998}
999
1001 MachineIRBuilder &B, MachineInstr &MI, ArrayRef<unsigned> OpIndices) const {
1002 // Use a set to avoid extra readfirstlanes in the case where multiple operands
1003 // are the same register.
1004 SmallSet<Register, 4> SGPROperandRegs;
1005
1006 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
1007 return false;
1008
1009 MachineBasicBlock::iterator I = MI.getIterator();
1010 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
1011 SGPROperandRegs);
1012}
1013
1014// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1016 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1017 Register Reg = MI.getOperand(OpIdx).getReg();
1018 MachineRegisterInfo &MRI = *B.getMRI();
1019 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1020 if (Bank == &AMDGPU::SGPRRegBank)
1021 return;
1022
1023 Reg = buildReadFirstLane(B, MRI, Reg);
1024 MI.getOperand(OpIdx).setReg(Reg);
1025}
1026
1027/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1028/// rest will be in the remainder.
1029static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1030 unsigned TotalSize = Ty.getSizeInBits();
1031 if (!Ty.isVector())
1032 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1033
1034 LLT EltTy = Ty.getElementType();
1035 unsigned EltSize = EltTy.getSizeInBits();
1036 assert(FirstSize % EltSize == 0);
1037
1038 unsigned FirstPartNumElts = FirstSize / EltSize;
1039 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1040
1041 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1042 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1043}
1044
1046 if (!Ty.isVector())
1047 return LLT::scalar(128);
1048
1049 LLT EltTy = Ty.getElementType();
1050 assert(128 % EltTy.getSizeInBits() == 0);
1051 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1052}
1053
1057 MachineInstr &MI) const {
1058 MachineRegisterInfo &MRI = *B.getMRI();
1059 Register DstReg = MI.getOperand(0).getReg();
1060 const LLT LoadTy = MRI.getType(DstReg);
1061 unsigned LoadSize = LoadTy.getSizeInBits();
1062 MachineMemOperand *MMO = *MI.memoperands_begin();
1063 const unsigned MaxNonSmrdLoadSize = 128;
1064
1065 const RegisterBank *DstBank =
1066 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1067 if (DstBank == &AMDGPU::SGPRRegBank) {
1068 // There are some special cases that we need to look at for 32 bit and 96
1069 // bit SGPR loads otherwise we have nothing to do.
1070 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1071 return false;
1072
1073 const unsigned MemSize = 8 * MMO->getSize().getValue();
1074 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1075 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1076 // scalar loads should have a load size of 32 but memory access size of less
1077 // than 32.
1078 if (LoadSize == 32 &&
1079 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1080 return false;
1081
1082 if (LoadSize == 32 &&
1083 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1084 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1087 return false;
1088
1089 Register PtrReg = MI.getOperand(1).getReg();
1090
1091 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1092
1093 if (LoadSize == 32) {
1094 // This is an extending load from a sub-dword size. Widen the memory
1095 // access size to 4 bytes and clear the extra high bits appropriately
1096 const LLT S32 = LLT::scalar(32);
1097 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1098 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1099 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1100 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1101 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1102 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1103 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1104 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1105 } else
1106 // We do not need to touch the higher bits for regular loads.
1107 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1108 } else {
1109 // 96-bit loads are only available for vector loads. We need to split this
1110 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1111 if (MMO->getAlign() < Align(16)) {
1112 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1113 LLT Part64, Part32;
1114 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1115 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1117 return false;
1118 return true;
1119 }
1120 LLT WiderTy = widen96To128(LoadTy);
1121 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1122 if (WiderTy.isScalar()) {
1123 B.buildTrunc(MI.getOperand(0), WideLoad);
1124 } else {
1125 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1126 WideLoad);
1127 }
1128 }
1129
1130 MI.eraseFromParent();
1131 return true;
1132 }
1133
1134 // 128-bit loads are supported for all instruction types.
1135 if (LoadSize <= MaxNonSmrdLoadSize)
1136 return false;
1137
1138 SmallVector<Register, 16> DefRegs(OpdMapper.getVRegs(0));
1139 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1140
1141 if (SrcRegs.empty())
1142 SrcRegs.push_back(MI.getOperand(1).getReg());
1143
1144 // RegBankSelect only emits scalar types, so we need to reset the pointer
1145 // operand to a pointer type.
1146 Register BasePtrReg = SrcRegs[0];
1147 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1148 MRI.setType(BasePtrReg, PtrTy);
1149
1150 // The following are the loads not splitted enough during legalization
1151 // because it was not clear they are smem-load or vmem-load
1154 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1155 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1156 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1157 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1158 LegalizerHelper Helper(B.getMF(), O, B);
1159 if (LoadTy.isVector()) {
1160 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1162 return false;
1163 } else {
1164 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1165 return false;
1166 }
1167 }
1168
1169 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1170 return true;
1171}
1172
1176 MachineInstr &MI) const {
1177 MachineRegisterInfo &MRI = *B.getMRI();
1178 const MachineFunction &MF = B.getMF();
1179 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1180 const auto &TFI = *ST.getFrameLowering();
1181
1182 // Guard in case the stack growth direction ever changes with scratch
1183 // instructions.
1184 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1185 "Stack grows upwards for AMDGPU");
1186
1187 Register Dst = MI.getOperand(0).getReg();
1188 Register AllocSize = MI.getOperand(1).getReg();
1189 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1190
1191 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1192
1193 // TODO: Need to emit a wave reduction to get the maximum size.
1194 if (SizeBank != &AMDGPU::SGPRRegBank)
1195 return false;
1196
1197 LLT PtrTy = MRI.getType(Dst);
1198 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1199
1201 Register SPReg = Info->getStackPtrOffsetReg();
1202 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1203
1204 auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1205 auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1206
1207 auto OldSP = B.buildCopy(PtrTy, SPReg);
1208 if (Alignment > TFI.getStackAlign()) {
1209 auto StackAlignMask = (Alignment.value() << ST.getWavefrontSizeLog2()) - 1;
1210 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1211 B.buildConstant(LLT::scalar(32), StackAlignMask));
1212 B.buildMaskLowPtrBits(Dst, Tmp1,
1213 Log2(Alignment) + ST.getWavefrontSizeLog2());
1214 } else {
1215 B.buildCopy(Dst, OldSP);
1216 }
1217 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1218 B.buildCopy(SPReg, PtrAdd);
1219 MI.eraseFromParent();
1220 return true;
1221}
1222
1226 int RsrcIdx) const {
1227 const int NumDefs = MI.getNumExplicitDefs();
1228
1229 // The reported argument index is relative to the IR intrinsic call arguments,
1230 // so we need to shift by the number of defs and the intrinsic ID.
1231 RsrcIdx += NumDefs + 1;
1232
1233 // Insert copies to VGPR arguments.
1234 applyDefaultMapping(OpdMapper);
1235
1236 // Fixup any SGPR arguments.
1237 SmallVector<unsigned, 4> SGPRIndexes;
1238 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1239 if (!MI.getOperand(I).isReg())
1240 continue;
1241
1242 // If this intrinsic has a sampler, it immediately follows rsrc.
1243 if (I == RsrcIdx || I == RsrcIdx + 1)
1244 SGPRIndexes.push_back(I);
1245 }
1246
1247 executeInWaterfallLoop(B, MI, SGPRIndexes);
1248 return true;
1249}
1250
1251// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1252// the three offsets (voffset, soffset and instoffset)
1254 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1255 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1256 const LLT S32 = LLT::scalar(32);
1257 MachineRegisterInfo *MRI = B.getMRI();
1258
1259 if (std::optional<int64_t> Imm =
1260 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1261 uint32_t SOffset, ImmOffset;
1262 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1263 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1264 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1265 InstOffsetVal = ImmOffset;
1266
1267 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1268 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1269 return SOffset + ImmOffset;
1270 }
1271 }
1272
1273 Register Base;
1274 unsigned Offset;
1275
1276 std::tie(Base, Offset) =
1277 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);
1278
1279 uint32_t SOffset, ImmOffset;
1280 if ((int)Offset > 0 &&
1281 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1282 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1283 VOffsetReg = Base;
1284 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1285 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1286 InstOffsetVal = ImmOffset;
1287 return 0; // XXX - Why is this 0?
1288 }
1289
1290 // If we have SGPR base, we can use it for soffset.
1291 if (SOffset == 0) {
1292 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1293 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1294 SOffsetReg = Base;
1295 InstOffsetVal = ImmOffset;
1296 return 0; // XXX - Why is this 0?
1297 }
1298 }
1299
1300 // Handle the variable sgpr + vgpr case.
1301 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1302 if (Add && (int)Offset >= 0) {
1303 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1304 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1305
1306 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1307 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1308
1309 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1310 VOffsetReg = Src0;
1311 SOffsetReg = Src1;
1312 return 0;
1313 }
1314
1315 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1316 VOffsetReg = Src1;
1317 SOffsetReg = Src0;
1318 return 0;
1319 }
1320 }
1321
1322 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1323 // have an SGPR offset and a VGPR resource.
1324 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1325 VOffsetReg = CombinedOffset;
1326 } else {
1327 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1328 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1329 }
1330
1331 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1332 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1333 return 0;
1334}
1335
1336static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc) {
1337 switch (Opc) {
1338 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1339 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1340 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1341 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1342 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1343 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1344 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1345 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1346 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1347 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1348 default:
1349 break;
1350 }
1351 llvm_unreachable("Unexpected s_buffer_load opcode");
1352}
1353
1355 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1356 MachineInstr &MI = OpdMapper.getMI();
1357 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1358
1359 const LLT S32 = LLT::scalar(32);
1360 Register Dst = MI.getOperand(0).getReg();
1361 LLT Ty = MRI.getType(Dst);
1362
1363 const RegisterBank *RSrcBank =
1364 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1365 const RegisterBank *OffsetBank =
1366 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1367 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1368 OffsetBank == &AMDGPU::SGPRRegBank)
1369 return true; // Legal mapping
1370
1371 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1372 // here but don't have an MMO.
1373
1374 unsigned LoadSize = Ty.getSizeInBits();
1375 int NumLoads = 1;
1376 if (LoadSize == 256 || LoadSize == 512) {
1377 NumLoads = LoadSize / 128;
1378 Ty = Ty.divide(NumLoads);
1379 }
1380
1381 // Use the alignment to ensure that the required offsets will fit into the
1382 // immediate offsets.
1383 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1384
1385 MachineFunction &MF = B.getMF();
1386
1387 Register SOffset;
1388 Register VOffset;
1389 int64_t ImmOffset = 0;
1390
1391 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1392 SOffset, ImmOffset, Alignment);
1393
1394 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1395 // can, but we need to track an MMO for that.
1396 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1397 const Align MemAlign(4); // FIXME: ABI type alignment?
1402 MemSize, MemAlign);
1403 if (MMOOffset != 0)
1404 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1405
1406 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1407 // assume that the buffer is unswizzled.
1408
1409 Register RSrc = MI.getOperand(1).getReg();
1410 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1411 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1412
1413 SmallVector<Register, 4> LoadParts(NumLoads);
1414
1415 MachineBasicBlock::iterator MII = MI.getIterator();
1416 MachineInstrSpan Span(MII, &B.getMBB());
1417
1418 for (int i = 0; i < NumLoads; ++i) {
1419 if (NumLoads == 1) {
1420 LoadParts[i] = Dst;
1421 } else {
1422 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1423 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1424 }
1425
1426 MachineMemOperand *MMO = BaseMMO;
1427 if (i != 0)
1428 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset + 16 * i, MemSize);
1429
1430 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1431 .addDef(LoadParts[i]) // vdata
1432 .addUse(RSrc) // rsrc
1433 .addUse(VIndex) // vindex
1434 .addUse(VOffset) // voffset
1435 .addUse(SOffset) // soffset
1436 .addImm(ImmOffset + 16 * i) // offset(imm)
1437 .addImm(0) // cachepolicy, swizzled buffer(imm)
1438 .addImm(0) // idxen(imm)
1439 .addMemOperand(MMO);
1440 }
1441
1442 // TODO: If only the resource is a VGPR, it may be better to execute the
1443 // scalar load in the waterfall loop if the resource is expected to frequently
1444 // be dynamically uniform.
1445 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1446 // Remove the original instruction to avoid potentially confusing the
1447 // waterfall loop logic.
1448 B.setInstr(*Span.begin());
1449 MI.eraseFromParent();
1450
1451 SmallSet<Register, 4> OpsToWaterfall;
1452
1453 OpsToWaterfall.insert(RSrc);
1454 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1455 OpsToWaterfall);
1456 }
1457
1458 if (NumLoads != 1) {
1459 if (Ty.isVector())
1460 B.buildConcatVectors(Dst, LoadParts);
1461 else
1462 B.buildMergeLikeInstr(Dst, LoadParts);
1463 }
1464
1465 // We removed the instruction earlier with a waterfall loop.
1466 if (RSrcBank == &AMDGPU::SGPRRegBank)
1467 MI.eraseFromParent();
1468
1469 return true;
1470}
1471
1473 const OperandsMapper &OpdMapper,
1474 bool Signed) const {
1475 MachineInstr &MI = OpdMapper.getMI();
1476 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1477
1478 // Insert basic copies
1479 applyDefaultMapping(OpdMapper);
1480
1481 Register DstReg = MI.getOperand(0).getReg();
1482 LLT Ty = MRI.getType(DstReg);
1483
1484 const LLT S32 = LLT::scalar(32);
1485
1486 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1487 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1488 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1489 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1490
1491 const RegisterBank *DstBank =
1492 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1493 if (DstBank == &AMDGPU::VGPRRegBank) {
1494 if (Ty == S32)
1495 return true;
1496
1497 // There is no 64-bit vgpr bitfield extract instructions so the operation
1498 // is expanded to a sequence of instructions that implement the operation.
1499 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1500
1501 const LLT S64 = LLT::scalar(64);
1502 // Shift the source operand so that extracted bits start at bit 0.
1503 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1504 : B.buildLShr(S64, SrcReg, OffsetReg);
1505 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1506
1507 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1508 // if the width is a constant.
1509 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1510 // Use the 32-bit bitfield extract instruction if the width is a constant.
1511 // Depending on the width size, use either the low or high 32-bits.
1512 auto Zero = B.buildConstant(S32, 0);
1513 auto WidthImm = ConstWidth->Value.getZExtValue();
1514 if (WidthImm <= 32) {
1515 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1516 // or clear the upper 32-bits.
1517 auto Extract =
1518 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1519 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1520 auto Extend =
1521 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1522 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1523 } else {
1524 // Use bitfield extract on upper 32-bit source, and combine with lower
1525 // 32-bit source.
1526 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1527 auto Extract =
1528 Signed
1529 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1530 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1531 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1532 }
1533 MI.eraseFromParent();
1534 return true;
1535 }
1536
1537 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1538 // operations.
1539 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1540 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1541 if (Signed)
1542 B.buildAShr(S64, SignBit, ExtShift);
1543 else
1544 B.buildLShr(S64, SignBit, ExtShift);
1545 MI.eraseFromParent();
1546 return true;
1547 }
1548
1549 // The scalar form packs the offset and width in a single operand.
1550
1551 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1552
1553 // Ensure the high bits are clear to insert the offset.
1554 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1555 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1556
1557 // Zeros out the low bits, so don't bother clamping the input value.
1558 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1559
1560 // Transformation function, pack the offset and width of a BFE into
1561 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1562 // source, bits [5:0] contain the offset and bits [22:16] the width.
1563 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1564
1565 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1566 // register class constraints.
1567 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1568 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1569
1570 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1571 if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
1572 llvm_unreachable("failed to constrain BFE");
1573
1574 MI.eraseFromParent();
1575 return true;
1576}
1577
1579 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1580 MachineInstr &MI = OpdMapper.getMI();
1581 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1582
1583 // Insert basic copies.
1584 applyDefaultMapping(OpdMapper);
1585
1586 Register Dst0 = MI.getOperand(0).getReg();
1587 Register Dst1 = MI.getOperand(1).getReg();
1588 Register Src0 = MI.getOperand(2).getReg();
1589 Register Src1 = MI.getOperand(3).getReg();
1590 Register Src2 = MI.getOperand(4).getReg();
1591
1592 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1593 return true;
1594
1595 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1596 LLT S1 = LLT::scalar(1);
1597 LLT S32 = LLT::scalar(32);
1598
1599 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1600 bool Accumulate = true;
1601
1602 if (!DstOnValu) {
1603 if (mi_match(Src2, MRI, m_ZeroInt()))
1604 Accumulate = false;
1605 }
1606
1607 // Keep the multiplication on the SALU.
1608 Register DstHi;
1609 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1610 bool MulHiInVgpr = false;
1611
1612 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1613
1614 if (Subtarget.hasSMulHi()) {
1615 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1616 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1617 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1618 } else {
1619 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1620 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1621
1622 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1623 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1624
1625 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1626 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1627 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1628
1629 if (!DstOnValu) {
1630 DstHi = buildReadFirstLane(B, MRI, DstHi);
1631 } else {
1632 MulHiInVgpr = true;
1633 }
1634 }
1635
1636 // Accumulate and produce the "carry-out" bit.
1637 //
1638 // The "carry-out" is defined as bit 64 of the result when computed as a
1639 // big integer. For unsigned multiply-add, this matches the usual definition
1640 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1641 // result, which is determined as:
1642 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1643 LLT CarryType = DstOnValu ? S1 : S32;
1644 const RegisterBank &CarryBank =
1645 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1646 const RegisterBank &DstBank =
1647 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1648 Register Carry;
1649 Register Zero;
1650
1651 if (!IsUnsigned) {
1652 Zero = B.buildConstant(S32, 0).getReg(0);
1653 MRI.setRegBank(Zero,
1654 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1655
1656 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1657 .getReg(0);
1658 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1659 : AMDGPU::SGPRRegBank);
1660
1661 if (DstOnValu && !MulHiInVgpr) {
1662 Carry = B.buildTrunc(S1, Carry).getReg(0);
1663 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1664 }
1665 }
1666
1667 if (Accumulate) {
1668 if (DstOnValu) {
1669 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1670 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1671 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1672 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1673 }
1674
1675 auto Unmerge = B.buildUnmerge(S32, Src2);
1676 Register Src2Lo = Unmerge.getReg(0);
1677 Register Src2Hi = Unmerge.getReg(1);
1678 MRI.setRegBank(Src2Lo, DstBank);
1679 MRI.setRegBank(Src2Hi, DstBank);
1680
1681 if (!IsUnsigned) {
1682 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1683 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1684
1685 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1686 MRI.setRegBank(Carry, CarryBank);
1687 }
1688
1689 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1690 DstLo = AddLo.getReg(0);
1691 Register CarryLo = AddLo.getReg(1);
1692 MRI.setRegBank(DstLo, DstBank);
1693 MRI.setRegBank(CarryLo, CarryBank);
1694
1695 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1696 DstHi = AddHi.getReg(0);
1697 MRI.setRegBank(DstHi, DstBank);
1698
1699 Register CarryHi = AddHi.getReg(1);
1700 MRI.setRegBank(CarryHi, CarryBank);
1701
1702 if (IsUnsigned) {
1703 Carry = CarryHi;
1704 } else {
1705 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1706 MRI.setRegBank(Carry, CarryBank);
1707 }
1708 } else {
1709 if (IsUnsigned) {
1710 Carry = B.buildConstant(CarryType, 0).getReg(0);
1711 MRI.setRegBank(Carry, CarryBank);
1712 }
1713 }
1714
1715 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1716
1717 if (DstOnValu) {
1718 B.buildCopy(Dst1, Carry);
1719 } else {
1720 B.buildTrunc(Dst1, Carry);
1721 }
1722
1723 MI.eraseFromParent();
1724 return true;
1725}
1726
1727// Return a suitable opcode for extending the operands of Opc when widening.
1728static unsigned getExtendOp(unsigned Opc) {
1729 switch (Opc) {
1730 case TargetOpcode::G_ASHR:
1731 case TargetOpcode::G_SMIN:
1732 case TargetOpcode::G_SMAX:
1733 return TargetOpcode::G_SEXT;
1734 case TargetOpcode::G_LSHR:
1735 case TargetOpcode::G_UMIN:
1736 case TargetOpcode::G_UMAX:
1737 return TargetOpcode::G_ZEXT;
1738 default:
1739 return TargetOpcode::G_ANYEXT;
1740 }
1741}
1742
1743// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1744// any illegal vector extend or unmerge operations.
1745static std::pair<Register, Register>
1746unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1747 const LLT S32 = LLT::scalar(32);
1748 auto Bitcast = B.buildBitcast(S32, Src);
1749
1750 if (ExtOpcode == TargetOpcode::G_SEXT) {
1751 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1752 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1753 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1754 }
1755
1756 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1757 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1758 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1759 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1760 }
1761
1762 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1763 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1764}
1765
1766// For cases where only a single copy is inserted for matching register banks.
1767// Replace the register in the instruction operand
1769 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1770 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1771 if (!SrcReg.empty()) {
1772 assert(SrcReg.size() == 1);
1773 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1774 return true;
1775 }
1776
1777 return false;
1778}
1779
1780/// Handle register layout difference for f16 images for some subtargets.
1783 Register Reg) const {
1785 return Reg;
1786
1787 const LLT S16 = LLT::scalar(16);
1788 LLT StoreVT = MRI.getType(Reg);
1789 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1790 return Reg;
1791
1792 auto Unmerge = B.buildUnmerge(S16, Reg);
1793
1794
1795 SmallVector<Register, 4> WideRegs;
1796 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1797 WideRegs.push_back(Unmerge.getReg(I));
1798
1799 const LLT S32 = LLT::scalar(32);
1800 int NumElts = StoreVT.getNumElements();
1801
1802 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1803 .getReg(0);
1804}
1805
1806static std::pair<Register, unsigned>
1808 int64_t Const;
1809 if (mi_match(Reg, MRI, m_ICst(Const)))
1810 return std::pair(Register(), Const);
1811
1812 Register Base;
1813 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1814 return std::pair(Base, Const);
1815
1816 // TODO: Handle G_OR used for add case
1817 return std::pair(Reg, 0);
1818}
1819
1820std::pair<Register, unsigned>
1822 Register OrigOffset) const {
1823 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1824 Register BaseReg;
1825 unsigned ImmOffset;
1826 const LLT S32 = LLT::scalar(32);
1827
1828 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1829 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1830 OrigOffset);
1831
1832 unsigned C1 = 0;
1833 if (ImmOffset != 0) {
1834 // If the immediate value is too big for the immoffset field, put only bits
1835 // that would normally fit in the immoffset field. The remaining value that
1836 // is copied/added for the voffset field is a large power of 2, and it
1837 // stands more chance of being CSEd with the copy/add for another similar
1838 // load/store.
1839 // However, do not do that rounding down if that is a negative
1840 // number, as it appears to be illegal to have a negative offset in the
1841 // vgpr, even if adding the immediate offset makes it positive.
1842 unsigned Overflow = ImmOffset & ~MaxImm;
1843 ImmOffset -= Overflow;
1844 if ((int32_t)Overflow < 0) {
1845 Overflow += ImmOffset;
1846 ImmOffset = 0;
1847 }
1848
1849 C1 = ImmOffset;
1850 if (Overflow != 0) {
1851 if (!BaseReg)
1852 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1853 else {
1854 auto OverflowVal = B.buildConstant(S32, Overflow);
1855 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1856 }
1857 }
1858 }
1859
1860 if (!BaseReg)
1861 BaseReg = B.buildConstant(S32, 0).getReg(0);
1862
1863 return {BaseReg, C1};
1864}
1865
1867 Register SrcReg) const {
1868 MachineRegisterInfo &MRI = *B.getMRI();
1869 LLT SrcTy = MRI.getType(SrcReg);
1870 if (SrcTy.getSizeInBits() == 32) {
1871 // Use a v_mov_b32 here to make the exec dependency explicit.
1872 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1873 .addDef(DstReg)
1874 .addUse(SrcReg);
1875 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1876 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1877 }
1878
1879 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1880 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1881
1882 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1883 .addDef(TmpReg0)
1884 .addUse(SrcReg, 0, AMDGPU::sub0);
1885 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1886 .addDef(TmpReg1)
1887 .addUse(SrcReg, 0, AMDGPU::sub1);
1888 B.buildInstr(AMDGPU::REG_SEQUENCE)
1889 .addDef(DstReg)
1890 .addUse(TmpReg0)
1891 .addImm(AMDGPU::sub0)
1892 .addUse(TmpReg1)
1893 .addImm(AMDGPU::sub1);
1894
1895 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1896 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1897}
1898
1899/// Utility function for pushing dynamic vector indexes with a constant offset
1900/// into waterfall loops.
1902 MachineInstr &IdxUseInstr,
1903 unsigned OpIdx,
1904 unsigned ConstOffset) {
1905 MachineRegisterInfo &MRI = *B.getMRI();
1906 const LLT S32 = LLT::scalar(32);
1907 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1908 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1909
1910 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1911
1912 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1913 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1914 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1915 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1916}
1917
1918/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1919/// original 32-bit source value (to be inserted in the low part of the combined
1920/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1921/// value.
1923 Register Hi32Reg, Register Lo32Reg,
1924 unsigned ExtOpc,
1925 const RegisterBank &RegBank,
1926 bool IsBooleanSrc = false) {
1927 if (ExtOpc == AMDGPU::G_ZEXT) {
1928 B.buildConstant(Hi32Reg, 0);
1929 } else if (ExtOpc == AMDGPU::G_SEXT) {
1930 if (IsBooleanSrc) {
1931 // If we know the original source was an s1, the high half is the same as
1932 // the low.
1933 B.buildCopy(Hi32Reg, Lo32Reg);
1934 } else {
1935 // Replicate sign bit from 32-bit extended part.
1936 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1937 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1938 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1939 }
1940 } else {
1941 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1942 B.buildUndef(Hi32Reg);
1943 }
1944}
1945
1946bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1948 const OperandsMapper &OpdMapper) const {
1949 MachineRegisterInfo &MRI = *B.getMRI();
1950
1951 Register VecReg = MI.getOperand(1).getReg();
1952 Register Idx = MI.getOperand(2).getReg();
1953
1954 const RegisterBank &IdxBank =
1955 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1956
1957 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1958
1959 LLT VecTy = MRI.getType(VecReg);
1960 unsigned EltSize = VecTy.getScalarSizeInBits();
1961 unsigned NumElem = VecTy.getNumElements();
1962
1963 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1964 IsDivergentIdx, &Subtarget))
1965 return false;
1966
1967 LLT S32 = LLT::scalar(32);
1968
1969 const RegisterBank &DstBank =
1970 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1971 const RegisterBank &SrcBank =
1972 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1973
1974 const RegisterBank &CCBank =
1975 (DstBank == AMDGPU::SGPRRegBank &&
1976 SrcBank == AMDGPU::SGPRRegBank &&
1977 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1978 : AMDGPU::VCCRegBank;
1979 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1980
1981 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1982 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1983 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1984 }
1985
1986 LLT EltTy = VecTy.getScalarType();
1987 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1988 unsigned NumLanes = DstRegs.size();
1989 if (!NumLanes)
1990 NumLanes = 1;
1991 else
1992 EltTy = MRI.getType(DstRegs[0]);
1993
1994 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1995 SmallVector<Register, 2> Res(NumLanes);
1996 for (unsigned L = 0; L < NumLanes; ++L)
1997 Res[L] = UnmergeToEltTy.getReg(L);
1998
1999 for (unsigned I = 1; I < NumElem; ++I) {
2000 auto IC = B.buildConstant(S32, I);
2001 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2002 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2003 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2004
2005 for (unsigned L = 0; L < NumLanes; ++L) {
2006 auto S = B.buildSelect(EltTy, Cmp,
2007 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
2008
2009 for (unsigned N : { 0, 2, 3 })
2010 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2011
2012 Res[L] = S->getOperand(0).getReg();
2013 }
2014 }
2015
2016 for (unsigned L = 0; L < NumLanes; ++L) {
2017 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2018 B.buildCopy(DstReg, Res[L]);
2019 MRI.setRegBank(DstReg, DstBank);
2020 }
2021
2022 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2023 MI.eraseFromParent();
2024
2025 return true;
2026}
2027
2028// Insert a cross regbank copy for a register if it already has a bank that
2029// differs from the one we want to set.
2032 const RegisterBank &Bank) {
2033 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2034 if (CurrBank && *CurrBank != Bank) {
2035 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2036 MRI.setRegBank(Copy, Bank);
2037 return Copy;
2038 }
2039
2040 MRI.setRegBank(Reg, Bank);
2041 return Reg;
2042}
2043
2044bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2046 const OperandsMapper &OpdMapper) const {
2047
2048 MachineRegisterInfo &MRI = *B.getMRI();
2049 Register VecReg = MI.getOperand(1).getReg();
2050 Register Idx = MI.getOperand(3).getReg();
2051
2052 const RegisterBank &IdxBank =
2053 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2054
2055 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2056
2057 LLT VecTy = MRI.getType(VecReg);
2058 unsigned EltSize = VecTy.getScalarSizeInBits();
2059 unsigned NumElem = VecTy.getNumElements();
2060
2061 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2062 IsDivergentIdx, &Subtarget))
2063 return false;
2064
2065 LLT S32 = LLT::scalar(32);
2066
2067 const RegisterBank &DstBank =
2068 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2069 const RegisterBank &SrcBank =
2070 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2071 const RegisterBank &InsBank =
2072 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2073
2074 const RegisterBank &CCBank =
2075 (DstBank == AMDGPU::SGPRRegBank &&
2076 SrcBank == AMDGPU::SGPRRegBank &&
2077 InsBank == AMDGPU::SGPRRegBank &&
2078 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2079 : AMDGPU::VCCRegBank;
2080 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2081
2082 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2083 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2084 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2085 }
2086
2087 LLT EltTy = VecTy.getScalarType();
2088 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2089 unsigned NumLanes = InsRegs.size();
2090 if (!NumLanes) {
2091 NumLanes = 1;
2092 InsRegs.push_back(MI.getOperand(2).getReg());
2093 } else {
2094 EltTy = MRI.getType(InsRegs[0]);
2095 }
2096
2097 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2098 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2099
2100 for (unsigned I = 0; I < NumElem; ++I) {
2101 auto IC = B.buildConstant(S32, I);
2102 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2103 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2104 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2105
2106 for (unsigned L = 0; L < NumLanes; ++L) {
2107 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2108 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2109 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2110
2111 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2112 MRI.setRegBank(Select, DstBank);
2113
2114 Ops[I * NumLanes + L] = Select;
2115 }
2116 }
2117
2118 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2119 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2120 B.buildBuildVector(MI.getOperand(0), Ops);
2121 } else {
2122 auto Vec = B.buildBuildVector(MergeTy, Ops);
2123 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2124 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2125 }
2126
2127 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2128 MI.eraseFromParent();
2129
2130 return true;
2131}
2132
2133// Break s_mul_u64 into 32-bit vector operations.
2135 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2136 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2137 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2138 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2139
2140 // All inputs are SGPRs, nothing special to do.
2141 if (DefRegs.empty()) {
2142 assert(Src0Regs.empty() && Src1Regs.empty());
2143 applyDefaultMapping(OpdMapper);
2144 return;
2145 }
2146
2147 assert(DefRegs.size() == 2);
2148 assert(Src0Regs.size() == Src1Regs.size() &&
2149 (Src0Regs.empty() || Src0Regs.size() == 2));
2150
2151 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2152 MachineInstr &MI = OpdMapper.getMI();
2153 Register DstReg = MI.getOperand(0).getReg();
2154 LLT HalfTy = LLT::scalar(32);
2155
2156 // Depending on where the source registers came from, the generic code may
2157 // have decided to split the inputs already or not. If not, we still need to
2158 // extract the values.
2159
2160 if (Src0Regs.empty())
2161 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2162 else
2163 setRegsToType(MRI, Src0Regs, HalfTy);
2164
2165 if (Src1Regs.empty())
2166 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2167 else
2168 setRegsToType(MRI, Src1Regs, HalfTy);
2169
2170 setRegsToType(MRI, DefRegs, HalfTy);
2171
2172 // The multiplication is done as follows:
2173 //
2174 // Op1H Op1L
2175 // * Op0H Op0L
2176 // --------------------
2177 // Op1H*Op0L Op1L*Op0L
2178 // + Op1H*Op0H Op1L*Op0H
2179 // -----------------------------------------
2180 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2181 //
2182 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2183 // value and that would overflow.
2184 // The low 32-bit value is Op1L*Op0L.
2185 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2186 // Op1L*Op0L).
2187
2188 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2189
2190 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2191 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2192 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2193 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2194 B.buildAdd(DefRegs[1], Add, MulHiLo);
2195 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2196
2197 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2198 MI.eraseFromParent();
2199}
2200
2202 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2203 MachineInstr &MI = OpdMapper.getMI();
2204 B.setInstrAndDebugLoc(MI);
2205 unsigned Opc = MI.getOpcode();
2206 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2207 switch (Opc) {
2208 case AMDGPU::G_CONSTANT:
2209 case AMDGPU::G_IMPLICIT_DEF: {
2210 Register DstReg = MI.getOperand(0).getReg();
2211 LLT DstTy = MRI.getType(DstReg);
2212 if (DstTy != LLT::scalar(1))
2213 break;
2214
2215 const RegisterBank *DstBank =
2216 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2217 if (DstBank == &AMDGPU::VCCRegBank)
2218 break;
2219 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2220 if (DefRegs.empty())
2221 DefRegs.push_back(DstReg);
2222
2223 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2224
2225 Register NewDstReg = MRI.createGenericVirtualRegister(LLT::scalar(32));
2226 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2227
2228 MI.getOperand(0).setReg(NewDstReg);
2229 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2230 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2231 MI.getOperand(1).setCImm(
2232 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2233 }
2234
2235 MRI.setRegBank(NewDstReg, *DstBank);
2236 B.buildTrunc(DefRegs[0], NewDstReg);
2237 return;
2238 }
2239 case AMDGPU::G_PHI: {
2240 Register DstReg = MI.getOperand(0).getReg();
2241 LLT DstTy = MRI.getType(DstReg);
2242 if (DstTy != LLT::scalar(1))
2243 break;
2244
2245 const LLT S32 = LLT::scalar(32);
2246 const RegisterBank *DstBank =
2247 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2248 if (DstBank == &AMDGPU::VCCRegBank) {
2249 applyDefaultMapping(OpdMapper);
2250 // The standard handling only considers the result register bank for
2251 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2252 // produce an invalid copy. We can only copy with some kind of compare to
2253 // get a vector boolean result. Insert a register bank copy that will be
2254 // correctly lowered to a compare.
2255 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2256 Register SrcReg = MI.getOperand(I).getReg();
2257 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2258
2259 if (SrcBank != &AMDGPU::VCCRegBank) {
2260 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2261 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2262
2263 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2264 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2265 MI.getOperand(I).setReg(Copy.getReg(0));
2266 }
2267 }
2268
2269 return;
2270 }
2271
2272 // Phi handling is strange and only considers the bank of the destination.
2273 substituteSimpleCopyRegs(OpdMapper, 0);
2274
2275 // Promote SGPR/VGPR booleans to s32
2276 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2277 B.setInsertPt(B.getMBB(), MI);
2278 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2279
2280 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2281 llvm_unreachable("widen scalar should have succeeded");
2282
2283 return;
2284 }
2285 case AMDGPU::G_FCMP:
2287 break;
2288 [[fallthrough]];
2289 case AMDGPU::G_ICMP:
2290 case AMDGPU::G_UADDO:
2291 case AMDGPU::G_USUBO:
2292 case AMDGPU::G_UADDE:
2293 case AMDGPU::G_SADDE:
2294 case AMDGPU::G_USUBE:
2295 case AMDGPU::G_SSUBE: {
2296 unsigned BoolDstOp =
2297 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2298 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2299
2300 const RegisterBank *DstBank =
2301 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2302 if (DstBank != &AMDGPU::SGPRRegBank)
2303 break;
2304
2305 const bool HasCarryIn = MI.getNumOperands() == 5;
2306
2307 // If this is a scalar compare, promote the result to s32, as the selection
2308 // will end up using a copy to a 32-bit vreg.
2309 const LLT S32 = LLT::scalar(32);
2310 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2311 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2312 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2313
2314 if (HasCarryIn) {
2315 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2316 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2317 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2318 MI.getOperand(4).setReg(NewSrcReg);
2319 }
2320
2321 MachineBasicBlock *MBB = MI.getParent();
2322 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2323
2324 // If we had a constrained VCC result register, a copy was inserted to VCC
2325 // from SGPR.
2326 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2327 if (DefRegs.empty())
2328 DefRegs.push_back(DstReg);
2329 B.buildTrunc(DefRegs[0], NewDstReg);
2330 return;
2331 }
2332 case AMDGPU::G_SELECT: {
2333 Register DstReg = MI.getOperand(0).getReg();
2334 LLT DstTy = MRI.getType(DstReg);
2335
2336 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2337 if (CondRegs.empty())
2338 CondRegs.push_back(MI.getOperand(1).getReg());
2339 else {
2340 assert(CondRegs.size() == 1);
2341 }
2342
2343 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2344 if (CondBank == &AMDGPU::SGPRRegBank) {
2345 const LLT S32 = LLT::scalar(32);
2346 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2347 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2348
2349 MI.getOperand(1).setReg(NewCondReg);
2350 B.buildZExt(NewCondReg, CondRegs[0]);
2351 }
2352
2353 if (DstTy.getSizeInBits() != 64)
2354 break;
2355
2356 LLT HalfTy = getHalfSizedType(DstTy);
2357
2358 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2359 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2360 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2361
2362 // All inputs are SGPRs, nothing special to do.
2363 if (DefRegs.empty()) {
2364 assert(Src1Regs.empty() && Src2Regs.empty());
2365 break;
2366 }
2367
2368 if (Src1Regs.empty())
2369 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2370 else {
2371 setRegsToType(MRI, Src1Regs, HalfTy);
2372 }
2373
2374 if (Src2Regs.empty())
2375 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2376 else
2377 setRegsToType(MRI, Src2Regs, HalfTy);
2378
2379 setRegsToType(MRI, DefRegs, HalfTy);
2380
2381 auto Flags = MI.getFlags();
2382 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2383 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2384
2385 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2386 MI.eraseFromParent();
2387 return;
2388 }
2389 case AMDGPU::G_BRCOND: {
2390 Register CondReg = MI.getOperand(0).getReg();
2391 // FIXME: Should use legalizer helper, but should change bool ext type.
2392 const RegisterBank *CondBank =
2393 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2394
2395 if (CondBank == &AMDGPU::SGPRRegBank) {
2396 const LLT S32 = LLT::scalar(32);
2397 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2398 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2399
2400 MI.getOperand(0).setReg(NewCondReg);
2401 B.buildZExt(NewCondReg, CondReg);
2402 return;
2403 }
2404
2405 break;
2406 }
2407 case AMDGPU::G_AND:
2408 case AMDGPU::G_OR:
2409 case AMDGPU::G_XOR: {
2410 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2411 // there is a VGPR input.
2412 Register DstReg = MI.getOperand(0).getReg();
2413 LLT DstTy = MRI.getType(DstReg);
2414
2415 if (DstTy.getSizeInBits() == 1) {
2416 const RegisterBank *DstBank =
2417 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2418 if (DstBank == &AMDGPU::VCCRegBank)
2419 break;
2420
2421 MachineFunction *MF = MI.getParent()->getParent();
2422 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2423 LegalizerHelper Helper(*MF, ApplyBank, B);
2424
2425 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2427 llvm_unreachable("widen scalar should have succeeded");
2428 return;
2429 }
2430
2431 if (DstTy.getSizeInBits() != 64)
2432 break;
2433
2434 LLT HalfTy = getHalfSizedType(DstTy);
2435 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2436 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2437 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2438
2439 // All inputs are SGPRs, nothing special to do.
2440 if (DefRegs.empty()) {
2441 assert(Src0Regs.empty() && Src1Regs.empty());
2442 break;
2443 }
2444
2445 assert(DefRegs.size() == 2);
2446 assert(Src0Regs.size() == Src1Regs.size() &&
2447 (Src0Regs.empty() || Src0Regs.size() == 2));
2448
2449 // Depending on where the source registers came from, the generic code may
2450 // have decided to split the inputs already or not. If not, we still need to
2451 // extract the values.
2452
2453 if (Src0Regs.empty())
2454 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2455 else
2456 setRegsToType(MRI, Src0Regs, HalfTy);
2457
2458 if (Src1Regs.empty())
2459 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2460 else
2461 setRegsToType(MRI, Src1Regs, HalfTy);
2462
2463 setRegsToType(MRI, DefRegs, HalfTy);
2464
2465 auto Flags = MI.getFlags();
2466 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2467 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2468
2469 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2470 MI.eraseFromParent();
2471 return;
2472 }
2473 case AMDGPU::G_ABS: {
2474 Register SrcReg = MI.getOperand(1).getReg();
2475 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2476
2477 // There is no VALU abs instruction so we need to replace it with a sub and
2478 // max combination.
2479 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2480 MachineFunction *MF = MI.getParent()->getParent();
2481 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2482 LegalizerHelper Helper(*MF, Apply, B);
2483
2485 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2486 return;
2487 }
2488 [[fallthrough]];
2489 }
2490 case AMDGPU::G_ADD:
2491 case AMDGPU::G_SUB:
2492 case AMDGPU::G_MUL:
2493 case AMDGPU::G_SHL:
2494 case AMDGPU::G_LSHR:
2495 case AMDGPU::G_ASHR:
2496 case AMDGPU::G_SMIN:
2497 case AMDGPU::G_SMAX:
2498 case AMDGPU::G_UMIN:
2499 case AMDGPU::G_UMAX: {
2500 Register DstReg = MI.getOperand(0).getReg();
2501 LLT DstTy = MRI.getType(DstReg);
2502
2503 // Special case for s_mul_u64. There is not a vector equivalent of
2504 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2505 // multiplications.
2506 if (Opc == AMDGPU::G_MUL && DstTy.getSizeInBits() == 64) {
2507 applyMappingSMULU64(B, OpdMapper);
2508 return;
2509 }
2510
2511 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2512 // Packed 16-bit operations need to be scalarized and promoted.
2513 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2514 break;
2515
2516 const RegisterBank *DstBank =
2517 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2518 if (DstBank == &AMDGPU::VGPRRegBank)
2519 break;
2520
2521 const LLT S32 = LLT::scalar(32);
2522 MachineBasicBlock *MBB = MI.getParent();
2523 MachineFunction *MF = MBB->getParent();
2524 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2525
2526 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2527 Register WideSrcLo, WideSrcHi;
2528
2529 std::tie(WideSrcLo, WideSrcHi) =
2530 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2531 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2532 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2533 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2534 MI.eraseFromParent();
2535 return;
2536 }
2537
2538 if (DstTy.isVector()) {
2539 Register WideSrc0Lo, WideSrc0Hi;
2540 Register WideSrc1Lo, WideSrc1Hi;
2541
2542 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2543 std::tie(WideSrc0Lo, WideSrc0Hi)
2544 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2545 std::tie(WideSrc1Lo, WideSrc1Hi)
2546 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2547 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2548 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2549 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2550 MI.eraseFromParent();
2551 } else {
2552 LegalizerHelper Helper(*MF, ApplySALU, B);
2553
2554 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2555 llvm_unreachable("widen scalar should have succeeded");
2556
2557 // FIXME: s16 shift amounts should be legal.
2558 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2559 Opc == AMDGPU::G_ASHR) {
2560 B.setInsertPt(*MBB, MI.getIterator());
2561 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2562 llvm_unreachable("widen scalar should have succeeded");
2563 }
2564 }
2565
2566 return;
2567 }
2568 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2569 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2570 // This is a special case for s_mul_u64. We use
2571 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2572 // where the 33 higher bits are sign-extended and
2573 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2574 // where the 32 higher bits are zero-extended. In case scalar registers are
2575 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2576 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2577 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2578
2579 // Insert basic copies.
2580 applyDefaultMapping(OpdMapper);
2581
2582 Register DstReg = MI.getOperand(0).getReg();
2583 Register SrcReg0 = MI.getOperand(1).getReg();
2584 Register SrcReg1 = MI.getOperand(2).getReg();
2585 const LLT S32 = LLT::scalar(32);
2586 const LLT S64 = LLT::scalar(64);
2587 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2588 "that handles only 64-bit operands.");
2589 const RegisterBank *DstBank =
2590 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2591
2592 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2593 // with s_mul_u64 operation.
2594 if (DstBank == &AMDGPU::SGPRRegBank) {
2595 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2596 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2597 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2598 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2599 return;
2600 }
2601
2602 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2603 // with a vector mad.
2604 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2605 "The destination operand should be in vector registers.");
2606
2607 DebugLoc DL = MI.getDebugLoc();
2608
2609 // Extract the lower subregister from the first operand.
2610 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2611 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2612 MRI.setType(Op0L, S32);
2613 B.buildTrunc(Op0L, SrcReg0);
2614
2615 // Extract the lower subregister from the second operand.
2616 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2617 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2618 MRI.setType(Op1L, S32);
2619 B.buildTrunc(Op1L, SrcReg1);
2620
2621 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2622 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2623 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2624
2626 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2627 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2628 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2629 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2630 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2631 MI.eraseFromParent();
2632 return;
2633 }
2634 case AMDGPU::G_SEXT_INREG: {
2635 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2636 if (SrcRegs.empty())
2637 break; // Nothing to repair
2638
2639 const LLT S32 = LLT::scalar(32);
2640 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2641
2642 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2643 // we would need to further expand, and doesn't let us directly set the
2644 // result registers.
2645 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2646
2647 int Amt = MI.getOperand(2).getImm();
2648 if (Amt <= 32) {
2649 // Downstream users have expectations for the high bit behavior, so freeze
2650 // incoming undefined bits.
2651 if (Amt == 32) {
2652 // The low bits are unchanged.
2653 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2654 } else {
2655 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2656 // Extend in the low bits and propagate the sign bit to the high half.
2657 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2658 }
2659
2660 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2661 } else {
2662 // The low bits are unchanged, and extend in the high bits.
2663 // No freeze required
2664 B.buildCopy(DstRegs[0], SrcRegs[0]);
2665 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2666 }
2667
2668 Register DstReg = MI.getOperand(0).getReg();
2669 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2670 MI.eraseFromParent();
2671 return;
2672 }
2673 case AMDGPU::G_CTPOP:
2674 case AMDGPU::G_BITREVERSE: {
2675 const RegisterBank *DstBank =
2676 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2677 if (DstBank == &AMDGPU::SGPRRegBank)
2678 break;
2679
2680 Register SrcReg = MI.getOperand(1).getReg();
2681 const LLT S32 = LLT::scalar(32);
2682 LLT Ty = MRI.getType(SrcReg);
2683 if (Ty == S32)
2684 break;
2685
2686 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2687
2688 MachineFunction &MF = B.getMF();
2689 LegalizerHelper Helper(MF, ApplyVALU, B);
2690
2691 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2692 llvm_unreachable("narrowScalar should have succeeded");
2693 return;
2694 }
2695 case AMDGPU::G_AMDGPU_FFBH_U32:
2696 case AMDGPU::G_AMDGPU_FFBL_B32:
2697 case AMDGPU::G_CTLZ_ZERO_UNDEF:
2698 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2699 const RegisterBank *DstBank =
2700 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2701 if (DstBank == &AMDGPU::SGPRRegBank)
2702 break;
2703
2704 Register SrcReg = MI.getOperand(1).getReg();
2705 const LLT S32 = LLT::scalar(32);
2706 LLT Ty = MRI.getType(SrcReg);
2707 if (Ty == S32)
2708 break;
2709
2710 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2711 // which return -1 when the input is zero:
2712 // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2713 // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2714 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2715 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2716 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2717 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2718 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2719 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2720 : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2721 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2722 : Opc;
2723 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2724 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2725 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2726 unsigned AddOpc =
2727 Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2728 ? AMDGPU::G_ADD
2729 : AMDGPU::G_UADDSAT;
2730 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2731 Register DstReg = MI.getOperand(0).getReg();
2732 B.buildUMin(DstReg, X, Y);
2733 MI.eraseFromParent();
2734 return;
2735 }
2736 case AMDGPU::G_SEXT:
2737 case AMDGPU::G_ZEXT:
2738 case AMDGPU::G_ANYEXT: {
2739 Register SrcReg = MI.getOperand(1).getReg();
2740 LLT SrcTy = MRI.getType(SrcReg);
2741 const bool Signed = Opc == AMDGPU::G_SEXT;
2742
2743 assert(OpdMapper.getVRegs(1).empty());
2744
2745 const RegisterBank *SrcBank =
2746 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2747
2748 Register DstReg = MI.getOperand(0).getReg();
2749 LLT DstTy = MRI.getType(DstReg);
2750 if (DstTy.isScalar() &&
2751 SrcBank != &AMDGPU::SGPRRegBank &&
2752 SrcBank != &AMDGPU::VCCRegBank &&
2753 // FIXME: Should handle any type that round to s64 when irregular
2754 // breakdowns supported.
2755 DstTy.getSizeInBits() == 64 &&
2756 SrcTy.getSizeInBits() <= 32) {
2757 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2758
2759 // Extend to 32-bit, and then extend the low half.
2760 if (Signed) {
2761 // TODO: Should really be buildSExtOrCopy
2762 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2763 } else if (Opc == AMDGPU::G_ZEXT) {
2764 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2765 } else {
2766 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2767 }
2768
2769 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2770 MRI.setRegBank(DstReg, *SrcBank);
2771 MI.eraseFromParent();
2772 return;
2773 }
2774
2775 if (SrcTy != LLT::scalar(1))
2776 return;
2777
2778 // It is not legal to have a legalization artifact with a VCC source. Rather
2779 // than introducing a copy, insert the select we would have to select the
2780 // copy to.
2781 if (SrcBank == &AMDGPU::VCCRegBank) {
2782 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2783
2784 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2785
2786 unsigned DstSize = DstTy.getSizeInBits();
2787 // 64-bit select is SGPR only
2788 const bool UseSel64 = DstSize > 32 &&
2789 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2790
2791 // TODO: Should s16 select be legal?
2792 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2793 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2794 auto False = B.buildConstant(SelType, 0);
2795
2796 MRI.setRegBank(True.getReg(0), *DstBank);
2797 MRI.setRegBank(False.getReg(0), *DstBank);
2798 MRI.setRegBank(DstReg, *DstBank);
2799
2800 if (DstSize > 32) {
2801 B.buildSelect(DefRegs[0], SrcReg, True, False);
2802 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2803 } else if (DstSize < 32) {
2804 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2805 MRI.setRegBank(Sel.getReg(0), *DstBank);
2806 B.buildTrunc(DstReg, Sel);
2807 } else {
2808 B.buildSelect(DstReg, SrcReg, True, False);
2809 }
2810
2811 MI.eraseFromParent();
2812 return;
2813 }
2814
2815 break;
2816 }
2817 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2818 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2819
2820 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2821
2822 Register DstReg = MI.getOperand(0).getReg();
2823 Register SrcReg = MI.getOperand(1).getReg();
2824
2825 const LLT S32 = LLT::scalar(32);
2826 LLT DstTy = MRI.getType(DstReg);
2827 LLT SrcTy = MRI.getType(SrcReg);
2828
2829 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2830 return;
2831
2832 const ValueMapping &DstMapping
2833 = OpdMapper.getInstrMapping().getOperandMapping(0);
2834 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2835 const RegisterBank *SrcBank =
2836 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2837 const RegisterBank *IdxBank =
2838 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2839
2840 Register BaseIdxReg;
2841 unsigned ConstOffset;
2842 std::tie(BaseIdxReg, ConstOffset) =
2843 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2844
2845 // See if the index is an add of a constant which will be foldable by moving
2846 // the base register of the index later if this is going to be executed in a
2847 // waterfall loop. This is essentially to reassociate the add of a constant
2848 // with the readfirstlane.
2849 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2850 ConstOffset > 0 &&
2851 ConstOffset < SrcTy.getNumElements();
2852
2853 // Move the base register. We'll re-insert the add later.
2854 if (ShouldMoveIndexIntoLoop)
2855 MI.getOperand(2).setReg(BaseIdxReg);
2856
2857 // If this is a VGPR result only because the index was a VGPR result, the
2858 // actual indexing will be done on the SGPR source vector, which will
2859 // produce a scalar result. We need to copy to the VGPR result inside the
2860 // waterfall loop.
2861 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2862 SrcBank == &AMDGPU::SGPRRegBank;
2863 if (DstRegs.empty()) {
2864 applyDefaultMapping(OpdMapper);
2865
2867
2868 if (NeedCopyToVGPR) {
2869 // We don't want a phi for this temporary reg.
2870 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2871 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2872 MI.getOperand(0).setReg(TmpReg);
2873 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2874
2875 // Use a v_mov_b32 here to make the exec dependency explicit.
2876 buildVCopy(B, DstReg, TmpReg);
2877 }
2878
2879 // Re-insert the constant offset add inside the waterfall loop.
2880 if (ShouldMoveIndexIntoLoop)
2881 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2882
2883 return;
2884 }
2885
2886 assert(DstTy.getSizeInBits() == 64);
2887
2888 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2889
2890 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2891 auto One = B.buildConstant(S32, 1);
2892
2893 MachineBasicBlock::iterator MII = MI.getIterator();
2894
2895 // Split the vector index into 32-bit pieces. Prepare to move all of the
2896 // new instructions into a waterfall loop if necessary.
2897 //
2898 // Don't put the bitcast or constant in the loop.
2899 MachineInstrSpan Span(MII, &B.getMBB());
2900
2901 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2902 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2903 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2904
2905 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2906 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2907
2908 MRI.setRegBank(DstReg, *DstBank);
2909 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2910 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2911 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2912 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2913
2914 SmallSet<Register, 4> OpsToWaterfall;
2915 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2916 MI.eraseFromParent();
2917 return;
2918 }
2919
2920 // Remove the original instruction to avoid potentially confusing the
2921 // waterfall loop logic.
2922 B.setInstr(*Span.begin());
2923 MI.eraseFromParent();
2924 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2925 OpsToWaterfall);
2926
2927 if (NeedCopyToVGPR) {
2928 MachineBasicBlock *LoopBB = Extract1->getParent();
2929 Register TmpReg0 = MRI.createGenericVirtualRegister(S32);
2930 Register TmpReg1 = MRI.createGenericVirtualRegister(S32);
2931 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2932 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2933
2934 Extract0->getOperand(0).setReg(TmpReg0);
2935 Extract1->getOperand(0).setReg(TmpReg1);
2936
2937 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2938
2939 buildVCopy(B, DstRegs[0], TmpReg0);
2940 buildVCopy(B, DstRegs[1], TmpReg1);
2941 }
2942
2943 if (ShouldMoveIndexIntoLoop)
2944 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2945
2946 return;
2947 }
2948 case AMDGPU::G_INSERT_VECTOR_ELT: {
2949 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2950
2951 Register DstReg = MI.getOperand(0).getReg();
2952 LLT VecTy = MRI.getType(DstReg);
2953
2954 assert(OpdMapper.getVRegs(0).empty());
2955 assert(OpdMapper.getVRegs(3).empty());
2956
2957 if (substituteSimpleCopyRegs(OpdMapper, 1))
2958 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2959
2960 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2961 return;
2962
2963 const RegisterBank *IdxBank =
2964 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2965
2966 Register SrcReg = MI.getOperand(1).getReg();
2967 Register InsReg = MI.getOperand(2).getReg();
2968 LLT InsTy = MRI.getType(InsReg);
2969 (void)InsTy;
2970
2971 Register BaseIdxReg;
2972 unsigned ConstOffset;
2973 std::tie(BaseIdxReg, ConstOffset) =
2974 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2975
2976 // See if the index is an add of a constant which will be foldable by moving
2977 // the base register of the index later if this is going to be executed in a
2978 // waterfall loop. This is essentially to reassociate the add of a constant
2979 // with the readfirstlane.
2980 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2981 ConstOffset > 0 &&
2982 ConstOffset < VecTy.getNumElements();
2983
2984 // Move the base register. We'll re-insert the add later.
2985 if (ShouldMoveIndexIntoLoop)
2986 MI.getOperand(3).setReg(BaseIdxReg);
2987
2988
2989 if (InsRegs.empty()) {
2991
2992 // Re-insert the constant offset add inside the waterfall loop.
2993 if (ShouldMoveIndexIntoLoop) {
2994 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
2995 }
2996
2997 return;
2998 }
2999
3000 assert(InsTy.getSizeInBits() == 64);
3001
3002 const LLT S32 = LLT::scalar(32);
3003 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3004
3005 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3006 auto One = B.buildConstant(S32, 1);
3007
3008 // Split the vector index into 32-bit pieces. Prepare to move all of the
3009 // new instructions into a waterfall loop if necessary.
3010 //
3011 // Don't put the bitcast or constant in the loop.
3013
3014 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3015 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3016 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3017
3018 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3019 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3020
3021 const RegisterBank *DstBank =
3022 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3023 const RegisterBank *SrcBank =
3024 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3025 const RegisterBank *InsSrcBank =
3026 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3027
3028 MRI.setRegBank(InsReg, *InsSrcBank);
3029 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3030 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3031 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3032 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3033 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3034 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3035
3036
3037 SmallSet<Register, 4> OpsToWaterfall;
3038 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3039 B.setInsertPt(B.getMBB(), MI);
3040 B.buildBitcast(DstReg, InsHi);
3041 MI.eraseFromParent();
3042 return;
3043 }
3044
3045 B.setInstr(*Span.begin());
3046 MI.eraseFromParent();
3047
3048 // Figure out the point after the waterfall loop before mangling the control
3049 // flow.
3050 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3051 OpsToWaterfall);
3052
3053 // The insertion point is now right after the original instruction.
3054 //
3055 // Keep the bitcast to the original vector type out of the loop. Doing this
3056 // saved an extra phi we don't need inside the loop.
3057 B.buildBitcast(DstReg, InsHi);
3058
3059 // Re-insert the constant offset add inside the waterfall loop.
3060 if (ShouldMoveIndexIntoLoop)
3061 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3062
3063 return;
3064 }
3065 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3066 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3067 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3068 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3069 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3070 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3071 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3072 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3073 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3074 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3075 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3076 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3077 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3078 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3079 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3080 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3081 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3082 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3083 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3084 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3085 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3086 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3087 applyDefaultMapping(OpdMapper);
3088 executeInWaterfallLoop(B, MI, {1, 4});
3089 return;
3090 }
3091 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3092 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3093 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3094 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3095 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3096 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3097 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3098 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3099 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3100 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3101 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3102 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3103 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3104 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3105 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3106 applyDefaultMapping(OpdMapper);
3107 executeInWaterfallLoop(B, MI, {2, 5});
3108 return;
3109 }
3110 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3111 applyDefaultMapping(OpdMapper);
3112 executeInWaterfallLoop(B, MI, {3, 6});
3113 return;
3114 }
3115 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3116 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3117 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3118 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3119 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3120 applyMappingSBufferLoad(B, OpdMapper);
3121 return;
3122 }
3123 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3126 return;
3127 case AMDGPU::G_INTRINSIC:
3128 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3129 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3130 case Intrinsic::amdgcn_readlane: {
3131 substituteSimpleCopyRegs(OpdMapper, 2);
3132
3133 assert(OpdMapper.getVRegs(0).empty());
3134 assert(OpdMapper.getVRegs(3).empty());
3135
3136 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3137 // waterfall loop, so assume it's a uniform value.
3138 constrainOpWithReadfirstlane(B, MI, 3); // Index
3139 return;
3140 }
3141 case Intrinsic::amdgcn_writelane: {
3142 assert(OpdMapper.getVRegs(0).empty());
3143 assert(OpdMapper.getVRegs(2).empty());
3144 assert(OpdMapper.getVRegs(3).empty());
3145
3146 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3147 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3148 constrainOpWithReadfirstlane(B, MI, 3); // Index
3149 return;
3150 }
3151 case Intrinsic::amdgcn_interp_p1:
3152 case Intrinsic::amdgcn_interp_p2:
3153 case Intrinsic::amdgcn_interp_mov:
3154 case Intrinsic::amdgcn_interp_p1_f16:
3155 case Intrinsic::amdgcn_interp_p2_f16:
3156 case Intrinsic::amdgcn_lds_param_load: {
3157 applyDefaultMapping(OpdMapper);
3158
3159 // Readlane for m0 value, which is always the last operand.
3160 // FIXME: Should this be a waterfall loop instead?
3161 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3162 return;
3163 }
3164 case Intrinsic::amdgcn_interp_inreg_p10:
3165 case Intrinsic::amdgcn_interp_inreg_p2:
3166 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3167 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3168 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3169 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3170 case Intrinsic::amdgcn_permlane16_swap:
3171 case Intrinsic::amdgcn_permlane32_swap:
3172 applyDefaultMapping(OpdMapper);
3173 return;
3174 case Intrinsic::amdgcn_permlane16:
3175 case Intrinsic::amdgcn_permlanex16: {
3176 // Doing a waterfall loop over these wouldn't make any sense.
3177 substituteSimpleCopyRegs(OpdMapper, 2);
3178 substituteSimpleCopyRegs(OpdMapper, 3);
3181 return;
3182 }
3183 case Intrinsic::amdgcn_sbfe:
3184 applyMappingBFE(B, OpdMapper, true);
3185 return;
3186 case Intrinsic::amdgcn_ubfe:
3187 applyMappingBFE(B, OpdMapper, false);
3188 return;
3189 case Intrinsic::amdgcn_inverse_ballot:
3190 case Intrinsic::amdgcn_s_bitreplicate:
3191 case Intrinsic::amdgcn_s_quadmask:
3192 case Intrinsic::amdgcn_s_wqm:
3193 applyDefaultMapping(OpdMapper);
3194 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3195 return;
3196 case Intrinsic::amdgcn_ballot:
3197 // Use default handling and insert copy to vcc source.
3198 break;
3199 }
3200 break;
3201 }
3202 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3203 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3204 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3205 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3206 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3207 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3209 assert(RSrcIntrin && RSrcIntrin->IsImage);
3210 // Non-images can have complications from operands that allow both SGPR
3211 // and VGPR. For now it's too complicated to figure out the final opcode
3212 // to derive the register bank from the MCInstrDesc.
3213 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3214 return;
3215 }
3216 case AMDGPU::G_AMDGPU_INTRIN_BVH_INTERSECT_RAY: {
3217 unsigned N = MI.getNumExplicitOperands() - 2;
3218 applyDefaultMapping(OpdMapper);
3220 return;
3221 }
3222 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3223 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3224 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3225 switch (IntrID) {
3226 case Intrinsic::amdgcn_ds_ordered_add:
3227 case Intrinsic::amdgcn_ds_ordered_swap: {
3228 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3229 assert(OpdMapper.getVRegs(0).empty());
3230 substituteSimpleCopyRegs(OpdMapper, 3);
3232 return;
3233 }
3234 case Intrinsic::amdgcn_ds_gws_init:
3235 case Intrinsic::amdgcn_ds_gws_barrier:
3236 case Intrinsic::amdgcn_ds_gws_sema_br: {
3237 // Only the first lane is executes, so readfirstlane is safe.
3238 substituteSimpleCopyRegs(OpdMapper, 1);
3240 return;
3241 }
3242 case Intrinsic::amdgcn_ds_gws_sema_v:
3243 case Intrinsic::amdgcn_ds_gws_sema_p:
3244 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3245 // Only the first lane is executes, so readfirstlane is safe.
3247 return;
3248 }
3249 case Intrinsic::amdgcn_ds_append:
3250 case Intrinsic::amdgcn_ds_consume: {
3252 return;
3253 }
3254 case Intrinsic::amdgcn_s_sendmsg:
3255 case Intrinsic::amdgcn_s_sendmsghalt: {
3256 // FIXME: Should this use a waterfall loop?
3258 return;
3259 }
3260 case Intrinsic::amdgcn_s_setreg: {
3262 return;
3263 }
3264 case Intrinsic::amdgcn_s_ttracedata:
3266 return;
3267 case Intrinsic::amdgcn_raw_buffer_load_lds:
3268 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
3269 applyDefaultMapping(OpdMapper);
3270 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3272 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3273 return;
3274 }
3275 case Intrinsic::amdgcn_struct_buffer_load_lds:
3276 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
3277 applyDefaultMapping(OpdMapper);
3278 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3280 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3281 return;
3282 }
3283 case Intrinsic::amdgcn_global_load_lds: {
3284 applyDefaultMapping(OpdMapper);
3286 return;
3287 }
3288 case Intrinsic::amdgcn_lds_direct_load: {
3289 applyDefaultMapping(OpdMapper);
3290 // Readlane for m0 value, which is always the last operand.
3291 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3292 return;
3293 }
3294 case Intrinsic::amdgcn_exp_row:
3295 applyDefaultMapping(OpdMapper);
3297 return;
3298 case Intrinsic::amdgcn_s_sleep_var:
3299 assert(OpdMapper.getVRegs(1).empty());
3301 return;
3302 case Intrinsic::amdgcn_s_barrier_join:
3303 case Intrinsic::amdgcn_s_wakeup_barrier:
3305 return;
3306 case Intrinsic::amdgcn_s_barrier_init:
3307 case Intrinsic::amdgcn_s_barrier_signal_var:
3310 return;
3311 case Intrinsic::amdgcn_s_get_barrier_state:
3312 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3314 return;
3315 }
3316 case Intrinsic::amdgcn_s_prefetch_data: {
3317 Register PtrReg = MI.getOperand(1).getReg();
3318 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3322 } else
3323 MI.eraseFromParent();
3324 return;
3325 }
3326 default: {
3327 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3329 // Non-images can have complications from operands that allow both SGPR
3330 // and VGPR. For now it's too complicated to figure out the final opcode
3331 // to derive the register bank from the MCInstrDesc.
3332 if (RSrcIntrin->IsImage) {
3333 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3334 return;
3335 }
3336 }
3337
3338 break;
3339 }
3340 }
3341 break;
3342 }
3343 case AMDGPU::G_SI_CALL: {
3344 // Use a set to avoid extra readfirstlanes in the case where multiple
3345 // operands are the same register.
3346 SmallSet<Register, 4> SGPROperandRegs;
3347
3348 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3349 break;
3350
3351 // Move all copies to physical SGPRs that are used by the call instruction
3352 // into the loop block. Start searching for these copies until the
3353 // ADJCALLSTACKUP.
3354 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3355 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3356
3357 // Move all non-copies before the copies, so that a complete range can be
3358 // moved into the waterfall loop.
3359 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3360 // Count of NonCopyInstrs found until the current LastCopy.
3361 unsigned NonCopyInstrsLen = 0;
3363 MachineBasicBlock::iterator LastCopy = Start;
3364 MachineBasicBlock *MBB = MI.getParent();
3367 while (Start->getOpcode() != FrameSetupOpcode) {
3368 --Start;
3369 bool IsCopy = false;
3370 if (Start->getOpcode() == AMDGPU::COPY) {
3371 auto &Dst = Start->getOperand(0);
3372 if (Dst.isReg()) {
3373 Register Reg = Dst.getReg();
3374 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3375 IsCopy = true;
3376 } else {
3377 // Also move the copy from the scratch rsrc descriptor into the loop
3378 // to allow it to be optimized away.
3379 auto &Src = Start->getOperand(1);
3380 if (Src.isReg()) {
3381 Reg = Src.getReg();
3382 IsCopy = Info->getScratchRSrcReg() == Reg;
3383 }
3384 }
3385 }
3386 }
3387
3388 if (IsCopy) {
3389 LastCopy = Start;
3390 NonCopyInstrsLen = NonCopyInstrs.size();
3391 } else {
3392 NonCopyInstrs.push_back(&*Start);
3393 }
3394 }
3395 NonCopyInstrs.resize(NonCopyInstrsLen);
3396
3397 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3398 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3399 }
3400 Start = LastCopy;
3401
3402 // Do the same for copies after the loop
3403 NonCopyInstrs.clear();
3404 NonCopyInstrsLen = 0;
3406 LastCopy = End;
3407 while (End->getOpcode() != FrameDestroyOpcode) {
3408 ++End;
3409 bool IsCopy = false;
3410 if (End->getOpcode() == AMDGPU::COPY) {
3411 auto &Src = End->getOperand(1);
3412 if (Src.isReg()) {
3413 Register Reg = Src.getReg();
3414 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3415 }
3416 }
3417
3418 if (IsCopy) {
3419 LastCopy = End;
3420 NonCopyInstrsLen = NonCopyInstrs.size();
3421 } else {
3422 NonCopyInstrs.push_back(&*End);
3423 }
3424 }
3425 NonCopyInstrs.resize(NonCopyInstrsLen);
3426
3427 End = LastCopy;
3428 ++LastCopy;
3429 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3430 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3431 }
3432
3433 ++End;
3434 B.setInsertPt(B.getMBB(), Start);
3435 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3436 break;
3437 }
3438 case AMDGPU::G_LOAD:
3439 case AMDGPU::G_ZEXTLOAD:
3440 case AMDGPU::G_SEXTLOAD: {
3441 if (applyMappingLoad(B, OpdMapper, MI))
3442 return;
3443 break;
3444 }
3445 case AMDGPU::G_DYN_STACKALLOC:
3446 applyMappingDynStackAlloc(B, OpdMapper, MI);
3447 return;
3448 case AMDGPU::G_STACKRESTORE: {
3449 applyDefaultMapping(OpdMapper);
3451 return;
3452 }
3453 case AMDGPU::G_SBFX:
3454 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3455 return;
3456 case AMDGPU::G_UBFX:
3457 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3458 return;
3459 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3460 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3461 applyMappingMAD_64_32(B, OpdMapper);
3462 return;
3463 case AMDGPU::G_PREFETCH: {
3464 if (!Subtarget.hasPrefetch()) {
3465 MI.eraseFromParent();
3466 return;
3467 }
3468 Register PtrReg = MI.getOperand(0).getReg();
3469 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3470 if (PtrBank == AMDGPU::VGPRRegBankID) {
3471 MI.eraseFromParent();
3472 return;
3473 }
3474 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3477 MI.eraseFromParent();
3478 return;
3479 }
3480 applyDefaultMapping(OpdMapper);
3481 return;
3482 }
3483 default:
3484 break;
3485 }
3486
3487 return applyDefaultMapping(OpdMapper);
3488}
3489
3490// vgpr, sgpr -> vgpr
3491// vgpr, agpr -> vgpr
3492// agpr, agpr -> agpr
3493// agpr, sgpr -> vgpr
3494static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3495 if (RB0 == AMDGPU::InvalidRegBankID)
3496 return RB1;
3497 if (RB1 == AMDGPU::InvalidRegBankID)
3498 return RB0;
3499
3500 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3501 return AMDGPU::SGPRRegBankID;
3502
3503 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3504 return AMDGPU::AGPRRegBankID;
3505
3506 return AMDGPU::VGPRRegBankID;
3507}
3508
3509static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3510 if (RB0 == AMDGPU::InvalidRegBankID)
3511 return RB1;
3512 if (RB1 == AMDGPU::InvalidRegBankID)
3513 return RB0;
3514
3515 // vcc, vcc -> vcc
3516 // vcc, sgpr -> vcc
3517 // vcc, vgpr -> vcc
3518 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3519 return AMDGPU::VCCRegBankID;
3520
3521 // vcc, vgpr -> vgpr
3522 return regBankUnion(RB0, RB1);
3523}
3524
3526 const MachineInstr &MI) const {
3527 unsigned RegBank = AMDGPU::InvalidRegBankID;
3528
3529 for (const MachineOperand &MO : MI.operands()) {
3530 if (!MO.isReg())
3531 continue;
3532 Register Reg = MO.getReg();
3533 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3534 RegBank = regBankUnion(RegBank, Bank->getID());
3535 if (RegBank == AMDGPU::VGPRRegBankID)
3536 break;
3537 }
3538 }
3539
3540 return RegBank;
3541}
3542
3544 const MachineFunction &MF = *MI.getParent()->getParent();
3545 const MachineRegisterInfo &MRI = MF.getRegInfo();
3546 for (const MachineOperand &MO : MI.operands()) {
3547 if (!MO.isReg())
3548 continue;
3549 Register Reg = MO.getReg();
3550 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3551 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3552 return false;
3553 }
3554 }
3555 return true;
3556}
3557
3560 const MachineFunction &MF = *MI.getParent()->getParent();
3561 const MachineRegisterInfo &MRI = MF.getRegInfo();
3562 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3563
3564 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3565 const MachineOperand &SrcOp = MI.getOperand(i);
3566 if (!SrcOp.isReg())
3567 continue;
3568
3569 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3570 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3571 }
3572 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3573 MI.getNumOperands());
3574}
3575
3578 const MachineFunction &MF = *MI.getParent()->getParent();
3579 const MachineRegisterInfo &MRI = MF.getRegInfo();
3580 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3581
3582 // Even though we technically could use SGPRs, this would require knowledge of
3583 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3584 //
3585 // TODO: Unary ops are trivially OK, so accept SGPRs?
3586 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3587 const MachineOperand &Src = MI.getOperand(i);
3588 if (!Src.isReg())
3589 continue;
3590
3591 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3592 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3593 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3594 }
3595
3596 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3597 MI.getNumOperands());
3598}
3599
3602 const MachineFunction &MF = *MI.getParent()->getParent();
3603 const MachineRegisterInfo &MRI = MF.getRegInfo();
3604 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3605
3606 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3607 const MachineOperand &Op = MI.getOperand(I);
3608 if (!Op.isReg())
3609 continue;
3610
3611 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3612 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3613 }
3614
3615 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3616 MI.getNumOperands());
3617}
3618
3621 const MachineInstr &MI,
3622 int RsrcIdx) const {
3623 // The reported argument index is relative to the IR intrinsic call arguments,
3624 // so we need to shift by the number of defs and the intrinsic ID.
3625 RsrcIdx += MI.getNumExplicitDefs() + 1;
3626
3627 const int NumOps = MI.getNumOperands();
3628 SmallVector<const ValueMapping *, 8> OpdsMapping(NumOps);
3629
3630 // TODO: Should packed/unpacked D16 difference be reported here as part of
3631 // the value mapping?
3632 for (int I = 0; I != NumOps; ++I) {
3633 if (!MI.getOperand(I).isReg())
3634 continue;
3635
3636 Register OpReg = MI.getOperand(I).getReg();
3637 // We replace some dead address operands with $noreg
3638 if (!OpReg)
3639 continue;
3640
3641 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3642
3643 // FIXME: Probably need a new intrinsic register bank searchable table to
3644 // handle arbitrary intrinsics easily.
3645 //
3646 // If this has a sampler, it immediately follows rsrc.
3647 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3648
3649 if (MustBeSGPR) {
3650 // If this must be an SGPR, so we must report whatever it is as legal.
3651 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3652 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3653 } else {
3654 // Some operands must be VGPR, and these are easy to copy to.
3655 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3656 }
3657 }
3658
3659 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3660}
3661
3662/// Return the mapping for a pointer argument.
3665 Register PtrReg) const {
3666 LLT PtrTy = MRI.getType(PtrReg);
3667 unsigned Size = PtrTy.getSizeInBits();
3670 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3671
3672 // If we're using MUBUF instructions for global memory, an SGPR base register
3673 // is possible. Otherwise this needs to be a VGPR.
3674 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3675 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3676}
3677
3680
3681 const MachineFunction &MF = *MI.getParent()->getParent();
3682 const MachineRegisterInfo &MRI = MF.getRegInfo();
3684 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3685 Register PtrReg = MI.getOperand(1).getReg();
3686 LLT PtrTy = MRI.getType(PtrReg);
3687 unsigned AS = PtrTy.getAddressSpace();
3688 unsigned PtrSize = PtrTy.getSizeInBits();
3689
3690 const ValueMapping *ValMapping;
3691 const ValueMapping *PtrMapping;
3692
3693 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3694
3695 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3696 if (isScalarLoadLegal(MI)) {
3697 // We have a uniform instruction so we want to use an SMRD load
3698 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3699 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3700 } else {
3701 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3702
3703 // If we're using MUBUF instructions for global memory, an SGPR base
3704 // register is possible. Otherwise this needs to be a VGPR.
3705 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3706 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3707
3708 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3709 }
3710 } else {
3711 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3712 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3713 }
3714
3715 OpdsMapping[0] = ValMapping;
3716 OpdsMapping[1] = PtrMapping;
3718 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3719 return Mapping;
3720
3721 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3722 // handle that during instruction selection?
3723}
3724
3725unsigned
3727 const MachineRegisterInfo &MRI,
3728 unsigned Default) const {
3729 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3730 return Bank ? Bank->getID() : Default;
3731}
3732
3735 const MachineRegisterInfo &MRI,
3736 const TargetRegisterInfo &TRI) const {
3737 // Lie and claim anything is legal, even though this needs to be an SGPR
3738 // applyMapping will have to deal with it as a waterfall loop.
3739 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3740 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3741 return AMDGPU::getValueMapping(Bank, Size);
3742}
3743
3746 const MachineRegisterInfo &MRI,
3747 const TargetRegisterInfo &TRI) const {
3748 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3749 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3750}
3751
3754 const MachineRegisterInfo &MRI,
3755 const TargetRegisterInfo &TRI) const {
3756 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3757 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3758}
3759
3760///
3761/// This function must return a legal mapping, because
3762/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3763/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3764/// VGPR to SGPR generated is illegal.
3765///
3766// Operands that must be SGPRs must accept potentially divergent VGPRs as
3767// legal. These will be dealt with in applyMappingImpl.
3768//
3771 const MachineFunction &MF = *MI.getParent()->getParent();
3772 const MachineRegisterInfo &MRI = MF.getRegInfo();
3773
3774 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3775 Register DstReg = MI.getOperand(0).getReg();
3776 Register SrcReg = MI.getOperand(1).getReg();
3777
3778 // The default logic bothers to analyze impossible alternative mappings. We
3779 // want the most straightforward mapping, so just directly handle this.
3780 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3781 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3782 assert(SrcBank && "src bank should have been assigned already");
3783
3784 // For COPY between a physical reg and an s1, there is no type associated so
3785 // we need to take the virtual register's type as a hint on how to interpret
3786 // s1 values.
3787 if (!SrcReg.isVirtual() && !DstBank &&
3788 MRI.getType(DstReg) == LLT::scalar(1))
3789 DstBank = &AMDGPU::VCCRegBank;
3790 else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1))
3791 DstBank = &AMDGPU::VCCRegBank;
3792
3793 if (!DstBank)
3794 DstBank = SrcBank;
3795
3796 unsigned Size = getSizeInBits(DstReg, MRI, *TRI);
3797 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3798 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3800
3801 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3802 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3803 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3804 OpdsMapping[0] = &ValMap;
3805 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3806 OpdsMapping[1] = &ValMap;
3807
3808 return getInstructionMapping(
3809 1, /*Cost*/ 1,
3810 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3811 }
3812
3813 if (MI.isRegSequence()) {
3814 // If any input is a VGPR, the result must be a VGPR. The default handling
3815 // assumes any copy between banks is legal.
3816 unsigned BankID = AMDGPU::SGPRRegBankID;
3817
3818 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3819 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3820 // It doesn't make sense to use vcc or scc banks here, so just ignore
3821 // them.
3822 if (OpBank != AMDGPU::SGPRRegBankID) {
3823 BankID = AMDGPU::VGPRRegBankID;
3824 break;
3825 }
3826 }
3827 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3828
3829 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3830 return getInstructionMapping(
3831 1, /*Cost*/ 1,
3832 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3833 }
3834
3835 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3836 // properly.
3837 //
3838 // TODO: There are additional exec masking dependencies to analyze.
3839 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3840 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3841 Register DstReg = PHI->getReg(0);
3842
3843 // Sometimes the result may have already been assigned a bank.
3844 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3845 ResultBank = DstBank->getID();
3846
3847 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3848 Register Reg = PHI->getIncomingValue(I);
3849 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3850
3851 // FIXME: Assuming VGPR for any undetermined inputs.
3852 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3853 ResultBank = AMDGPU::VGPRRegBankID;
3854 break;
3855 }
3856
3857 // FIXME: Need to promote SGPR case to s32
3858 unsigned OpBank = Bank->getID();
3859 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3860 }
3861
3862 assert(ResultBank != AMDGPU::InvalidRegBankID);
3863
3864 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3865
3866 const ValueMapping &ValMap =
3867 getValueMapping(0, Size, getRegBank(ResultBank));
3868 return getInstructionMapping(
3869 1, /*Cost*/ 1,
3870 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3871 }
3872
3874 if (Mapping.isValid())
3875 return Mapping;
3876
3877 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3878
3879 switch (MI.getOpcode()) {
3880 default:
3882
3883 case AMDGPU::G_AND:
3884 case AMDGPU::G_OR:
3885 case AMDGPU::G_XOR:
3886 case AMDGPU::G_MUL: {
3887 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3888 if (Size == 1) {
3889 const RegisterBank *DstBank
3890 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3891
3892 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3893 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3894 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3895 if (DstBank) {
3896 TargetBankID = DstBank->getID();
3897 if (DstBank == &AMDGPU::VCCRegBank) {
3898 TargetBankID = AMDGPU::VCCRegBankID;
3899 BankLHS = AMDGPU::VCCRegBankID;
3900 BankRHS = AMDGPU::VCCRegBankID;
3901 } else {
3902 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3903 AMDGPU::SGPRRegBankID);
3904 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3905 AMDGPU::SGPRRegBankID);
3906 }
3907 } else {
3908 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3909 AMDGPU::VCCRegBankID);
3910 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3911 AMDGPU::VCCRegBankID);
3912
3913 // Both inputs should be true booleans to produce a boolean result.
3914 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3915 TargetBankID = AMDGPU::VGPRRegBankID;
3916 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3917 TargetBankID = AMDGPU::VCCRegBankID;
3918 BankLHS = AMDGPU::VCCRegBankID;
3919 BankRHS = AMDGPU::VCCRegBankID;
3920 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
3921 TargetBankID = AMDGPU::SGPRRegBankID;
3922 }
3923 }
3924
3925 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
3926 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
3927 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
3928 break;
3929 }
3930
3931 if (Size == 64) {
3932
3933 if (isSALUMapping(MI)) {
3934 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
3935 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
3936 } else {
3937 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
3938 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
3939 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
3940
3941 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
3942 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
3943 }
3944
3945 break;
3946 }
3947
3948 [[fallthrough]];
3949 }
3950 case AMDGPU::G_PTR_ADD:
3951 case AMDGPU::G_PTRMASK:
3952 case AMDGPU::G_ADD:
3953 case AMDGPU::G_SUB:
3954 case AMDGPU::G_SHL:
3955 case AMDGPU::G_LSHR:
3956 case AMDGPU::G_ASHR:
3957 case AMDGPU::G_UADDO:
3958 case AMDGPU::G_USUBO:
3959 case AMDGPU::G_UADDE:
3960 case AMDGPU::G_SADDE:
3961 case AMDGPU::G_USUBE:
3962 case AMDGPU::G_SSUBE:
3963 case AMDGPU::G_SMIN:
3964 case AMDGPU::G_SMAX:
3965 case AMDGPU::G_UMIN:
3966 case AMDGPU::G_UMAX:
3967 case AMDGPU::G_ABS:
3968 case AMDGPU::G_SHUFFLE_VECTOR:
3969 case AMDGPU::G_SBFX:
3970 case AMDGPU::G_UBFX:
3971 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
3972 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
3973 if (isSALUMapping(MI))
3974 return getDefaultMappingSOP(MI);
3975 return getDefaultMappingVOP(MI);
3976 case AMDGPU::G_FADD:
3977 case AMDGPU::G_FSUB:
3978 case AMDGPU::G_FMUL:
3979 case AMDGPU::G_FMA:
3980 case AMDGPU::G_FFLOOR:
3981 case AMDGPU::G_FCEIL:
3982 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
3983 case AMDGPU::G_FMINNUM:
3984 case AMDGPU::G_FMAXNUM:
3985 case AMDGPU::G_FMINIMUM:
3986 case AMDGPU::G_FMAXIMUM:
3987 case AMDGPU::G_INTRINSIC_TRUNC:
3988 case AMDGPU::G_STRICT_FADD:
3989 case AMDGPU::G_STRICT_FSUB:
3990 case AMDGPU::G_STRICT_FMUL:
3991 case AMDGPU::G_STRICT_FMA: {
3992 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
3993 unsigned Size = Ty.getSizeInBits();
3994 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
3995 (Size == 32 || Size == 16) && isSALUMapping(MI))
3996 return getDefaultMappingSOP(MI);
3997 return getDefaultMappingVOP(MI);
3998 }
3999 case AMDGPU::G_FPTOSI:
4000 case AMDGPU::G_FPTOUI:
4001 case AMDGPU::G_SITOFP:
4002 case AMDGPU::G_UITOFP: {
4003 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4004 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4005 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4007 return getDefaultMappingSOP(MI);
4008 return getDefaultMappingVOP(MI);
4009 }
4010 case AMDGPU::G_FPTRUNC:
4011 case AMDGPU::G_FPEXT: {
4012 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4013 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4014 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4016 return getDefaultMappingSOP(MI);
4017 return getDefaultMappingVOP(MI);
4018 }
4019 case AMDGPU::G_FSQRT:
4020 case AMDGPU::G_FEXP2:
4021 case AMDGPU::G_FLOG2: {
4022 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4023 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4025 return getDefaultMappingSOP(MI);
4026 return getDefaultMappingVOP(MI);
4027 }
4028 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4029 case AMDGPU::G_SSUBSAT:
4030 case AMDGPU::G_UADDSAT:
4031 case AMDGPU::G_USUBSAT:
4032 case AMDGPU::G_FMAD:
4033 case AMDGPU::G_FLDEXP:
4034 case AMDGPU::G_FMINNUM_IEEE:
4035 case AMDGPU::G_FMAXNUM_IEEE:
4036 case AMDGPU::G_FCANONICALIZE:
4037 case AMDGPU::G_STRICT_FLDEXP:
4038 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4039 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4040 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4041 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4042 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4043 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4044 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4045 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4046 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4047 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4048 case AMDGPU::G_AMDGPU_SMED3:
4049 case AMDGPU::G_AMDGPU_FMED3:
4050 return getDefaultMappingVOP(MI);
4051 case AMDGPU::G_UMULH:
4052 case AMDGPU::G_SMULH: {
4054 return getDefaultMappingSOP(MI);
4055 return getDefaultMappingVOP(MI);
4056 }
4057 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4058 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4059 // Three possible mappings:
4060 //
4061 // - Default SOP
4062 // - Default VOP
4063 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4064 //
4065 // This allows instruction selection to keep the multiplication part of the
4066 // instruction on the SALU.
4067 bool AllSalu = true;
4068 bool MulSalu = true;
4069 for (unsigned i = 0; i < 5; ++i) {
4070 Register Reg = MI.getOperand(i).getReg();
4071 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4072 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4073 AllSalu = false;
4074 if (i == 2 || i == 3) {
4075 MulSalu = false;
4076 break;
4077 }
4078 }
4079 }
4080 }
4081
4082 if (AllSalu)
4083 return getDefaultMappingSOP(MI);
4084
4085 // If the multiply-add is full-rate in VALU, use that even if the
4086 // multiplication part is scalar. Accumulating separately on the VALU would
4087 // take two instructions.
4088 if (!MulSalu || Subtarget.hasFullRate64Ops())
4089 return getDefaultMappingVOP(MI);
4090
4091 // Keep the multiplication on the SALU, then accumulate on the VALU.
4092 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4093 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4094 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4095 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4096 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4097 break;
4098 }
4099 case AMDGPU::G_IMPLICIT_DEF: {
4100 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4101 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4102 break;
4103 }
4104 case AMDGPU::G_FCONSTANT:
4105 case AMDGPU::G_CONSTANT:
4106 case AMDGPU::G_GLOBAL_VALUE:
4107 case AMDGPU::G_FRAME_INDEX:
4108 case AMDGPU::G_BLOCK_ADDR:
4109 case AMDGPU::G_READSTEADYCOUNTER:
4110 case AMDGPU::G_READCYCLECOUNTER: {
4111 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4112 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4113 break;
4114 }
4115 case AMDGPU::G_DYN_STACKALLOC: {
4116 // Result is always uniform, and a wave reduction is needed for the source.
4117 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4118 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4119 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4120 break;
4121 }
4122 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4123 // This case is weird because we expect a physical register in the source,
4124 // but need to set a bank anyway.
4125 //
4126 // TODO: We could select the result to SGPR or VGPR
4127 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4128 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4129 break;
4130 }
4131 case AMDGPU::G_INSERT: {
4132 unsigned BankID = getMappingType(MRI, MI);
4133 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4134 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4135 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4136 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4137 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4138 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4139 OpdsMapping[3] = nullptr;
4140 break;
4141 }
4142 case AMDGPU::G_EXTRACT: {
4143 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4144 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4145 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4146 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4147 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4148 OpdsMapping[2] = nullptr;
4149 break;
4150 }
4151 case AMDGPU::G_BUILD_VECTOR:
4152 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4153 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4154 if (DstTy == LLT::fixed_vector(2, 16)) {
4155 unsigned DstSize = DstTy.getSizeInBits();
4156 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4157 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4158 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4159 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4160
4161 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4162 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4163 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4164 break;
4165 }
4166
4167 [[fallthrough]];
4168 }
4169 case AMDGPU::G_MERGE_VALUES:
4170 case AMDGPU::G_CONCAT_VECTORS: {
4171 unsigned Bank = getMappingType(MRI, MI);
4172 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4173 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4174
4175 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4176 // Op1 and Dst should use the same register bank.
4177 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4178 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4179 break;
4180 }
4181 case AMDGPU::G_BITREVERSE:
4182 case AMDGPU::G_BITCAST:
4183 case AMDGPU::G_INTTOPTR:
4184 case AMDGPU::G_PTRTOINT:
4185 case AMDGPU::G_FABS:
4186 case AMDGPU::G_FNEG: {
4187 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4188 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4189 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4190 break;
4191 }
4192 case AMDGPU::G_AMDGPU_FFBH_U32:
4193 case AMDGPU::G_AMDGPU_FFBL_B32:
4194 case AMDGPU::G_CTLZ_ZERO_UNDEF:
4195 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
4196 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4197 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4198 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4199 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4200 break;
4201 }
4202 case AMDGPU::G_CTPOP: {
4203 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4204 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4205 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4206
4207 // This should really be getValueMappingSGPR64Only, but allowing the generic
4208 // code to handle the register split just makes using LegalizerHelper more
4209 // difficult.
4210 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4211 break;
4212 }
4213 case AMDGPU::G_TRUNC: {
4214 Register Dst = MI.getOperand(0).getReg();
4215 Register Src = MI.getOperand(1).getReg();
4216 unsigned Bank = getRegBankID(Src, MRI);
4217 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4218 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4219 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4220 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4221 break;
4222 }
4223 case AMDGPU::G_ZEXT:
4224 case AMDGPU::G_SEXT:
4225 case AMDGPU::G_ANYEXT:
4226 case AMDGPU::G_SEXT_INREG: {
4227 Register Dst = MI.getOperand(0).getReg();
4228 Register Src = MI.getOperand(1).getReg();
4229 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4230 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4231
4232 unsigned DstBank;
4233 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4234 assert(SrcBank);
4235 switch (SrcBank->getID()) {
4236 case AMDGPU::SGPRRegBankID:
4237 DstBank = AMDGPU::SGPRRegBankID;
4238 break;
4239 default:
4240 DstBank = AMDGPU::VGPRRegBankID;
4241 break;
4242 }
4243
4244 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4245 // 32-bits, and then to 64.
4246 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4247 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4248 SrcSize);
4249 break;
4250 }
4251 case AMDGPU::G_IS_FPCLASS: {
4252 Register SrcReg = MI.getOperand(1).getReg();
4253 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4254 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4255 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4256 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4257 break;
4258 }
4259 case AMDGPU::G_STORE: {
4260 assert(MI.getOperand(0).isReg());
4261 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4262
4263 // FIXME: We need to specify a different reg bank once scalar stores are
4264 // supported.
4265 const ValueMapping *ValMapping =
4266 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4267 OpdsMapping[0] = ValMapping;
4268 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4269 break;
4270 }
4271 case AMDGPU::G_ICMP:
4272 case AMDGPU::G_FCMP: {
4273 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4274
4275 // See if the result register has already been constrained to vcc, which may
4276 // happen due to control flow intrinsic lowering.
4277 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4278 AMDGPU::SGPRRegBankID);
4279 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4280 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4281
4282 auto canUseSCCICMP = [&]() {
4283 auto Pred =
4284 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4285 return Size == 32 ||
4286 (Size == 64 &&
4287 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4289 };
4290 auto canUseSCCFCMP = [&]() {
4291 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4292 };
4293
4294 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4295 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4296 Op2Bank == AMDGPU::SGPRRegBankID &&
4297 Op3Bank == AMDGPU::SGPRRegBankID &&
4298 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4299
4300 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4301 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4302
4303 // TODO: Use 32-bit for scalar output size.
4304 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4305 const unsigned ResultSize = 1;
4306
4307 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4308 OpdsMapping[1] = nullptr; // Predicate Operand.
4309 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4310 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4311 break;
4312 }
4313 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4314 // VGPR index can be used for waterfall when indexing a SGPR vector.
4315 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4316 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4317 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4318 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4319 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4320 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4321
4322 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4323 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4324
4325 // The index can be either if the source vector is VGPR.
4326 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4327 break;
4328 }
4329 case AMDGPU::G_INSERT_VECTOR_ELT: {
4330 unsigned OutputBankID = isSALUMapping(MI) ?
4331 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4332
4333 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4334 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4335 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4336 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4337 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4338
4339 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4340 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4341
4342 // This is a weird case, because we need to break down the mapping based on
4343 // the register bank of a different operand.
4344 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4345 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4346 InsertSize);
4347 } else {
4348 assert(InsertSize == 32 || InsertSize == 64);
4349 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4350 }
4351
4352 // The index can be either if the source vector is VGPR.
4353 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4354 break;
4355 }
4356 case AMDGPU::G_UNMERGE_VALUES: {
4357 unsigned Bank = getMappingType(MRI, MI);
4358
4359 // Op1 and Dst should use the same register bank.
4360 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4361 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4362 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4363 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4364 }
4365 break;
4366 }
4367 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4368 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4369 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4370 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4371 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4372 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4373 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4374 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4375 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4376 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4377 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4378 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4379 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4380 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4381 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4382 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4383 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4384 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4385 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4386 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4387 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4388 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4389 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4390
4391 // rsrc
4392 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4393
4394 // vindex
4395 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4396
4397 // voffset
4398 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4399
4400 // soffset
4401 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4402
4403 // Any remaining operands are immediates and were correctly null
4404 // initialized.
4405 break;
4406 }
4407 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4408 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4409 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4410 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4411 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4412 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4413 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4414 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4415 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4416 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4417 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4418 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4419 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4420 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4421 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4422 // vdata_out
4423 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4424
4425 // vdata_in
4426 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4427
4428 // rsrc
4429 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4430
4431 // vindex
4432 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4433
4434 // voffset
4435 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4436
4437 // soffset
4438 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4439
4440 // Any remaining operands are immediates and were correctly null
4441 // initialized.
4442 break;
4443 }
4444 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4445 // vdata_out
4446 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4447
4448 // vdata_in
4449 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4450
4451 // cmp
4452 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4453
4454 // rsrc
4455 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4456
4457 // vindex
4458 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4459
4460 // voffset
4461 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4462
4463 // soffset
4464 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4465
4466 // Any remaining operands are immediates and were correctly null
4467 // initialized.
4468 break;
4469 }
4470 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4471 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4472 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4473 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4474 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4475 // Lie and claim everything is legal, even though some need to be
4476 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4477 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4478 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4479
4480 // We need to convert this to a MUBUF if either the resource of offset is
4481 // VGPR.
4482 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4483 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4484 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4485
4486 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4487 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4488 break;
4489 }
4490 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4491 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4492 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4493 break;
4494 case AMDGPU::G_INTRINSIC:
4495 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4496 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4497 default:
4499 case Intrinsic::amdgcn_div_fmas:
4500 case Intrinsic::amdgcn_div_fixup:
4501 case Intrinsic::amdgcn_trig_preop:
4502 case Intrinsic::amdgcn_sin:
4503 case Intrinsic::amdgcn_cos:
4504 case Intrinsic::amdgcn_log_clamp:
4505 case Intrinsic::amdgcn_rcp_legacy:
4506 case Intrinsic::amdgcn_rsq_legacy:
4507 case Intrinsic::amdgcn_rsq_clamp:
4508 case Intrinsic::amdgcn_fmul_legacy:
4509 case Intrinsic::amdgcn_fma_legacy:
4510 case Intrinsic::amdgcn_frexp_mant:
4511 case Intrinsic::amdgcn_frexp_exp:
4512 case Intrinsic::amdgcn_fract:
4513 case Intrinsic::amdgcn_cvt_pknorm_i16:
4514 case Intrinsic::amdgcn_cvt_pknorm_u16:
4515 case Intrinsic::amdgcn_cvt_pk_i16:
4516 case Intrinsic::amdgcn_cvt_pk_u16:
4517 case Intrinsic::amdgcn_fmed3:
4518 case Intrinsic::amdgcn_cubeid:
4519 case Intrinsic::amdgcn_cubema:
4520 case Intrinsic::amdgcn_cubesc:
4521 case Intrinsic::amdgcn_cubetc:
4522 case Intrinsic::amdgcn_sffbh:
4523 case Intrinsic::amdgcn_fmad_ftz:
4524 case Intrinsic::amdgcn_mbcnt_lo:
4525 case Intrinsic::amdgcn_mbcnt_hi:
4526 case Intrinsic::amdgcn_mul_u24:
4527 case Intrinsic::amdgcn_mul_i24:
4528 case Intrinsic::amdgcn_mulhi_u24:
4529 case Intrinsic::amdgcn_mulhi_i24:
4530 case Intrinsic::amdgcn_lerp:
4531 case Intrinsic::amdgcn_sad_u8:
4532 case Intrinsic::amdgcn_msad_u8:
4533 case Intrinsic::amdgcn_sad_hi_u8:
4534 case Intrinsic::amdgcn_sad_u16:
4535 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4536 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4537 case Intrinsic::amdgcn_mqsad_u32_u8:
4538 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4539 case Intrinsic::amdgcn_alignbyte:
4540 case Intrinsic::amdgcn_perm:
4541 case Intrinsic::amdgcn_prng_b32:
4542 case Intrinsic::amdgcn_fdot2:
4543 case Intrinsic::amdgcn_sdot2:
4544 case Intrinsic::amdgcn_udot2:
4545 case Intrinsic::amdgcn_sdot4:
4546 case Intrinsic::amdgcn_udot4:
4547 case Intrinsic::amdgcn_sdot8:
4548 case Intrinsic::amdgcn_udot8:
4549 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4550 case Intrinsic::amdgcn_fdot2_f16_f16:
4551 case Intrinsic::amdgcn_fdot2_f32_bf16:
4552 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4553 case Intrinsic::amdgcn_sudot4:
4554 case Intrinsic::amdgcn_sudot8:
4555 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4556 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4557 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4558 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4559 case Intrinsic::amdgcn_cvt_f32_fp8:
4560 case Intrinsic::amdgcn_cvt_f32_bf8:
4561 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4562 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4563 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4564 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4565 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4566 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4567 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4568 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4569 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4570 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4571 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4572 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4573 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4574 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4575 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4576 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4577 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4578 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4579 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4580 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4581 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4582 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4583 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4584 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4585 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4586 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4587 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4588 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4589 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4590 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4591 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4592 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4593 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4594 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4595 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4596 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4597 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4598 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4599 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4600 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4601 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4602 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4603 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4604 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4605 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4606 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4607 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4608 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4609 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4610 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4611 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4612 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4613 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4614 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4615 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4616 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4617 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4618 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4619 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4620 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4621 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4622 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4623 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4624 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4625 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4626 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4627 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4628 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4629 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4630 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4631 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4632 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4633 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4634 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4635 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4636 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4637 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4638 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4639 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4640 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4641 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4642 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4643 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4644 return getDefaultMappingVOP(MI);
4645 case Intrinsic::amdgcn_log:
4646 case Intrinsic::amdgcn_exp2:
4647 case Intrinsic::amdgcn_rcp:
4648 case Intrinsic::amdgcn_rsq:
4649 case Intrinsic::amdgcn_sqrt: {
4650 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4651 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4653 return getDefaultMappingSOP(MI);
4654 return getDefaultMappingVOP(MI);
4655 }
4656 case Intrinsic::amdgcn_sbfe:
4657 case Intrinsic::amdgcn_ubfe:
4658 if (isSALUMapping(MI))
4659 return getDefaultMappingSOP(MI);
4660 return getDefaultMappingVOP(MI);
4661 case Intrinsic::amdgcn_ds_swizzle:
4662 case Intrinsic::amdgcn_ds_permute:
4663 case Intrinsic::amdgcn_ds_bpermute:
4664 case Intrinsic::amdgcn_update_dpp:
4665 case Intrinsic::amdgcn_mov_dpp8:
4666 case Intrinsic::amdgcn_mov_dpp:
4667 case Intrinsic::amdgcn_strict_wwm:
4668 case Intrinsic::amdgcn_wwm:
4669 case Intrinsic::amdgcn_strict_wqm:
4670 case Intrinsic::amdgcn_wqm:
4671 case Intrinsic::amdgcn_softwqm:
4672 case Intrinsic::amdgcn_set_inactive:
4673 case Intrinsic::amdgcn_set_inactive_chain_arg:
4674 case Intrinsic::amdgcn_permlane64:
4676 case Intrinsic::amdgcn_cvt_pkrtz:
4678 return getDefaultMappingSOP(MI);
4679 return getDefaultMappingVOP(MI);
4680 case Intrinsic::amdgcn_kernarg_segment_ptr:
4681 case Intrinsic::amdgcn_s_getpc:
4682 case Intrinsic::amdgcn_groupstaticsize:
4683 case Intrinsic::amdgcn_reloc_constant:
4684 case Intrinsic::returnaddress: {
4685 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4686 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4687 break;
4688 }
4689 case Intrinsic::amdgcn_wqm_vote: {
4690 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4691 OpdsMapping[0] = OpdsMapping[2]
4692 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4693 break;
4694 }
4695 case Intrinsic::amdgcn_ps_live: {
4696 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4697 break;
4698 }
4699 case Intrinsic::amdgcn_div_scale: {
4700 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4701 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4702 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4703 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4704
4705 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4706 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4707 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4708 break;
4709 }
4710 case Intrinsic::amdgcn_class: {
4711 Register Src0Reg = MI.getOperand(2).getReg();
4712 Register Src1Reg = MI.getOperand(3).getReg();
4713 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4714 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4715 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4716 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4717 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4718 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4719 break;
4720 }
4721 case Intrinsic::amdgcn_icmp:
4722 case Intrinsic::amdgcn_fcmp: {
4723 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4724 // This is not VCCRegBank because this is not used in boolean contexts.
4725 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4726 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4727 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4728 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4729 break;
4730 }
4731 case Intrinsic::amdgcn_readlane: {
4732 // This must be an SGPR, but accept a VGPR.
4733 Register IdxReg = MI.getOperand(3).getReg();
4734 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4735 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4736 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4737 [[fallthrough]];
4738 }
4739 case Intrinsic::amdgcn_readfirstlane: {
4740 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4741 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4742 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4743 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4744 break;
4745 }
4746 case Intrinsic::amdgcn_writelane: {
4747 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4748 Register SrcReg = MI.getOperand(2).getReg();
4749 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4750 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4751 Register IdxReg = MI.getOperand(3).getReg();
4752 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4753 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4754 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4755
4756 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4757 // to legalize.
4758 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4759 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4760 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4761 break;
4762 }
4763 case Intrinsic::amdgcn_if_break: {
4764 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4765 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4766 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4767 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4768 break;
4769 }
4770 case Intrinsic::amdgcn_permlane16:
4771 case Intrinsic::amdgcn_permlanex16: {
4772 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4773 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4774 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4775 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4776 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4777 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4778 break;
4779 }
4780 case Intrinsic::amdgcn_permlane16_var:
4781 case Intrinsic::amdgcn_permlanex16_var: {
4782 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4783 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4784 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4785 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4786 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4787 break;
4788 }
4789 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
4790 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
4791 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
4792 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
4793 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
4794 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
4795 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
4796 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
4797 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
4798 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
4799 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
4800 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
4801 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
4802 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
4803 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
4804 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
4805 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
4806 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
4807 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
4808 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
4809 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
4810 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
4811 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
4812 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
4813 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
4814 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
4815 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
4816 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
4817 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
4818 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
4819 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
4820 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
4821 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
4822 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
4823 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
4824 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
4825 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
4826 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
4827 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
4828 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
4829 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
4830 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
4831 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
4832 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
4833 // Default for MAI intrinsics.
4834 // srcC can also be an immediate which can be folded later.
4835 // FIXME: Should we eventually add an alternative mapping with AGPR src
4836 // for srcA/srcB?
4837 //
4838 // vdst, srcA, srcB, srcC
4840 OpdsMapping[0] =
4841 Info->mayNeedAGPRs()
4842 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
4843 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4844 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4845 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4846 OpdsMapping[4] =
4847 Info->mayNeedAGPRs()
4848 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
4849 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4850 break;
4851 }
4852 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
4853 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
4855 OpdsMapping[0] =
4856 Info->mayNeedAGPRs()
4857 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
4858 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4859
4860 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4861 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4862 OpdsMapping[4] =
4863 Info->mayNeedAGPRs()
4864 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
4865 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4866
4867 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
4868 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
4869 break;
4870 }
4871 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
4872 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
4873 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
4874 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
4875 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
4876 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
4877 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
4878 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
4879 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
4880 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
4881 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
4882 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
4883 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
4884 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
4885 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
4886 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
4887 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
4888 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
4889 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
4890 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
4891 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
4892 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
4893 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
4894 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
4895 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
4896 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
4897 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
4898 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
4899 // vdst, srcA, srcB, srcC, idx
4900 OpdsMapping[0] = getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4901 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4902 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4903 OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4904 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4905 break;
4906 }
4907 case Intrinsic::amdgcn_interp_p1:
4908 case Intrinsic::amdgcn_interp_p2:
4909 case Intrinsic::amdgcn_interp_mov:
4910 case Intrinsic::amdgcn_interp_p1_f16:
4911 case Intrinsic::amdgcn_interp_p2_f16:
4912 case Intrinsic::amdgcn_lds_param_load: {
4913 const int M0Idx = MI.getNumOperands() - 1;
4914 Register M0Reg = MI.getOperand(M0Idx).getReg();
4915 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
4916 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4917
4918 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4919 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
4920 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4921
4922 // Must be SGPR, but we must take whatever the original bank is and fix it
4923 // later.
4924 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
4925 break;
4926 }
4927 case Intrinsic::amdgcn_interp_inreg_p10:
4928 case Intrinsic::amdgcn_interp_inreg_p2:
4929 case Intrinsic::amdgcn_interp_inreg_p10_f16:
4930 case Intrinsic::amdgcn_interp_inreg_p2_f16:
4931 case Intrinsic::amdgcn_interp_p10_rtz_f16:
4932 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
4933 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4934 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4935 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4936 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4937 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4938 break;
4939 }
4940 case Intrinsic::amdgcn_permlane16_swap:
4941 case Intrinsic::amdgcn_permlane32_swap: {
4942 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4943 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
4944 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4945 break;
4946 }
4947 case Intrinsic::amdgcn_ballot: {
4948 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4949 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4950 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4951 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
4952 break;
4953 }
4954 case Intrinsic::amdgcn_inverse_ballot: {
4955 // This must be an SGPR, but accept a VGPR.
4956 Register MaskReg = MI.getOperand(2).getReg();
4957 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
4958 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
4959 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4960 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
4961 break;
4962 }
4963 case Intrinsic::amdgcn_bitop3: {
4964 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4965 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4966 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4967 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4968 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4969 break;
4970 }
4971 case Intrinsic::amdgcn_s_quadmask:
4972 case Intrinsic::amdgcn_s_wqm: {
4973 Register MaskReg = MI.getOperand(2).getReg();
4974 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
4975 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
4976 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
4977 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
4978 break;
4979 }
4980 case Intrinsic::amdgcn_wave_reduce_umin:
4981 case Intrinsic::amdgcn_wave_reduce_umax: {
4982 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4983 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4984 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4985 auto regBankID =
4986 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4987 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
4988 break;
4989 }
4990 case Intrinsic::amdgcn_s_bitreplicate:
4991 Register MaskReg = MI.getOperand(2).getReg();
4992 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
4993 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
4994 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
4995 }
4996 break;
4997 }
4998 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
4999 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5000 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5001 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5002 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5003 auto IntrID = AMDGPU::getIntrinsicID(MI);
5004 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5005 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5006 // Non-images can have complications from operands that allow both SGPR
5007 // and VGPR. For now it's too complicated to figure out the final opcode
5008 // to derive the register bank from the MCInstrDesc.
5009 assert(RSrcIntrin->IsImage);
5010 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5011 }
5012 case AMDGPU::G_AMDGPU_INTRIN_BVH_INTERSECT_RAY: {
5013 unsigned N = MI.getNumExplicitOperands() - 2;
5014 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 128);
5015 OpdsMapping[N] = getSGPROpMapping(MI.getOperand(N).getReg(), MRI, *TRI);
5016 if (N == 3) {
5017 // Sequential form: all operands combined into VGPR256/VGPR512
5018 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5019 if (Size > 256)
5020 Size = 512;
5021 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5022 } else {
5023 // NSA form
5024 for (unsigned I = 2; I < N; ++I) {
5025 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5026 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5027 }
5028 }
5029 break;
5030 }
5031 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5032 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5033 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5034 switch (IntrID) {
5035 case Intrinsic::amdgcn_s_getreg:
5036 case Intrinsic::amdgcn_s_memtime:
5037 case Intrinsic::amdgcn_s_memrealtime:
5038 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5039 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5040 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5041 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5042 break;
5043 }
5044 case Intrinsic::amdgcn_global_atomic_csub:
5045 case Intrinsic::amdgcn_global_atomic_fmin_num:
5046 case Intrinsic::amdgcn_global_atomic_fmax_num:
5047 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5048 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5049 case Intrinsic::amdgcn_atomic_cond_sub_u32:
5050 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5051 case Intrinsic::amdgcn_global_load_tr_b64:
5052 case Intrinsic::amdgcn_global_load_tr_b128:
5053 case Intrinsic::amdgcn_ds_read_tr4_b64:
5054 case Intrinsic::amdgcn_ds_read_tr6_b96:
5055 case Intrinsic::amdgcn_ds_read_tr8_b64:
5056 case Intrinsic::amdgcn_ds_read_tr16_b64:
5058 case Intrinsic::amdgcn_ds_ordered_add:
5059 case Intrinsic::amdgcn_ds_ordered_swap: {
5060 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5061 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5062 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5063 AMDGPU::SGPRRegBankID);
5064 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5065 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5066 break;
5067 }
5068 case Intrinsic::amdgcn_ds_append:
5069 case Intrinsic::amdgcn_ds_consume: {
5070 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5071 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5072 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5073 break;
5074 }
5075 case Intrinsic::amdgcn_exp_compr:
5076 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5077 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5078 break;
5079 case Intrinsic::amdgcn_exp:
5080 // FIXME: Could we support packed types here?
5081 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5082 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5083 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5084 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5085 break;
5086 case Intrinsic::amdgcn_exp_row:
5087 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5088 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5089 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5090 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5091 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5092 break;
5093 case Intrinsic::amdgcn_s_sendmsg:
5094 case Intrinsic::amdgcn_s_sendmsghalt: {
5095 // This must be an SGPR, but accept a VGPR.
5096 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5097 AMDGPU::SGPRRegBankID);
5098 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5099 break;
5100 }
5101 case Intrinsic::amdgcn_s_setreg: {
5102 // This must be an SGPR, but accept a VGPR.
5103 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5104 AMDGPU::SGPRRegBankID);
5105 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5106 break;
5107 }
5108 case Intrinsic::amdgcn_s_ttracedata: {
5109 // This must be an SGPR, but accept a VGPR.
5110 unsigned Bank =
5111 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5112 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5113 break;
5114 }
5115 case Intrinsic::amdgcn_end_cf: {
5116 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5117 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5118 break;
5119 }
5120 case Intrinsic::amdgcn_else: {
5121 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5122 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5123 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5124 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5125 break;
5126 }
5127 case Intrinsic::amdgcn_init_whole_wave:
5128 case Intrinsic::amdgcn_live_mask: {
5129 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5130 break;
5131 }
5132 case Intrinsic::amdgcn_wqm_demote:
5133 case Intrinsic::amdgcn_kill: {
5134 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5135 break;
5136 }
5137 case Intrinsic::amdgcn_raw_buffer_load:
5138 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5139 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5140 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5141 case Intrinsic::amdgcn_raw_tbuffer_load:
5142 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5143 // FIXME: Should make intrinsic ID the last operand of the instruction,
5144 // then this would be the same as store
5145 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5146 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5147 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5148 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5149 break;
5150 }
5151 case Intrinsic::amdgcn_raw_buffer_load_lds:
5152 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
5153 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5154 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5155 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5156 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5157 break;
5158 }
5159 case Intrinsic::amdgcn_raw_buffer_store:
5160 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5161 case Intrinsic::amdgcn_raw_buffer_store_format:
5162 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5163 case Intrinsic::amdgcn_raw_tbuffer_store:
5164 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5165 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5166 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5167 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5168 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5169 break;
5170 }
5171 case Intrinsic::amdgcn_struct_buffer_load:
5172 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5173 case Intrinsic::amdgcn_struct_tbuffer_load:
5174 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5175 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5176 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5177 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5178 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5179 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5180 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5181 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5182 break;
5183 }
5184 case Intrinsic::amdgcn_struct_buffer_load_lds:
5185 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
5186 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5187 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5188 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5189 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5190 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5191 break;
5192 }
5193 case Intrinsic::amdgcn_struct_buffer_store:
5194 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5195 case Intrinsic::amdgcn_struct_tbuffer_store:
5196 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5197 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5198 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5199 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5200 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5201 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5202 break;
5203 }
5204 case Intrinsic::amdgcn_init_exec_from_input: {
5205 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5206 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5207 break;
5208 }
5209 case Intrinsic::amdgcn_ds_gws_init:
5210 case Intrinsic::amdgcn_ds_gws_barrier:
5211 case Intrinsic::amdgcn_ds_gws_sema_br: {
5212 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5213
5214 // This must be an SGPR, but accept a VGPR.
5215 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5216 AMDGPU::SGPRRegBankID);
5217 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5218 break;
5219 }
5220 case Intrinsic::amdgcn_ds_gws_sema_v:
5221 case Intrinsic::amdgcn_ds_gws_sema_p:
5222 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5223 // This must be an SGPR, but accept a VGPR.
5224 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5225 AMDGPU::SGPRRegBankID);
5226 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5227 break;
5228 }
5229 case Intrinsic::amdgcn_global_load_lds: {
5230 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5231 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5232 break;
5233 }
5234 case Intrinsic::amdgcn_lds_direct_load: {
5235 const int M0Idx = MI.getNumOperands() - 1;
5236 Register M0Reg = MI.getOperand(M0Idx).getReg();
5237 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5238 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5239
5240 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5241 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5242 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5243
5244 // Must be SGPR, but we must take whatever the original bank is and fix it
5245 // later.
5246 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5247 break;
5248 }
5249 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5250 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5251 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5252 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5253 break;
5254 case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
5255 OpdsMapping[0] =
5256 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5257 OpdsMapping[1] =
5258 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5259 OpdsMapping[3] =
5260 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5261 OpdsMapping[4] =
5262 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5263 OpdsMapping[5] =
5264 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5265 break;
5266 }
5267 case Intrinsic::amdgcn_s_sleep_var:
5268 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5269 break;
5270 case Intrinsic::amdgcn_s_barrier_join:
5271 case Intrinsic::amdgcn_s_wakeup_barrier:
5272 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5273 break;
5274 case Intrinsic::amdgcn_s_barrier_init:
5275 case Intrinsic::amdgcn_s_barrier_signal_var:
5276 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5277 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5278 break;
5279 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5280 const unsigned ResultSize = 1;
5281 OpdsMapping[0] =
5282 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5283 break;
5284 }
5285 case Intrinsic::amdgcn_s_get_barrier_state:
5286 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5287 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5288 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5289 break;
5290 }
5291 case Intrinsic::amdgcn_pops_exiting_wave_id:
5292 return getDefaultMappingSOP(MI);
5293 case Intrinsic::amdgcn_s_prefetch_data: {
5294 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5295 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5296 break;
5297 }
5298 default:
5300 }
5301 break;
5302 }
5303 case AMDGPU::G_SELECT: {
5304 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5305 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5306 AMDGPU::SGPRRegBankID);
5307 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5308 AMDGPU::SGPRRegBankID);
5309 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5310 Op3Bank == AMDGPU::SGPRRegBankID;
5311
5312 unsigned CondBankDefault = SGPRSrcs ?
5313 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5314 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5315 CondBankDefault);
5316 if (CondBank == AMDGPU::SGPRRegBankID)
5317 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5318 else if (CondBank == AMDGPU::VGPRRegBankID)
5319 CondBank = AMDGPU::VCCRegBankID;
5320
5321 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5322 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5323
5324 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5325
5326 // TODO: Should report 32-bit for scalar condition type.
5327 if (Size == 64) {
5328 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5329 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5330 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5331 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5332 } else {
5333 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5334 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5335 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5336 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5337 }
5338
5339 break;
5340 }
5341
5342 case AMDGPU::G_SI_CALL: {
5343 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5344 // Lie and claim everything is legal, even though some need to be
5345 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5346 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5347
5348 // Allow anything for implicit arguments
5349 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5350 if (MI.getOperand(I).isReg()) {
5351 Register Reg = MI.getOperand(I).getReg();
5352 auto OpBank = getRegBankID(Reg, MRI);
5353 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5354 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5355 }
5356 }
5357 break;
5358 }
5359 case AMDGPU::G_LOAD:
5360 case AMDGPU::G_ZEXTLOAD:
5361 case AMDGPU::G_SEXTLOAD:
5362 return getInstrMappingForLoad(MI);
5363
5364 case AMDGPU::G_ATOMICRMW_XCHG:
5365 case AMDGPU::G_ATOMICRMW_ADD:
5366 case AMDGPU::G_ATOMICRMW_SUB:
5367 case AMDGPU::G_ATOMICRMW_AND:
5368 case AMDGPU::G_ATOMICRMW_OR:
5369 case AMDGPU::G_ATOMICRMW_XOR:
5370 case AMDGPU::G_ATOMICRMW_MAX:
5371 case AMDGPU::G_ATOMICRMW_MIN:
5372 case AMDGPU::G_ATOMICRMW_UMAX:
5373 case AMDGPU::G_ATOMICRMW_UMIN:
5374 case AMDGPU::G_ATOMICRMW_FADD:
5375 case AMDGPU::G_ATOMICRMW_FMIN:
5376 case AMDGPU::G_ATOMICRMW_FMAX:
5377 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5378 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5379 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5380 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5381 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5382 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5383 break;
5384 }
5385 case AMDGPU::G_ATOMIC_CMPXCHG: {
5386 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5387 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5388 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5389 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5390 break;
5391 }
5392 case AMDGPU::G_BRCOND: {
5393 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5394 AMDGPU::SGPRRegBankID);
5395 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5396 if (Bank != AMDGPU::SGPRRegBankID)
5397 Bank = AMDGPU::VCCRegBankID;
5398
5399 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5400 break;
5401 }
5402 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5403 return getDefaultMappingVOP(MI);
5404 case AMDGPU::G_PREFETCH:
5405 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5406 break;
5407 }
5408
5409 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5410 getOperandsMapping(OpdsMapping),
5411 MI.getNumOperands());
5412}
unsigned const MachineRegisterInfo * MRI
static unsigned getIntrinsicID(const SDNode *N)
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
static const LLT S1
static const LLT S64
static const LLT S32
static const LLT S16
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static GCMetadataPrinterRegistry::Add< ErlangGCPrinter > X("erlang", "erlang-compatible garbage collector")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
#define I(x, y, z)
Definition: MD5.cpp:58
Contains matchers for matching SSA Machine Instructions.
mir Rename Register Operands
This file declares the MachineIRBuilder class.
unsigned const TargetRegisterInfo * TRI
static bool isReg(const MCInst &MI, unsigned OpNo)
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static GCMetadataPrinterRegistry::Add< OcamlGCMetadataPrinter > Y("ocaml", "ocaml 3.10-compatible collector")
static constexpr Register SPReg
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
Interface definition for SIRegisterInfo.
static bool isUniformMMO(const MachineMemOperand *MMO)
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:673
@ ICMP_SLT
signed less than
Definition: InstrTypes.h:702
@ ICMP_EQ
equal
Definition: InstrTypes.h:694
@ ICMP_NE
not equal
Definition: InstrTypes.h:695
This class represents an Operation in the Expression.
A debug info location.
Definition: DebugLoc.h:33
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:156
iterator end()
Definition: DenseMap.h:84
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition: DenseMap.h:211
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition: TypeSize.h:311
bool hasPrefetch() const
Definition: GCNSubtarget.h:962
bool hasScalarCompareEq64() const
bool hasScalarSubwordLoads() const
Definition: GCNSubtarget.h:465
bool hasFullRate64Ops() const
Definition: GCNSubtarget.h:387
bool isWave32() const
bool hasScalarDwordx3Loads() const
bool hasScalarMulHiInsts() const
Definition: GCNSubtarget.h:461
bool hasPseudoScalarTrans() const
bool useFlatForGlobal() const
Definition: GCNSubtarget.h:541
Generation getGeneration() const
Definition: GCNSubtarget.h:327
bool hasUnpackedD16VMem() const
Definition: GCNSubtarget.h:746
bool hasSALUFloatInsts() const
Abstract class that contains various methods for clients to notify about changes.
virtual void changingInstr(MachineInstr &MI)=0
This instruction is about to be mutated in some way.
virtual void changedInstr(MachineInstr &MI)=0
This instruction was mutated in some way.
virtual void createdInstr(MachineInstr &MI)=0
An instruction has been created and inserted into the function.
virtual void erasingInstr(MachineInstr &MI)=0
An instruction is about to be erased.
constexpr unsigned getScalarSizeInBits() const
Definition: LowLevelType.h:264
constexpr bool isScalar() const
Definition: LowLevelType.h:146
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
Definition: LowLevelType.h:42
constexpr bool isValid() const
Definition: LowLevelType.h:145
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
Definition: LowLevelType.h:159
constexpr bool isVector() const
Definition: LowLevelType.h:148
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
Definition: LowLevelType.h:190
constexpr LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
Definition: LowLevelType.h:277
constexpr ElementCount getElementCount() const
Definition: LowLevelType.h:183
constexpr unsigned getAddressSpace() const
Definition: LowLevelType.h:270
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
Definition: LowLevelType.h:100
constexpr LLT getScalarType() const
Definition: LowLevelType.h:205
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
Definition: LowLevelType.h:124
constexpr LLT divide(int Factor) const
Return a type that is Factor times smaller.
Definition: LowLevelType.h:234
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
TypeSize getValue() const
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
Definition: MachineInstr.h:69
const MachineBasicBlock * getParent() const
Definition: MachineInstr.h:347
const MachineOperand & getOperand(unsigned i) const
Definition: MachineInstr.h:585
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
Definition: RegisterBank.h:28
unsigned getID() const
Get the identifier of this register bank.
Definition: RegisterBank.h:45
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition: Register.h:91
bool splitMUBUFOffset(uint32_t Imm, uint32_t &SOffset, uint32_t &ImmOffset, Align Alignment=Align(4)) const
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
const TargetRegisterClass * getWaveMaskRegClass() const
static bool isSGPRClass(const TargetRegisterClass *RC)
static bool isAGPRClass(const TargetRegisterClass *RC)
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition: SmallSet.h:132
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition: SmallSet.h:175
bool empty() const
Definition: SmallSet.h:168
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition: SmallSet.h:181
bool empty() const
Definition: SmallVector.h:81
size_t size() const
Definition: SmallVector.h:78
void resize(size_type N)
Definition: SmallVector.h:638
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
static IntegerType * getInt32Ty(LLVMContext &C)
constexpr bool isKnownMultipleOf(ScalarTy RHS) const
This function tells the caller whether the element count is known at compile time to be a multiple of...
Definition: TypeSize.h:183
constexpr LeafTy divideCoefficientBy(ScalarTy RHS) const
We do not provide the '/' operator here because division for polynomial types does not work in the sa...
Definition: TypeSize.h:254
self_iterator getIterator()
Definition: ilist_node.h:132
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelKnownBits *KnownBits=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
operand_type_match m_Reg()
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
cst_pred_ty< is_zero_int > m_ZeroInt()
Match an integer 0 or a vector with all elements equal to 0.
Definition: PatternMatch.h:599
@ Kill
The last use of a register.
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition: Utils.cpp:630
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
bool constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition: Utils.cpp:155
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition: Utils.cpp:299
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition: SIInstrInfo.h:43
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:420
@ Add
Sum of integers.
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition: Threading.h:87
std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition: Utils.cpp:418
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition: Alignment.h:111
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition: Alignment.h:208
Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition: Utils.cpp:478
@ Default
The result values are uniform if and only if all operands are uniform.
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition: Threading.h:68