LLVM 23.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "AMDGPULaneMaskUtils.h"
77#include "GCNSubtarget.h"
79#include "SIRegisterInfo.h"
85#include "llvm/IR/IntrinsicsAMDGPU.h"
86
87#define GET_TARGET_REGBANK_IMPL
88#include "AMDGPUGenRegisterBank.inc"
89
90// This file will be TableGen'ed at some point.
91#include "AMDGPUGenRegisterBankInfo.def"
92
93using namespace llvm;
94using namespace MIPatternMatch;
95
96namespace {
97
98// Observer to apply a register bank to new registers created by LegalizerHelper.
99class ApplyRegBankMapping final : public GISelChangeObserver {
100private:
102 const AMDGPURegisterBankInfo &RBI;
104 const RegisterBank *NewBank;
106
107public:
108 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
109 MachineRegisterInfo &MRI_, const RegisterBank *RB)
110 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
111 assert(!B.isObservingChanges());
112 B.setChangeObserver(*this);
113 }
114
115 ~ApplyRegBankMapping() override {
116 for (MachineInstr *MI : NewInsts)
117 applyBank(*MI);
118
119 B.stopObservingChanges();
120 }
121
122 /// Set any registers that don't have a set register class or bank to SALU.
123 void applyBank(MachineInstr &MI) {
124 const unsigned Opc = MI.getOpcode();
125 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
126 Opc == AMDGPU::G_SEXT) {
127 // LegalizerHelper wants to use the basic legalization artifacts when
128 // widening etc. We don't handle selection with vcc in artifact sources,
129 // so we need to use a select instead to handle these properly.
130 Register DstReg = MI.getOperand(0).getReg();
131 Register SrcReg = MI.getOperand(1).getReg();
132 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
133 if (SrcBank == &AMDGPU::VCCRegBank) {
134 const LLT S32 = LLT::scalar(32);
135 assert(MRI.getType(SrcReg) == LLT::scalar(1));
136 assert(MRI.getType(DstReg) == S32);
137 assert(NewBank == &AMDGPU::VGPRRegBank);
138
139 // Replace the extension with a select, which really uses the boolean
140 // source.
141 B.setInsertPt(*MI.getParent(), MI);
142
143 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
144 auto False = B.buildConstant(S32, 0);
145 B.buildSelect(DstReg, SrcReg, True, False);
146 MRI.setRegBank(True.getReg(0), *NewBank);
147 MRI.setRegBank(False.getReg(0), *NewBank);
148 MI.eraseFromParent();
149 }
150
151 assert(!MRI.getRegClassOrRegBank(DstReg));
152 MRI.setRegBank(DstReg, *NewBank);
153 return;
154 }
155
156#ifndef NDEBUG
157 if (Opc == AMDGPU::G_TRUNC) {
158 Register DstReg = MI.getOperand(0).getReg();
159 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
160 assert(DstBank != &AMDGPU::VCCRegBank);
161 }
162#endif
163
164 for (MachineOperand &Op : MI.operands()) {
165 if (!Op.isReg())
166 continue;
167
168 // We may see physical registers if building a real MI
169 Register Reg = Op.getReg();
170 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
171 continue;
172
173 const RegisterBank *RB = NewBank;
174 if (MRI.getType(Reg) == LLT::scalar(1)) {
175 assert(NewBank == &AMDGPU::VGPRRegBank &&
176 "s1 operands should only be used for vector bools");
177 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
178 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
179 "not expecting legalization artifacts here");
180 RB = &AMDGPU::VCCRegBank;
181 }
182
183 MRI.setRegBank(Reg, *RB);
184 }
185 }
186
187 void erasingInstr(MachineInstr &MI) override {}
188
189 void createdInstr(MachineInstr &MI) override {
190 // At this point, the instruction was just inserted and has no operands.
191 NewInsts.push_back(&MI);
192 }
193
194 void changingInstr(MachineInstr &MI) override {}
195 void changedInstr(MachineInstr &MI) override {
196 // FIXME: In principle we should probably add the instruction to NewInsts,
197 // but the way the LegalizerHelper uses the observer, we will always see the
198 // registers we need to set the regbank on also referenced in a new
199 // instruction.
200 }
201};
202
203} // anonymous namespace
204
206 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
207 TII(Subtarget.getInstrInfo()) {
208
209 // HACK: Until this is fully tablegen'd.
210 static llvm::once_flag InitializeRegisterBankFlag;
211
212 static auto InitializeRegisterBankOnce = [this]() {
213 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
214 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
215 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
216 (void)this;
217 };
218
219 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
220}
221
222static bool isVectorRegisterBank(const RegisterBank &Bank) {
223 unsigned BankID = Bank.getID();
224 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
225}
226
228 return RB != &AMDGPU::SGPRRegBank;
229}
230
232 const RegisterBank &Src,
233 TypeSize Size) const {
234 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
235 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
236 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
237 return std::numeric_limits<unsigned>::max();
238 }
239
240 // Bool values are tricky, because the meaning is based on context. The SCC
241 // and VCC banks are for the natural scalar and vector conditions produced by
242 // a compare.
243 //
244 // Legalization doesn't know about the necessary context, so an s1 use may
245 // have been a truncate from an arbitrary value, in which case a copy (lowered
246 // as a compare with 0) needs to be inserted.
247 if (Size == 1 &&
248 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
249 (isVectorRegisterBank(Src) ||
250 Src.getID() == AMDGPU::SGPRRegBankID ||
251 Src.getID() == AMDGPU::VCCRegBankID))
252 return std::numeric_limits<unsigned>::max();
253
254 // There is no direct copy between AGPRs.
255 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
256 Src.getID() == AMDGPU::AGPRRegBankID)
257 return 4;
258
259 return RegisterBankInfo::copyCost(Dst, Src, Size);
260}
261
263 const ValueMapping &ValMapping,
264 const RegisterBank *CurBank) const {
265 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
266 // VGPR.
267 // FIXME: Is there a better way to do this?
268 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
269 return 10; // This is expensive.
270
271 assert(ValMapping.NumBreakDowns == 2 &&
272 ValMapping.BreakDown[0].Length == 32 &&
273 ValMapping.BreakDown[0].StartIdx == 0 &&
274 ValMapping.BreakDown[1].Length == 32 &&
275 ValMapping.BreakDown[1].StartIdx == 32 &&
276 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
277
278 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
279 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
280 // want.
281
282 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
283 // alignment restrictions, but this probably isn't important.
284 return 1;
285}
286
287const RegisterBank &
289 LLT Ty) const {
290 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
291 // VCC-like use.
292 if (TRI->isSGPRClass(&RC)) {
293 // FIXME: This probably came from a copy from a physical register, which
294 // should be inferable from the copied to-type. We don't have many boolean
295 // physical register constraints so just assume a normal SGPR for now.
296 if (!Ty.isValid())
297 return AMDGPU::SGPRRegBank;
298
299 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
300 }
301
302 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
303}
304
305template <unsigned NumOps>
308 const MachineInstr &MI, const MachineRegisterInfo &MRI,
309 const std::array<unsigned, NumOps> RegSrcOpIdx,
310 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
311
312 InstructionMappings AltMappings;
313
314 SmallVector<const ValueMapping *, 10> Operands(MI.getNumOperands());
315
316 unsigned Sizes[NumOps];
317 for (unsigned I = 0; I < NumOps; ++I) {
318 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
319 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
320 }
321
322 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
323 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
324 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
325 }
326
327 // getInstrMapping's default mapping uses ID 1, so start at 2.
328 unsigned MappingID = 2;
329 for (const auto &Entry : Table) {
330 for (unsigned I = 0; I < NumOps; ++I) {
331 int OpIdx = RegSrcOpIdx[I];
332 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
333 }
334
335 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
336 getOperandsMapping(Operands),
337 Operands.size()));
338 }
339
340 return AltMappings;
341}
342
345 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
347 case Intrinsic::amdgcn_readlane: {
348 static const OpRegBankEntry<3> Table[2] = {
349 // Perfectly legal.
350 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
351
352 // Need a readfirstlane for the index.
353 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
354 };
355
356 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
357 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
358 }
359 case Intrinsic::amdgcn_writelane: {
360 static const OpRegBankEntry<4> Table[4] = {
361 // Perfectly legal.
362 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
363
364 // Need readfirstlane of first op
365 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
366
367 // Need readfirstlane of second op
368 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
369
370 // Need readfirstlane of both ops
371 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
372 };
373
374 // rsrc, voffset, offset
375 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
376 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
377 }
378 default:
380 }
381}
382
385 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
386
388 case Intrinsic::amdgcn_s_buffer_load: {
389 static const OpRegBankEntry<2> Table[4] = {
390 // Perfectly legal.
391 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
392
393 // Only need 1 register in loop
394 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
395
396 // Have to waterfall the resource.
397 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
398
399 // Have to waterfall the resource, and the offset.
400 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
401 };
402
403 // rsrc, offset
404 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
405 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
406 }
407 case Intrinsic::amdgcn_ds_ordered_add:
408 case Intrinsic::amdgcn_ds_ordered_swap: {
409 // VGPR = M0, VGPR
410 static const OpRegBankEntry<3> Table[2] = {
411 // Perfectly legal.
412 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
413
414 // Need a readfirstlane for m0
415 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
416 };
417
418 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
419 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
420 }
421 case Intrinsic::amdgcn_s_sendmsg:
422 case Intrinsic::amdgcn_s_sendmsghalt: {
423 // FIXME: Should have no register for immediate
424 static const OpRegBankEntry<1> Table[2] = {
425 // Perfectly legal.
426 { { AMDGPU::SGPRRegBankID }, 1 },
427
428 // Need readlane
429 { { AMDGPU::VGPRRegBankID }, 3 }
430 };
431
432 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
433 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
434 }
435 default:
437 }
438}
439
440// FIXME: Returns uniform if there's no source value information. This is
441// probably wrong.
443 if (!MI.hasOneMemOperand())
444 return false;
445
446 const MachineMemOperand *MMO = *MI.memoperands_begin();
447 const unsigned AS = MMO->getAddrSpace();
448 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
450 const unsigned MemSize = 8 * MMO->getSize().getValue();
451
452 // Require 4-byte alignment.
453 return (MMO->getAlign() >= Align(4) ||
454 (Subtarget.hasScalarSubwordLoads() &&
455 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
456 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
457 // Can't do a scalar atomic load.
458 !MMO->isAtomic() &&
459 // Don't use scalar loads for volatile accesses to non-constant address
460 // spaces.
461 (IsConst || !MMO->isVolatile()) &&
462 // Memory must be known constant, or not written before this load.
463 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
465}
466
469 const MachineInstr &MI) const {
470
471 const MachineFunction &MF = *MI.getMF();
472 const MachineRegisterInfo &MRI = MF.getRegInfo();
473
474
475 InstructionMappings AltMappings;
476 switch (MI.getOpcode()) {
477 case TargetOpcode::G_CONSTANT:
478 case TargetOpcode::G_IMPLICIT_DEF: {
479 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
480 if (Size == 1) {
481 static const OpRegBankEntry<1> Table[3] = {
482 { { AMDGPU::VGPRRegBankID }, 1 },
483 { { AMDGPU::SGPRRegBankID }, 1 },
484 { { AMDGPU::VCCRegBankID }, 1 }
485 };
486
487 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
488 }
489
490 [[fallthrough]];
491 }
492 case TargetOpcode::G_FCONSTANT:
493 case TargetOpcode::G_FRAME_INDEX:
494 case TargetOpcode::G_GLOBAL_VALUE: {
495 static const OpRegBankEntry<1> Table[2] = {
496 { { AMDGPU::VGPRRegBankID }, 1 },
497 { { AMDGPU::SGPRRegBankID }, 1 }
498 };
499
500 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
501 }
502 case TargetOpcode::G_AND:
503 case TargetOpcode::G_OR:
504 case TargetOpcode::G_XOR: {
505 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
506
507 if (Size == 1) {
508 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
509 const InstructionMapping &SCCMapping = getInstructionMapping(
510 1, 1, getOperandsMapping(
511 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
512 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
513 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
514 3); // Num Operands
515 AltMappings.push_back(&SCCMapping);
516
517 const InstructionMapping &VCCMapping0 = getInstructionMapping(
518 2, 1, getOperandsMapping(
519 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
520 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
521 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
522 3); // Num Operands
523 AltMappings.push_back(&VCCMapping0);
524 return AltMappings;
525 }
526
527 if (Size != 64)
528 break;
529
530 const InstructionMapping &SSMapping = getInstructionMapping(
531 1, 1, getOperandsMapping(
532 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
533 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
534 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
535 3); // Num Operands
536 AltMappings.push_back(&SSMapping);
537
538 const InstructionMapping &VVMapping = getInstructionMapping(
539 2, 2, getOperandsMapping(
540 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
541 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
542 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
543 3); // Num Operands
544 AltMappings.push_back(&VVMapping);
545 break;
546 }
547 case TargetOpcode::G_LOAD:
548 case TargetOpcode::G_ZEXTLOAD:
549 case TargetOpcode::G_SEXTLOAD: {
550 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
551 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
552 unsigned PtrSize = PtrTy.getSizeInBits();
553 unsigned AS = PtrTy.getAddressSpace();
554
558 const InstructionMapping &SSMapping = getInstructionMapping(
559 1, 1, getOperandsMapping(
560 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
561 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
562 2); // Num Operands
563 AltMappings.push_back(&SSMapping);
564 }
565
566 const InstructionMapping &VVMapping = getInstructionMapping(
567 2, 1,
569 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
570 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
571 2); // Num Operands
572 AltMappings.push_back(&VVMapping);
573
574 // It may be possible to have a vgpr = load sgpr mapping here, because
575 // the mubuf instructions support this kind of load, but probably for only
576 // gfx7 and older. However, the addressing mode matching in the instruction
577 // selector should be able to do a better job of detecting and selecting
578 // these kinds of loads from the vgpr = load vgpr mapping.
579
580 return AltMappings;
581
582 }
583 case TargetOpcode::G_SELECT: {
584 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
585 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
586 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
587 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
588 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
589 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
590 4); // Num Operands
591 AltMappings.push_back(&SSMapping);
592
593 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
594 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
595 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
596 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
597 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
598 4); // Num Operands
599 AltMappings.push_back(&VVMapping);
600
601 return AltMappings;
602 }
603 case TargetOpcode::G_UADDE:
604 case TargetOpcode::G_USUBE:
605 case TargetOpcode::G_SADDE:
606 case TargetOpcode::G_SSUBE: {
607 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
608 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
610 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
611 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
612 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
613 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
615 5); // Num Operands
616 AltMappings.push_back(&SSMapping);
617
618 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
619 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
620 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
621 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
622 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
623 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
624 5); // Num Operands
625 AltMappings.push_back(&VVMapping);
626 return AltMappings;
627 }
628 case AMDGPU::G_BRCOND: {
629 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
630
631 // TODO: Change type to 32 for scalar
633 1, 1, getOperandsMapping(
634 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
635 2); // Num Operands
636 AltMappings.push_back(&SMapping);
637
639 1, 1, getOperandsMapping(
640 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
641 2); // Num Operands
642 AltMappings.push_back(&VMapping);
643 return AltMappings;
644 }
645 case AMDGPU::G_INTRINSIC:
646 case AMDGPU::G_INTRINSIC_CONVERGENT:
648 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
649 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
651 default:
652 break;
653 }
655}
656
660 LLT HalfTy,
661 Register Reg) const {
662 assert(HalfTy.getSizeInBits() == 32);
663 MachineRegisterInfo *MRI = B.getMRI();
664 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
665 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
666 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
667 MRI->setRegBank(LoLHS, *Bank);
668 MRI->setRegBank(HiLHS, *Bank);
669
670 Regs.push_back(LoLHS);
671 Regs.push_back(HiLHS);
672
673 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
674 .addDef(LoLHS)
675 .addDef(HiLHS)
676 .addUse(Reg);
677}
678
679/// Replace the current type each register in \p Regs has with \p NewTy
681 LLT NewTy) {
682 for (Register Reg : Regs) {
683 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
684 MRI.setType(Reg, NewTy);
685 }
686}
687
689 if (Ty.isVector()) {
690 assert(Ty.getElementCount().isKnownMultipleOf(2));
691 return LLT::scalarOrVector(Ty.getElementCount().divideCoefficientBy(2),
692 Ty.getElementType());
693 }
694
695 assert(Ty.getScalarSizeInBits() % 2 == 0);
696 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
697}
698
699// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
700// source value into a scalar register.
703 Register Src) const {
704 LLT Ty = MRI.getType(Src);
705 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
706
707 if (Bank == &AMDGPU::SGPRRegBank)
708 return Src;
709
710 unsigned Bits = Ty.getSizeInBits();
711 assert(Bits % 32 == 0);
712
713 if (Bank != &AMDGPU::VGPRRegBank) {
714 // We need to copy from AGPR to VGPR
715 Src = B.buildCopy(Ty, Src).getReg(0);
716 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
717 }
718
719 LLT S32 = LLT::scalar(32);
720 unsigned NumParts = Bits / 32;
723
724 if (Bits == 32) {
725 SrcParts.push_back(Src);
726 } else {
727 auto Unmerge = B.buildUnmerge(S32, Src);
728 for (unsigned i = 0; i < NumParts; ++i)
729 SrcParts.push_back(Unmerge.getReg(i));
730 }
731
732 for (unsigned i = 0; i < NumParts; ++i) {
733 Register SrcPart = SrcParts[i];
734 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
735 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
736
737 const TargetRegisterClass *Constrained =
738 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
739 (void)Constrained;
740 assert(Constrained && "Failed to constrain readfirstlane src reg");
741
742 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
743
744 DstParts.push_back(DstPart);
745 }
746
747 if (Bits == 32)
748 return DstParts[0];
749
750 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
751 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
752 return Dst;
753}
754
755/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
756/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
757/// execute the instruction for each unique combination of values in all lanes
758/// in the wave. The block will be split such that rest of the instructions are
759/// moved to a new block.
760///
761/// Essentially performs this loop:
762//
763/// Save Execution Mask
764/// For (Lane : Wavefront) {
765/// Enable Lane, Disable all other lanes
766/// SGPR = read SGPR value for current lane from VGPR
767/// VGPRResult[Lane] = use_op SGPR
768/// }
769/// Restore Execution Mask
770///
771/// There is additional complexity to try for compare values to identify the
772/// unique values used.
775 SmallSet<Register, 4> &SGPROperandRegs) const {
776 // Track use registers which have already been expanded with a readfirstlane
777 // sequence. This may have multiple uses if moving a sequence.
778 DenseMap<Register, Register> WaterfalledRegMap;
779
780 MachineBasicBlock &MBB = B.getMBB();
781 MachineFunction *MF = &B.getMF();
782
783 const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
784 const AMDGPU::LaneMaskConstants &LMC =
786
787#ifndef NDEBUG
788 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
789#endif
790
791 MachineRegisterInfo &MRI = *B.getMRI();
792 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
793 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
794
795 // Don't bother using generic instructions/registers for the exec mask.
796 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
797 .addDef(InitSaveExecReg);
798
799 Register PhiExec = MRI.createVirtualRegister(WaveRC);
800 Register NewExec = MRI.createVirtualRegister(WaveRC);
801
802 // To insert the loop we need to split the block. Move everything before this
803 // point to a new block, and insert a new empty block before this instruction.
806 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
807 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
809 ++MBBI;
810 MF->insert(MBBI, LoopBB);
811 MF->insert(MBBI, BodyBB);
812 MF->insert(MBBI, RestoreExecBB);
813 MF->insert(MBBI, RemainderBB);
814
815 LoopBB->addSuccessor(BodyBB);
816 BodyBB->addSuccessor(RestoreExecBB);
817 BodyBB->addSuccessor(LoopBB);
818
819 // Move the rest of the block into a new block.
821 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
822
823 MBB.addSuccessor(LoopBB);
824 RestoreExecBB->addSuccessor(RemainderBB);
825
826 B.setInsertPt(*LoopBB, LoopBB->end());
827
828 B.buildInstr(TargetOpcode::PHI)
829 .addDef(PhiExec)
830 .addReg(InitSaveExecReg)
831 .addMBB(&MBB)
832 .addReg(NewExec)
833 .addMBB(BodyBB);
834
835 const DebugLoc &DL = B.getDL();
836
837 MachineInstr &FirstInst = *Range.begin();
838
839 // Move the instruction into the loop body. Note we moved everything after
840 // Range.end() already into a new block, so Range.end() is no longer valid.
841 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
842
843 // Figure out the iterator range after splicing the instructions.
844 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
845 auto NewEnd = BodyBB->end();
846
847 B.setMBB(*LoopBB);
848
849 LLT S1 = LLT::scalar(1);
850 Register CondReg;
851
852 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
853
854 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
855 for (MachineOperand &Op : MI.all_uses()) {
856 Register OldReg = Op.getReg();
857 if (!SGPROperandRegs.count(OldReg))
858 continue;
859
860 // See if we already processed this register in another instruction in the
861 // sequence.
862 auto OldVal = WaterfalledRegMap.find(OldReg);
863 if (OldVal != WaterfalledRegMap.end()) {
864 Op.setReg(OldVal->second);
865 continue;
866 }
867
868 Register OpReg = Op.getReg();
869 LLT OpTy = MRI.getType(OpReg);
870
871 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
872 if (OpBank != &AMDGPU::VGPRRegBank) {
873 // Insert copy from AGPR to VGPR before the loop.
874 B.setMBB(MBB);
875 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
876 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
877 B.setMBB(*LoopBB);
878 }
879
880 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
881
882 // Build the comparison(s).
883 unsigned OpSize = OpTy.getSizeInBits();
884 bool Is64 = OpSize % 64 == 0;
885 unsigned PartSize = Is64 ? 64 : 32;
886 LLT PartTy = LLT::scalar(PartSize);
887 unsigned NumParts = OpSize / PartSize;
889 SmallVector<Register, 8> CurrentLaneParts;
890
891 if (NumParts == 1) {
892 OpParts.push_back(OpReg);
893 CurrentLaneParts.push_back(CurrentLaneReg);
894 } else {
895 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
896 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
897 for (unsigned i = 0; i < NumParts; ++i) {
898 OpParts.push_back(UnmergeOp.getReg(i));
899 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
900 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
901 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
902 }
903 }
904
905 for (unsigned i = 0; i < NumParts; ++i) {
906 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
907 OpParts[i]).getReg(0);
908 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
909
910 if (!CondReg) {
911 CondReg = CmpReg;
912 } else {
913 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
914 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
915 }
916 }
917
918 Op.setReg(CurrentLaneReg);
919
920 // Make sure we don't re-process this register again.
921 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
922 }
923 }
924
925 // The ballot becomes a no-op during instruction selection.
926 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
927 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
928 .addReg(CondReg)
929 .getReg(0);
930 MRI.setRegClass(CondReg, WaveRC);
931
932 // Update EXEC, save the original EXEC value to VCC.
933 B.buildInstr(LMC.AndSaveExecOpc)
934 .addDef(NewExec)
935 .addReg(CondReg, RegState::Kill);
936
937 MRI.setSimpleHint(NewExec, CondReg);
938
939 B.setInsertPt(*BodyBB, BodyBB->end());
940
941 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
942 B.buildInstr(LMC.XorTermOpc)
943 .addDef(LMC.ExecReg)
944 .addReg(LMC.ExecReg)
945 .addReg(NewExec);
946
947 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
948 // s_cbranch_scc0?
949
950 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
951 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
952
953 // Save the EXEC mask before the loop.
954 BuildMI(MBB, MBB.end(), DL, TII->get(LMC.MovOpc), SaveExecReg)
955 .addReg(LMC.ExecReg);
956
957 // Restore the EXEC mask after the loop.
958 B.setMBB(*RestoreExecBB);
959 B.buildInstr(LMC.MovTermOpc).addDef(LMC.ExecReg).addReg(SaveExecReg);
960
961 // Set the insert point after the original instruction, so any new
962 // instructions will be in the remainder.
963 B.setInsertPt(*RemainderBB, RemainderBB->begin());
964
965 return true;
966}
967
968// Return any unique registers used by \p MI at \p OpIndices that need to be
969// handled in a waterfall loop. Returns these registers in \p
970// SGPROperandRegs. Returns true if there are any operands to handle and a
971// waterfall loop is necessary.
973 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
974 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
975 for (unsigned Op : OpIndices) {
976 assert(MI.getOperand(Op).isUse());
977 Register Reg = MI.getOperand(Op).getReg();
978 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
979 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
980 SGPROperandRegs.insert(Reg);
981 }
982
983 // No operands need to be replaced, so no need to loop.
984 return !SGPROperandRegs.empty();
985}
986
989 // Use a set to avoid extra readfirstlanes in the case where multiple operands
990 // are the same register.
991 SmallSet<Register, 4> SGPROperandRegs;
992
993 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
994 return false;
995
996 MachineBasicBlock::iterator I = MI.getIterator();
997 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
998 SGPROperandRegs);
999}
1000
1001// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1003 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1004 Register Reg = MI.getOperand(OpIdx).getReg();
1005 MachineRegisterInfo &MRI = *B.getMRI();
1006 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1007 if (Bank == &AMDGPU::SGPRRegBank)
1008 return;
1009
1010 Reg = buildReadFirstLane(B, MRI, Reg);
1011 MI.getOperand(OpIdx).setReg(Reg);
1012}
1013
1014/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1015/// rest will be in the remainder.
1016static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1017 unsigned TotalSize = Ty.getSizeInBits();
1018 if (!Ty.isVector())
1019 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1020
1021 LLT EltTy = Ty.getElementType();
1022 unsigned EltSize = EltTy.getSizeInBits();
1023 assert(FirstSize % EltSize == 0);
1024
1025 unsigned FirstPartNumElts = FirstSize / EltSize;
1026 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1027
1028 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1029 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1030}
1031
1033 if (!Ty.isVector())
1034 return LLT::scalar(128);
1035
1036 LLT EltTy = Ty.getElementType();
1037 assert(128 % EltTy.getSizeInBits() == 0);
1038 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1039}
1040
1044 MachineInstr &MI) const {
1045 MachineRegisterInfo &MRI = *B.getMRI();
1046 Register DstReg = MI.getOperand(0).getReg();
1047 const LLT LoadTy = MRI.getType(DstReg);
1048 unsigned LoadSize = LoadTy.getSizeInBits();
1049 MachineMemOperand *MMO = *MI.memoperands_begin();
1050 const unsigned MaxNonSmrdLoadSize = 128;
1051
1052 const RegisterBank *DstBank =
1053 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1054 if (DstBank == &AMDGPU::SGPRRegBank) {
1055 // There are some special cases that we need to look at for 32 bit and 96
1056 // bit SGPR loads otherwise we have nothing to do.
1057 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1058 return false;
1059
1060 const unsigned MemSize = 8 * MMO->getSize().getValue();
1061 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1062 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1063 // scalar loads should have a load size of 32 but memory access size of less
1064 // than 32.
1065 if (LoadSize == 32 &&
1066 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1067 return false;
1068
1069 if (LoadSize == 32 &&
1070 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1071 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1073 Subtarget.getGeneration() >= AMDGPUSubtarget::GFX12)
1074 return false;
1075
1076 Register PtrReg = MI.getOperand(1).getReg();
1077
1078 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1079
1080 if (LoadSize == 32) {
1081 // This is an extending load from a sub-dword size. Widen the memory
1082 // access size to 4 bytes and clear the extra high bits appropriately
1083 const LLT S32 = LLT::scalar(32);
1084 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1085 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1086 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1087 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1088 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1089 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1090 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1091 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1092 } else
1093 // We do not need to touch the higher bits for regular loads.
1094 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1095 } else {
1096 // 96-bit loads are only available for vector loads. We need to split this
1097 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1098 if (MMO->getAlign() < Align(16)) {
1099 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1100 LLT Part64, Part32;
1101 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1102 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1104 return false;
1105 return true;
1106 }
1107 LLT WiderTy = widen96To128(LoadTy);
1108 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1109 if (WiderTy.isScalar()) {
1110 B.buildTrunc(MI.getOperand(0), WideLoad);
1111 } else {
1112 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1113 WideLoad);
1114 }
1115 }
1116
1117 MI.eraseFromParent();
1118 return true;
1119 }
1120
1121 // 128-bit loads are supported for all instruction types.
1122 if (LoadSize <= MaxNonSmrdLoadSize)
1123 return false;
1124
1125 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1126
1127 if (SrcRegs.empty())
1128 SrcRegs.push_back(MI.getOperand(1).getReg());
1129
1130 // RegBankSelect only emits scalar types, so we need to reset the pointer
1131 // operand to a pointer type.
1132 Register BasePtrReg = SrcRegs[0];
1133 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1134 MRI.setType(BasePtrReg, PtrTy);
1135
1136 // The following are the loads not splitted enough during legalization
1137 // because it was not clear they are smem-load or vmem-load
1140 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1141 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1142 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1143 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1144 LegalizerHelper Helper(B.getMF(), O, B);
1145 if (LoadTy.isVector()) {
1146 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1148 return false;
1149 } else {
1150 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1151 return false;
1152 }
1153 }
1154
1155 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1156 return true;
1157}
1158
1162 MachineInstr &MI) const {
1163 MachineRegisterInfo &MRI = *B.getMRI();
1164 const MachineFunction &MF = B.getMF();
1165 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1166 const auto &TFI = *ST.getFrameLowering();
1167
1168 // Guard in case the stack growth direction ever changes with scratch
1169 // instructions.
1170 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1171 "Stack grows upwards for AMDGPU");
1172
1173 Register Dst = MI.getOperand(0).getReg();
1174 Register AllocSize = MI.getOperand(1).getReg();
1175 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1176
1177 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1178
1179 if (SizeBank != &AMDGPU::SGPRRegBank) {
1180 auto WaveReduction =
1181 B.buildIntrinsic(Intrinsic::amdgcn_wave_reduce_umax, {LLT::scalar(32)})
1182 .addUse(AllocSize)
1183 .addImm(0);
1184 AllocSize = WaveReduction.getReg(0);
1185 }
1186
1187 LLT PtrTy = MRI.getType(Dst);
1188 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1189
1191 Register SPReg = Info->getStackPtrOffsetReg();
1192 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1193
1194 auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1195 auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1196
1197 auto OldSP = B.buildCopy(PtrTy, SPReg);
1198 if (Alignment > TFI.getStackAlign()) {
1199 auto StackAlignMask = (Alignment.value() << ST.getWavefrontSizeLog2()) - 1;
1200 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1201 B.buildConstant(LLT::scalar(32), StackAlignMask));
1202 B.buildMaskLowPtrBits(Dst, Tmp1,
1203 Log2(Alignment) + ST.getWavefrontSizeLog2());
1204 } else {
1205 B.buildCopy(Dst, OldSP);
1206 }
1207 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1208 B.buildCopy(SPReg, PtrAdd);
1209 MI.eraseFromParent();
1210 return true;
1211}
1212
1216 int RsrcIdx) const {
1217 const int NumDefs = MI.getNumExplicitDefs();
1218
1219 // The reported argument index is relative to the IR intrinsic call arguments,
1220 // so we need to shift by the number of defs and the intrinsic ID.
1221 RsrcIdx += NumDefs + 1;
1222
1223 // Insert copies to VGPR arguments.
1224 applyDefaultMapping(OpdMapper);
1225
1226 // Fixup any SGPR arguments.
1227 SmallVector<unsigned, 4> SGPRIndexes;
1228 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1229 if (!MI.getOperand(I).isReg())
1230 continue;
1231
1232 // If this intrinsic has a sampler, it immediately follows rsrc.
1233 if (I == RsrcIdx || I == RsrcIdx + 1)
1234 SGPRIndexes.push_back(I);
1235 }
1236
1237 executeInWaterfallLoop(B, MI, SGPRIndexes);
1238 return true;
1239}
1240
1241// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1242// the three offsets (voffset, soffset and instoffset)
1244 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1245 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1246 const LLT S32 = LLT::scalar(32);
1247 MachineRegisterInfo *MRI = B.getMRI();
1248
1249 if (std::optional<int64_t> Imm =
1250 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1251 uint32_t SOffset, ImmOffset;
1252 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1253 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1254 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1255 InstOffsetVal = ImmOffset;
1256
1257 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1258 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1259 return SOffset + ImmOffset;
1260 }
1261 }
1262
1263 const bool CheckNUW = Subtarget.hasGFX1250Insts();
1264 Register Base;
1265 unsigned Offset;
1266
1267 std::tie(Base, Offset) =
1268 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset,
1269 /*KnownBits=*/nullptr,
1270 /*CheckNUW=*/CheckNUW);
1271
1272 uint32_t SOffset, ImmOffset;
1273 if ((int)Offset > 0 &&
1274 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1275 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1276 VOffsetReg = Base;
1277 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1278 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1279 InstOffsetVal = ImmOffset;
1280 return 0; // XXX - Why is this 0?
1281 }
1282
1283 // If we have SGPR base, we can use it for soffset.
1284 if (SOffset == 0) {
1285 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1286 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1287 SOffsetReg = Base;
1288 InstOffsetVal = ImmOffset;
1289 return 0; // XXX - Why is this 0?
1290 }
1291 }
1292
1293 // Handle the variable sgpr + vgpr case.
1294 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1295 if (Add && (int)Offset >= 0 &&
1296 (!CheckNUW || Add->getFlag(MachineInstr::NoUWrap))) {
1297 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1298 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1299
1300 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1301 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1302
1303 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1304 VOffsetReg = Src0;
1305 SOffsetReg = Src1;
1306 return 0;
1307 }
1308
1309 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1310 VOffsetReg = Src1;
1311 SOffsetReg = Src0;
1312 return 0;
1313 }
1314 }
1315
1316 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1317 // have an SGPR offset and a VGPR resource.
1318 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1319 VOffsetReg = CombinedOffset;
1320 } else {
1321 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1322 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1323 }
1324
1325 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1326 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1327 return 0;
1328}
1329
1331 switch (Opc) {
1332 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1333 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1334 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1335 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1336 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1337 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1338 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1339 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1340 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1341 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1342 default:
1343 break;
1344 }
1345 llvm_unreachable("Unexpected s_buffer_load opcode");
1346}
1347
1349 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1350 MachineInstr &MI = OpdMapper.getMI();
1351 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1352
1353 const LLT S32 = LLT::scalar(32);
1354 Register Dst = MI.getOperand(0).getReg();
1355 LLT Ty = MRI.getType(Dst);
1356
1357 const RegisterBank *RSrcBank =
1358 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1359 const RegisterBank *OffsetBank =
1360 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1361 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1362 OffsetBank == &AMDGPU::SGPRRegBank)
1363 return true; // Legal mapping
1364
1365 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1366 // here but don't have an MMO.
1367
1368 unsigned LoadSize = Ty.getSizeInBits();
1369 int NumLoads = 1;
1370 if (LoadSize == 256 || LoadSize == 512) {
1371 NumLoads = LoadSize / 128;
1372 Ty = Ty.divide(NumLoads);
1373 }
1374
1375 // Use the alignment to ensure that the required offsets will fit into the
1376 // immediate offsets.
1377 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1378
1379 MachineFunction &MF = B.getMF();
1380
1381 Register SOffset;
1382 Register VOffset;
1383 int64_t ImmOffset = 0;
1384
1385 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1386 SOffset, ImmOffset, Alignment);
1387
1388 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1389 // can, but we need to track an MMO for that.
1390 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1391 const Align MemAlign(4); // FIXME: ABI type alignment?
1396 MemSize, MemAlign);
1397 if (MMOOffset != 0)
1398 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1399
1400 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1401 // assume that the buffer is unswizzled.
1402
1403 Register RSrc = MI.getOperand(1).getReg();
1404 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1405 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1406
1407 SmallVector<Register, 4> LoadParts(NumLoads);
1408
1409 MachineBasicBlock::iterator MII = MI.getIterator();
1410 MachineInstrSpan Span(MII, &B.getMBB());
1411
1412 for (int i = 0; i < NumLoads; ++i) {
1413 if (NumLoads == 1) {
1414 LoadParts[i] = Dst;
1415 } else {
1416 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1417 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1418 }
1419
1420 if (i != 0)
1421 BaseMMO = MF.getMachineMemOperand(BaseMMO, 16, MemSize);
1422
1423 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1424 .addDef(LoadParts[i]) // vdata
1425 .addUse(RSrc) // rsrc
1426 .addUse(VIndex) // vindex
1427 .addUse(VOffset) // voffset
1428 .addUse(SOffset) // soffset
1429 .addImm(ImmOffset + 16 * i) // offset(imm)
1430 .addImm(0) // cachepolicy, swizzled buffer(imm)
1431 .addImm(0) // idxen(imm)
1432 .addMemOperand(BaseMMO);
1433 }
1434
1435 // TODO: If only the resource is a VGPR, it may be better to execute the
1436 // scalar load in the waterfall loop if the resource is expected to frequently
1437 // be dynamically uniform.
1438 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1439 // Remove the original instruction to avoid potentially confusing the
1440 // waterfall loop logic.
1441 B.setInstr(*Span.begin());
1442 MI.eraseFromParent();
1443
1444 SmallSet<Register, 4> OpsToWaterfall;
1445
1446 OpsToWaterfall.insert(RSrc);
1447 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1448 OpsToWaterfall);
1449 }
1450
1451 if (NumLoads != 1) {
1452 if (Ty.isVector())
1453 B.buildConcatVectors(Dst, LoadParts);
1454 else
1455 B.buildMergeLikeInstr(Dst, LoadParts);
1456 }
1457
1458 // We removed the instruction earlier with a waterfall loop.
1459 if (RSrcBank == &AMDGPU::SGPRRegBank)
1460 MI.eraseFromParent();
1461
1462 return true;
1463}
1464
1466 const OperandsMapper &OpdMapper,
1467 bool Signed) const {
1468 MachineInstr &MI = OpdMapper.getMI();
1469 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1470
1471 // Insert basic copies
1472 applyDefaultMapping(OpdMapper);
1473
1474 Register DstReg = MI.getOperand(0).getReg();
1475 LLT Ty = MRI.getType(DstReg);
1476
1477 const LLT S32 = LLT::scalar(32);
1478
1479 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1480 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1481 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1482 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1483
1484 const RegisterBank *DstBank =
1485 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1486 if (DstBank == &AMDGPU::VGPRRegBank) {
1487 if (Ty == S32)
1488 return true;
1489
1490 // There is no 64-bit vgpr bitfield extract instructions so the operation
1491 // is expanded to a sequence of instructions that implement the operation.
1492 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1493
1494 const LLT S64 = LLT::scalar(64);
1495 // Shift the source operand so that extracted bits start at bit 0.
1496 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1497 : B.buildLShr(S64, SrcReg, OffsetReg);
1498 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1499
1500 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1501 // if the width is a constant.
1502 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1503 // Use the 32-bit bitfield extract instruction if the width is a constant.
1504 // Depending on the width size, use either the low or high 32-bits.
1505 auto Zero = B.buildConstant(S32, 0);
1506 auto WidthImm = ConstWidth->Value.getZExtValue();
1507 if (WidthImm <= 32) {
1508 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1509 // or clear the upper 32-bits.
1510 auto Extract =
1511 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1512 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1513 auto Extend =
1514 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1515 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1516 } else {
1517 // Use bitfield extract on upper 32-bit source, and combine with lower
1518 // 32-bit source.
1519 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1520 auto Extract =
1521 Signed
1522 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1523 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1524 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1525 }
1526 MI.eraseFromParent();
1527 return true;
1528 }
1529
1530 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1531 // operations.
1532 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1533 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1534 if (Signed)
1535 B.buildAShr(S64, SignBit, ExtShift);
1536 else
1537 B.buildLShr(S64, SignBit, ExtShift);
1538 MI.eraseFromParent();
1539 return true;
1540 }
1541
1542 // The scalar form packs the offset and width in a single operand.
1543
1544 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1545
1546 // Ensure the high bits are clear to insert the offset.
1547 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1548 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1549
1550 // Zeros out the low bits, so don't bother clamping the input value.
1551 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1552
1553 // Transformation function, pack the offset and width of a BFE into
1554 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1555 // source, bits [5:0] contain the offset and bits [22:16] the width.
1556 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1557
1558 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1559 // register class constraints.
1560 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1561 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1562
1563 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1564 constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this);
1565
1566 MI.eraseFromParent();
1567 return true;
1568}
1569
1571 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1572 MachineInstr &MI = OpdMapper.getMI();
1573 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1574
1575 // Insert basic copies.
1576 applyDefaultMapping(OpdMapper);
1577
1578 Register Dst0 = MI.getOperand(0).getReg();
1579 Register Dst1 = MI.getOperand(1).getReg();
1580 Register Src0 = MI.getOperand(2).getReg();
1581 Register Src1 = MI.getOperand(3).getReg();
1582 Register Src2 = MI.getOperand(4).getReg();
1583
1584 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1585 return true;
1586
1587 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1588 LLT S1 = LLT::scalar(1);
1589 LLT S32 = LLT::scalar(32);
1590
1591 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1592 bool Accumulate = true;
1593
1594 if (!DstOnValu) {
1595 if (mi_match(Src2, MRI, m_ZeroInt()))
1596 Accumulate = false;
1597 }
1598
1599 // Keep the multiplication on the SALU.
1600 Register DstHi;
1601 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1602 bool MulHiInVgpr = false;
1603
1604 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1605
1606 if (Subtarget.hasSMulHi()) {
1607 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1608 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1609 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1610 } else {
1611 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1612 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1613
1614 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1615 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1616
1617 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1618 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1619 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1620
1621 if (!DstOnValu) {
1622 DstHi = buildReadFirstLane(B, MRI, DstHi);
1623 } else {
1624 MulHiInVgpr = true;
1625 }
1626 }
1627
1628 // Accumulate and produce the "carry-out" bit.
1629 //
1630 // The "carry-out" is defined as bit 64 of the result when computed as a
1631 // big integer. For unsigned multiply-add, this matches the usual definition
1632 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1633 // result, which is determined as:
1634 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1635 LLT CarryType = DstOnValu ? S1 : S32;
1636 const RegisterBank &CarryBank =
1637 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1638 const RegisterBank &DstBank =
1639 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1640 Register Carry;
1641 Register Zero;
1642
1643 if (!IsUnsigned) {
1644 Zero = B.buildConstant(S32, 0).getReg(0);
1645 MRI.setRegBank(Zero,
1646 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1647
1648 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1649 .getReg(0);
1650 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1651 : AMDGPU::SGPRRegBank);
1652
1653 if (DstOnValu && !MulHiInVgpr) {
1654 Carry = B.buildTrunc(S1, Carry).getReg(0);
1655 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1656 }
1657 }
1658
1659 if (Accumulate) {
1660 if (DstOnValu) {
1661 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1662 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1663 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1664 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1665 }
1666
1667 auto Unmerge = B.buildUnmerge(S32, Src2);
1668 Register Src2Lo = Unmerge.getReg(0);
1669 Register Src2Hi = Unmerge.getReg(1);
1670 MRI.setRegBank(Src2Lo, DstBank);
1671 MRI.setRegBank(Src2Hi, DstBank);
1672
1673 if (!IsUnsigned) {
1674 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1675 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1676
1677 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1678 MRI.setRegBank(Carry, CarryBank);
1679 }
1680
1681 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1682 DstLo = AddLo.getReg(0);
1683 Register CarryLo = AddLo.getReg(1);
1684 MRI.setRegBank(DstLo, DstBank);
1685 MRI.setRegBank(CarryLo, CarryBank);
1686
1687 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1688 DstHi = AddHi.getReg(0);
1689 MRI.setRegBank(DstHi, DstBank);
1690
1691 Register CarryHi = AddHi.getReg(1);
1692 MRI.setRegBank(CarryHi, CarryBank);
1693
1694 if (IsUnsigned) {
1695 Carry = CarryHi;
1696 } else {
1697 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1698 MRI.setRegBank(Carry, CarryBank);
1699 }
1700 } else {
1701 if (IsUnsigned) {
1702 Carry = B.buildConstant(CarryType, 0).getReg(0);
1703 MRI.setRegBank(Carry, CarryBank);
1704 }
1705 }
1706
1707 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1708
1709 if (DstOnValu) {
1710 B.buildCopy(Dst1, Carry);
1711 } else {
1712 B.buildTrunc(Dst1, Carry);
1713 }
1714
1715 MI.eraseFromParent();
1716 return true;
1717}
1718
1719// Return a suitable opcode for extending the operands of Opc when widening.
1720static unsigned getExtendOp(unsigned Opc) {
1721 switch (Opc) {
1722 case TargetOpcode::G_ASHR:
1723 case TargetOpcode::G_SMIN:
1724 case TargetOpcode::G_SMAX:
1725 return TargetOpcode::G_SEXT;
1726 case TargetOpcode::G_LSHR:
1727 case TargetOpcode::G_UMIN:
1728 case TargetOpcode::G_UMAX:
1729 return TargetOpcode::G_ZEXT;
1730 default:
1731 return TargetOpcode::G_ANYEXT;
1732 }
1733}
1734
1735// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1736// any illegal vector extend or unmerge operations.
1737static std::pair<Register, Register>
1738unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1739 const LLT S32 = LLT::scalar(32);
1740 auto Bitcast = B.buildBitcast(S32, Src);
1741
1742 if (ExtOpcode == TargetOpcode::G_SEXT) {
1743 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1744 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1745 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1746 }
1747
1748 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1749 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1750 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1751 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1752 }
1753
1754 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1755 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1756}
1757
1758// For cases where only a single copy is inserted for matching register banks.
1759// Replace the register in the instruction operand
1761 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1762 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1763 if (!SrcReg.empty()) {
1764 assert(SrcReg.size() == 1);
1765 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1766 return true;
1767 }
1768
1769 return false;
1770}
1771
1772/// Handle register layout difference for f16 images for some subtargets.
1775 Register Reg) const {
1776 if (!Subtarget.hasUnpackedD16VMem())
1777 return Reg;
1778
1779 const LLT S16 = LLT::scalar(16);
1780 LLT StoreVT = MRI.getType(Reg);
1781 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1782 return Reg;
1783
1784 auto Unmerge = B.buildUnmerge(S16, Reg);
1785
1786
1787 SmallVector<Register, 4> WideRegs;
1788 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1789 WideRegs.push_back(Unmerge.getReg(I));
1790
1791 const LLT S32 = LLT::scalar(32);
1792 int NumElts = StoreVT.getNumElements();
1793
1794 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1795 .getReg(0);
1796}
1797
1798static std::pair<Register, unsigned>
1800 int64_t Const;
1801 if (mi_match(Reg, MRI, m_ICst(Const)))
1802 return std::pair(Register(), Const);
1803
1804 Register Base;
1805 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1806 return std::pair(Base, Const);
1807
1808 // TODO: Handle G_OR used for add case
1809 return std::pair(Reg, 0);
1810}
1811
1812std::pair<Register, unsigned>
1814 Register OrigOffset) const {
1815 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1816 Register BaseReg;
1817 unsigned ImmOffset;
1818 const LLT S32 = LLT::scalar(32);
1819
1820 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1821 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1822 OrigOffset);
1823
1824 unsigned C1 = 0;
1825 if (ImmOffset != 0) {
1826 // If the immediate value is too big for the immoffset field, put only bits
1827 // that would normally fit in the immoffset field. The remaining value that
1828 // is copied/added for the voffset field is a large power of 2, and it
1829 // stands more chance of being CSEd with the copy/add for another similar
1830 // load/store.
1831 // However, do not do that rounding down if that is a negative
1832 // number, as it appears to be illegal to have a negative offset in the
1833 // vgpr, even if adding the immediate offset makes it positive.
1834 unsigned Overflow = ImmOffset & ~MaxImm;
1835 ImmOffset -= Overflow;
1836 if ((int32_t)Overflow < 0) {
1837 Overflow += ImmOffset;
1838 ImmOffset = 0;
1839 }
1840
1841 C1 = ImmOffset;
1842 if (Overflow != 0) {
1843 if (!BaseReg)
1844 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1845 else {
1846 auto OverflowVal = B.buildConstant(S32, Overflow);
1847 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1848 }
1849 }
1850 }
1851
1852 if (!BaseReg)
1853 BaseReg = B.buildConstant(S32, 0).getReg(0);
1854
1855 return {BaseReg, C1};
1856}
1857
1859 Register SrcReg) const {
1860 MachineRegisterInfo &MRI = *B.getMRI();
1861 LLT SrcTy = MRI.getType(SrcReg);
1862 if (SrcTy.getSizeInBits() == 32) {
1863 // Use a v_mov_b32 here to make the exec dependency explicit.
1864 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1865 .addDef(DstReg)
1866 .addUse(SrcReg);
1867 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1868 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1869 }
1870
1871 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1872 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1873
1874 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1875 .addDef(TmpReg0)
1876 .addUse(SrcReg, {}, AMDGPU::sub0);
1877 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1878 .addDef(TmpReg1)
1879 .addUse(SrcReg, {}, AMDGPU::sub1);
1880 B.buildInstr(AMDGPU::REG_SEQUENCE)
1881 .addDef(DstReg)
1882 .addUse(TmpReg0)
1883 .addImm(AMDGPU::sub0)
1884 .addUse(TmpReg1)
1885 .addImm(AMDGPU::sub1);
1886
1887 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1888 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1889}
1890
1891/// Utility function for pushing dynamic vector indexes with a constant offset
1892/// into waterfall loops.
1894 MachineInstr &IdxUseInstr,
1895 unsigned OpIdx,
1896 unsigned ConstOffset) {
1897 MachineRegisterInfo &MRI = *B.getMRI();
1898 const LLT S32 = LLT::scalar(32);
1899 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1900 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1901
1902 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1903
1904 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1905 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1906 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1907 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1908}
1909
1910/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1911/// original 32-bit source value (to be inserted in the low part of the combined
1912/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1913/// value.
1915 Register Hi32Reg, Register Lo32Reg,
1916 unsigned ExtOpc,
1917 const RegisterBank &RegBank,
1918 bool IsBooleanSrc = false) {
1919 if (ExtOpc == AMDGPU::G_ZEXT) {
1920 B.buildConstant(Hi32Reg, 0);
1921 } else if (ExtOpc == AMDGPU::G_SEXT) {
1922 if (IsBooleanSrc) {
1923 // If we know the original source was an s1, the high half is the same as
1924 // the low.
1925 B.buildCopy(Hi32Reg, Lo32Reg);
1926 } else {
1927 // Replicate sign bit from 32-bit extended part.
1928 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1929 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1930 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1931 }
1932 } else {
1933 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1934 B.buildUndef(Hi32Reg);
1935 }
1936}
1937
1938bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1940 const OperandsMapper &OpdMapper) const {
1941 MachineRegisterInfo &MRI = *B.getMRI();
1942
1943 Register VecReg = MI.getOperand(1).getReg();
1944 Register Idx = MI.getOperand(2).getReg();
1945
1946 const RegisterBank &IdxBank =
1947 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1948
1949 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1950
1951 LLT VecTy = MRI.getType(VecReg);
1952 unsigned EltSize = VecTy.getScalarSizeInBits();
1953 unsigned NumElem = VecTy.getNumElements();
1954
1955 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1956 IsDivergentIdx, &Subtarget))
1957 return false;
1958
1959 LLT S32 = LLT::scalar(32);
1960
1961 const RegisterBank &DstBank =
1962 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1963 const RegisterBank &SrcBank =
1964 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1965
1966 const RegisterBank &CCBank =
1967 (DstBank == AMDGPU::SGPRRegBank &&
1968 SrcBank == AMDGPU::SGPRRegBank &&
1969 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1970 : AMDGPU::VCCRegBank;
1971 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1972
1973 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1974 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1975 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1976 }
1977
1978 LLT EltTy = VecTy.getScalarType();
1979 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1980 unsigned NumLanes = DstRegs.size();
1981 if (!NumLanes)
1982 NumLanes = 1;
1983 else
1984 EltTy = MRI.getType(DstRegs[0]);
1985
1986 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1987 SmallVector<Register, 2> Res(NumLanes);
1988 for (unsigned L = 0; L < NumLanes; ++L)
1989 Res[L] = UnmergeToEltTy.getReg(L);
1990
1991 for (unsigned I = 1; I < NumElem; ++I) {
1992 auto IC = B.buildConstant(S32, I);
1993 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
1994 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
1995 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
1996
1997 for (unsigned L = 0; L < NumLanes; ++L) {
1998 auto S = B.buildSelect(EltTy, Cmp,
1999 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
2000
2001 for (unsigned N : { 0, 2, 3 })
2002 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2003
2004 Res[L] = S->getOperand(0).getReg();
2005 }
2006 }
2007
2008 for (unsigned L = 0; L < NumLanes; ++L) {
2009 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2010 B.buildCopy(DstReg, Res[L]);
2011 MRI.setRegBank(DstReg, DstBank);
2012 }
2013
2014 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2015 MI.eraseFromParent();
2016
2017 return true;
2018}
2019
2020// Insert a cross regbank copy for a register if it already has a bank that
2021// differs from the one we want to set.
2024 const RegisterBank &Bank) {
2025 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2026 if (CurrBank && *CurrBank != Bank) {
2027 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2028 MRI.setRegBank(Copy, Bank);
2029 return Copy;
2030 }
2031
2032 MRI.setRegBank(Reg, Bank);
2033 return Reg;
2034}
2035
2036bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2038 const OperandsMapper &OpdMapper) const {
2039
2040 MachineRegisterInfo &MRI = *B.getMRI();
2041 Register VecReg = MI.getOperand(1).getReg();
2042 Register Idx = MI.getOperand(3).getReg();
2043
2044 const RegisterBank &IdxBank =
2045 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2046
2047 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2048
2049 LLT VecTy = MRI.getType(VecReg);
2050 unsigned EltSize = VecTy.getScalarSizeInBits();
2051 unsigned NumElem = VecTy.getNumElements();
2052
2053 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2054 IsDivergentIdx, &Subtarget))
2055 return false;
2056
2057 LLT S32 = LLT::scalar(32);
2058
2059 const RegisterBank &DstBank =
2060 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2061 const RegisterBank &SrcBank =
2062 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2063 const RegisterBank &InsBank =
2064 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2065
2066 const RegisterBank &CCBank =
2067 (DstBank == AMDGPU::SGPRRegBank &&
2068 SrcBank == AMDGPU::SGPRRegBank &&
2069 InsBank == AMDGPU::SGPRRegBank &&
2070 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2071 : AMDGPU::VCCRegBank;
2072 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2073
2074 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2075 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2076 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2077 }
2078
2079 LLT EltTy = VecTy.getScalarType();
2080 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2081 unsigned NumLanes = InsRegs.size();
2082 if (!NumLanes) {
2083 NumLanes = 1;
2084 InsRegs.push_back(MI.getOperand(2).getReg());
2085 } else {
2086 EltTy = MRI.getType(InsRegs[0]);
2087 }
2088
2089 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2090 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2091
2092 for (unsigned I = 0; I < NumElem; ++I) {
2093 auto IC = B.buildConstant(S32, I);
2094 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2095 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2096 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2097
2098 for (unsigned L = 0; L < NumLanes; ++L) {
2099 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2100 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2101 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2102
2103 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2104 MRI.setRegBank(Select, DstBank);
2105
2106 Ops[I * NumLanes + L] = Select;
2107 }
2108 }
2109
2110 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2111 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2112 B.buildBuildVector(MI.getOperand(0), Ops);
2113 } else {
2114 auto Vec = B.buildBuildVector(MergeTy, Ops);
2115 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2116 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2117 }
2118
2119 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2120 MI.eraseFromParent();
2121
2122 return true;
2123}
2124
2125// Break s_mul_u64 into 32-bit vector operations.
2127 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2128 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2129 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2130 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2131
2132 // All inputs are SGPRs, nothing special to do.
2133 if (DefRegs.empty()) {
2134 assert(Src0Regs.empty() && Src1Regs.empty());
2135 applyDefaultMapping(OpdMapper);
2136 return;
2137 }
2138
2139 assert(DefRegs.size() == 2);
2140 assert(Src0Regs.size() == Src1Regs.size() &&
2141 (Src0Regs.empty() || Src0Regs.size() == 2));
2142
2143 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2144 MachineInstr &MI = OpdMapper.getMI();
2145 Register DstReg = MI.getOperand(0).getReg();
2146 LLT HalfTy = LLT::scalar(32);
2147
2148 // Depending on where the source registers came from, the generic code may
2149 // have decided to split the inputs already or not. If not, we still need to
2150 // extract the values.
2151
2152 if (Src0Regs.empty())
2153 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2154 else
2155 setRegsToType(MRI, Src0Regs, HalfTy);
2156
2157 if (Src1Regs.empty())
2158 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2159 else
2160 setRegsToType(MRI, Src1Regs, HalfTy);
2161
2162 setRegsToType(MRI, DefRegs, HalfTy);
2163
2164 // The multiplication is done as follows:
2165 //
2166 // Op1H Op1L
2167 // * Op0H Op0L
2168 // --------------------
2169 // Op1H*Op0L Op1L*Op0L
2170 // + Op1H*Op0H Op1L*Op0H
2171 // -----------------------------------------
2172 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2173 //
2174 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2175 // value and that would overflow.
2176 // The low 32-bit value is Op1L*Op0L.
2177 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2178 // Op1L*Op0L).
2179
2180 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2181
2182 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2183 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2184 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2185 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2186 B.buildAdd(DefRegs[1], Add, MulHiLo);
2187 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2188
2189 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2190 MI.eraseFromParent();
2191}
2192
2194 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2195 MachineInstr &MI = OpdMapper.getMI();
2196 B.setInstrAndDebugLoc(MI);
2197 unsigned Opc = MI.getOpcode();
2198 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2199 switch (Opc) {
2200 case AMDGPU::G_CONSTANT:
2201 case AMDGPU::G_IMPLICIT_DEF: {
2202 Register DstReg = MI.getOperand(0).getReg();
2203 LLT DstTy = MRI.getType(DstReg);
2204 if (DstTy != LLT::scalar(1))
2205 break;
2206
2207 const RegisterBank *DstBank =
2208 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2209 if (DstBank == &AMDGPU::VCCRegBank)
2210 break;
2211 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2212 if (DefRegs.empty())
2213 DefRegs.push_back(DstReg);
2214
2215 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2216
2218 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2219
2220 MI.getOperand(0).setReg(NewDstReg);
2221 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2222 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2223 MI.getOperand(1).setCImm(
2224 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2225 }
2226
2227 MRI.setRegBank(NewDstReg, *DstBank);
2228 B.buildTrunc(DefRegs[0], NewDstReg);
2229 return;
2230 }
2231 case AMDGPU::G_PHI: {
2232 Register DstReg = MI.getOperand(0).getReg();
2233 LLT DstTy = MRI.getType(DstReg);
2234 if (DstTy != LLT::scalar(1))
2235 break;
2236
2237 const LLT S32 = LLT::scalar(32);
2238 const RegisterBank *DstBank =
2239 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2240 if (DstBank == &AMDGPU::VCCRegBank) {
2241 applyDefaultMapping(OpdMapper);
2242 // The standard handling only considers the result register bank for
2243 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2244 // produce an invalid copy. We can only copy with some kind of compare to
2245 // get a vector boolean result. Insert a register bank copy that will be
2246 // correctly lowered to a compare.
2247 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2248 Register SrcReg = MI.getOperand(I).getReg();
2249 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2250
2251 if (SrcBank != &AMDGPU::VCCRegBank) {
2252 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2253 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2254
2255 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2256 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2257 MI.getOperand(I).setReg(Copy.getReg(0));
2258 }
2259 }
2260
2261 return;
2262 }
2263
2264 // Phi handling is strange and only considers the bank of the destination.
2265 substituteSimpleCopyRegs(OpdMapper, 0);
2266
2267 // Promote SGPR/VGPR booleans to s32
2268 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2269 B.setInsertPt(B.getMBB(), MI);
2270 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2271
2272 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2273 llvm_unreachable("widen scalar should have succeeded");
2274
2275 return;
2276 }
2277 case AMDGPU::G_FCMP:
2278 if (!Subtarget.hasSALUFloatInsts())
2279 break;
2280 [[fallthrough]];
2281 case AMDGPU::G_ICMP:
2282 case AMDGPU::G_UADDO:
2283 case AMDGPU::G_USUBO:
2284 case AMDGPU::G_UADDE:
2285 case AMDGPU::G_SADDE:
2286 case AMDGPU::G_USUBE:
2287 case AMDGPU::G_SSUBE: {
2288 unsigned BoolDstOp =
2289 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2290 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2291
2292 const RegisterBank *DstBank =
2293 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2294 if (DstBank != &AMDGPU::SGPRRegBank)
2295 break;
2296
2297 const bool HasCarryIn = MI.getNumOperands() == 5;
2298
2299 // If this is a scalar compare, promote the result to s32, as the selection
2300 // will end up using a copy to a 32-bit vreg.
2301 const LLT S32 = LLT::scalar(32);
2302 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2303 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2304 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2305
2306 if (HasCarryIn) {
2307 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2308 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2309 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2310 MI.getOperand(4).setReg(NewSrcReg);
2311 }
2312
2313 MachineBasicBlock *MBB = MI.getParent();
2314 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2315
2316 // If we had a constrained VCC result register, a copy was inserted to VCC
2317 // from SGPR.
2318 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2319 if (DefRegs.empty())
2320 DefRegs.push_back(DstReg);
2321 B.buildTrunc(DefRegs[0], NewDstReg);
2322 return;
2323 }
2324 case AMDGPU::G_SELECT: {
2325 Register DstReg = MI.getOperand(0).getReg();
2326 LLT DstTy = MRI.getType(DstReg);
2327
2328 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2329 if (CondRegs.empty())
2330 CondRegs.push_back(MI.getOperand(1).getReg());
2331 else {
2332 assert(CondRegs.size() == 1);
2333 }
2334
2335 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2336 if (CondBank == &AMDGPU::SGPRRegBank) {
2337 const LLT S32 = LLT::scalar(32);
2338 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2339 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2340
2341 MI.getOperand(1).setReg(NewCondReg);
2342 B.buildZExt(NewCondReg, CondRegs[0]);
2343 }
2344
2345 if (DstTy.getSizeInBits() != 64)
2346 break;
2347
2348 LLT HalfTy = getHalfSizedType(DstTy);
2349
2350 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2351 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2352 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2353
2354 // All inputs are SGPRs, nothing special to do.
2355 if (DefRegs.empty()) {
2356 assert(Src1Regs.empty() && Src2Regs.empty());
2357 break;
2358 }
2359
2360 if (Src1Regs.empty())
2361 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2362 else {
2363 setRegsToType(MRI, Src1Regs, HalfTy);
2364 }
2365
2366 if (Src2Regs.empty())
2367 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2368 else
2369 setRegsToType(MRI, Src2Regs, HalfTy);
2370
2371 setRegsToType(MRI, DefRegs, HalfTy);
2372
2373 auto Flags = MI.getFlags();
2374 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2375 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2376
2377 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2378 MI.eraseFromParent();
2379 return;
2380 }
2381 case AMDGPU::G_BRCOND: {
2382 Register CondReg = MI.getOperand(0).getReg();
2383 // FIXME: Should use legalizer helper, but should change bool ext type.
2384 const RegisterBank *CondBank =
2385 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2386
2387 if (CondBank == &AMDGPU::SGPRRegBank) {
2388 const LLT S32 = LLT::scalar(32);
2389 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2390 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2391
2392 MI.getOperand(0).setReg(NewCondReg);
2393 B.buildZExt(NewCondReg, CondReg);
2394 return;
2395 }
2396
2397 break;
2398 }
2399 case AMDGPU::G_AND:
2400 case AMDGPU::G_OR:
2401 case AMDGPU::G_XOR: {
2402 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2403 // there is a VGPR input.
2404 Register DstReg = MI.getOperand(0).getReg();
2405 LLT DstTy = MRI.getType(DstReg);
2406
2407 const RegisterBank *DstBank =
2408 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2409
2410 if (DstTy.getSizeInBits() == 1) {
2411 if (DstBank == &AMDGPU::VCCRegBank)
2412 break;
2413
2414 MachineFunction *MF = MI.getMF();
2415 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2416 LegalizerHelper Helper(*MF, ApplyBank, B);
2417
2418 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2420 llvm_unreachable("widen scalar should have succeeded");
2421 return;
2422 }
2423
2424 if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
2425 const LLT S32 = LLT::scalar(32);
2426 MachineBasicBlock *MBB = MI.getParent();
2427 MachineFunction *MF = MBB->getParent();
2428 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2429 LegalizerHelper Helper(*MF, ApplySALU, B);
2430 // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
2431 // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
2432 // as "not".
2433 if (MI.getOpcode() == AMDGPU::G_XOR &&
2434 mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) {
2435 Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
2436 Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
2437 Helper.widenScalarDst(MI, S32);
2438 } else {
2439 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2440 llvm_unreachable("widen scalar should have succeeded");
2441 }
2442 return;
2443 }
2444
2445 if (DstTy.getSizeInBits() != 64)
2446 break;
2447
2448 LLT HalfTy = getHalfSizedType(DstTy);
2449 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2450 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2451 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2452
2453 // All inputs are SGPRs, nothing special to do.
2454 if (DefRegs.empty()) {
2455 assert(Src0Regs.empty() && Src1Regs.empty());
2456 break;
2457 }
2458
2459 assert(DefRegs.size() == 2);
2460 assert(Src0Regs.size() == Src1Regs.size() &&
2461 (Src0Regs.empty() || Src0Regs.size() == 2));
2462
2463 // Depending on where the source registers came from, the generic code may
2464 // have decided to split the inputs already or not. If not, we still need to
2465 // extract the values.
2466
2467 if (Src0Regs.empty())
2468 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2469 else
2470 setRegsToType(MRI, Src0Regs, HalfTy);
2471
2472 if (Src1Regs.empty())
2473 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2474 else
2475 setRegsToType(MRI, Src1Regs, HalfTy);
2476
2477 setRegsToType(MRI, DefRegs, HalfTy);
2478
2479 auto Flags = MI.getFlags();
2480 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2481 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2482
2483 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2484 MI.eraseFromParent();
2485 return;
2486 }
2487 case AMDGPU::G_ABS: {
2488 Register SrcReg = MI.getOperand(1).getReg();
2489 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2490
2491 // There is no VALU abs instruction so we need to replace it with a sub and
2492 // max combination.
2493 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2494 MachineFunction *MF = MI.getMF();
2495 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2496 LegalizerHelper Helper(*MF, Apply, B);
2497
2499 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2500 return;
2501 }
2502 [[fallthrough]];
2503 }
2504 case AMDGPU::G_ADD:
2505 case AMDGPU::G_SUB:
2506 case AMDGPU::G_MUL:
2507 case AMDGPU::G_SHL:
2508 case AMDGPU::G_LSHR:
2509 case AMDGPU::G_ASHR:
2510 case AMDGPU::G_SMIN:
2511 case AMDGPU::G_SMAX:
2512 case AMDGPU::G_UMIN:
2513 case AMDGPU::G_UMAX: {
2514 Register DstReg = MI.getOperand(0).getReg();
2515 LLT DstTy = MRI.getType(DstReg);
2516
2517 // Special case for s_mul_u64. There is not a vector equivalent of
2518 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2519 // multiplications.
2520 if (!Subtarget.hasVectorMulU64() && Opc == AMDGPU::G_MUL &&
2521 DstTy.getSizeInBits() == 64) {
2522 applyMappingSMULU64(B, OpdMapper);
2523 return;
2524 }
2525
2526 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2527 // Packed 16-bit operations need to be scalarized and promoted.
2528 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2529 break;
2530
2531 const RegisterBank *DstBank =
2532 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2533 if (DstBank == &AMDGPU::VGPRRegBank)
2534 break;
2535
2536 const LLT S32 = LLT::scalar(32);
2537 MachineBasicBlock *MBB = MI.getParent();
2538 MachineFunction *MF = MBB->getParent();
2539 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2540
2541 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2542 Register WideSrcLo, WideSrcHi;
2543
2544 std::tie(WideSrcLo, WideSrcHi) =
2545 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2546 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2547 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2548 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2549 MI.eraseFromParent();
2550 return;
2551 }
2552
2553 if (DstTy.isVector()) {
2554 Register WideSrc0Lo, WideSrc0Hi;
2555 Register WideSrc1Lo, WideSrc1Hi;
2556
2557 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2558 std::tie(WideSrc0Lo, WideSrc0Hi)
2559 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2560 std::tie(WideSrc1Lo, WideSrc1Hi)
2561 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2562 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2563 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2564 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2565 MI.eraseFromParent();
2566 } else {
2567 LegalizerHelper Helper(*MF, ApplySALU, B);
2568
2569 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2570 llvm_unreachable("widen scalar should have succeeded");
2571
2572 // FIXME: s16 shift amounts should be legal.
2573 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2574 Opc == AMDGPU::G_ASHR) {
2575 B.setInsertPt(*MBB, MI.getIterator());
2576 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2577 llvm_unreachable("widen scalar should have succeeded");
2578 }
2579 }
2580
2581 return;
2582 }
2583 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2584 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2585 // This is a special case for s_mul_u64. We use
2586 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2587 // where the 33 higher bits are sign-extended and
2588 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2589 // where the 32 higher bits are zero-extended. In case scalar registers are
2590 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2591 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2592 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2593
2594 // Insert basic copies.
2595 applyDefaultMapping(OpdMapper);
2596
2597 Register DstReg = MI.getOperand(0).getReg();
2598 Register SrcReg0 = MI.getOperand(1).getReg();
2599 Register SrcReg1 = MI.getOperand(2).getReg();
2600 const LLT S32 = LLT::scalar(32);
2601 const LLT S64 = LLT::scalar(64);
2602 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2603 "that handles only 64-bit operands.");
2604 const RegisterBank *DstBank =
2605 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2606
2607 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2608 // with s_mul_u64 operation.
2609 if (DstBank == &AMDGPU::SGPRRegBank) {
2610 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2611 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2612 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2613 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2614 return;
2615 }
2616
2617 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2618 // with a vector mad.
2619 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2620 "The destination operand should be in vector registers.");
2621
2622 // Extract the lower subregister from the first operand.
2623 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2624 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2625 MRI.setType(Op0L, S32);
2626 B.buildTrunc(Op0L, SrcReg0);
2627
2628 // Extract the lower subregister from the second operand.
2629 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2630 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2631 MRI.setType(Op1L, S32);
2632 B.buildTrunc(Op1L, SrcReg1);
2633
2634 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2635 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2636 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2637
2639 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2640 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2641 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2642 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2643 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2644 MI.eraseFromParent();
2645 return;
2646 }
2647 case AMDGPU::G_SEXT_INREG: {
2648 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2649 if (SrcRegs.empty())
2650 break; // Nothing to repair
2651
2652 const LLT S32 = LLT::scalar(32);
2653 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2654
2655 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2656 // we would need to further expand, and doesn't let us directly set the
2657 // result registers.
2658 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2659
2660 int Amt = MI.getOperand(2).getImm();
2661 if (Amt <= 32) {
2662 // Downstream users have expectations for the high bit behavior, so freeze
2663 // incoming undefined bits.
2664 if (Amt == 32) {
2665 // The low bits are unchanged.
2666 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2667 } else {
2668 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2669 // Extend in the low bits and propagate the sign bit to the high half.
2670 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2671 }
2672
2673 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2674 } else {
2675 // The low bits are unchanged, and extend in the high bits.
2676 // No freeze required
2677 B.buildCopy(DstRegs[0], SrcRegs[0]);
2678 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2679 }
2680
2681 Register DstReg = MI.getOperand(0).getReg();
2682 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2683 MI.eraseFromParent();
2684 return;
2685 }
2686 case AMDGPU::G_CTPOP:
2687 case AMDGPU::G_BITREVERSE: {
2688 const RegisterBank *DstBank =
2689 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2690 if (DstBank == &AMDGPU::SGPRRegBank)
2691 break;
2692
2693 Register SrcReg = MI.getOperand(1).getReg();
2694 const LLT S32 = LLT::scalar(32);
2695 LLT Ty = MRI.getType(SrcReg);
2696 if (Ty == S32)
2697 break;
2698
2699 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2700
2701 MachineFunction &MF = B.getMF();
2702 LegalizerHelper Helper(MF, ApplyVALU, B);
2703
2704 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2705 llvm_unreachable("narrowScalar should have succeeded");
2706 return;
2707 }
2708 case AMDGPU::G_AMDGPU_FFBH_U32:
2709 case AMDGPU::G_AMDGPU_FFBL_B32:
2710 case AMDGPU::G_CTLZ_ZERO_UNDEF:
2711 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2712 const RegisterBank *DstBank =
2713 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2714 if (DstBank == &AMDGPU::SGPRRegBank)
2715 break;
2716
2717 Register SrcReg = MI.getOperand(1).getReg();
2718 const LLT S32 = LLT::scalar(32);
2719 LLT Ty = MRI.getType(SrcReg);
2720 if (Ty == S32)
2721 break;
2722
2723 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2724 // which return -1 when the input is zero:
2725 // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2726 // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2727 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2728 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2729 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2730 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2731 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2732 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2733 : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2734 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2735 : Opc;
2736 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2737 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2738 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2739 unsigned AddOpc =
2740 Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2741 ? AMDGPU::G_ADD
2742 : AMDGPU::G_UADDSAT;
2743 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2744 Register DstReg = MI.getOperand(0).getReg();
2745 B.buildUMin(DstReg, X, Y);
2746 MI.eraseFromParent();
2747 return;
2748 }
2749 case AMDGPU::G_SEXT:
2750 case AMDGPU::G_ZEXT:
2751 case AMDGPU::G_ANYEXT: {
2752 Register SrcReg = MI.getOperand(1).getReg();
2753 LLT SrcTy = MRI.getType(SrcReg);
2754 const bool Signed = Opc == AMDGPU::G_SEXT;
2755
2756 assert(OpdMapper.getVRegs(1).empty());
2757
2758 const RegisterBank *SrcBank =
2759 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2760
2761 Register DstReg = MI.getOperand(0).getReg();
2762 LLT DstTy = MRI.getType(DstReg);
2763 if (DstTy.isScalar() &&
2764 SrcBank != &AMDGPU::SGPRRegBank &&
2765 SrcBank != &AMDGPU::VCCRegBank &&
2766 // FIXME: Should handle any type that round to s64 when irregular
2767 // breakdowns supported.
2768 DstTy.getSizeInBits() == 64 &&
2769 SrcTy.getSizeInBits() <= 32) {
2770 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2771
2772 // Extend to 32-bit, and then extend the low half.
2773 if (Signed) {
2774 // TODO: Should really be buildSExtOrCopy
2775 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2776 } else if (Opc == AMDGPU::G_ZEXT) {
2777 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2778 } else {
2779 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2780 }
2781
2782 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2783 MRI.setRegBank(DstReg, *SrcBank);
2784 MI.eraseFromParent();
2785 return;
2786 }
2787
2788 if (SrcTy != LLT::scalar(1))
2789 return;
2790
2791 // It is not legal to have a legalization artifact with a VCC source. Rather
2792 // than introducing a copy, insert the select we would have to select the
2793 // copy to.
2794 if (SrcBank == &AMDGPU::VCCRegBank) {
2795 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2796
2797 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2798
2799 unsigned DstSize = DstTy.getSizeInBits();
2800 // 64-bit select is SGPR only
2801 const bool UseSel64 = DstSize > 32 &&
2802 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2803
2804 // TODO: Should s16 select be legal?
2805 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2806 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2807 auto False = B.buildConstant(SelType, 0);
2808
2809 MRI.setRegBank(True.getReg(0), *DstBank);
2810 MRI.setRegBank(False.getReg(0), *DstBank);
2811 MRI.setRegBank(DstReg, *DstBank);
2812
2813 if (DstSize > 32) {
2814 B.buildSelect(DefRegs[0], SrcReg, True, False);
2815 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2816 } else if (DstSize < 32) {
2817 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2818 MRI.setRegBank(Sel.getReg(0), *DstBank);
2819 B.buildTrunc(DstReg, Sel);
2820 } else {
2821 B.buildSelect(DstReg, SrcReg, True, False);
2822 }
2823
2824 MI.eraseFromParent();
2825 return;
2826 }
2827
2828 break;
2829 }
2830 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2831 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2832
2833 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2834
2835 Register DstReg = MI.getOperand(0).getReg();
2836 Register SrcReg = MI.getOperand(1).getReg();
2837
2838 const LLT S32 = LLT::scalar(32);
2839 LLT DstTy = MRI.getType(DstReg);
2840 LLT SrcTy = MRI.getType(SrcReg);
2841
2842 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2843 return;
2844
2845 const ValueMapping &DstMapping
2846 = OpdMapper.getInstrMapping().getOperandMapping(0);
2847 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2848 const RegisterBank *SrcBank =
2849 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2850 const RegisterBank *IdxBank =
2851 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2852
2853 Register BaseIdxReg;
2854 unsigned ConstOffset;
2855 std::tie(BaseIdxReg, ConstOffset) =
2856 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2857
2858 // See if the index is an add of a constant which will be foldable by moving
2859 // the base register of the index later if this is going to be executed in a
2860 // waterfall loop. This is essentially to reassociate the add of a constant
2861 // with the readfirstlane.
2862 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2863 ConstOffset > 0 &&
2864 ConstOffset < SrcTy.getNumElements();
2865
2866 // Move the base register. We'll re-insert the add later.
2867 if (ShouldMoveIndexIntoLoop)
2868 MI.getOperand(2).setReg(BaseIdxReg);
2869
2870 // If this is a VGPR result only because the index was a VGPR result, the
2871 // actual indexing will be done on the SGPR source vector, which will
2872 // produce a scalar result. We need to copy to the VGPR result inside the
2873 // waterfall loop.
2874 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2875 SrcBank == &AMDGPU::SGPRRegBank;
2876 if (DstRegs.empty()) {
2877 applyDefaultMapping(OpdMapper);
2878
2880
2881 if (NeedCopyToVGPR) {
2882 // We don't want a phi for this temporary reg.
2883 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2884 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2885 MI.getOperand(0).setReg(TmpReg);
2886 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2887
2888 // Use a v_mov_b32 here to make the exec dependency explicit.
2889 buildVCopy(B, DstReg, TmpReg);
2890 }
2891
2892 // Re-insert the constant offset add inside the waterfall loop.
2893 if (ShouldMoveIndexIntoLoop)
2894 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2895
2896 return;
2897 }
2898
2899 assert(DstTy.getSizeInBits() == 64);
2900
2901 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2902
2903 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2904 auto One = B.buildConstant(S32, 1);
2905
2906 MachineBasicBlock::iterator MII = MI.getIterator();
2907
2908 // Split the vector index into 32-bit pieces. Prepare to move all of the
2909 // new instructions into a waterfall loop if necessary.
2910 //
2911 // Don't put the bitcast or constant in the loop.
2912 MachineInstrSpan Span(MII, &B.getMBB());
2913
2914 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2915 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2916 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2917
2918 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2919 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2920
2921 MRI.setRegBank(DstReg, *DstBank);
2922 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2923 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2924 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2925 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2926
2927 SmallSet<Register, 4> OpsToWaterfall;
2928 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2929 MI.eraseFromParent();
2930 return;
2931 }
2932
2933 // Remove the original instruction to avoid potentially confusing the
2934 // waterfall loop logic.
2935 B.setInstr(*Span.begin());
2936 MI.eraseFromParent();
2937 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2938 OpsToWaterfall);
2939
2940 if (NeedCopyToVGPR) {
2941 MachineBasicBlock *LoopBB = Extract1->getParent();
2944 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2945 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2946
2947 Extract0->getOperand(0).setReg(TmpReg0);
2948 Extract1->getOperand(0).setReg(TmpReg1);
2949
2950 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2951
2952 buildVCopy(B, DstRegs[0], TmpReg0);
2953 buildVCopy(B, DstRegs[1], TmpReg1);
2954 }
2955
2956 if (ShouldMoveIndexIntoLoop)
2957 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2958
2959 return;
2960 }
2961 case AMDGPU::G_INSERT_VECTOR_ELT: {
2962 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2963
2964 Register DstReg = MI.getOperand(0).getReg();
2965 LLT VecTy = MRI.getType(DstReg);
2966
2967 assert(OpdMapper.getVRegs(0).empty());
2968 assert(OpdMapper.getVRegs(3).empty());
2969
2970 if (substituteSimpleCopyRegs(OpdMapper, 1))
2971 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2972
2973 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2974 return;
2975
2976 const RegisterBank *IdxBank =
2977 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2978
2979 Register SrcReg = MI.getOperand(1).getReg();
2980 Register InsReg = MI.getOperand(2).getReg();
2981 LLT InsTy = MRI.getType(InsReg);
2982 (void)InsTy;
2983
2984 Register BaseIdxReg;
2985 unsigned ConstOffset;
2986 std::tie(BaseIdxReg, ConstOffset) =
2987 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2988
2989 // See if the index is an add of a constant which will be foldable by moving
2990 // the base register of the index later if this is going to be executed in a
2991 // waterfall loop. This is essentially to reassociate the add of a constant
2992 // with the readfirstlane.
2993 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2994 ConstOffset > 0 &&
2995 ConstOffset < VecTy.getNumElements();
2996
2997 // Move the base register. We'll re-insert the add later.
2998 if (ShouldMoveIndexIntoLoop)
2999 MI.getOperand(3).setReg(BaseIdxReg);
3000
3001
3002 if (InsRegs.empty()) {
3004
3005 // Re-insert the constant offset add inside the waterfall loop.
3006 if (ShouldMoveIndexIntoLoop) {
3007 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
3008 }
3009
3010 return;
3011 }
3012
3013 assert(InsTy.getSizeInBits() == 64);
3014
3015 const LLT S32 = LLT::scalar(32);
3016 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3017
3018 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3019 auto One = B.buildConstant(S32, 1);
3020
3021 // Split the vector index into 32-bit pieces. Prepare to move all of the
3022 // new instructions into a waterfall loop if necessary.
3023 //
3024 // Don't put the bitcast or constant in the loop.
3026
3027 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3028 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3029 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3030
3031 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3032 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3033
3034 const RegisterBank *DstBank =
3035 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3036 const RegisterBank *SrcBank =
3037 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3038 const RegisterBank *InsSrcBank =
3039 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3040
3041 MRI.setRegBank(InsReg, *InsSrcBank);
3042 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3043 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3044 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3045 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3046 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3047 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3048
3049
3050 SmallSet<Register, 4> OpsToWaterfall;
3051 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3052 B.setInsertPt(B.getMBB(), MI);
3053 B.buildBitcast(DstReg, InsHi);
3054 MI.eraseFromParent();
3055 return;
3056 }
3057
3058 B.setInstr(*Span.begin());
3059 MI.eraseFromParent();
3060
3061 // Figure out the point after the waterfall loop before mangling the control
3062 // flow.
3063 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3064 OpsToWaterfall);
3065
3066 // The insertion point is now right after the original instruction.
3067 //
3068 // Keep the bitcast to the original vector type out of the loop. Doing this
3069 // saved an extra phi we don't need inside the loop.
3070 B.buildBitcast(DstReg, InsHi);
3071
3072 // Re-insert the constant offset add inside the waterfall loop.
3073 if (ShouldMoveIndexIntoLoop)
3074 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3075
3076 return;
3077 }
3078 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3079 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3080 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3081 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3082 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3083 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3084 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3085 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3086 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3087 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3088 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3089 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3090 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3091 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3092 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3093 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3094 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3095 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3096 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3097 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3098 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3099 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3100 applyDefaultMapping(OpdMapper);
3101 executeInWaterfallLoop(B, MI, {1, 4});
3102 return;
3103 }
3104 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3105 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3106 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3107 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3108 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3109 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3110 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3111 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3112 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3113 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3114 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3115 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3116 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB_CLAMP_U32:
3117 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_COND_SUB_U32:
3118 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3119 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3120 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3121 applyDefaultMapping(OpdMapper);
3122 executeInWaterfallLoop(B, MI, {2, 5});
3123 return;
3124 }
3125 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3126 applyDefaultMapping(OpdMapper);
3127 executeInWaterfallLoop(B, MI, {3, 6});
3128 return;
3129 }
3130 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3131 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3132 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3133 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3134 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3135 applyMappingSBufferLoad(B, OpdMapper);
3136 return;
3137 }
3138 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3141 return;
3142 case AMDGPU::G_INTRINSIC:
3143 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3144 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3145 case Intrinsic::amdgcn_readlane: {
3146 substituteSimpleCopyRegs(OpdMapper, 2);
3147
3148 assert(OpdMapper.getVRegs(0).empty());
3149 assert(OpdMapper.getVRegs(3).empty());
3150
3151 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3152 // waterfall loop, so assume it's a uniform value.
3153 constrainOpWithReadfirstlane(B, MI, 3); // Index
3154 return;
3155 }
3156 case Intrinsic::amdgcn_writelane: {
3157 assert(OpdMapper.getVRegs(0).empty());
3158 assert(OpdMapper.getVRegs(2).empty());
3159 assert(OpdMapper.getVRegs(3).empty());
3160
3161 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3162 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3163 constrainOpWithReadfirstlane(B, MI, 3); // Index
3164 return;
3165 }
3166 case Intrinsic::amdgcn_interp_p1:
3167 case Intrinsic::amdgcn_interp_p2:
3168 case Intrinsic::amdgcn_interp_mov:
3169 case Intrinsic::amdgcn_interp_p1_f16:
3170 case Intrinsic::amdgcn_interp_p2_f16:
3171 case Intrinsic::amdgcn_lds_param_load: {
3172 applyDefaultMapping(OpdMapper);
3173
3174 // Readlane for m0 value, which is always the last operand.
3175 // FIXME: Should this be a waterfall loop instead?
3176 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3177 return;
3178 }
3179 case Intrinsic::amdgcn_interp_inreg_p10:
3180 case Intrinsic::amdgcn_interp_inreg_p2:
3181 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3182 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3183 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3184 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3185 case Intrinsic::amdgcn_permlane16_swap:
3186 case Intrinsic::amdgcn_permlane32_swap:
3187 applyDefaultMapping(OpdMapper);
3188 return;
3189 case Intrinsic::amdgcn_permlane16:
3190 case Intrinsic::amdgcn_permlanex16: {
3191 // Doing a waterfall loop over these wouldn't make any sense.
3192 substituteSimpleCopyRegs(OpdMapper, 2);
3193 substituteSimpleCopyRegs(OpdMapper, 3);
3196 return;
3197 }
3198 case Intrinsic::amdgcn_permlane_bcast:
3199 case Intrinsic::amdgcn_permlane_up:
3200 case Intrinsic::amdgcn_permlane_down:
3201 case Intrinsic::amdgcn_permlane_xor:
3202 // Doing a waterfall loop over these wouldn't make any sense.
3205 return;
3206 case Intrinsic::amdgcn_permlane_idx_gen: {
3208 return;
3209 }
3210 case Intrinsic::amdgcn_sbfe:
3211 applyMappingBFE(B, OpdMapper, true);
3212 return;
3213 case Intrinsic::amdgcn_ubfe:
3214 applyMappingBFE(B, OpdMapper, false);
3215 return;
3216 case Intrinsic::amdgcn_inverse_ballot:
3217 case Intrinsic::amdgcn_s_bitreplicate:
3218 case Intrinsic::amdgcn_s_quadmask:
3219 case Intrinsic::amdgcn_s_wqm:
3220 applyDefaultMapping(OpdMapper);
3221 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3222 return;
3223 case Intrinsic::amdgcn_ballot:
3224 // Use default handling and insert copy to vcc source.
3225 break;
3226 }
3227 break;
3228 }
3229 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3230 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3231 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3232 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3233 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3234 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3236 assert(RSrcIntrin && RSrcIntrin->IsImage);
3237 // Non-images can have complications from operands that allow both SGPR
3238 // and VGPR. For now it's too complicated to figure out the final opcode
3239 // to derive the register bank from the MCInstrDesc.
3240 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3241 return;
3242 }
3243 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
3244 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
3245 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
3246 bool IsDualOrBVH8 =
3247 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
3248 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
3249 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
3250 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
3251 applyDefaultMapping(OpdMapper);
3252 executeInWaterfallLoop(B, MI, {LastRegOpIdx});
3253 return;
3254 }
3255 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3256 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3257 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3258 switch (IntrID) {
3259 case Intrinsic::amdgcn_ds_ordered_add:
3260 case Intrinsic::amdgcn_ds_ordered_swap: {
3261 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3262 assert(OpdMapper.getVRegs(0).empty());
3263 substituteSimpleCopyRegs(OpdMapper, 3);
3265 return;
3266 }
3267 case Intrinsic::amdgcn_ds_gws_init:
3268 case Intrinsic::amdgcn_ds_gws_barrier:
3269 case Intrinsic::amdgcn_ds_gws_sema_br: {
3270 // Only the first lane is executes, so readfirstlane is safe.
3271 substituteSimpleCopyRegs(OpdMapper, 1);
3273 return;
3274 }
3275 case Intrinsic::amdgcn_ds_gws_sema_v:
3276 case Intrinsic::amdgcn_ds_gws_sema_p:
3277 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3278 // Only the first lane is executes, so readfirstlane is safe.
3280 return;
3281 }
3282 case Intrinsic::amdgcn_ds_append:
3283 case Intrinsic::amdgcn_ds_consume: {
3285 return;
3286 }
3287 case Intrinsic::amdgcn_s_alloc_vgpr:
3289 return;
3290 case Intrinsic::amdgcn_s_sendmsg:
3291 case Intrinsic::amdgcn_s_sendmsghalt: {
3292 // FIXME: Should this use a waterfall loop?
3294 return;
3295 }
3296 case Intrinsic::amdgcn_s_setreg: {
3298 return;
3299 }
3300 case Intrinsic::amdgcn_s_ttracedata:
3302 return;
3303 case Intrinsic::amdgcn_raw_buffer_load_lds:
3304 case Intrinsic::amdgcn_raw_buffer_load_async_lds:
3305 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
3306 case Intrinsic::amdgcn_raw_ptr_buffer_load_async_lds: {
3307 applyDefaultMapping(OpdMapper);
3308 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3310 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3311 return;
3312 }
3313 case Intrinsic::amdgcn_struct_buffer_load_lds:
3314 case Intrinsic::amdgcn_struct_buffer_load_async_lds:
3315 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds:
3316 case Intrinsic::amdgcn_struct_ptr_buffer_load_async_lds: {
3317 applyDefaultMapping(OpdMapper);
3318 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3320 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3321 return;
3322 }
3323 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
3324 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
3325 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
3326 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
3327 applyDefaultMapping(OpdMapper);
3329 return;
3330 }
3331 case Intrinsic::amdgcn_load_to_lds:
3332 case Intrinsic::amdgcn_load_async_to_lds:
3333 case Intrinsic::amdgcn_global_load_lds:
3334 case Intrinsic::amdgcn_global_load_async_lds: {
3335 applyDefaultMapping(OpdMapper);
3337 return;
3338 }
3339 case Intrinsic::amdgcn_lds_direct_load: {
3340 applyDefaultMapping(OpdMapper);
3341 // Readlane for m0 value, which is always the last operand.
3342 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3343 return;
3344 }
3345 case Intrinsic::amdgcn_exp_row:
3346 applyDefaultMapping(OpdMapper);
3348 return;
3349 case Intrinsic::amdgcn_cluster_load_b32:
3350 case Intrinsic::amdgcn_cluster_load_b64:
3351 case Intrinsic::amdgcn_cluster_load_b128: {
3352 applyDefaultMapping(OpdMapper);
3354 return;
3355 }
3356 case Intrinsic::amdgcn_s_sleep_var:
3357 assert(OpdMapper.getVRegs(1).empty());
3359 return;
3360 case Intrinsic::amdgcn_s_barrier_join:
3361 case Intrinsic::amdgcn_s_wakeup_barrier:
3363 return;
3364 case Intrinsic::amdgcn_s_barrier_init:
3365 case Intrinsic::amdgcn_s_barrier_signal_var:
3368 return;
3369 case Intrinsic::amdgcn_s_get_barrier_state:
3370 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3372 return;
3373 }
3374 case Intrinsic::amdgcn_s_prefetch_data: {
3375 Register PtrReg = MI.getOperand(1).getReg();
3376 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3380 } else
3381 MI.eraseFromParent();
3382 return;
3383 }
3384 case Intrinsic::amdgcn_tensor_load_to_lds:
3385 case Intrinsic::amdgcn_tensor_store_from_lds: {
3391 return;
3392 }
3393 default: {
3394 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3396 // Non-images can have complications from operands that allow both SGPR
3397 // and VGPR. For now it's too complicated to figure out the final opcode
3398 // to derive the register bank from the MCInstrDesc.
3399 if (RSrcIntrin->IsImage) {
3400 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3401 return;
3402 }
3403 }
3404
3405 break;
3406 }
3407 }
3408 break;
3409 }
3410 case AMDGPU::G_SI_CALL: {
3411 // Use a set to avoid extra readfirstlanes in the case where multiple
3412 // operands are the same register.
3413 SmallSet<Register, 4> SGPROperandRegs;
3414
3415 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3416 break;
3417
3418 // Move all copies to physical SGPRs that are used by the call instruction
3419 // into the loop block. Start searching for these copies until the
3420 // ADJCALLSTACKUP.
3421 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3422 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3423
3424 // Move all non-copies before the copies, so that a complete range can be
3425 // moved into the waterfall loop.
3426 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3427 // Count of NonCopyInstrs found until the current LastCopy.
3428 unsigned NonCopyInstrsLen = 0;
3430 MachineBasicBlock::iterator LastCopy = Start;
3431 MachineBasicBlock *MBB = MI.getParent();
3432 const SIMachineFunctionInfo *Info =
3433 MBB->getParent()->getInfo<SIMachineFunctionInfo>();
3434 while (Start->getOpcode() != FrameSetupOpcode) {
3435 --Start;
3436 bool IsCopy = false;
3437 if (Start->getOpcode() == AMDGPU::COPY) {
3438 auto &Dst = Start->getOperand(0);
3439 if (Dst.isReg()) {
3440 Register Reg = Dst.getReg();
3441 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3442 IsCopy = true;
3443 } else {
3444 // Also move the copy from the scratch rsrc descriptor into the loop
3445 // to allow it to be optimized away.
3446 auto &Src = Start->getOperand(1);
3447 if (Src.isReg()) {
3448 Reg = Src.getReg();
3449 IsCopy = Info->getScratchRSrcReg() == Reg;
3450 }
3451 }
3452 }
3453 }
3454
3455 if (IsCopy) {
3456 LastCopy = Start;
3457 NonCopyInstrsLen = NonCopyInstrs.size();
3458 } else {
3459 NonCopyInstrs.push_back(&*Start);
3460 }
3461 }
3462 NonCopyInstrs.resize(NonCopyInstrsLen);
3463
3464 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3465 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3466 }
3467 Start = LastCopy;
3468
3469 // Do the same for copies after the loop
3470 NonCopyInstrs.clear();
3471 NonCopyInstrsLen = 0;
3473 LastCopy = End;
3474 while (End->getOpcode() != FrameDestroyOpcode) {
3475 ++End;
3476 bool IsCopy = false;
3477 if (End->getOpcode() == AMDGPU::COPY) {
3478 auto &Src = End->getOperand(1);
3479 if (Src.isReg()) {
3480 Register Reg = Src.getReg();
3481 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3482 }
3483 }
3484
3485 if (IsCopy) {
3486 LastCopy = End;
3487 NonCopyInstrsLen = NonCopyInstrs.size();
3488 } else {
3489 NonCopyInstrs.push_back(&*End);
3490 }
3491 }
3492 NonCopyInstrs.resize(NonCopyInstrsLen);
3493
3494 End = LastCopy;
3495 ++LastCopy;
3496 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3497 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3498 }
3499
3500 ++End;
3501 B.setInsertPt(B.getMBB(), Start);
3502 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3503 break;
3504 }
3505 case AMDGPU::G_AMDGPU_FLAT_LOAD_MONITOR:
3506 case AMDGPU::G_AMDGPU_GLOBAL_LOAD_MONITOR:
3507 case AMDGPU::G_LOAD:
3508 case AMDGPU::G_ZEXTLOAD:
3509 case AMDGPU::G_SEXTLOAD: {
3510 if (applyMappingLoad(B, OpdMapper, MI))
3511 return;
3512 break;
3513 }
3514 case AMDGPU::G_DYN_STACKALLOC:
3515 applyMappingDynStackAlloc(B, OpdMapper, MI);
3516 return;
3517 case AMDGPU::G_STACKRESTORE: {
3518 applyDefaultMapping(OpdMapper);
3520 return;
3521 }
3522 case AMDGPU::G_SBFX:
3523 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3524 return;
3525 case AMDGPU::G_UBFX:
3526 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3527 return;
3528 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3529 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3530 applyMappingMAD_64_32(B, OpdMapper);
3531 return;
3532 case AMDGPU::G_PREFETCH: {
3533 if (!Subtarget.hasSafeSmemPrefetch() && !Subtarget.hasVmemPrefInsts()) {
3534 MI.eraseFromParent();
3535 return;
3536 }
3537 Register PtrReg = MI.getOperand(0).getReg();
3538 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3539 if (PtrBank == AMDGPU::VGPRRegBankID &&
3540 (!Subtarget.hasVmemPrefInsts() || !MI.getOperand(3).getImm())) {
3541 // Cannot do I$ prefetch with divergent pointer.
3542 MI.eraseFromParent();
3543 return;
3544 }
3545 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3548 (!Subtarget.hasSafeSmemPrefetch() &&
3550 !MI.getOperand(3).getImm() /* I$ prefetch */))) {
3551 MI.eraseFromParent();
3552 return;
3553 }
3554 applyDefaultMapping(OpdMapper);
3555 return;
3556 }
3557 default:
3558 break;
3559 }
3560
3561 return applyDefaultMapping(OpdMapper);
3562}
3563
3564// vgpr, sgpr -> vgpr
3565// vgpr, agpr -> vgpr
3566// agpr, agpr -> agpr
3567// agpr, sgpr -> vgpr
3568static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3569 if (RB0 == AMDGPU::InvalidRegBankID)
3570 return RB1;
3571 if (RB1 == AMDGPU::InvalidRegBankID)
3572 return RB0;
3573
3574 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3575 return AMDGPU::SGPRRegBankID;
3576
3577 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3578 return AMDGPU::AGPRRegBankID;
3579
3580 return AMDGPU::VGPRRegBankID;
3581}
3582
3583static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3584 if (RB0 == AMDGPU::InvalidRegBankID)
3585 return RB1;
3586 if (RB1 == AMDGPU::InvalidRegBankID)
3587 return RB0;
3588
3589 // vcc, vcc -> vcc
3590 // vcc, sgpr -> vcc
3591 // vcc, vgpr -> vcc
3592 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3593 return AMDGPU::VCCRegBankID;
3594
3595 // vcc, vgpr -> vgpr
3596 return regBankUnion(RB0, RB1);
3597}
3598
3600 const MachineInstr &MI) const {
3601 unsigned RegBank = AMDGPU::InvalidRegBankID;
3602
3603 for (const MachineOperand &MO : MI.operands()) {
3604 if (!MO.isReg())
3605 continue;
3606 Register Reg = MO.getReg();
3607 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3608 RegBank = regBankUnion(RegBank, Bank->getID());
3609 if (RegBank == AMDGPU::VGPRRegBankID)
3610 break;
3611 }
3612 }
3613
3614 return RegBank;
3615}
3616
3618 const MachineFunction &MF = *MI.getMF();
3619 const MachineRegisterInfo &MRI = MF.getRegInfo();
3620 for (const MachineOperand &MO : MI.operands()) {
3621 if (!MO.isReg())
3622 continue;
3623 Register Reg = MO.getReg();
3624 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3625 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3626 return false;
3627 }
3628 }
3629 return true;
3630}
3631
3634 const MachineFunction &MF = *MI.getMF();
3635 const MachineRegisterInfo &MRI = MF.getRegInfo();
3636 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3637
3638 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3639 const MachineOperand &SrcOp = MI.getOperand(i);
3640 if (!SrcOp.isReg())
3641 continue;
3642
3643 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3644 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3645 }
3646 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3647 MI.getNumOperands());
3648}
3649
3652 const MachineFunction &MF = *MI.getMF();
3653 const MachineRegisterInfo &MRI = MF.getRegInfo();
3654 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3655
3656 // Even though we technically could use SGPRs, this would require knowledge of
3657 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3658 //
3659 // TODO: Unary ops are trivially OK, so accept SGPRs?
3660 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3661 const MachineOperand &Src = MI.getOperand(i);
3662 if (!Src.isReg())
3663 continue;
3664
3665 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3666 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3667 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3668 }
3669
3670 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3671 MI.getNumOperands());
3672}
3673
3676 const MachineFunction &MF = *MI.getMF();
3677 const MachineRegisterInfo &MRI = MF.getRegInfo();
3678 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3679
3680 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3681 const MachineOperand &Op = MI.getOperand(I);
3682 if (!Op.isReg())
3683 continue;
3684
3685 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3686 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3687 }
3688
3689 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3690 MI.getNumOperands());
3691}
3692
3695 const MachineInstr &MI,
3696 int RsrcIdx) const {
3697 // The reported argument index is relative to the IR intrinsic call arguments,
3698 // so we need to shift by the number of defs and the intrinsic ID.
3699 RsrcIdx += MI.getNumExplicitDefs() + 1;
3700
3701 const int NumOps = MI.getNumOperands();
3703
3704 // TODO: Should packed/unpacked D16 difference be reported here as part of
3705 // the value mapping?
3706 for (int I = 0; I != NumOps; ++I) {
3707 if (!MI.getOperand(I).isReg())
3708 continue;
3709
3710 Register OpReg = MI.getOperand(I).getReg();
3711 // We replace some dead address operands with $noreg
3712 if (!OpReg)
3713 continue;
3714
3715 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3716
3717 // FIXME: Probably need a new intrinsic register bank searchable table to
3718 // handle arbitrary intrinsics easily.
3719 //
3720 // If this has a sampler, it immediately follows rsrc.
3721 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3722
3723 if (MustBeSGPR) {
3724 // If this must be an SGPR, so we must report whatever it is as legal.
3725 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3726 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3727 } else {
3728 // Some operands must be VGPR, and these are easy to copy to.
3729 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3730 }
3731 }
3732
3733 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3734}
3735
3736/// Return the mapping for a pointer argument.
3739 Register PtrReg) const {
3740 LLT PtrTy = MRI.getType(PtrReg);
3741 unsigned Size = PtrTy.getSizeInBits();
3742 if (Subtarget.useFlatForGlobal() ||
3744 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3745
3746 // If we're using MUBUF instructions for global memory, an SGPR base register
3747 // is possible. Otherwise this needs to be a VGPR.
3748 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3749 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3750}
3751
3754
3755 const MachineFunction &MF = *MI.getMF();
3756 const MachineRegisterInfo &MRI = MF.getRegInfo();
3758 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3759 Register PtrReg = MI.getOperand(1).getReg();
3760 LLT PtrTy = MRI.getType(PtrReg);
3761 unsigned AS = PtrTy.getAddressSpace();
3762 unsigned PtrSize = PtrTy.getSizeInBits();
3763
3764 const ValueMapping *ValMapping;
3765 const ValueMapping *PtrMapping;
3766
3767 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3768
3769 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3770 if (isScalarLoadLegal(MI)) {
3771 // We have a uniform instruction so we want to use an SMRD load
3772 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3773 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3774 } else {
3775 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3776
3777 // If we're using MUBUF instructions for global memory, an SGPR base
3778 // register is possible. Otherwise this needs to be a VGPR.
3779 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3780 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3781
3782 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3783 }
3784 } else {
3785 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3786 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3787 }
3788
3789 OpdsMapping[0] = ValMapping;
3790 OpdsMapping[1] = PtrMapping;
3792 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3793 return Mapping;
3794
3795 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3796 // handle that during instruction selection?
3797}
3798
3799unsigned
3801 const MachineRegisterInfo &MRI,
3802 unsigned Default) const {
3803 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3804 return Bank ? Bank->getID() : Default;
3805}
3806
3809 const MachineRegisterInfo &MRI,
3810 const TargetRegisterInfo &TRI) const {
3811 // Lie and claim anything is legal, even though this needs to be an SGPR
3812 // applyMapping will have to deal with it as a waterfall loop.
3813 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3814 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3815 return AMDGPU::getValueMapping(Bank, Size);
3816}
3817
3820 const MachineRegisterInfo &MRI,
3821 const TargetRegisterInfo &TRI) const {
3822 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3823 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3824}
3825
3828 const MachineRegisterInfo &MRI,
3829 const TargetRegisterInfo &TRI) const {
3830 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3831 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3832}
3833
3834///
3835/// This function must return a legal mapping, because
3836/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3837/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3838/// VGPR to SGPR generated is illegal.
3839///
3840// Operands that must be SGPRs must accept potentially divergent VGPRs as
3841// legal. These will be dealt with in applyMappingImpl.
3842//
3845 const MachineFunction &MF = *MI.getMF();
3846 const MachineRegisterInfo &MRI = MF.getRegInfo();
3847
3848 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3849 Register DstReg = MI.getOperand(0).getReg();
3850 Register SrcReg = MI.getOperand(1).getReg();
3851
3852 // The default logic bothers to analyze impossible alternative mappings. We
3853 // want the most straightforward mapping, so just directly handle this.
3854 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3855 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3856
3857 // For COPY between a physical reg and an s1, there is no type associated so
3858 // we need to take the virtual register's type as a hint on how to interpret
3859 // s1 values.
3860 unsigned Size;
3861 if (!SrcReg.isVirtual() && !DstBank &&
3862 MRI.getType(DstReg) == LLT::scalar(1)) {
3863 DstBank = &AMDGPU::VCCRegBank;
3864 Size = 1;
3865 } else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1)) {
3866 DstBank = &AMDGPU::VCCRegBank;
3867 Size = 1;
3868 } else {
3869 Size = getSizeInBits(DstReg, MRI, *TRI);
3870 }
3871
3872 if (!DstBank)
3873 DstBank = SrcBank;
3874 else if (!SrcBank)
3875 SrcBank = DstBank;
3876
3877 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3878 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3880
3881 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3882 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3883 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3884 OpdsMapping[0] = &ValMap;
3885 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3886 OpdsMapping[1] = &ValMap;
3887
3888 return getInstructionMapping(
3889 1, /*Cost*/ 1,
3890 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3891 }
3892
3893 if (MI.isRegSequence()) {
3894 // If any input is a VGPR, the result must be a VGPR. The default handling
3895 // assumes any copy between banks is legal.
3896 unsigned BankID = AMDGPU::SGPRRegBankID;
3897
3898 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3899 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3900 // It doesn't make sense to use vcc or scc banks here, so just ignore
3901 // them.
3902 if (OpBank != AMDGPU::SGPRRegBankID) {
3903 BankID = AMDGPU::VGPRRegBankID;
3904 break;
3905 }
3906 }
3907 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3908
3909 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3910 return getInstructionMapping(
3911 1, /*Cost*/ 1,
3912 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3913 }
3914
3915 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3916 // properly.
3917 //
3918 // TODO: There are additional exec masking dependencies to analyze.
3919 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3920 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3921 Register DstReg = PHI->getReg(0);
3922
3923 // Sometimes the result may have already been assigned a bank.
3924 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3925 ResultBank = DstBank->getID();
3926
3927 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3928 Register Reg = PHI->getIncomingValue(I);
3929 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3930
3931 // FIXME: Assuming VGPR for any undetermined inputs.
3932 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3933 ResultBank = AMDGPU::VGPRRegBankID;
3934 break;
3935 }
3936
3937 // FIXME: Need to promote SGPR case to s32
3938 unsigned OpBank = Bank->getID();
3939 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3940 }
3941
3942 assert(ResultBank != AMDGPU::InvalidRegBankID);
3943
3944 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3945
3946 const ValueMapping &ValMap =
3947 getValueMapping(0, Size, getRegBank(ResultBank));
3948 return getInstructionMapping(
3949 1, /*Cost*/ 1,
3950 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3951 }
3952
3954 if (Mapping.isValid())
3955 return Mapping;
3956
3957 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3958
3959 switch (MI.getOpcode()) {
3960 default:
3962
3963 case AMDGPU::G_AND:
3964 case AMDGPU::G_OR:
3965 case AMDGPU::G_XOR:
3966 case AMDGPU::G_MUL: {
3967 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3968 if (Size == 1) {
3969 const RegisterBank *DstBank
3970 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3971
3972 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3973 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3974 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3975 if (DstBank) {
3976 TargetBankID = DstBank->getID();
3977 if (DstBank == &AMDGPU::VCCRegBank) {
3978 TargetBankID = AMDGPU::VCCRegBankID;
3979 BankLHS = AMDGPU::VCCRegBankID;
3980 BankRHS = AMDGPU::VCCRegBankID;
3981 } else {
3982 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3983 AMDGPU::SGPRRegBankID);
3984 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3985 AMDGPU::SGPRRegBankID);
3986 }
3987 } else {
3988 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3989 AMDGPU::VCCRegBankID);
3990 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3991 AMDGPU::VCCRegBankID);
3992
3993 // Both inputs should be true booleans to produce a boolean result.
3994 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3995 TargetBankID = AMDGPU::VGPRRegBankID;
3996 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3997 TargetBankID = AMDGPU::VCCRegBankID;
3998 BankLHS = AMDGPU::VCCRegBankID;
3999 BankRHS = AMDGPU::VCCRegBankID;
4000 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
4001 TargetBankID = AMDGPU::SGPRRegBankID;
4002 }
4003 }
4004
4005 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
4006 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
4007 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
4008 break;
4009 }
4010
4011 if (Size == 64) {
4012
4013 if (isSALUMapping(MI)) {
4014 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
4015 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
4016 } else {
4017 if (MI.getOpcode() == AMDGPU::G_MUL && Subtarget.hasVectorMulU64())
4018 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4019 else
4020 OpdsMapping[0] =
4021 getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
4022 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
4023 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
4024
4025 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
4026 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
4027 }
4028
4029 break;
4030 }
4031
4032 [[fallthrough]];
4033 }
4034 case AMDGPU::G_PTR_ADD:
4035 case AMDGPU::G_PTRMASK:
4036 case AMDGPU::G_ADD:
4037 case AMDGPU::G_SUB:
4038 case AMDGPU::G_SHL:
4039 case AMDGPU::G_LSHR:
4040 case AMDGPU::G_ASHR:
4041 case AMDGPU::G_UADDO:
4042 case AMDGPU::G_USUBO:
4043 case AMDGPU::G_UADDE:
4044 case AMDGPU::G_SADDE:
4045 case AMDGPU::G_USUBE:
4046 case AMDGPU::G_SSUBE:
4047 case AMDGPU::G_ABS:
4048 case AMDGPU::G_SHUFFLE_VECTOR:
4049 case AMDGPU::G_SBFX:
4050 case AMDGPU::G_UBFX:
4051 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
4052 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
4053 if (isSALUMapping(MI))
4054 return getDefaultMappingSOP(MI);
4055 return getDefaultMappingVOP(MI);
4056 case AMDGPU::G_SMIN:
4057 case AMDGPU::G_SMAX:
4058 case AMDGPU::G_UMIN:
4059 case AMDGPU::G_UMAX:
4060 if (isSALUMapping(MI)) {
4061 // There are no scalar 64-bit min and max, use vector instruction instead.
4062 if (MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 64 &&
4063 Subtarget.hasIntMinMax64())
4064 return getDefaultMappingVOP(MI);
4065 return getDefaultMappingSOP(MI);
4066 }
4067 return getDefaultMappingVOP(MI);
4068 case AMDGPU::G_FADD:
4069 case AMDGPU::G_FSUB:
4070 case AMDGPU::G_FMUL:
4071 case AMDGPU::G_FMA:
4072 case AMDGPU::G_FFLOOR:
4073 case AMDGPU::G_FCEIL:
4074 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
4075 case AMDGPU::G_FMINNUM:
4076 case AMDGPU::G_FMAXNUM:
4077 case AMDGPU::G_FMINIMUMNUM:
4078 case AMDGPU::G_FMAXIMUMNUM:
4079 case AMDGPU::G_INTRINSIC_TRUNC:
4080 case AMDGPU::G_STRICT_FADD:
4081 case AMDGPU::G_STRICT_FSUB:
4082 case AMDGPU::G_STRICT_FMUL:
4083 case AMDGPU::G_STRICT_FMA: {
4084 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4085 unsigned Size = Ty.getSizeInBits();
4086 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
4087 (Size == 32 || Size == 16) && isSALUMapping(MI))
4088 return getDefaultMappingSOP(MI);
4089 return getDefaultMappingVOP(MI);
4090 }
4091 case AMDGPU::G_FMINIMUM:
4092 case AMDGPU::G_FMAXIMUM: {
4093 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4094 unsigned Size = Ty.getSizeInBits();
4095 if (Subtarget.hasSALUMinimumMaximumInsts() && Ty.isScalar() &&
4096 (Size == 32 || Size == 16) && isSALUMapping(MI))
4097 return getDefaultMappingSOP(MI);
4098 return getDefaultMappingVOP(MI);
4099 }
4100 case AMDGPU::G_FPTOSI:
4101 case AMDGPU::G_FPTOUI:
4102 case AMDGPU::G_FPTOSI_SAT:
4103 case AMDGPU::G_FPTOUI_SAT:
4104 case AMDGPU::G_SITOFP:
4105 case AMDGPU::G_UITOFP: {
4106 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4107 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4108 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4110 return getDefaultMappingSOP(MI);
4111 return getDefaultMappingVOP(MI);
4112 }
4113 case AMDGPU::G_FPTRUNC:
4114 case AMDGPU::G_FPEXT: {
4115 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4116 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4117 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4119 return getDefaultMappingSOP(MI);
4120 return getDefaultMappingVOP(MI);
4121 }
4122 case AMDGPU::G_FSQRT:
4123 case AMDGPU::G_FEXP2:
4124 case AMDGPU::G_FLOG2: {
4125 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4126 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4128 return getDefaultMappingSOP(MI);
4129 return getDefaultMappingVOP(MI);
4130 }
4131 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4132 case AMDGPU::G_SSUBSAT:
4133 case AMDGPU::G_UADDSAT:
4134 case AMDGPU::G_USUBSAT:
4135 case AMDGPU::G_FMAD:
4136 case AMDGPU::G_FLDEXP:
4137 case AMDGPU::G_FMINNUM_IEEE:
4138 case AMDGPU::G_FMAXNUM_IEEE:
4139 case AMDGPU::G_FCANONICALIZE:
4140 case AMDGPU::G_STRICT_FLDEXP:
4141 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4142 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4143 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4144 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4145 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4146 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4147 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4148 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4149 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4150 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4151 case AMDGPU::G_AMDGPU_SMED3:
4152 case AMDGPU::G_AMDGPU_FMED3:
4153 return getDefaultMappingVOP(MI);
4154 case AMDGPU::G_UMULH:
4155 case AMDGPU::G_SMULH: {
4156 if (Subtarget.hasScalarMulHiInsts() && isSALUMapping(MI))
4157 return getDefaultMappingSOP(MI);
4158 return getDefaultMappingVOP(MI);
4159 }
4160 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4161 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4162 // Three possible mappings:
4163 //
4164 // - Default SOP
4165 // - Default VOP
4166 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4167 //
4168 // This allows instruction selection to keep the multiplication part of the
4169 // instruction on the SALU.
4170 bool AllSalu = true;
4171 bool MulSalu = true;
4172 for (unsigned i = 0; i < 5; ++i) {
4173 Register Reg = MI.getOperand(i).getReg();
4174 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4175 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4176 AllSalu = false;
4177 if (i == 2 || i == 3) {
4178 MulSalu = false;
4179 break;
4180 }
4181 }
4182 }
4183 }
4184
4185 if (AllSalu)
4186 return getDefaultMappingSOP(MI);
4187
4188 // If the multiply-add is full-rate in VALU, use that even if the
4189 // multiplication part is scalar. Accumulating separately on the VALU would
4190 // take two instructions.
4191 if (!MulSalu || Subtarget.hasFullRate64Ops())
4192 return getDefaultMappingVOP(MI);
4193
4194 // Keep the multiplication on the SALU, then accumulate on the VALU.
4195 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4196 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4197 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4198 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4199 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4200 break;
4201 }
4202 case AMDGPU::G_IMPLICIT_DEF: {
4203 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4204 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4205 break;
4206 }
4207 case AMDGPU::G_FCONSTANT:
4208 case AMDGPU::G_CONSTANT:
4209 case AMDGPU::G_GLOBAL_VALUE:
4210 case AMDGPU::G_FRAME_INDEX:
4211 case AMDGPU::G_BLOCK_ADDR:
4212 case AMDGPU::G_READSTEADYCOUNTER:
4213 case AMDGPU::G_READCYCLECOUNTER: {
4214 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4215 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4216 break;
4217 }
4218 case AMDGPU::G_DYN_STACKALLOC: {
4219 // Result is always uniform, and a wave reduction is needed for the source.
4220 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4221 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4222 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4223 break;
4224 }
4225 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4226 // This case is weird because we expect a physical register in the source,
4227 // but need to set a bank anyway.
4228 //
4229 // TODO: We could select the result to SGPR or VGPR
4230 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4231 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4232 break;
4233 }
4234 case AMDGPU::G_INSERT: {
4235 unsigned BankID = getMappingType(MRI, MI);
4236 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4237 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4238 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4239 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4240 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4241 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4242 OpdsMapping[3] = nullptr;
4243 break;
4244 }
4245 case AMDGPU::G_EXTRACT: {
4246 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4247 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4248 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4249 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4250 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4251 OpdsMapping[2] = nullptr;
4252 break;
4253 }
4254 case AMDGPU::G_BUILD_VECTOR:
4255 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4256 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4257 if (DstTy == LLT::fixed_vector(2, 16)) {
4258 unsigned DstSize = DstTy.getSizeInBits();
4259 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4260 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4261 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4262 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4263
4264 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4265 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4266 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4267 break;
4268 }
4269
4270 [[fallthrough]];
4271 }
4272 case AMDGPU::G_MERGE_VALUES:
4273 case AMDGPU::G_CONCAT_VECTORS: {
4274 unsigned Bank = getMappingType(MRI, MI);
4275 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4276 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4277
4278 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4279 // Op1 and Dst should use the same register bank.
4280 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4281 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4282 break;
4283 }
4284 case AMDGPU::G_BITREVERSE:
4285 case AMDGPU::G_BITCAST:
4286 case AMDGPU::G_INTTOPTR:
4287 case AMDGPU::G_PTRTOINT:
4288 case AMDGPU::G_FABS:
4289 case AMDGPU::G_FNEG: {
4290 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4291 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4292 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4293 break;
4294 }
4295 case AMDGPU::G_AMDGPU_FFBH_U32:
4296 case AMDGPU::G_AMDGPU_FFBL_B32:
4297 case AMDGPU::G_CTLZ_ZERO_UNDEF:
4298 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
4299 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4300 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4301 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4302 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4303 break;
4304 }
4305 case AMDGPU::G_CTPOP: {
4306 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4307 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4308 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4309
4310 // This should really be getValueMappingSGPR64Only, but allowing the generic
4311 // code to handle the register split just makes using LegalizerHelper more
4312 // difficult.
4313 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4314 break;
4315 }
4316 case AMDGPU::G_TRUNC: {
4317 Register Dst = MI.getOperand(0).getReg();
4318 Register Src = MI.getOperand(1).getReg();
4319 unsigned Bank = getRegBankID(Src, MRI);
4320 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4321 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4322 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4323 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4324 break;
4325 }
4326 case AMDGPU::G_ZEXT:
4327 case AMDGPU::G_SEXT:
4328 case AMDGPU::G_ANYEXT:
4329 case AMDGPU::G_SEXT_INREG: {
4330 Register Dst = MI.getOperand(0).getReg();
4331 Register Src = MI.getOperand(1).getReg();
4332 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4333 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4334
4335 unsigned DstBank;
4336 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4337 assert(SrcBank);
4338 switch (SrcBank->getID()) {
4339 case AMDGPU::SGPRRegBankID:
4340 DstBank = AMDGPU::SGPRRegBankID;
4341 break;
4342 default:
4343 DstBank = AMDGPU::VGPRRegBankID;
4344 break;
4345 }
4346
4347 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4348 // 32-bits, and then to 64.
4349 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4350 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4351 SrcSize);
4352 break;
4353 }
4354 case AMDGPU::G_IS_FPCLASS: {
4355 Register SrcReg = MI.getOperand(1).getReg();
4356 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4357 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4358 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4359 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4360 break;
4361 }
4362 case AMDGPU::G_STORE: {
4363 assert(MI.getOperand(0).isReg());
4364 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4365
4366 // FIXME: We need to specify a different reg bank once scalar stores are
4367 // supported.
4368 const ValueMapping *ValMapping =
4369 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4370 OpdsMapping[0] = ValMapping;
4371 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4372 break;
4373 }
4374 case AMDGPU::G_ICMP:
4375 case AMDGPU::G_FCMP: {
4376 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4377
4378 // See if the result register has already been constrained to vcc, which may
4379 // happen due to control flow intrinsic lowering.
4380 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4381 AMDGPU::SGPRRegBankID);
4382 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4383 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4384
4385 auto canUseSCCICMP = [&]() {
4386 auto Pred =
4387 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4388 return Size == 32 ||
4389 (Size == 64 &&
4390 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4391 Subtarget.hasScalarCompareEq64());
4392 };
4393 auto canUseSCCFCMP = [&]() {
4394 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4395 };
4396
4397 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4398 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4399 Op2Bank == AMDGPU::SGPRRegBankID &&
4400 Op3Bank == AMDGPU::SGPRRegBankID &&
4401 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4402
4403 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4404 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4405
4406 // TODO: Use 32-bit for scalar output size.
4407 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4408 const unsigned ResultSize = 1;
4409
4410 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4411 OpdsMapping[1] = nullptr; // Predicate Operand.
4412 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4413 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4414 break;
4415 }
4416 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4417 // VGPR index can be used for waterfall when indexing a SGPR vector.
4418 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4419 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4420 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4421 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4422 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4423 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4424
4425 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4426 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4427
4428 // The index can be either if the source vector is VGPR.
4429 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4430 break;
4431 }
4432 case AMDGPU::G_INSERT_VECTOR_ELT: {
4433 unsigned OutputBankID = isSALUMapping(MI) ?
4434 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4435
4436 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4437 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4438 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4439 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4440 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4441
4442 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4443 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4444
4445 // This is a weird case, because we need to break down the mapping based on
4446 // the register bank of a different operand.
4447 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4448 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4449 InsertSize);
4450 } else {
4451 assert(InsertSize == 32 || InsertSize == 64);
4452 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4453 }
4454
4455 // The index can be either if the source vector is VGPR.
4456 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4457 break;
4458 }
4459 case AMDGPU::G_UNMERGE_VALUES: {
4460 unsigned Bank = getMappingType(MRI, MI);
4461
4462 // Op1 and Dst should use the same register bank.
4463 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4464 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4465 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4466 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4467 }
4468 break;
4469 }
4470 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4471 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4472 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4473 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4474 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4475 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4476 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4477 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4478 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4479 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4480 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4481 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4482 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4483 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4484 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4485 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4486 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4487 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4488 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4489 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4490 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4491 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4492 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4493
4494 // rsrc
4495 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4496
4497 // vindex
4498 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4499
4500 // voffset
4501 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4502
4503 // soffset
4504 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4505
4506 // Any remaining operands are immediates and were correctly null
4507 // initialized.
4508 break;
4509 }
4510 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4511 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4512 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4513 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4514 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4515 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4516 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4517 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4518 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4519 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4520 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4521 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4522 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB_CLAMP_U32:
4523 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_COND_SUB_U32:
4524 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4525 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4526 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4527 // vdata_out
4528 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4529
4530 // vdata_in
4531 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4532
4533 // rsrc
4534 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4535
4536 // vindex
4537 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4538
4539 // voffset
4540 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4541
4542 // soffset
4543 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4544
4545 // Any remaining operands are immediates and were correctly null
4546 // initialized.
4547 break;
4548 }
4549 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4550 // vdata_out
4551 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4552
4553 // vdata_in
4554 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4555
4556 // cmp
4557 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4558
4559 // rsrc
4560 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4561
4562 // vindex
4563 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4564
4565 // voffset
4566 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4567
4568 // soffset
4569 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4570
4571 // Any remaining operands are immediates and were correctly null
4572 // initialized.
4573 break;
4574 }
4575 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4576 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4577 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4578 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4579 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4580 // Lie and claim everything is legal, even though some need to be
4581 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4582 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4583 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4584
4585 // We need to convert this to a MUBUF if either the resource of offset is
4586 // VGPR.
4587 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4588 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4589 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4590
4591 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4592 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4593 break;
4594 }
4595 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4596 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4597 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4598 break;
4599 case AMDGPU::G_AMDGPU_SPONENTRY: {
4600 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4601 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4602 break;
4603 }
4604 case AMDGPU::G_INTRINSIC:
4605 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4606 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4607 default:
4609 case Intrinsic::amdgcn_div_fmas:
4610 case Intrinsic::amdgcn_div_fixup:
4611 case Intrinsic::amdgcn_trig_preop:
4612 case Intrinsic::amdgcn_sin:
4613 case Intrinsic::amdgcn_cos:
4614 case Intrinsic::amdgcn_log_clamp:
4615 case Intrinsic::amdgcn_rcp_legacy:
4616 case Intrinsic::amdgcn_rsq_legacy:
4617 case Intrinsic::amdgcn_rsq_clamp:
4618 case Intrinsic::amdgcn_tanh:
4619 case Intrinsic::amdgcn_fmul_legacy:
4620 case Intrinsic::amdgcn_fma_legacy:
4621 case Intrinsic::amdgcn_frexp_mant:
4622 case Intrinsic::amdgcn_frexp_exp:
4623 case Intrinsic::amdgcn_fract:
4624 case Intrinsic::amdgcn_cvt_pknorm_i16:
4625 case Intrinsic::amdgcn_cvt_pknorm_u16:
4626 case Intrinsic::amdgcn_cvt_pk_i16:
4627 case Intrinsic::amdgcn_cvt_pk_u16:
4628 case Intrinsic::amdgcn_cvt_sr_pk_f16_f32:
4629 case Intrinsic::amdgcn_cvt_sr_pk_bf16_f32:
4630 case Intrinsic::amdgcn_cvt_pk_f16_fp8:
4631 case Intrinsic::amdgcn_cvt_pk_f16_bf8:
4632 case Intrinsic::amdgcn_cvt_pk_fp8_f16:
4633 case Intrinsic::amdgcn_cvt_pk_bf8_f16:
4634 case Intrinsic::amdgcn_cvt_sr_fp8_f16:
4635 case Intrinsic::amdgcn_cvt_sr_bf8_f16:
4636 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp8:
4637 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp8:
4638 case Intrinsic::amdgcn_cvt_scale_pk8_f16_bf8:
4639 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_bf8:
4640 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp4:
4641 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp4:
4642 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp8:
4643 case Intrinsic::amdgcn_cvt_scale_pk8_f32_bf8:
4644 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp4:
4645 case Intrinsic::amdgcn_cvt_scale_pk16_f16_fp6:
4646 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_fp6:
4647 case Intrinsic::amdgcn_cvt_scale_pk16_f16_bf6:
4648 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_bf6:
4649 case Intrinsic::amdgcn_cvt_scale_pk16_f32_fp6:
4650 case Intrinsic::amdgcn_cvt_scale_pk16_f32_bf6:
4651 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_bf16:
4652 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_bf16:
4653 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f16:
4654 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f16:
4655 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f32:
4656 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f32:
4657 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f32:
4658 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f16:
4659 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_bf16:
4660 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f32:
4661 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f32:
4662 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f16:
4663 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f16:
4664 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_bf16:
4665 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_bf16:
4666 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_bf16:
4667 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_bf16:
4668 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f16:
4669 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f16:
4670 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f32:
4671 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f32:
4672 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f32:
4673 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f16:
4674 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_bf16:
4675 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f32:
4676 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f32:
4677 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f16:
4678 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f16:
4679 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_bf16:
4680 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_bf16:
4681 case Intrinsic::amdgcn_sat_pk4_i4_i8:
4682 case Intrinsic::amdgcn_sat_pk4_u4_u8:
4683 case Intrinsic::amdgcn_fmed3:
4684 case Intrinsic::amdgcn_cubeid:
4685 case Intrinsic::amdgcn_cubema:
4686 case Intrinsic::amdgcn_cubesc:
4687 case Intrinsic::amdgcn_cubetc:
4688 case Intrinsic::amdgcn_sffbh:
4689 case Intrinsic::amdgcn_fmad_ftz:
4690 case Intrinsic::amdgcn_mbcnt_lo:
4691 case Intrinsic::amdgcn_mbcnt_hi:
4692 case Intrinsic::amdgcn_mul_u24:
4693 case Intrinsic::amdgcn_mul_i24:
4694 case Intrinsic::amdgcn_mulhi_u24:
4695 case Intrinsic::amdgcn_mulhi_i24:
4696 case Intrinsic::amdgcn_lerp:
4697 case Intrinsic::amdgcn_sad_u8:
4698 case Intrinsic::amdgcn_msad_u8:
4699 case Intrinsic::amdgcn_sad_hi_u8:
4700 case Intrinsic::amdgcn_sad_u16:
4701 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4702 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4703 case Intrinsic::amdgcn_mqsad_u32_u8:
4704 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4705 case Intrinsic::amdgcn_alignbyte:
4706 case Intrinsic::amdgcn_perm:
4707 case Intrinsic::amdgcn_prng_b32:
4708 case Intrinsic::amdgcn_fdot2:
4709 case Intrinsic::amdgcn_sdot2:
4710 case Intrinsic::amdgcn_udot2:
4711 case Intrinsic::amdgcn_sdot4:
4712 case Intrinsic::amdgcn_udot4:
4713 case Intrinsic::amdgcn_sdot8:
4714 case Intrinsic::amdgcn_udot8:
4715 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4716 case Intrinsic::amdgcn_fdot2_f16_f16:
4717 case Intrinsic::amdgcn_fdot2_f32_bf16:
4718 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4719 case Intrinsic::amdgcn_sudot4:
4720 case Intrinsic::amdgcn_sudot8:
4721 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4722 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4723 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4724 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4725 case Intrinsic::amdgcn_cvt_f32_fp8:
4726 case Intrinsic::amdgcn_cvt_f32_fp8_e5m3:
4727 case Intrinsic::amdgcn_cvt_f32_bf8:
4728 case Intrinsic::amdgcn_cvt_off_f32_i4:
4729 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4730 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4731 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4732 case Intrinsic::amdgcn_cvt_pk_fp8_f32_e5m3:
4733 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4734 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4735 case Intrinsic::amdgcn_cvt_sr_fp8_f32_e5m3:
4736 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4737 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4738 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4739 case Intrinsic::amdgcn_cvt_f16_fp8:
4740 case Intrinsic::amdgcn_cvt_f16_bf8:
4741 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4742 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4743 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4744 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4745 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4746 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4747 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4748 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4749 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4750 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4751 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4752 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4753 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4754 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4755 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4756 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4757 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4758 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4759 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4760 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4761 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4762 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4763 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4764 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4765 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4766 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4767 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4768 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4769 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4770 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4771 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4772 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4773 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4774 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4775 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4776 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4777 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4778 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4779 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4780 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4781 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4782 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4783 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4784 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4785 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4786 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4787 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4788 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4789 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4790 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4791 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4792 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4793 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4794 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4795 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4796 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4797 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4798 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4799 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4800 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4801 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4802 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4803 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4804 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4805 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4806 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4807 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4808 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4809 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4810 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4811 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4812 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4813 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4814 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4815 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4816 case Intrinsic::amdgcn_wmma_f32_16x16x4_f32:
4817 case Intrinsic::amdgcn_wmma_f32_16x16x32_bf16:
4818 case Intrinsic::amdgcn_wmma_f32_16x16x32_f16:
4819 case Intrinsic::amdgcn_wmma_f16_16x16x32_f16:
4820 case Intrinsic::amdgcn_wmma_bf16_16x16x32_bf16:
4821 case Intrinsic::amdgcn_wmma_bf16f32_16x16x32_bf16:
4822 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_fp8:
4823 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_bf8:
4824 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_fp8:
4825 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_bf8:
4826 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_fp8:
4827 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_bf8:
4828 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_fp8:
4829 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_bf8:
4830 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_fp8:
4831 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_bf8:
4832 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_fp8:
4833 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_bf8:
4834 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_fp8:
4835 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_bf8:
4836 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_fp8:
4837 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_bf8:
4838 case Intrinsic::amdgcn_wmma_i32_16x16x64_iu8:
4839 case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4:
4840 case Intrinsic::amdgcn_wmma_scale_f32_16x16x128_f8f6f4:
4841 case Intrinsic::amdgcn_wmma_scale16_f32_16x16x128_f8f6f4:
4842 case Intrinsic::amdgcn_wmma_f32_32x16x128_f4:
4843 case Intrinsic::amdgcn_wmma_scale_f32_32x16x128_f4:
4844 case Intrinsic::amdgcn_wmma_scale16_f32_32x16x128_f4:
4845 case Intrinsic::amdgcn_swmmac_f16_16x16x64_f16:
4846 case Intrinsic::amdgcn_swmmac_bf16_16x16x64_bf16:
4847 case Intrinsic::amdgcn_swmmac_f32_16x16x64_bf16:
4848 case Intrinsic::amdgcn_swmmac_bf16f32_16x16x64_bf16:
4849 case Intrinsic::amdgcn_swmmac_f32_16x16x64_f16:
4850 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_fp8:
4851 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_bf8:
4852 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_fp8:
4853 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_bf8:
4854 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_fp8:
4855 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_bf8:
4856 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_fp8:
4857 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_bf8:
4858 case Intrinsic::amdgcn_swmmac_i32_16x16x128_iu8:
4859 case Intrinsic::amdgcn_perm_pk16_b4_u4:
4860 case Intrinsic::amdgcn_perm_pk16_b6_u4:
4861 case Intrinsic::amdgcn_perm_pk16_b8_u4:
4862 case Intrinsic::amdgcn_add_max_i32:
4863 case Intrinsic::amdgcn_add_max_u32:
4864 case Intrinsic::amdgcn_add_min_i32:
4865 case Intrinsic::amdgcn_add_min_u32:
4866 case Intrinsic::amdgcn_pk_add_max_i16:
4867 case Intrinsic::amdgcn_pk_add_max_u16:
4868 case Intrinsic::amdgcn_pk_add_min_i16:
4869 case Intrinsic::amdgcn_pk_add_min_u16:
4870 return getDefaultMappingVOP(MI);
4871 case Intrinsic::amdgcn_log:
4872 case Intrinsic::amdgcn_exp2:
4873 case Intrinsic::amdgcn_rcp:
4874 case Intrinsic::amdgcn_rsq:
4875 case Intrinsic::amdgcn_sqrt: {
4876 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4877 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4879 return getDefaultMappingSOP(MI);
4880 return getDefaultMappingVOP(MI);
4881 }
4882 case Intrinsic::amdgcn_sbfe:
4883 case Intrinsic::amdgcn_ubfe:
4884 if (isSALUMapping(MI))
4885 return getDefaultMappingSOP(MI);
4886 return getDefaultMappingVOP(MI);
4887 case Intrinsic::amdgcn_ds_swizzle:
4888 case Intrinsic::amdgcn_ds_permute:
4889 case Intrinsic::amdgcn_ds_bpermute:
4890 case Intrinsic::amdgcn_update_dpp:
4891 case Intrinsic::amdgcn_mov_dpp8:
4892 case Intrinsic::amdgcn_mov_dpp:
4893 case Intrinsic::amdgcn_strict_wwm:
4894 case Intrinsic::amdgcn_wwm:
4895 case Intrinsic::amdgcn_strict_wqm:
4896 case Intrinsic::amdgcn_wqm:
4897 case Intrinsic::amdgcn_softwqm:
4898 case Intrinsic::amdgcn_set_inactive:
4899 case Intrinsic::amdgcn_set_inactive_chain_arg:
4900 case Intrinsic::amdgcn_permlane64:
4901 case Intrinsic::amdgcn_ds_bpermute_fi_b32:
4903 case Intrinsic::amdgcn_cvt_pkrtz:
4904 if (Subtarget.hasSALUFloatInsts() && isSALUMapping(MI))
4905 return getDefaultMappingSOP(MI);
4906 return getDefaultMappingVOP(MI);
4907 case Intrinsic::amdgcn_kernarg_segment_ptr:
4908 case Intrinsic::amdgcn_s_getpc:
4909 case Intrinsic::amdgcn_groupstaticsize:
4910 case Intrinsic::amdgcn_reloc_constant:
4911 case Intrinsic::returnaddress: {
4912 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4913 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4914 break;
4915 }
4916 case Intrinsic::amdgcn_wqm_vote: {
4917 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4918 OpdsMapping[0] = OpdsMapping[2]
4919 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4920 break;
4921 }
4922 case Intrinsic::amdgcn_ps_live: {
4923 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4924 break;
4925 }
4926 case Intrinsic::amdgcn_div_scale: {
4927 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4928 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4929 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4930 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4931
4932 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4933 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4934 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4935 break;
4936 }
4937 case Intrinsic::amdgcn_class: {
4938 Register Src0Reg = MI.getOperand(2).getReg();
4939 Register Src1Reg = MI.getOperand(3).getReg();
4940 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4941 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4942 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4943 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4944 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4945 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4946 break;
4947 }
4948 case Intrinsic::amdgcn_icmp:
4949 case Intrinsic::amdgcn_fcmp: {
4950 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4951 // This is not VCCRegBank because this is not used in boolean contexts.
4952 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4953 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4954 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4955 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4956 break;
4957 }
4958 case Intrinsic::amdgcn_readlane: {
4959 // This must be an SGPR, but accept a VGPR.
4960 Register IdxReg = MI.getOperand(3).getReg();
4961 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4962 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4963 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4964 [[fallthrough]];
4965 }
4966 case Intrinsic::amdgcn_readfirstlane: {
4967 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4968 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4969 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4970 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4971 break;
4972 }
4973 case Intrinsic::amdgcn_writelane: {
4974 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4975 Register SrcReg = MI.getOperand(2).getReg();
4976 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4977 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4978 Register IdxReg = MI.getOperand(3).getReg();
4979 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4980 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4981 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4982
4983 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4984 // to legalize.
4985 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4986 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4987 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4988 break;
4989 }
4990 case Intrinsic::amdgcn_if_break: {
4991 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4992 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4993 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4994 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4995 break;
4996 }
4997 case Intrinsic::amdgcn_permlane16:
4998 case Intrinsic::amdgcn_permlanex16: {
4999 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5000 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5001 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5002 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5003 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5004 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5005 break;
5006 }
5007 case Intrinsic::amdgcn_permlane_bcast:
5008 case Intrinsic::amdgcn_permlane_up:
5009 case Intrinsic::amdgcn_permlane_down:
5010 case Intrinsic::amdgcn_permlane_xor: {
5011 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5012 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5013 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5014 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5015 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5016 break;
5017 }
5018 case Intrinsic::amdgcn_permlane_idx_gen: {
5019 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5020 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5021 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5022 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5023 break;
5024 }
5025 case Intrinsic::amdgcn_permlane16_var:
5026 case Intrinsic::amdgcn_permlanex16_var: {
5027 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5028 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5029 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5030 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5031 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5032 break;
5033 }
5034 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
5035 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
5036 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
5037 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
5038 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
5039 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
5040 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
5041 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
5042 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
5043 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
5044 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
5045 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
5046 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
5047 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
5048 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
5049 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
5050 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
5051 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
5052 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
5053 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
5054 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
5055 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
5056 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
5057 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
5058 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
5059 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
5060 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
5061 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
5062 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
5063 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
5064 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
5065 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
5066 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
5067 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
5068 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
5069 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
5070 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
5071 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
5072 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
5073 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
5074 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
5075 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
5076 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
5077 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
5078 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5079 unsigned MinNumRegsRequired = DstSize / 32;
5080
5081 // Default for MAI intrinsics.
5082 // srcC can also be an immediate which can be folded later.
5083 // FIXME: Should we eventually add an alternative mapping with AGPR src
5084 // for srcA/srcB?
5085 //
5086 // vdst, srcA, srcB, srcC
5088
5089 bool UseAGPRForm = !Subtarget.hasGFX90AInsts() ||
5090 Info->selectAGPRFormMFMA(MinNumRegsRequired);
5091
5092 OpdsMapping[0] =
5093 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5094 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5095 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5096 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5097 OpdsMapping[4] =
5098 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5099 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5100 break;
5101 }
5102 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
5103 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
5104 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5105 unsigned MinNumRegsRequired = DstSize / 32;
5106
5108 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5109
5110 OpdsMapping[0] =
5111 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5112 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5113
5114 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5115 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5116 OpdsMapping[4] =
5117 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5118 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5119
5120 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5121 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
5122 break;
5123 }
5124 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
5125 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
5126 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
5127 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
5128 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
5129 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
5130 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
5131 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
5132 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
5133 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
5134 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
5135 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
5136 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
5137 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
5138 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
5139 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
5140 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
5141 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
5142 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
5143 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
5144 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
5145 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
5146 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
5147 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
5148 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
5149 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
5150 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
5151 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
5152 Register DstReg = MI.getOperand(0).getReg();
5153 unsigned DstSize = MRI.getType(DstReg).getSizeInBits();
5154 unsigned MinNumRegsRequired = DstSize / 32;
5156 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5157
5158 // vdst, srcA, srcB, srcC, idx
5159 OpdsMapping[0] = UseAGPRForm ? getAGPROpMapping(DstReg, MRI, *TRI)
5160 : getVGPROpMapping(DstReg, MRI, *TRI);
5161
5162 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5163 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5164 OpdsMapping[4] =
5165 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5166 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5167 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5168 break;
5169 }
5170 case Intrinsic::amdgcn_interp_p1:
5171 case Intrinsic::amdgcn_interp_p2:
5172 case Intrinsic::amdgcn_interp_mov:
5173 case Intrinsic::amdgcn_interp_p1_f16:
5174 case Intrinsic::amdgcn_interp_p2_f16:
5175 case Intrinsic::amdgcn_lds_param_load: {
5176 const int M0Idx = MI.getNumOperands() - 1;
5177 Register M0Reg = MI.getOperand(M0Idx).getReg();
5178 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5179 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5180
5181 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5182 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5183 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5184
5185 // Must be SGPR, but we must take whatever the original bank is and fix it
5186 // later.
5187 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5188 break;
5189 }
5190 case Intrinsic::amdgcn_interp_inreg_p10:
5191 case Intrinsic::amdgcn_interp_inreg_p2:
5192 case Intrinsic::amdgcn_interp_inreg_p10_f16:
5193 case Intrinsic::amdgcn_interp_inreg_p2_f16:
5194 case Intrinsic::amdgcn_interp_p10_rtz_f16:
5195 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
5196 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5197 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5198 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5199 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5200 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5201 break;
5202 }
5203 case Intrinsic::amdgcn_permlane16_swap:
5204 case Intrinsic::amdgcn_permlane32_swap: {
5205 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5206 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
5207 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5208 break;
5209 }
5210 case Intrinsic::amdgcn_ballot: {
5211 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5212 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5213 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5214 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
5215 break;
5216 }
5217 case Intrinsic::amdgcn_inverse_ballot: {
5218 // This must be an SGPR, but accept a VGPR.
5219 Register MaskReg = MI.getOperand(2).getReg();
5220 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5221 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5222 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5223 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5224 break;
5225 }
5226 case Intrinsic::amdgcn_bitop3: {
5227 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5228 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5229 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5230 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5231 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5232 break;
5233 }
5234 case Intrinsic::amdgcn_s_quadmask:
5235 case Intrinsic::amdgcn_s_wqm: {
5236 Register MaskReg = MI.getOperand(2).getReg();
5237 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5238 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5239 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
5240 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5241 break;
5242 }
5243 case Intrinsic::amdgcn_wave_reduce_add:
5244 case Intrinsic::amdgcn_wave_reduce_fadd:
5245 case Intrinsic::amdgcn_wave_reduce_sub:
5246 case Intrinsic::amdgcn_wave_reduce_fsub:
5247 case Intrinsic::amdgcn_wave_reduce_min:
5248 case Intrinsic::amdgcn_wave_reduce_umin:
5249 case Intrinsic::amdgcn_wave_reduce_fmin:
5250 case Intrinsic::amdgcn_wave_reduce_max:
5251 case Intrinsic::amdgcn_wave_reduce_umax:
5252 case Intrinsic::amdgcn_wave_reduce_fmax:
5253 case Intrinsic::amdgcn_wave_reduce_and:
5254 case Intrinsic::amdgcn_wave_reduce_or:
5255 case Intrinsic::amdgcn_wave_reduce_xor: {
5256 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5257 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5258 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5259 auto regBankID =
5260 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5261 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
5262 break;
5263 }
5264 case Intrinsic::amdgcn_s_bitreplicate: {
5265 Register MaskReg = MI.getOperand(2).getReg();
5266 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5267 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5268 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
5269 break;
5270 }
5271 case Intrinsic::amdgcn_wave_shuffle: {
5272 unsigned OpSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5273 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5274 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5275 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
5276 break;
5277 }
5278 }
5279 break;
5280 }
5281 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
5282 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5283 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5284 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5285 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5286 auto IntrID = AMDGPU::getIntrinsicID(MI);
5287 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5288 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5289 // Non-images can have complications from operands that allow both SGPR
5290 // and VGPR. For now it's too complicated to figure out the final opcode
5291 // to derive the register bank from the MCInstrDesc.
5292 assert(RSrcIntrin->IsImage);
5293 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5294 }
5295 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
5296 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
5297 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
5298 bool IsDualOrBVH8 =
5299 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
5300 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
5301 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
5302 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
5303 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5304 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5305 if (IsDualOrBVH8) {
5306 OpdsMapping[1] = AMDGPU::getValueMapping(
5307 AMDGPU::VGPRRegBankID,
5308 MRI.getType(MI.getOperand(1).getReg()).getSizeInBits());
5309 OpdsMapping[2] = AMDGPU::getValueMapping(
5310 AMDGPU::VGPRRegBankID,
5311 MRI.getType(MI.getOperand(2).getReg()).getSizeInBits());
5312 }
5313 OpdsMapping[LastRegOpIdx] =
5314 getSGPROpMapping(MI.getOperand(LastRegOpIdx).getReg(), MRI, *TRI);
5315 if (LastRegOpIdx == 3) {
5316 // Sequential form: all operands combined into VGPR256/VGPR512
5317 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5318 if (Size > 256)
5319 Size = 512;
5320 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5321 } else {
5322 // NSA form
5323 unsigned FirstSrcOpIdx = IsDualOrBVH8 ? 4 : 2;
5324 for (unsigned I = FirstSrcOpIdx; I < LastRegOpIdx; ++I) {
5325 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5326 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5327 }
5328 }
5329 break;
5330 }
5331 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5332 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5333 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5334 switch (IntrID) {
5335 case Intrinsic::amdgcn_s_getreg:
5336 case Intrinsic::amdgcn_s_memtime:
5337 case Intrinsic::amdgcn_s_memrealtime:
5338 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5339 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5340 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5341 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5342 break;
5343 }
5344 case Intrinsic::amdgcn_global_atomic_fmin_num:
5345 case Intrinsic::amdgcn_global_atomic_fmax_num:
5346 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5347 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5348 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5349 case Intrinsic::amdgcn_global_load_tr_b64:
5350 case Intrinsic::amdgcn_global_load_tr_b128:
5351 case Intrinsic::amdgcn_global_load_tr4_b64:
5352 case Intrinsic::amdgcn_global_load_tr6_b96:
5353 case Intrinsic::amdgcn_ds_load_tr8_b64:
5354 case Intrinsic::amdgcn_ds_load_tr16_b128:
5355 case Intrinsic::amdgcn_ds_load_tr4_b64:
5356 case Intrinsic::amdgcn_ds_load_tr6_b96:
5357 case Intrinsic::amdgcn_ds_read_tr4_b64:
5358 case Intrinsic::amdgcn_ds_read_tr6_b96:
5359 case Intrinsic::amdgcn_ds_read_tr8_b64:
5360 case Intrinsic::amdgcn_ds_read_tr16_b64:
5361 case Intrinsic::amdgcn_ds_atomic_async_barrier_arrive_b64:
5362 case Intrinsic::amdgcn_ds_atomic_barrier_arrive_rtn_b64:
5364 case Intrinsic::amdgcn_ds_ordered_add:
5365 case Intrinsic::amdgcn_ds_ordered_swap: {
5366 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5367 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5368 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5369 AMDGPU::SGPRRegBankID);
5370 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5371 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5372 break;
5373 }
5374 case Intrinsic::amdgcn_ds_append:
5375 case Intrinsic::amdgcn_ds_consume: {
5376 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5377 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5378 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5379 break;
5380 }
5381 case Intrinsic::amdgcn_exp_compr:
5382 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5383 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5384 break;
5385 case Intrinsic::amdgcn_exp:
5386 // FIXME: Could we support packed types here?
5387 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5388 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5389 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5390 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5391 break;
5392 case Intrinsic::amdgcn_exp_row:
5393 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5394 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5395 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5396 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5397 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5398 break;
5399 case Intrinsic::amdgcn_s_alloc_vgpr:
5400 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1);
5401 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
5402 break;
5403 case Intrinsic::amdgcn_s_sendmsg:
5404 case Intrinsic::amdgcn_s_sendmsghalt: {
5405 // This must be an SGPR, but accept a VGPR.
5406 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5407 AMDGPU::SGPRRegBankID);
5408 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5409 break;
5410 }
5411 case Intrinsic::amdgcn_s_setreg: {
5412 // This must be an SGPR, but accept a VGPR.
5413 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5414 AMDGPU::SGPRRegBankID);
5415 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5416 break;
5417 }
5418 case Intrinsic::amdgcn_s_ttracedata: {
5419 // This must be an SGPR, but accept a VGPR.
5420 unsigned Bank =
5421 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5422 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5423 break;
5424 }
5425 case Intrinsic::amdgcn_end_cf: {
5426 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5427 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5428 break;
5429 }
5430 case Intrinsic::amdgcn_else: {
5431 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5432 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5433 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5434 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5435 break;
5436 }
5437 case Intrinsic::amdgcn_init_whole_wave:
5438 case Intrinsic::amdgcn_live_mask: {
5439 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5440 break;
5441 }
5442 case Intrinsic::amdgcn_wqm_demote:
5443 case Intrinsic::amdgcn_kill: {
5444 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5445 break;
5446 }
5447 case Intrinsic::amdgcn_raw_buffer_load:
5448 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5449 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5450 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5451 case Intrinsic::amdgcn_raw_tbuffer_load:
5452 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5453 // FIXME: Should make intrinsic ID the last operand of the instruction,
5454 // then this would be the same as store
5455 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5456 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5457 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5458 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5459 break;
5460 }
5461 case Intrinsic::amdgcn_raw_buffer_load_lds:
5462 case Intrinsic::amdgcn_raw_buffer_load_async_lds:
5463 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds:
5464 case Intrinsic::amdgcn_raw_ptr_buffer_load_async_lds: {
5465 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5466 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5467 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5468 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5469 break;
5470 }
5471 case Intrinsic::amdgcn_raw_buffer_store:
5472 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5473 case Intrinsic::amdgcn_raw_buffer_store_format:
5474 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5475 case Intrinsic::amdgcn_raw_tbuffer_store:
5476 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5477 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5478 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5479 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5480 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5481 break;
5482 }
5483 case Intrinsic::amdgcn_struct_buffer_load:
5484 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5485 case Intrinsic::amdgcn_struct_tbuffer_load:
5486 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5487 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5488 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5489 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5490 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5491 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5492 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5493 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5494 break;
5495 }
5496 case Intrinsic::amdgcn_struct_buffer_load_lds:
5497 case Intrinsic::amdgcn_struct_buffer_load_async_lds:
5498 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds:
5499 case Intrinsic::amdgcn_struct_ptr_buffer_load_async_lds: {
5500 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5501 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5502 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5503 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5504 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5505 break;
5506 }
5507 case Intrinsic::amdgcn_struct_buffer_store:
5508 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5509 case Intrinsic::amdgcn_struct_tbuffer_store:
5510 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5511 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5512 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5513 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5514 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5515 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5516 break;
5517 }
5518 case Intrinsic::amdgcn_init_exec_from_input: {
5519 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5520 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5521 break;
5522 }
5523 case Intrinsic::amdgcn_ds_gws_init:
5524 case Intrinsic::amdgcn_ds_gws_barrier:
5525 case Intrinsic::amdgcn_ds_gws_sema_br: {
5526 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5527
5528 // This must be an SGPR, but accept a VGPR.
5529 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5530 AMDGPU::SGPRRegBankID);
5531 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5532 break;
5533 }
5534 case Intrinsic::amdgcn_ds_gws_sema_v:
5535 case Intrinsic::amdgcn_ds_gws_sema_p:
5536 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5537 // This must be an SGPR, but accept a VGPR.
5538 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5539 AMDGPU::SGPRRegBankID);
5540 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5541 break;
5542 }
5543 case Intrinsic::amdgcn_cluster_load_b32:
5544 case Intrinsic::amdgcn_cluster_load_b64:
5545 case Intrinsic::amdgcn_cluster_load_b128: {
5546 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5547 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5548 unsigned M0Bank =
5549 getRegBankID(MI.getOperand(4).getReg(), MRI, AMDGPU::SGPRRegBankID);
5550 OpdsMapping[4] = AMDGPU::getValueMapping(M0Bank, 32);
5551 break;
5552 }
5553 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
5554 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
5555 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
5556 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
5557 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5558 // LDS address goes into $vdst (VGPR).
5559 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5560 unsigned M0Bank =
5561 getRegBankID(MI.getOperand(5).getReg(), MRI, AMDGPU::SGPRRegBankID);
5562 OpdsMapping[5] = AMDGPU::getValueMapping(M0Bank, 32);
5563 break;
5564 }
5565 case Intrinsic::amdgcn_global_store_async_from_lds_b8:
5566 case Intrinsic::amdgcn_global_store_async_from_lds_b32:
5567 case Intrinsic::amdgcn_global_store_async_from_lds_b64:
5568 case Intrinsic::amdgcn_global_store_async_from_lds_b128:
5569 case Intrinsic::amdgcn_global_load_async_to_lds_b8:
5570 case Intrinsic::amdgcn_global_load_async_to_lds_b32:
5571 case Intrinsic::amdgcn_global_load_async_to_lds_b64:
5572 case Intrinsic::amdgcn_global_load_async_to_lds_b128: {
5573 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5574 // LDS address goes into $vdst/$vdata (VGPR).
5575 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5576 break;
5577 }
5578 case Intrinsic::amdgcn_load_to_lds:
5579 case Intrinsic::amdgcn_load_async_to_lds:
5580 case Intrinsic::amdgcn_global_load_lds:
5581 case Intrinsic::amdgcn_global_load_async_lds: {
5582 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5583 // LDS address goes into M0 (SGPR).
5584 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5585 break;
5586 }
5587 case Intrinsic::amdgcn_lds_direct_load: {
5588 const int M0Idx = MI.getNumOperands() - 1;
5589 Register M0Reg = MI.getOperand(M0Idx).getReg();
5590 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5591 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5592
5593 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5594 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5595 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5596
5597 // Must be SGPR, but we must take whatever the original bank is and fix it
5598 // later.
5599 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5600 break;
5601 }
5602 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5603 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5604 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5605 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5606 break;
5607 case Intrinsic::amdgcn_ds_bvh_stack_rtn:
5608 case Intrinsic::amdgcn_ds_bvh_stack_push4_pop1_rtn:
5609 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop1_rtn:
5610 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop2_rtn: {
5611 OpdsMapping[0] =
5612 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5613 OpdsMapping[1] =
5614 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5615 OpdsMapping[3] =
5616 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5617 OpdsMapping[4] =
5618 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5619 OpdsMapping[5] =
5620 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5621 break;
5622 }
5623 case Intrinsic::amdgcn_s_sleep_var:
5624 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5625 break;
5626 case Intrinsic::amdgcn_s_barrier_join:
5627 case Intrinsic::amdgcn_s_wakeup_barrier:
5628 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5629 break;
5630 case Intrinsic::amdgcn_s_barrier_init:
5631 case Intrinsic::amdgcn_s_barrier_signal_var:
5632 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5633 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5634 break;
5635 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5636 const unsigned ResultSize = 1;
5637 OpdsMapping[0] =
5638 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5639 break;
5640 }
5641 case Intrinsic::amdgcn_s_get_barrier_state:
5642 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5643 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5644 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5645 break;
5646 }
5647 case Intrinsic::amdgcn_pops_exiting_wave_id:
5648 return getDefaultMappingSOP(MI);
5649 case Intrinsic::amdgcn_tensor_load_to_lds:
5650 case Intrinsic::amdgcn_tensor_store_from_lds: {
5651 // Lie and claim everything is legal, even all operands need to be
5652 // SGPRs. applyMapping will have to deal with it with readfirstlane.
5653 for (unsigned I = 1; I < MI.getNumOperands(); ++I) {
5654 if (MI.getOperand(I).isReg()) {
5655 Register Reg = MI.getOperand(I).getReg();
5656 auto OpBank = getRegBankID(Reg, MRI);
5657 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5658 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5659 }
5660 }
5661 break;
5662 }
5663 case Intrinsic::amdgcn_s_prefetch_data: {
5664 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5665 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5666 break;
5667 }
5668 case Intrinsic::amdgcn_flat_prefetch:
5669 case Intrinsic::amdgcn_global_prefetch:
5670 return getDefaultMappingVOP(MI);
5671 default:
5673 }
5674 break;
5675 }
5676 case AMDGPU::G_SELECT: {
5677 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5678 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5679 AMDGPU::SGPRRegBankID);
5680 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5681 AMDGPU::SGPRRegBankID);
5682 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5683 Op3Bank == AMDGPU::SGPRRegBankID;
5684
5685 unsigned CondBankDefault = SGPRSrcs ?
5686 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5687 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5688 CondBankDefault);
5689 if (CondBank == AMDGPU::SGPRRegBankID)
5690 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5691 else if (CondBank == AMDGPU::VGPRRegBankID)
5692 CondBank = AMDGPU::VCCRegBankID;
5693
5694 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5695 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5696
5697 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5698
5699 // TODO: Should report 32-bit for scalar condition type.
5700 if (Size == 64) {
5701 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5702 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5703 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5704 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5705 } else {
5706 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5707 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5708 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5709 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5710 }
5711
5712 break;
5713 }
5714
5715 case AMDGPU::G_SI_CALL: {
5716 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5717 // Lie and claim everything is legal, even though some need to be
5718 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5719 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5720
5721 // Allow anything for implicit arguments
5722 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5723 if (MI.getOperand(I).isReg()) {
5724 Register Reg = MI.getOperand(I).getReg();
5725 auto OpBank = getRegBankID(Reg, MRI);
5726 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5727 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5728 }
5729 }
5730 break;
5731 }
5732 case AMDGPU::G_LOAD:
5733 case AMDGPU::G_ZEXTLOAD:
5734 case AMDGPU::G_SEXTLOAD:
5735 return getInstrMappingForLoad(MI);
5736
5737 case AMDGPU::G_ATOMICRMW_XCHG:
5738 case AMDGPU::G_ATOMICRMW_ADD:
5739 case AMDGPU::G_ATOMICRMW_SUB:
5740 case AMDGPU::G_ATOMICRMW_AND:
5741 case AMDGPU::G_ATOMICRMW_OR:
5742 case AMDGPU::G_ATOMICRMW_XOR:
5743 case AMDGPU::G_ATOMICRMW_MAX:
5744 case AMDGPU::G_ATOMICRMW_MIN:
5745 case AMDGPU::G_ATOMICRMW_UMAX:
5746 case AMDGPU::G_ATOMICRMW_UMIN:
5747 case AMDGPU::G_ATOMICRMW_FADD:
5748 case AMDGPU::G_ATOMICRMW_FMIN:
5749 case AMDGPU::G_ATOMICRMW_FMAX:
5750 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5751 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5752 case AMDGPU::G_ATOMICRMW_USUB_COND:
5753 case AMDGPU::G_ATOMICRMW_USUB_SAT:
5754 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5755 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5756 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5757 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5758 break;
5759 }
5760 case AMDGPU::G_ATOMIC_CMPXCHG: {
5761 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5762 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5763 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5764 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5765 break;
5766 }
5767 case AMDGPU::G_BRCOND: {
5768 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5769 AMDGPU::SGPRRegBankID);
5770 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5771 if (Bank != AMDGPU::SGPRRegBankID)
5772 Bank = AMDGPU::VCCRegBankID;
5773
5774 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5775 break;
5776 }
5777 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5778 return getDefaultMappingVOP(MI);
5779 case AMDGPU::G_PREFETCH:
5780 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5781 break;
5782 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_SETUP:
5783 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_RETURN:
5784 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5785 break;
5786 case AMDGPU::G_AMDGPU_FLAT_LOAD_MONITOR:
5787 case AMDGPU::G_AMDGPU_GLOBAL_LOAD_MONITOR: {
5788 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5789 unsigned PtrSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5790 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5791 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
5792 break;
5793 }
5794 }
5795
5796 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5797 getOperandsMapping(OpdsMapping),
5798 MI.getNumOperands());
5799}
static unsigned getIntrinsicID(const SDNode *N)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
constexpr LLT S16
constexpr LLT S1
constexpr LLT S32
constexpr LLT S64
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
#define X(NUM, ENUM, NAME)
Definition ELF.h:851
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
IRTranslator LLVM IR MI
const size_t AbstractManglingParser< Derived, Alloc >::NumOps
const AbstractManglingParser< Derived, Alloc >::OperatorInfo AbstractManglingParser< Derived, Alloc >::Ops[]
#define I(x, y, z)
Definition MD5.cpp:57
Contains matchers for matching SSA Machine Instructions.
This file declares the MachineIRBuilder class.
Register Reg
Promote Memory to Register
Definition Mem2Reg.cpp:110
static bool isReg(const MCInst &MI, unsigned OpNo)
MachineInstr unsigned OpIdx
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static constexpr MCPhysReg SPReg
Interface definition for SIRegisterInfo.
static TableGen::Emitter::Opt Y("gen-skeleton-entry", EmitSkeleton, "Generate example skeleton entry")
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
static const LaneMaskConstants & get(const GCNSubtarget &ST)
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_NE
not equal
Definition InstrTypes.h:698
A debug info location.
Definition DebugLoc.h:123
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:178
iterator end()
Definition DenseMap.h:81
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition DenseMap.h:241
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition TypeSize.h:309
Abstract class that contains various methods for clients to notify about changes.
constexpr unsigned getScalarSizeInBits() const
constexpr bool isScalar() const
LLT getScalarType() const
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
constexpr bool isVector() const
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
LLT divide(int Factor) const
Return a type that is Factor times smaller.
constexpr unsigned getAddressSpace() const
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
LLVM_ABI void widenScalarSrc(MachineInstr &MI, LLT WideTy, unsigned OpIdx, unsigned ExtOpcode)
Legalize a single operand OpIdx of the machine instruction MI as a Use by extending the operand's typ...
LLVM_ABI LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LLVM_ABI LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LLVM_ABI LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LLVM_ABI LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LLVM_ABI LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
LLVM_ABI void widenScalarDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx=0, unsigned TruncOpcode=TargetOpcode::G_TRUNC)
Legalize a single operand OpIdx of the machine instruction MI as a Def by extending the operand's typ...
TypeSize getValue() const
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
const MachineBasicBlock * getParent() const
const MachineOperand & getOperand(unsigned i) const
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
LLVM_ABI Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
LLVM_ABI void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
const RegClassOrRegBank & getRegClassOrRegBank(Register Reg) const
Return the register bank or register class of Reg.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLT getType(Register Reg) const
Get the low-level type of Reg or LLT{} if Reg is not a generic (target independent) virtual register.
const RegisterBank * getRegBankOrNull(Register Reg) const
Return the register bank of Reg, or null if Reg has not been assigned a register bank or has been ass...
LLVM_ABI void setRegBank(Register Reg, const RegisterBank &RegBank)
Set the register bank to RegBank for Reg.
LLVM_ABI void setType(Register VReg, LLT Ty)
Set the low-level type of VReg to Ty.
LLVM_ABI void setRegClass(Register Reg, const TargetRegisterClass *RC)
setRegClass - Set the register class of the specified virtual register.
LLVM_ABI Register createGenericVirtualRegister(LLT Ty, StringRef Name="")
Create and return a new generic virtual register with low-level type Ty.
void setSimpleHint(Register VReg, Register PrefReg)
Specify the preferred (target independent) register allocation hint for the specified virtual registe...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
SmallVector< const InstructionMapping *, 4 > InstructionMappings
Convenient type to represent the alternatives for mapping an instruction.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
unsigned getID() const
Get the identifier of this register bank.
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition Register.h:79
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
bool selectAGPRFormMFMA(unsigned NumRegs) const
Return true if an MFMA that requires at least NumRegs should select to the AGPR form,...
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition SmallSet.h:134
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition SmallSet.h:176
bool empty() const
Definition SmallSet.h:169
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition SmallSet.h:184
void resize(size_type N)
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:313
self_iterator getIterator()
Definition ilist_node.h:123
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isUniformMMO(const MachineMemOperand *MMO)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelValueTracking *ValueTracking=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
operand_type_match m_Reg()
SpecificConstantMatch m_ZeroInt()
Convenience matchers for specific integer values.
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
SpecificConstantOrSplatMatch m_SpecificICstOrSplat(const APInt &RequestedValue)
Matches a RequestedValue constant or a constant splat of RequestedValue.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
LLVM_ABI MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition Utils.cpp:652
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
@ Kill
The last use of a register.
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
LLVM_ABI void constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition Utils.cpp:155
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition Utils.cpp:313
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition SIInstrInfo.h:44
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
@ Add
Sum of integers.
DWARFExpression::Operation Op
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition Threading.h:86
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
LLVM_ABI std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition Utils.cpp:432
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition Alignment.h:100
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition Alignment.h:197
LLVM_ABI Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition Utils.cpp:500
constexpr T maskTrailingOnes(unsigned N)
Create a bitmask with the N right-most bits set to 1, and all other bits set to 0.
Definition MathExtras.h:77
@ Default
The result value is uniform if and only if all operands are uniform.
Definition Uniformity.h:20
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition Threading.h:67