LLVM 22.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "AMDGPULaneMaskUtils.h"
77#include "GCNSubtarget.h"
79#include "SIRegisterInfo.h"
85#include "llvm/IR/IntrinsicsAMDGPU.h"
86
87#define GET_TARGET_REGBANK_IMPL
88#include "AMDGPUGenRegisterBank.inc"
89
90// This file will be TableGen'ed at some point.
91#include "AMDGPUGenRegisterBankInfo.def"
92
93using namespace llvm;
94using namespace MIPatternMatch;
95
96namespace {
97
98// Observer to apply a register bank to new registers created by LegalizerHelper.
99class ApplyRegBankMapping final : public GISelChangeObserver {
100private:
102 const AMDGPURegisterBankInfo &RBI;
104 const RegisterBank *NewBank;
106
107public:
108 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
109 MachineRegisterInfo &MRI_, const RegisterBank *RB)
110 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
111 assert(!B.isObservingChanges());
112 B.setChangeObserver(*this);
113 }
114
115 ~ApplyRegBankMapping() override {
116 for (MachineInstr *MI : NewInsts)
117 applyBank(*MI);
118
119 B.stopObservingChanges();
120 }
121
122 /// Set any registers that don't have a set register class or bank to SALU.
123 void applyBank(MachineInstr &MI) {
124 const unsigned Opc = MI.getOpcode();
125 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
126 Opc == AMDGPU::G_SEXT) {
127 // LegalizerHelper wants to use the basic legalization artifacts when
128 // widening etc. We don't handle selection with vcc in artifact sources,
129 // so we need to use a select instead to handle these properly.
130 Register DstReg = MI.getOperand(0).getReg();
131 Register SrcReg = MI.getOperand(1).getReg();
132 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
133 if (SrcBank == &AMDGPU::VCCRegBank) {
134 const LLT S32 = LLT::scalar(32);
135 assert(MRI.getType(SrcReg) == LLT::scalar(1));
136 assert(MRI.getType(DstReg) == S32);
137 assert(NewBank == &AMDGPU::VGPRRegBank);
138
139 // Replace the extension with a select, which really uses the boolean
140 // source.
141 B.setInsertPt(*MI.getParent(), MI);
142
143 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
144 auto False = B.buildConstant(S32, 0);
145 B.buildSelect(DstReg, SrcReg, True, False);
146 MRI.setRegBank(True.getReg(0), *NewBank);
147 MRI.setRegBank(False.getReg(0), *NewBank);
148 MI.eraseFromParent();
149 }
150
151 assert(!MRI.getRegClassOrRegBank(DstReg));
152 MRI.setRegBank(DstReg, *NewBank);
153 return;
154 }
155
156#ifndef NDEBUG
157 if (Opc == AMDGPU::G_TRUNC) {
158 Register DstReg = MI.getOperand(0).getReg();
159 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
160 assert(DstBank != &AMDGPU::VCCRegBank);
161 }
162#endif
163
164 for (MachineOperand &Op : MI.operands()) {
165 if (!Op.isReg())
166 continue;
167
168 // We may see physical registers if building a real MI
169 Register Reg = Op.getReg();
170 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
171 continue;
172
173 const RegisterBank *RB = NewBank;
174 if (MRI.getType(Reg) == LLT::scalar(1)) {
175 assert(NewBank == &AMDGPU::VGPRRegBank &&
176 "s1 operands should only be used for vector bools");
177 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
178 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
179 "not expecting legalization artifacts here");
180 RB = &AMDGPU::VCCRegBank;
181 }
182
183 MRI.setRegBank(Reg, *RB);
184 }
185 }
186
187 void erasingInstr(MachineInstr &MI) override {}
188
189 void createdInstr(MachineInstr &MI) override {
190 // At this point, the instruction was just inserted and has no operands.
191 NewInsts.push_back(&MI);
192 }
193
194 void changingInstr(MachineInstr &MI) override {}
195 void changedInstr(MachineInstr &MI) override {
196 // FIXME: In principle we should probably add the instruction to NewInsts,
197 // but the way the LegalizerHelper uses the observer, we will always see the
198 // registers we need to set the regbank on also referenced in a new
199 // instruction.
200 }
201};
202
203} // anonymous namespace
204
206 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
207 TII(Subtarget.getInstrInfo()) {
208
209 // HACK: Until this is fully tablegen'd.
210 static llvm::once_flag InitializeRegisterBankFlag;
211
212 static auto InitializeRegisterBankOnce = [this]() {
213 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
214 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
215 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
216 (void)this;
217 };
218
219 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
220}
221
222static bool isVectorRegisterBank(const RegisterBank &Bank) {
223 unsigned BankID = Bank.getID();
224 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
225}
226
228 return RB != &AMDGPU::SGPRRegBank;
229}
230
232 const RegisterBank &Src,
233 TypeSize Size) const {
234 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
235 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
236 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
237 return std::numeric_limits<unsigned>::max();
238 }
239
240 // Bool values are tricky, because the meaning is based on context. The SCC
241 // and VCC banks are for the natural scalar and vector conditions produced by
242 // a compare.
243 //
244 // Legalization doesn't know about the necessary context, so an s1 use may
245 // have been a truncate from an arbitrary value, in which case a copy (lowered
246 // as a compare with 0) needs to be inserted.
247 if (Size == 1 &&
248 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
249 (isVectorRegisterBank(Src) ||
250 Src.getID() == AMDGPU::SGPRRegBankID ||
251 Src.getID() == AMDGPU::VCCRegBankID))
252 return std::numeric_limits<unsigned>::max();
253
254 // There is no direct copy between AGPRs.
255 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
256 Src.getID() == AMDGPU::AGPRRegBankID)
257 return 4;
258
259 return RegisterBankInfo::copyCost(Dst, Src, Size);
260}
261
263 const ValueMapping &ValMapping,
264 const RegisterBank *CurBank) const {
265 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
266 // VGPR.
267 // FIXME: Is there a better way to do this?
268 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
269 return 10; // This is expensive.
270
271 assert(ValMapping.NumBreakDowns == 2 &&
272 ValMapping.BreakDown[0].Length == 32 &&
273 ValMapping.BreakDown[0].StartIdx == 0 &&
274 ValMapping.BreakDown[1].Length == 32 &&
275 ValMapping.BreakDown[1].StartIdx == 32 &&
276 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
277
278 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
279 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
280 // want.
281
282 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
283 // alignment restrictions, but this probably isn't important.
284 return 1;
285}
286
287const RegisterBank &
289 LLT Ty) const {
290 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
291 // VCC-like use.
292 if (TRI->isSGPRClass(&RC)) {
293 // FIXME: This probably came from a copy from a physical register, which
294 // should be inferable from the copied to-type. We don't have many boolean
295 // physical register constraints so just assume a normal SGPR for now.
296 if (!Ty.isValid())
297 return AMDGPU::SGPRRegBank;
298
299 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
300 }
301
302 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
303}
304
305template <unsigned NumOps>
308 const MachineInstr &MI, const MachineRegisterInfo &MRI,
309 const std::array<unsigned, NumOps> RegSrcOpIdx,
310 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
311
312 InstructionMappings AltMappings;
313
314 SmallVector<const ValueMapping *, 10> Operands(MI.getNumOperands());
315
316 unsigned Sizes[NumOps];
317 for (unsigned I = 0; I < NumOps; ++I) {
318 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
319 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
320 }
321
322 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
323 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
324 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
325 }
326
327 // getInstrMapping's default mapping uses ID 1, so start at 2.
328 unsigned MappingID = 2;
329 for (const auto &Entry : Table) {
330 for (unsigned I = 0; I < NumOps; ++I) {
331 int OpIdx = RegSrcOpIdx[I];
332 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
333 }
334
335 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
336 getOperandsMapping(Operands),
337 Operands.size()));
338 }
339
340 return AltMappings;
341}
342
345 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
347 case Intrinsic::amdgcn_readlane: {
348 static const OpRegBankEntry<3> Table[2] = {
349 // Perfectly legal.
350 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
351
352 // Need a readfirstlane for the index.
353 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
354 };
355
356 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
357 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
358 }
359 case Intrinsic::amdgcn_writelane: {
360 static const OpRegBankEntry<4> Table[4] = {
361 // Perfectly legal.
362 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
363
364 // Need readfirstlane of first op
365 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
366
367 // Need readfirstlane of second op
368 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
369
370 // Need readfirstlane of both ops
371 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
372 };
373
374 // rsrc, voffset, offset
375 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
376 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
377 }
378 default:
380 }
381}
382
385 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
386
388 case Intrinsic::amdgcn_s_buffer_load: {
389 static const OpRegBankEntry<2> Table[4] = {
390 // Perfectly legal.
391 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
392
393 // Only need 1 register in loop
394 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
395
396 // Have to waterfall the resource.
397 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
398
399 // Have to waterfall the resource, and the offset.
400 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
401 };
402
403 // rsrc, offset
404 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
405 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
406 }
407 case Intrinsic::amdgcn_ds_ordered_add:
408 case Intrinsic::amdgcn_ds_ordered_swap: {
409 // VGPR = M0, VGPR
410 static const OpRegBankEntry<3> Table[2] = {
411 // Perfectly legal.
412 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
413
414 // Need a readfirstlane for m0
415 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
416 };
417
418 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
419 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
420 }
421 case Intrinsic::amdgcn_s_sendmsg:
422 case Intrinsic::amdgcn_s_sendmsghalt: {
423 // FIXME: Should have no register for immediate
424 static const OpRegBankEntry<1> Table[2] = {
425 // Perfectly legal.
426 { { AMDGPU::SGPRRegBankID }, 1 },
427
428 // Need readlane
429 { { AMDGPU::VGPRRegBankID }, 3 }
430 };
431
432 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
433 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
434 }
435 default:
437 }
438}
439
440// FIXME: Returns uniform if there's no source value information. This is
441// probably wrong.
443 if (!MI.hasOneMemOperand())
444 return false;
445
446 const MachineMemOperand *MMO = *MI.memoperands_begin();
447 const unsigned AS = MMO->getAddrSpace();
448 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
450 const unsigned MemSize = 8 * MMO->getSize().getValue();
451
452 // Require 4-byte alignment.
453 return (MMO->getAlign() >= Align(4) ||
454 (Subtarget.hasScalarSubwordLoads() &&
455 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
456 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
457 // Can't do a scalar atomic load.
458 !MMO->isAtomic() &&
459 // Don't use scalar loads for volatile accesses to non-constant address
460 // spaces.
461 (IsConst || !MMO->isVolatile()) &&
462 // Memory must be known constant, or not written before this load.
463 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
465}
466
469 const MachineInstr &MI) const {
470
471 const MachineFunction &MF = *MI.getMF();
472 const MachineRegisterInfo &MRI = MF.getRegInfo();
473
474
475 InstructionMappings AltMappings;
476 switch (MI.getOpcode()) {
477 case TargetOpcode::G_CONSTANT:
478 case TargetOpcode::G_IMPLICIT_DEF: {
479 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
480 if (Size == 1) {
481 static const OpRegBankEntry<1> Table[3] = {
482 { { AMDGPU::VGPRRegBankID }, 1 },
483 { { AMDGPU::SGPRRegBankID }, 1 },
484 { { AMDGPU::VCCRegBankID }, 1 }
485 };
486
487 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
488 }
489
490 [[fallthrough]];
491 }
492 case TargetOpcode::G_FCONSTANT:
493 case TargetOpcode::G_FRAME_INDEX:
494 case TargetOpcode::G_GLOBAL_VALUE: {
495 static const OpRegBankEntry<1> Table[2] = {
496 { { AMDGPU::VGPRRegBankID }, 1 },
497 { { AMDGPU::SGPRRegBankID }, 1 }
498 };
499
500 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
501 }
502 case TargetOpcode::G_AND:
503 case TargetOpcode::G_OR:
504 case TargetOpcode::G_XOR: {
505 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
506
507 if (Size == 1) {
508 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
509 const InstructionMapping &SCCMapping = getInstructionMapping(
510 1, 1, getOperandsMapping(
511 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
512 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
513 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
514 3); // Num Operands
515 AltMappings.push_back(&SCCMapping);
516
517 const InstructionMapping &VCCMapping0 = getInstructionMapping(
518 2, 1, getOperandsMapping(
519 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
520 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
521 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
522 3); // Num Operands
523 AltMappings.push_back(&VCCMapping0);
524 return AltMappings;
525 }
526
527 if (Size != 64)
528 break;
529
530 const InstructionMapping &SSMapping = getInstructionMapping(
531 1, 1, getOperandsMapping(
532 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
533 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
534 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
535 3); // Num Operands
536 AltMappings.push_back(&SSMapping);
537
538 const InstructionMapping &VVMapping = getInstructionMapping(
539 2, 2, getOperandsMapping(
540 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
541 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
542 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
543 3); // Num Operands
544 AltMappings.push_back(&VVMapping);
545 break;
546 }
547 case TargetOpcode::G_LOAD:
548 case TargetOpcode::G_ZEXTLOAD:
549 case TargetOpcode::G_SEXTLOAD: {
550 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
551 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
552 unsigned PtrSize = PtrTy.getSizeInBits();
553 unsigned AS = PtrTy.getAddressSpace();
554
558 const InstructionMapping &SSMapping = getInstructionMapping(
559 1, 1, getOperandsMapping(
560 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
561 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
562 2); // Num Operands
563 AltMappings.push_back(&SSMapping);
564 }
565
566 const InstructionMapping &VVMapping = getInstructionMapping(
567 2, 1,
569 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
570 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
571 2); // Num Operands
572 AltMappings.push_back(&VVMapping);
573
574 // It may be possible to have a vgpr = load sgpr mapping here, because
575 // the mubuf instructions support this kind of load, but probably for only
576 // gfx7 and older. However, the addressing mode matching in the instruction
577 // selector should be able to do a better job of detecting and selecting
578 // these kinds of loads from the vgpr = load vgpr mapping.
579
580 return AltMappings;
581
582 }
583 case TargetOpcode::G_SELECT: {
584 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
585 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
586 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
587 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
588 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
589 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
590 4); // Num Operands
591 AltMappings.push_back(&SSMapping);
592
593 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
594 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
595 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
596 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
597 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
598 4); // Num Operands
599 AltMappings.push_back(&VVMapping);
600
601 return AltMappings;
602 }
603 case TargetOpcode::G_UADDE:
604 case TargetOpcode::G_USUBE:
605 case TargetOpcode::G_SADDE:
606 case TargetOpcode::G_SSUBE: {
607 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
608 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
610 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
611 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
612 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
613 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
615 5); // Num Operands
616 AltMappings.push_back(&SSMapping);
617
618 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
619 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
620 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
621 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
622 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
623 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
624 5); // Num Operands
625 AltMappings.push_back(&VVMapping);
626 return AltMappings;
627 }
628 case AMDGPU::G_BRCOND: {
629 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
630
631 // TODO: Change type to 32 for scalar
633 1, 1, getOperandsMapping(
634 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
635 2); // Num Operands
636 AltMappings.push_back(&SMapping);
637
639 1, 1, getOperandsMapping(
640 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
641 2); // Num Operands
642 AltMappings.push_back(&VMapping);
643 return AltMappings;
644 }
645 case AMDGPU::G_INTRINSIC:
646 case AMDGPU::G_INTRINSIC_CONVERGENT:
648 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
649 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
651 default:
652 break;
653 }
655}
656
660 LLT HalfTy,
661 Register Reg) const {
662 assert(HalfTy.getSizeInBits() == 32);
663 MachineRegisterInfo *MRI = B.getMRI();
664 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
665 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
666 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
667 MRI->setRegBank(LoLHS, *Bank);
668 MRI->setRegBank(HiLHS, *Bank);
669
670 Regs.push_back(LoLHS);
671 Regs.push_back(HiLHS);
672
673 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
674 .addDef(LoLHS)
675 .addDef(HiLHS)
676 .addUse(Reg);
677}
678
679/// Replace the current type each register in \p Regs has with \p NewTy
681 LLT NewTy) {
682 for (Register Reg : Regs) {
683 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
684 MRI.setType(Reg, NewTy);
685 }
686}
687
689 if (Ty.isVector()) {
690 assert(Ty.getElementCount().isKnownMultipleOf(2));
691 return LLT::scalarOrVector(Ty.getElementCount().divideCoefficientBy(2),
692 Ty.getElementType());
693 }
694
695 assert(Ty.getScalarSizeInBits() % 2 == 0);
696 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
697}
698
699// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
700// source value into a scalar register.
703 Register Src) const {
704 LLT Ty = MRI.getType(Src);
705 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
706
707 if (Bank == &AMDGPU::SGPRRegBank)
708 return Src;
709
710 unsigned Bits = Ty.getSizeInBits();
711 assert(Bits % 32 == 0);
712
713 if (Bank != &AMDGPU::VGPRRegBank) {
714 // We need to copy from AGPR to VGPR
715 Src = B.buildCopy(Ty, Src).getReg(0);
716 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
717 }
718
719 LLT S32 = LLT::scalar(32);
720 unsigned NumParts = Bits / 32;
723
724 if (Bits == 32) {
725 SrcParts.push_back(Src);
726 } else {
727 auto Unmerge = B.buildUnmerge(S32, Src);
728 for (unsigned i = 0; i < NumParts; ++i)
729 SrcParts.push_back(Unmerge.getReg(i));
730 }
731
732 for (unsigned i = 0; i < NumParts; ++i) {
733 Register SrcPart = SrcParts[i];
734 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
735 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
736
737 const TargetRegisterClass *Constrained =
738 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
739 (void)Constrained;
740 assert(Constrained && "Failed to constrain readfirstlane src reg");
741
742 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
743
744 DstParts.push_back(DstPart);
745 }
746
747 if (Bits == 32)
748 return DstParts[0];
749
750 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
751 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
752 return Dst;
753}
754
755/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
756/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
757/// execute the instruction for each unique combination of values in all lanes
758/// in the wave. The block will be split such that rest of the instructions are
759/// moved to a new block.
760///
761/// Essentially performs this loop:
762//
763/// Save Execution Mask
764/// For (Lane : Wavefront) {
765/// Enable Lane, Disable all other lanes
766/// SGPR = read SGPR value for current lane from VGPR
767/// VGPRResult[Lane] = use_op SGPR
768/// }
769/// Restore Execution Mask
770///
771/// There is additional complexity to try for compare values to identify the
772/// unique values used.
775 SmallSet<Register, 4> &SGPROperandRegs) const {
776 // Track use registers which have already been expanded with a readfirstlane
777 // sequence. This may have multiple uses if moving a sequence.
778 DenseMap<Register, Register> WaterfalledRegMap;
779
780 MachineBasicBlock &MBB = B.getMBB();
781 MachineFunction *MF = &B.getMF();
782
783 const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
784 const AMDGPU::LaneMaskConstants &LMC =
786
787#ifndef NDEBUG
788 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
789#endif
790
791 MachineRegisterInfo &MRI = *B.getMRI();
792 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
793 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
794
795 // Don't bother using generic instructions/registers for the exec mask.
796 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
797 .addDef(InitSaveExecReg);
798
799 Register PhiExec = MRI.createVirtualRegister(WaveRC);
800 Register NewExec = MRI.createVirtualRegister(WaveRC);
801
802 // To insert the loop we need to split the block. Move everything before this
803 // point to a new block, and insert a new empty block before this instruction.
806 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
807 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
809 ++MBBI;
810 MF->insert(MBBI, LoopBB);
811 MF->insert(MBBI, BodyBB);
812 MF->insert(MBBI, RestoreExecBB);
813 MF->insert(MBBI, RemainderBB);
814
815 LoopBB->addSuccessor(BodyBB);
816 BodyBB->addSuccessor(RestoreExecBB);
817 BodyBB->addSuccessor(LoopBB);
818
819 // Move the rest of the block into a new block.
821 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
822
823 MBB.addSuccessor(LoopBB);
824 RestoreExecBB->addSuccessor(RemainderBB);
825
826 B.setInsertPt(*LoopBB, LoopBB->end());
827
828 B.buildInstr(TargetOpcode::PHI)
829 .addDef(PhiExec)
830 .addReg(InitSaveExecReg)
831 .addMBB(&MBB)
832 .addReg(NewExec)
833 .addMBB(BodyBB);
834
835 const DebugLoc &DL = B.getDL();
836
837 MachineInstr &FirstInst = *Range.begin();
838
839 // Move the instruction into the loop body. Note we moved everything after
840 // Range.end() already into a new block, so Range.end() is no longer valid.
841 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
842
843 // Figure out the iterator range after splicing the instructions.
844 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
845 auto NewEnd = BodyBB->end();
846
847 B.setMBB(*LoopBB);
848
849 LLT S1 = LLT::scalar(1);
850 Register CondReg;
851
852 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
853
854 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
855 for (MachineOperand &Op : MI.all_uses()) {
856 Register OldReg = Op.getReg();
857 if (!SGPROperandRegs.count(OldReg))
858 continue;
859
860 // See if we already processed this register in another instruction in the
861 // sequence.
862 auto OldVal = WaterfalledRegMap.find(OldReg);
863 if (OldVal != WaterfalledRegMap.end()) {
864 Op.setReg(OldVal->second);
865 continue;
866 }
867
868 Register OpReg = Op.getReg();
869 LLT OpTy = MRI.getType(OpReg);
870
871 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
872 if (OpBank != &AMDGPU::VGPRRegBank) {
873 // Insert copy from AGPR to VGPR before the loop.
874 B.setMBB(MBB);
875 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
876 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
877 B.setMBB(*LoopBB);
878 }
879
880 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
881
882 // Build the comparison(s).
883 unsigned OpSize = OpTy.getSizeInBits();
884 bool Is64 = OpSize % 64 == 0;
885 unsigned PartSize = Is64 ? 64 : 32;
886 LLT PartTy = LLT::scalar(PartSize);
887 unsigned NumParts = OpSize / PartSize;
889 SmallVector<Register, 8> CurrentLaneParts;
890
891 if (NumParts == 1) {
892 OpParts.push_back(OpReg);
893 CurrentLaneParts.push_back(CurrentLaneReg);
894 } else {
895 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
896 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
897 for (unsigned i = 0; i < NumParts; ++i) {
898 OpParts.push_back(UnmergeOp.getReg(i));
899 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
900 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
901 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
902 }
903 }
904
905 for (unsigned i = 0; i < NumParts; ++i) {
906 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
907 OpParts[i]).getReg(0);
908 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
909
910 if (!CondReg) {
911 CondReg = CmpReg;
912 } else {
913 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
914 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
915 }
916 }
917
918 Op.setReg(CurrentLaneReg);
919
920 // Make sure we don't re-process this register again.
921 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
922 }
923 }
924
925 // The ballot becomes a no-op during instruction selection.
926 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
927 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
928 .addReg(CondReg)
929 .getReg(0);
930 MRI.setRegClass(CondReg, WaveRC);
931
932 // Update EXEC, save the original EXEC value to VCC.
933 B.buildInstr(LMC.AndSaveExecOpc)
934 .addDef(NewExec)
935 .addReg(CondReg, RegState::Kill);
936
937 MRI.setSimpleHint(NewExec, CondReg);
938
939 B.setInsertPt(*BodyBB, BodyBB->end());
940
941 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
942 B.buildInstr(LMC.XorTermOpc)
943 .addDef(LMC.ExecReg)
944 .addReg(LMC.ExecReg)
945 .addReg(NewExec);
946
947 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
948 // s_cbranch_scc0?
949
950 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
951 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
952
953 // Save the EXEC mask before the loop.
954 BuildMI(MBB, MBB.end(), DL, TII->get(LMC.MovOpc), SaveExecReg)
955 .addReg(LMC.ExecReg);
956
957 // Restore the EXEC mask after the loop.
958 B.setMBB(*RestoreExecBB);
959 B.buildInstr(LMC.MovTermOpc).addDef(LMC.ExecReg).addReg(SaveExecReg);
960
961 // Set the insert point after the original instruction, so any new
962 // instructions will be in the remainder.
963 B.setInsertPt(*RemainderBB, RemainderBB->begin());
964
965 return true;
966}
967
968// Return any unique registers used by \p MI at \p OpIndices that need to be
969// handled in a waterfall loop. Returns these registers in \p
970// SGPROperandRegs. Returns true if there are any operands to handle and a
971// waterfall loop is necessary.
973 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
974 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
975 for (unsigned Op : OpIndices) {
976 assert(MI.getOperand(Op).isUse());
977 Register Reg = MI.getOperand(Op).getReg();
978 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
979 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
980 SGPROperandRegs.insert(Reg);
981 }
982
983 // No operands need to be replaced, so no need to loop.
984 return !SGPROperandRegs.empty();
985}
986
989 // Use a set to avoid extra readfirstlanes in the case where multiple operands
990 // are the same register.
991 SmallSet<Register, 4> SGPROperandRegs;
992
993 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
994 return false;
995
996 MachineBasicBlock::iterator I = MI.getIterator();
997 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
998 SGPROperandRegs);
999}
1000
1001// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1003 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1004 Register Reg = MI.getOperand(OpIdx).getReg();
1005 MachineRegisterInfo &MRI = *B.getMRI();
1006 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1007 if (Bank == &AMDGPU::SGPRRegBank)
1008 return;
1009
1010 Reg = buildReadFirstLane(B, MRI, Reg);
1011 MI.getOperand(OpIdx).setReg(Reg);
1012}
1013
1014/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1015/// rest will be in the remainder.
1016static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1017 unsigned TotalSize = Ty.getSizeInBits();
1018 if (!Ty.isVector())
1019 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1020
1021 LLT EltTy = Ty.getElementType();
1022 unsigned EltSize = EltTy.getSizeInBits();
1023 assert(FirstSize % EltSize == 0);
1024
1025 unsigned FirstPartNumElts = FirstSize / EltSize;
1026 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1027
1028 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1029 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1030}
1031
1033 if (!Ty.isVector())
1034 return LLT::scalar(128);
1035
1036 LLT EltTy = Ty.getElementType();
1037 assert(128 % EltTy.getSizeInBits() == 0);
1038 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1039}
1040
1044 MachineInstr &MI) const {
1045 MachineRegisterInfo &MRI = *B.getMRI();
1046 Register DstReg = MI.getOperand(0).getReg();
1047 const LLT LoadTy = MRI.getType(DstReg);
1048 unsigned LoadSize = LoadTy.getSizeInBits();
1049 MachineMemOperand *MMO = *MI.memoperands_begin();
1050 const unsigned MaxNonSmrdLoadSize = 128;
1051
1052 const RegisterBank *DstBank =
1053 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1054 if (DstBank == &AMDGPU::SGPRRegBank) {
1055 // There are some special cases that we need to look at for 32 bit and 96
1056 // bit SGPR loads otherwise we have nothing to do.
1057 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1058 return false;
1059
1060 const unsigned MemSize = 8 * MMO->getSize().getValue();
1061 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1062 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1063 // scalar loads should have a load size of 32 but memory access size of less
1064 // than 32.
1065 if (LoadSize == 32 &&
1066 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1067 return false;
1068
1069 if (LoadSize == 32 &&
1070 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1071 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1073 Subtarget.getGeneration() >= AMDGPUSubtarget::GFX12)
1074 return false;
1075
1076 Register PtrReg = MI.getOperand(1).getReg();
1077
1078 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1079
1080 if (LoadSize == 32) {
1081 // This is an extending load from a sub-dword size. Widen the memory
1082 // access size to 4 bytes and clear the extra high bits appropriately
1083 const LLT S32 = LLT::scalar(32);
1084 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1085 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1086 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1087 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1088 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1089 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1090 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1091 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1092 } else
1093 // We do not need to touch the higher bits for regular loads.
1094 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1095 } else {
1096 // 96-bit loads are only available for vector loads. We need to split this
1097 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1098 if (MMO->getAlign() < Align(16)) {
1099 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1100 LLT Part64, Part32;
1101 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1102 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1104 return false;
1105 return true;
1106 }
1107 LLT WiderTy = widen96To128(LoadTy);
1108 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1109 if (WiderTy.isScalar()) {
1110 B.buildTrunc(MI.getOperand(0), WideLoad);
1111 } else {
1112 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1113 WideLoad);
1114 }
1115 }
1116
1117 MI.eraseFromParent();
1118 return true;
1119 }
1120
1121 // 128-bit loads are supported for all instruction types.
1122 if (LoadSize <= MaxNonSmrdLoadSize)
1123 return false;
1124
1125 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1126
1127 if (SrcRegs.empty())
1128 SrcRegs.push_back(MI.getOperand(1).getReg());
1129
1130 // RegBankSelect only emits scalar types, so we need to reset the pointer
1131 // operand to a pointer type.
1132 Register BasePtrReg = SrcRegs[0];
1133 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1134 MRI.setType(BasePtrReg, PtrTy);
1135
1136 // The following are the loads not splitted enough during legalization
1137 // because it was not clear they are smem-load or vmem-load
1140 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1141 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1142 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1143 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1144 LegalizerHelper Helper(B.getMF(), O, B);
1145 if (LoadTy.isVector()) {
1146 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1148 return false;
1149 } else {
1150 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1151 return false;
1152 }
1153 }
1154
1155 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1156 return true;
1157}
1158
1162 MachineInstr &MI) const {
1163 MachineRegisterInfo &MRI = *B.getMRI();
1164 const MachineFunction &MF = B.getMF();
1165 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1166 const auto &TFI = *ST.getFrameLowering();
1167
1168 // Guard in case the stack growth direction ever changes with scratch
1169 // instructions.
1170 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1171 "Stack grows upwards for AMDGPU");
1172
1173 Register Dst = MI.getOperand(0).getReg();
1174 Register AllocSize = MI.getOperand(1).getReg();
1175 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1176
1177 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1178
1179 if (SizeBank != &AMDGPU::SGPRRegBank) {
1180 auto WaveReduction =
1181 B.buildIntrinsic(Intrinsic::amdgcn_wave_reduce_umax, {LLT::scalar(32)})
1182 .addUse(AllocSize)
1183 .addImm(0);
1184 AllocSize = WaveReduction.getReg(0);
1185 }
1186
1187 LLT PtrTy = MRI.getType(Dst);
1188 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1189
1191 Register SPReg = Info->getStackPtrOffsetReg();
1192 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1193
1194 auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1195 auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1196
1197 auto OldSP = B.buildCopy(PtrTy, SPReg);
1198 if (Alignment > TFI.getStackAlign()) {
1199 auto StackAlignMask = (Alignment.value() << ST.getWavefrontSizeLog2()) - 1;
1200 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1201 B.buildConstant(LLT::scalar(32), StackAlignMask));
1202 B.buildMaskLowPtrBits(Dst, Tmp1,
1203 Log2(Alignment) + ST.getWavefrontSizeLog2());
1204 } else {
1205 B.buildCopy(Dst, OldSP);
1206 }
1207 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1208 B.buildCopy(SPReg, PtrAdd);
1209 MI.eraseFromParent();
1210 return true;
1211}
1212
1216 int RsrcIdx) const {
1217 const int NumDefs = MI.getNumExplicitDefs();
1218
1219 // The reported argument index is relative to the IR intrinsic call arguments,
1220 // so we need to shift by the number of defs and the intrinsic ID.
1221 RsrcIdx += NumDefs + 1;
1222
1223 // Insert copies to VGPR arguments.
1224 applyDefaultMapping(OpdMapper);
1225
1226 // Fixup any SGPR arguments.
1227 SmallVector<unsigned, 4> SGPRIndexes;
1228 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1229 if (!MI.getOperand(I).isReg())
1230 continue;
1231
1232 // If this intrinsic has a sampler, it immediately follows rsrc.
1233 if (I == RsrcIdx || I == RsrcIdx + 1)
1234 SGPRIndexes.push_back(I);
1235 }
1236
1237 executeInWaterfallLoop(B, MI, SGPRIndexes);
1238 return true;
1239}
1240
1241// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1242// the three offsets (voffset, soffset and instoffset)
1244 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1245 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1246 const LLT S32 = LLT::scalar(32);
1247 MachineRegisterInfo *MRI = B.getMRI();
1248
1249 if (std::optional<int64_t> Imm =
1250 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1251 uint32_t SOffset, ImmOffset;
1252 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1253 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1254 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1255 InstOffsetVal = ImmOffset;
1256
1257 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1258 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1259 return SOffset + ImmOffset;
1260 }
1261 }
1262
1263 Register Base;
1264 unsigned Offset;
1265
1266 std::tie(Base, Offset) =
1267 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);
1268
1269 uint32_t SOffset, ImmOffset;
1270 if ((int)Offset > 0 &&
1271 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1272 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1273 VOffsetReg = Base;
1274 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1275 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1276 InstOffsetVal = ImmOffset;
1277 return 0; // XXX - Why is this 0?
1278 }
1279
1280 // If we have SGPR base, we can use it for soffset.
1281 if (SOffset == 0) {
1282 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1283 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1284 SOffsetReg = Base;
1285 InstOffsetVal = ImmOffset;
1286 return 0; // XXX - Why is this 0?
1287 }
1288 }
1289
1290 // Handle the variable sgpr + vgpr case.
1291 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1292 if (Add && (int)Offset >= 0) {
1293 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1294 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1295
1296 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1297 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1298
1299 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1300 VOffsetReg = Src0;
1301 SOffsetReg = Src1;
1302 return 0;
1303 }
1304
1305 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1306 VOffsetReg = Src1;
1307 SOffsetReg = Src0;
1308 return 0;
1309 }
1310 }
1311
1312 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1313 // have an SGPR offset and a VGPR resource.
1314 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1315 VOffsetReg = CombinedOffset;
1316 } else {
1317 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1318 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1319 }
1320
1321 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1322 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1323 return 0;
1324}
1325
1327 switch (Opc) {
1328 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1329 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1330 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1331 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1332 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1333 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1334 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1335 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1336 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1337 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1338 default:
1339 break;
1340 }
1341 llvm_unreachable("Unexpected s_buffer_load opcode");
1342}
1343
1345 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1346 MachineInstr &MI = OpdMapper.getMI();
1347 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1348
1349 const LLT S32 = LLT::scalar(32);
1350 Register Dst = MI.getOperand(0).getReg();
1351 LLT Ty = MRI.getType(Dst);
1352
1353 const RegisterBank *RSrcBank =
1354 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1355 const RegisterBank *OffsetBank =
1356 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1357 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1358 OffsetBank == &AMDGPU::SGPRRegBank)
1359 return true; // Legal mapping
1360
1361 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1362 // here but don't have an MMO.
1363
1364 unsigned LoadSize = Ty.getSizeInBits();
1365 int NumLoads = 1;
1366 if (LoadSize == 256 || LoadSize == 512) {
1367 NumLoads = LoadSize / 128;
1368 Ty = Ty.divide(NumLoads);
1369 }
1370
1371 // Use the alignment to ensure that the required offsets will fit into the
1372 // immediate offsets.
1373 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1374
1375 MachineFunction &MF = B.getMF();
1376
1377 Register SOffset;
1378 Register VOffset;
1379 int64_t ImmOffset = 0;
1380
1381 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1382 SOffset, ImmOffset, Alignment);
1383
1384 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1385 // can, but we need to track an MMO for that.
1386 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1387 const Align MemAlign(4); // FIXME: ABI type alignment?
1392 MemSize, MemAlign);
1393 if (MMOOffset != 0)
1394 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1395
1396 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1397 // assume that the buffer is unswizzled.
1398
1399 Register RSrc = MI.getOperand(1).getReg();
1400 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1401 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1402
1403 SmallVector<Register, 4> LoadParts(NumLoads);
1404
1405 MachineBasicBlock::iterator MII = MI.getIterator();
1406 MachineInstrSpan Span(MII, &B.getMBB());
1407
1408 for (int i = 0; i < NumLoads; ++i) {
1409 if (NumLoads == 1) {
1410 LoadParts[i] = Dst;
1411 } else {
1412 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1413 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1414 }
1415
1416 MachineMemOperand *MMO = BaseMMO;
1417 if (i != 0)
1418 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset + 16 * i, MemSize);
1419
1420 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1421 .addDef(LoadParts[i]) // vdata
1422 .addUse(RSrc) // rsrc
1423 .addUse(VIndex) // vindex
1424 .addUse(VOffset) // voffset
1425 .addUse(SOffset) // soffset
1426 .addImm(ImmOffset + 16 * i) // offset(imm)
1427 .addImm(0) // cachepolicy, swizzled buffer(imm)
1428 .addImm(0) // idxen(imm)
1429 .addMemOperand(MMO);
1430 }
1431
1432 // TODO: If only the resource is a VGPR, it may be better to execute the
1433 // scalar load in the waterfall loop if the resource is expected to frequently
1434 // be dynamically uniform.
1435 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1436 // Remove the original instruction to avoid potentially confusing the
1437 // waterfall loop logic.
1438 B.setInstr(*Span.begin());
1439 MI.eraseFromParent();
1440
1441 SmallSet<Register, 4> OpsToWaterfall;
1442
1443 OpsToWaterfall.insert(RSrc);
1444 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1445 OpsToWaterfall);
1446 }
1447
1448 if (NumLoads != 1) {
1449 if (Ty.isVector())
1450 B.buildConcatVectors(Dst, LoadParts);
1451 else
1452 B.buildMergeLikeInstr(Dst, LoadParts);
1453 }
1454
1455 // We removed the instruction earlier with a waterfall loop.
1456 if (RSrcBank == &AMDGPU::SGPRRegBank)
1457 MI.eraseFromParent();
1458
1459 return true;
1460}
1461
1463 const OperandsMapper &OpdMapper,
1464 bool Signed) const {
1465 MachineInstr &MI = OpdMapper.getMI();
1466 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1467
1468 // Insert basic copies
1469 applyDefaultMapping(OpdMapper);
1470
1471 Register DstReg = MI.getOperand(0).getReg();
1472 LLT Ty = MRI.getType(DstReg);
1473
1474 const LLT S32 = LLT::scalar(32);
1475
1476 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1477 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1478 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1479 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1480
1481 const RegisterBank *DstBank =
1482 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1483 if (DstBank == &AMDGPU::VGPRRegBank) {
1484 if (Ty == S32)
1485 return true;
1486
1487 // There is no 64-bit vgpr bitfield extract instructions so the operation
1488 // is expanded to a sequence of instructions that implement the operation.
1489 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1490
1491 const LLT S64 = LLT::scalar(64);
1492 // Shift the source operand so that extracted bits start at bit 0.
1493 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1494 : B.buildLShr(S64, SrcReg, OffsetReg);
1495 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1496
1497 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1498 // if the width is a constant.
1499 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1500 // Use the 32-bit bitfield extract instruction if the width is a constant.
1501 // Depending on the width size, use either the low or high 32-bits.
1502 auto Zero = B.buildConstant(S32, 0);
1503 auto WidthImm = ConstWidth->Value.getZExtValue();
1504 if (WidthImm <= 32) {
1505 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1506 // or clear the upper 32-bits.
1507 auto Extract =
1508 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1509 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1510 auto Extend =
1511 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1512 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1513 } else {
1514 // Use bitfield extract on upper 32-bit source, and combine with lower
1515 // 32-bit source.
1516 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1517 auto Extract =
1518 Signed
1519 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1520 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1521 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1522 }
1523 MI.eraseFromParent();
1524 return true;
1525 }
1526
1527 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1528 // operations.
1529 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1530 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1531 if (Signed)
1532 B.buildAShr(S64, SignBit, ExtShift);
1533 else
1534 B.buildLShr(S64, SignBit, ExtShift);
1535 MI.eraseFromParent();
1536 return true;
1537 }
1538
1539 // The scalar form packs the offset and width in a single operand.
1540
1541 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1542
1543 // Ensure the high bits are clear to insert the offset.
1544 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1545 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1546
1547 // Zeros out the low bits, so don't bother clamping the input value.
1548 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1549
1550 // Transformation function, pack the offset and width of a BFE into
1551 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1552 // source, bits [5:0] contain the offset and bits [22:16] the width.
1553 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1554
1555 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1556 // register class constraints.
1557 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1558 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1559
1560 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1561 if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
1562 llvm_unreachable("failed to constrain BFE");
1563
1564 MI.eraseFromParent();
1565 return true;
1566}
1567
1569 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1570 MachineInstr &MI = OpdMapper.getMI();
1571 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1572
1573 // Insert basic copies.
1574 applyDefaultMapping(OpdMapper);
1575
1576 Register Dst0 = MI.getOperand(0).getReg();
1577 Register Dst1 = MI.getOperand(1).getReg();
1578 Register Src0 = MI.getOperand(2).getReg();
1579 Register Src1 = MI.getOperand(3).getReg();
1580 Register Src2 = MI.getOperand(4).getReg();
1581
1582 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1583 return true;
1584
1585 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1586 LLT S1 = LLT::scalar(1);
1587 LLT S32 = LLT::scalar(32);
1588
1589 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1590 bool Accumulate = true;
1591
1592 if (!DstOnValu) {
1593 if (mi_match(Src2, MRI, m_ZeroInt()))
1594 Accumulate = false;
1595 }
1596
1597 // Keep the multiplication on the SALU.
1598 Register DstHi;
1599 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1600 bool MulHiInVgpr = false;
1601
1602 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1603
1604 if (Subtarget.hasSMulHi()) {
1605 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1606 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1607 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1608 } else {
1609 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1610 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1611
1612 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1613 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1614
1615 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1616 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1617 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1618
1619 if (!DstOnValu) {
1620 DstHi = buildReadFirstLane(B, MRI, DstHi);
1621 } else {
1622 MulHiInVgpr = true;
1623 }
1624 }
1625
1626 // Accumulate and produce the "carry-out" bit.
1627 //
1628 // The "carry-out" is defined as bit 64 of the result when computed as a
1629 // big integer. For unsigned multiply-add, this matches the usual definition
1630 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1631 // result, which is determined as:
1632 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1633 LLT CarryType = DstOnValu ? S1 : S32;
1634 const RegisterBank &CarryBank =
1635 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1636 const RegisterBank &DstBank =
1637 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1638 Register Carry;
1639 Register Zero;
1640
1641 if (!IsUnsigned) {
1642 Zero = B.buildConstant(S32, 0).getReg(0);
1643 MRI.setRegBank(Zero,
1644 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1645
1646 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1647 .getReg(0);
1648 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1649 : AMDGPU::SGPRRegBank);
1650
1651 if (DstOnValu && !MulHiInVgpr) {
1652 Carry = B.buildTrunc(S1, Carry).getReg(0);
1653 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1654 }
1655 }
1656
1657 if (Accumulate) {
1658 if (DstOnValu) {
1659 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1660 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1661 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1662 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1663 }
1664
1665 auto Unmerge = B.buildUnmerge(S32, Src2);
1666 Register Src2Lo = Unmerge.getReg(0);
1667 Register Src2Hi = Unmerge.getReg(1);
1668 MRI.setRegBank(Src2Lo, DstBank);
1669 MRI.setRegBank(Src2Hi, DstBank);
1670
1671 if (!IsUnsigned) {
1672 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1673 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1674
1675 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1676 MRI.setRegBank(Carry, CarryBank);
1677 }
1678
1679 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1680 DstLo = AddLo.getReg(0);
1681 Register CarryLo = AddLo.getReg(1);
1682 MRI.setRegBank(DstLo, DstBank);
1683 MRI.setRegBank(CarryLo, CarryBank);
1684
1685 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1686 DstHi = AddHi.getReg(0);
1687 MRI.setRegBank(DstHi, DstBank);
1688
1689 Register CarryHi = AddHi.getReg(1);
1690 MRI.setRegBank(CarryHi, CarryBank);
1691
1692 if (IsUnsigned) {
1693 Carry = CarryHi;
1694 } else {
1695 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1696 MRI.setRegBank(Carry, CarryBank);
1697 }
1698 } else {
1699 if (IsUnsigned) {
1700 Carry = B.buildConstant(CarryType, 0).getReg(0);
1701 MRI.setRegBank(Carry, CarryBank);
1702 }
1703 }
1704
1705 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1706
1707 if (DstOnValu) {
1708 B.buildCopy(Dst1, Carry);
1709 } else {
1710 B.buildTrunc(Dst1, Carry);
1711 }
1712
1713 MI.eraseFromParent();
1714 return true;
1715}
1716
1717// Return a suitable opcode for extending the operands of Opc when widening.
1718static unsigned getExtendOp(unsigned Opc) {
1719 switch (Opc) {
1720 case TargetOpcode::G_ASHR:
1721 case TargetOpcode::G_SMIN:
1722 case TargetOpcode::G_SMAX:
1723 return TargetOpcode::G_SEXT;
1724 case TargetOpcode::G_LSHR:
1725 case TargetOpcode::G_UMIN:
1726 case TargetOpcode::G_UMAX:
1727 return TargetOpcode::G_ZEXT;
1728 default:
1729 return TargetOpcode::G_ANYEXT;
1730 }
1731}
1732
1733// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1734// any illegal vector extend or unmerge operations.
1735static std::pair<Register, Register>
1736unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1737 const LLT S32 = LLT::scalar(32);
1738 auto Bitcast = B.buildBitcast(S32, Src);
1739
1740 if (ExtOpcode == TargetOpcode::G_SEXT) {
1741 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1742 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1743 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1744 }
1745
1746 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1747 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1748 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1749 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1750 }
1751
1752 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1753 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1754}
1755
1756// For cases where only a single copy is inserted for matching register banks.
1757// Replace the register in the instruction operand
1759 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1760 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1761 if (!SrcReg.empty()) {
1762 assert(SrcReg.size() == 1);
1763 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1764 return true;
1765 }
1766
1767 return false;
1768}
1769
1770/// Handle register layout difference for f16 images for some subtargets.
1773 Register Reg) const {
1774 if (!Subtarget.hasUnpackedD16VMem())
1775 return Reg;
1776
1777 const LLT S16 = LLT::scalar(16);
1778 LLT StoreVT = MRI.getType(Reg);
1779 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1780 return Reg;
1781
1782 auto Unmerge = B.buildUnmerge(S16, Reg);
1783
1784
1785 SmallVector<Register, 4> WideRegs;
1786 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1787 WideRegs.push_back(Unmerge.getReg(I));
1788
1789 const LLT S32 = LLT::scalar(32);
1790 int NumElts = StoreVT.getNumElements();
1791
1792 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1793 .getReg(0);
1794}
1795
1796static std::pair<Register, unsigned>
1798 int64_t Const;
1799 if (mi_match(Reg, MRI, m_ICst(Const)))
1800 return std::pair(Register(), Const);
1801
1802 Register Base;
1803 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1804 return std::pair(Base, Const);
1805
1806 // TODO: Handle G_OR used for add case
1807 return std::pair(Reg, 0);
1808}
1809
1810std::pair<Register, unsigned>
1812 Register OrigOffset) const {
1813 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1814 Register BaseReg;
1815 unsigned ImmOffset;
1816 const LLT S32 = LLT::scalar(32);
1817
1818 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1819 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1820 OrigOffset);
1821
1822 unsigned C1 = 0;
1823 if (ImmOffset != 0) {
1824 // If the immediate value is too big for the immoffset field, put only bits
1825 // that would normally fit in the immoffset field. The remaining value that
1826 // is copied/added for the voffset field is a large power of 2, and it
1827 // stands more chance of being CSEd with the copy/add for another similar
1828 // load/store.
1829 // However, do not do that rounding down if that is a negative
1830 // number, as it appears to be illegal to have a negative offset in the
1831 // vgpr, even if adding the immediate offset makes it positive.
1832 unsigned Overflow = ImmOffset & ~MaxImm;
1833 ImmOffset -= Overflow;
1834 if ((int32_t)Overflow < 0) {
1835 Overflow += ImmOffset;
1836 ImmOffset = 0;
1837 }
1838
1839 C1 = ImmOffset;
1840 if (Overflow != 0) {
1841 if (!BaseReg)
1842 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1843 else {
1844 auto OverflowVal = B.buildConstant(S32, Overflow);
1845 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1846 }
1847 }
1848 }
1849
1850 if (!BaseReg)
1851 BaseReg = B.buildConstant(S32, 0).getReg(0);
1852
1853 return {BaseReg, C1};
1854}
1855
1857 Register SrcReg) const {
1858 MachineRegisterInfo &MRI = *B.getMRI();
1859 LLT SrcTy = MRI.getType(SrcReg);
1860 if (SrcTy.getSizeInBits() == 32) {
1861 // Use a v_mov_b32 here to make the exec dependency explicit.
1862 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1863 .addDef(DstReg)
1864 .addUse(SrcReg);
1865 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1866 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1867 }
1868
1869 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1870 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1871
1872 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1873 .addDef(TmpReg0)
1874 .addUse(SrcReg, 0, AMDGPU::sub0);
1875 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1876 .addDef(TmpReg1)
1877 .addUse(SrcReg, 0, AMDGPU::sub1);
1878 B.buildInstr(AMDGPU::REG_SEQUENCE)
1879 .addDef(DstReg)
1880 .addUse(TmpReg0)
1881 .addImm(AMDGPU::sub0)
1882 .addUse(TmpReg1)
1883 .addImm(AMDGPU::sub1);
1884
1885 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1886 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1887}
1888
1889/// Utility function for pushing dynamic vector indexes with a constant offset
1890/// into waterfall loops.
1892 MachineInstr &IdxUseInstr,
1893 unsigned OpIdx,
1894 unsigned ConstOffset) {
1895 MachineRegisterInfo &MRI = *B.getMRI();
1896 const LLT S32 = LLT::scalar(32);
1897 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1898 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1899
1900 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1901
1902 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1903 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1904 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1905 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1906}
1907
1908/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1909/// original 32-bit source value (to be inserted in the low part of the combined
1910/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1911/// value.
1913 Register Hi32Reg, Register Lo32Reg,
1914 unsigned ExtOpc,
1915 const RegisterBank &RegBank,
1916 bool IsBooleanSrc = false) {
1917 if (ExtOpc == AMDGPU::G_ZEXT) {
1918 B.buildConstant(Hi32Reg, 0);
1919 } else if (ExtOpc == AMDGPU::G_SEXT) {
1920 if (IsBooleanSrc) {
1921 // If we know the original source was an s1, the high half is the same as
1922 // the low.
1923 B.buildCopy(Hi32Reg, Lo32Reg);
1924 } else {
1925 // Replicate sign bit from 32-bit extended part.
1926 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1927 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1928 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1929 }
1930 } else {
1931 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1932 B.buildUndef(Hi32Reg);
1933 }
1934}
1935
1936bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1938 const OperandsMapper &OpdMapper) const {
1939 MachineRegisterInfo &MRI = *B.getMRI();
1940
1941 Register VecReg = MI.getOperand(1).getReg();
1942 Register Idx = MI.getOperand(2).getReg();
1943
1944 const RegisterBank &IdxBank =
1945 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1946
1947 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1948
1949 LLT VecTy = MRI.getType(VecReg);
1950 unsigned EltSize = VecTy.getScalarSizeInBits();
1951 unsigned NumElem = VecTy.getNumElements();
1952
1953 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1954 IsDivergentIdx, &Subtarget))
1955 return false;
1956
1957 LLT S32 = LLT::scalar(32);
1958
1959 const RegisterBank &DstBank =
1960 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1961 const RegisterBank &SrcBank =
1962 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1963
1964 const RegisterBank &CCBank =
1965 (DstBank == AMDGPU::SGPRRegBank &&
1966 SrcBank == AMDGPU::SGPRRegBank &&
1967 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1968 : AMDGPU::VCCRegBank;
1969 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1970
1971 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1972 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1973 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1974 }
1975
1976 LLT EltTy = VecTy.getScalarType();
1977 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1978 unsigned NumLanes = DstRegs.size();
1979 if (!NumLanes)
1980 NumLanes = 1;
1981 else
1982 EltTy = MRI.getType(DstRegs[0]);
1983
1984 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1985 SmallVector<Register, 2> Res(NumLanes);
1986 for (unsigned L = 0; L < NumLanes; ++L)
1987 Res[L] = UnmergeToEltTy.getReg(L);
1988
1989 for (unsigned I = 1; I < NumElem; ++I) {
1990 auto IC = B.buildConstant(S32, I);
1991 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
1992 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
1993 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
1994
1995 for (unsigned L = 0; L < NumLanes; ++L) {
1996 auto S = B.buildSelect(EltTy, Cmp,
1997 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
1998
1999 for (unsigned N : { 0, 2, 3 })
2000 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2001
2002 Res[L] = S->getOperand(0).getReg();
2003 }
2004 }
2005
2006 for (unsigned L = 0; L < NumLanes; ++L) {
2007 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2008 B.buildCopy(DstReg, Res[L]);
2009 MRI.setRegBank(DstReg, DstBank);
2010 }
2011
2012 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2013 MI.eraseFromParent();
2014
2015 return true;
2016}
2017
2018// Insert a cross regbank copy for a register if it already has a bank that
2019// differs from the one we want to set.
2022 const RegisterBank &Bank) {
2023 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2024 if (CurrBank && *CurrBank != Bank) {
2025 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2026 MRI.setRegBank(Copy, Bank);
2027 return Copy;
2028 }
2029
2030 MRI.setRegBank(Reg, Bank);
2031 return Reg;
2032}
2033
2034bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2036 const OperandsMapper &OpdMapper) const {
2037
2038 MachineRegisterInfo &MRI = *B.getMRI();
2039 Register VecReg = MI.getOperand(1).getReg();
2040 Register Idx = MI.getOperand(3).getReg();
2041
2042 const RegisterBank &IdxBank =
2043 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2044
2045 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2046
2047 LLT VecTy = MRI.getType(VecReg);
2048 unsigned EltSize = VecTy.getScalarSizeInBits();
2049 unsigned NumElem = VecTy.getNumElements();
2050
2051 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2052 IsDivergentIdx, &Subtarget))
2053 return false;
2054
2055 LLT S32 = LLT::scalar(32);
2056
2057 const RegisterBank &DstBank =
2058 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2059 const RegisterBank &SrcBank =
2060 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2061 const RegisterBank &InsBank =
2062 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2063
2064 const RegisterBank &CCBank =
2065 (DstBank == AMDGPU::SGPRRegBank &&
2066 SrcBank == AMDGPU::SGPRRegBank &&
2067 InsBank == AMDGPU::SGPRRegBank &&
2068 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2069 : AMDGPU::VCCRegBank;
2070 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2071
2072 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2073 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2074 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2075 }
2076
2077 LLT EltTy = VecTy.getScalarType();
2078 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2079 unsigned NumLanes = InsRegs.size();
2080 if (!NumLanes) {
2081 NumLanes = 1;
2082 InsRegs.push_back(MI.getOperand(2).getReg());
2083 } else {
2084 EltTy = MRI.getType(InsRegs[0]);
2085 }
2086
2087 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2088 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2089
2090 for (unsigned I = 0; I < NumElem; ++I) {
2091 auto IC = B.buildConstant(S32, I);
2092 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2093 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2094 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2095
2096 for (unsigned L = 0; L < NumLanes; ++L) {
2097 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2098 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2099 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2100
2101 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2102 MRI.setRegBank(Select, DstBank);
2103
2104 Ops[I * NumLanes + L] = Select;
2105 }
2106 }
2107
2108 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2109 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2110 B.buildBuildVector(MI.getOperand(0), Ops);
2111 } else {
2112 auto Vec = B.buildBuildVector(MergeTy, Ops);
2113 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2114 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2115 }
2116
2117 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2118 MI.eraseFromParent();
2119
2120 return true;
2121}
2122
2123// Break s_mul_u64 into 32-bit vector operations.
2125 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2126 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2127 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2128 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2129
2130 // All inputs are SGPRs, nothing special to do.
2131 if (DefRegs.empty()) {
2132 assert(Src0Regs.empty() && Src1Regs.empty());
2133 applyDefaultMapping(OpdMapper);
2134 return;
2135 }
2136
2137 assert(DefRegs.size() == 2);
2138 assert(Src0Regs.size() == Src1Regs.size() &&
2139 (Src0Regs.empty() || Src0Regs.size() == 2));
2140
2141 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2142 MachineInstr &MI = OpdMapper.getMI();
2143 Register DstReg = MI.getOperand(0).getReg();
2144 LLT HalfTy = LLT::scalar(32);
2145
2146 // Depending on where the source registers came from, the generic code may
2147 // have decided to split the inputs already or not. If not, we still need to
2148 // extract the values.
2149
2150 if (Src0Regs.empty())
2151 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2152 else
2153 setRegsToType(MRI, Src0Regs, HalfTy);
2154
2155 if (Src1Regs.empty())
2156 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2157 else
2158 setRegsToType(MRI, Src1Regs, HalfTy);
2159
2160 setRegsToType(MRI, DefRegs, HalfTy);
2161
2162 // The multiplication is done as follows:
2163 //
2164 // Op1H Op1L
2165 // * Op0H Op0L
2166 // --------------------
2167 // Op1H*Op0L Op1L*Op0L
2168 // + Op1H*Op0H Op1L*Op0H
2169 // -----------------------------------------
2170 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2171 //
2172 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2173 // value and that would overflow.
2174 // The low 32-bit value is Op1L*Op0L.
2175 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2176 // Op1L*Op0L).
2177
2178 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2179
2180 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2181 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2182 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2183 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2184 B.buildAdd(DefRegs[1], Add, MulHiLo);
2185 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2186
2187 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2188 MI.eraseFromParent();
2189}
2190
2192 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2193 MachineInstr &MI = OpdMapper.getMI();
2194 B.setInstrAndDebugLoc(MI);
2195 unsigned Opc = MI.getOpcode();
2196 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2197 switch (Opc) {
2198 case AMDGPU::G_CONSTANT:
2199 case AMDGPU::G_IMPLICIT_DEF: {
2200 Register DstReg = MI.getOperand(0).getReg();
2201 LLT DstTy = MRI.getType(DstReg);
2202 if (DstTy != LLT::scalar(1))
2203 break;
2204
2205 const RegisterBank *DstBank =
2206 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2207 if (DstBank == &AMDGPU::VCCRegBank)
2208 break;
2209 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2210 if (DefRegs.empty())
2211 DefRegs.push_back(DstReg);
2212
2213 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2214
2215 Register NewDstReg = MRI.createGenericVirtualRegister(LLT::scalar(32));
2216 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2217
2218 MI.getOperand(0).setReg(NewDstReg);
2219 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2220 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2221 MI.getOperand(1).setCImm(
2222 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2223 }
2224
2225 MRI.setRegBank(NewDstReg, *DstBank);
2226 B.buildTrunc(DefRegs[0], NewDstReg);
2227 return;
2228 }
2229 case AMDGPU::G_PHI: {
2230 Register DstReg = MI.getOperand(0).getReg();
2231 LLT DstTy = MRI.getType(DstReg);
2232 if (DstTy != LLT::scalar(1))
2233 break;
2234
2235 const LLT S32 = LLT::scalar(32);
2236 const RegisterBank *DstBank =
2237 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2238 if (DstBank == &AMDGPU::VCCRegBank) {
2239 applyDefaultMapping(OpdMapper);
2240 // The standard handling only considers the result register bank for
2241 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2242 // produce an invalid copy. We can only copy with some kind of compare to
2243 // get a vector boolean result. Insert a register bank copy that will be
2244 // correctly lowered to a compare.
2245 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2246 Register SrcReg = MI.getOperand(I).getReg();
2247 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2248
2249 if (SrcBank != &AMDGPU::VCCRegBank) {
2250 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2251 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2252
2253 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2254 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2255 MI.getOperand(I).setReg(Copy.getReg(0));
2256 }
2257 }
2258
2259 return;
2260 }
2261
2262 // Phi handling is strange and only considers the bank of the destination.
2263 substituteSimpleCopyRegs(OpdMapper, 0);
2264
2265 // Promote SGPR/VGPR booleans to s32
2266 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2267 B.setInsertPt(B.getMBB(), MI);
2268 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2269
2270 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2271 llvm_unreachable("widen scalar should have succeeded");
2272
2273 return;
2274 }
2275 case AMDGPU::G_FCMP:
2276 if (!Subtarget.hasSALUFloatInsts())
2277 break;
2278 [[fallthrough]];
2279 case AMDGPU::G_ICMP:
2280 case AMDGPU::G_UADDO:
2281 case AMDGPU::G_USUBO:
2282 case AMDGPU::G_UADDE:
2283 case AMDGPU::G_SADDE:
2284 case AMDGPU::G_USUBE:
2285 case AMDGPU::G_SSUBE: {
2286 unsigned BoolDstOp =
2287 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2288 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2289
2290 const RegisterBank *DstBank =
2291 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2292 if (DstBank != &AMDGPU::SGPRRegBank)
2293 break;
2294
2295 const bool HasCarryIn = MI.getNumOperands() == 5;
2296
2297 // If this is a scalar compare, promote the result to s32, as the selection
2298 // will end up using a copy to a 32-bit vreg.
2299 const LLT S32 = LLT::scalar(32);
2300 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2301 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2302 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2303
2304 if (HasCarryIn) {
2305 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2306 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2307 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2308 MI.getOperand(4).setReg(NewSrcReg);
2309 }
2310
2311 MachineBasicBlock *MBB = MI.getParent();
2312 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2313
2314 // If we had a constrained VCC result register, a copy was inserted to VCC
2315 // from SGPR.
2316 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2317 if (DefRegs.empty())
2318 DefRegs.push_back(DstReg);
2319 B.buildTrunc(DefRegs[0], NewDstReg);
2320 return;
2321 }
2322 case AMDGPU::G_SELECT: {
2323 Register DstReg = MI.getOperand(0).getReg();
2324 LLT DstTy = MRI.getType(DstReg);
2325
2326 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2327 if (CondRegs.empty())
2328 CondRegs.push_back(MI.getOperand(1).getReg());
2329 else {
2330 assert(CondRegs.size() == 1);
2331 }
2332
2333 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2334 if (CondBank == &AMDGPU::SGPRRegBank) {
2335 const LLT S32 = LLT::scalar(32);
2336 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2337 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2338
2339 MI.getOperand(1).setReg(NewCondReg);
2340 B.buildZExt(NewCondReg, CondRegs[0]);
2341 }
2342
2343 if (DstTy.getSizeInBits() != 64)
2344 break;
2345
2346 LLT HalfTy = getHalfSizedType(DstTy);
2347
2348 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2349 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2350 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2351
2352 // All inputs are SGPRs, nothing special to do.
2353 if (DefRegs.empty()) {
2354 assert(Src1Regs.empty() && Src2Regs.empty());
2355 break;
2356 }
2357
2358 if (Src1Regs.empty())
2359 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2360 else {
2361 setRegsToType(MRI, Src1Regs, HalfTy);
2362 }
2363
2364 if (Src2Regs.empty())
2365 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2366 else
2367 setRegsToType(MRI, Src2Regs, HalfTy);
2368
2369 setRegsToType(MRI, DefRegs, HalfTy);
2370
2371 auto Flags = MI.getFlags();
2372 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2373 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2374
2375 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2376 MI.eraseFromParent();
2377 return;
2378 }
2379 case AMDGPU::G_BRCOND: {
2380 Register CondReg = MI.getOperand(0).getReg();
2381 // FIXME: Should use legalizer helper, but should change bool ext type.
2382 const RegisterBank *CondBank =
2383 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2384
2385 if (CondBank == &AMDGPU::SGPRRegBank) {
2386 const LLT S32 = LLT::scalar(32);
2387 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2388 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2389
2390 MI.getOperand(0).setReg(NewCondReg);
2391 B.buildZExt(NewCondReg, CondReg);
2392 return;
2393 }
2394
2395 break;
2396 }
2397 case AMDGPU::G_AND:
2398 case AMDGPU::G_OR:
2399 case AMDGPU::G_XOR: {
2400 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2401 // there is a VGPR input.
2402 Register DstReg = MI.getOperand(0).getReg();
2403 LLT DstTy = MRI.getType(DstReg);
2404
2405 const RegisterBank *DstBank =
2406 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2407
2408 if (DstTy.getSizeInBits() == 1) {
2409 if (DstBank == &AMDGPU::VCCRegBank)
2410 break;
2411
2412 MachineFunction *MF = MI.getMF();
2413 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2414 LegalizerHelper Helper(*MF, ApplyBank, B);
2415
2416 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2418 llvm_unreachable("widen scalar should have succeeded");
2419 return;
2420 }
2421
2422 if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
2423 const LLT S32 = LLT::scalar(32);
2424 MachineBasicBlock *MBB = MI.getParent();
2425 MachineFunction *MF = MBB->getParent();
2426 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2427 LegalizerHelper Helper(*MF, ApplySALU, B);
2428 // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
2429 // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
2430 // as "not".
2431 if (MI.getOpcode() == AMDGPU::G_XOR &&
2432 mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) {
2433 Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
2434 Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
2435 Helper.widenScalarDst(MI, S32);
2436 } else {
2437 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2438 llvm_unreachable("widen scalar should have succeeded");
2439 }
2440 return;
2441 }
2442
2443 if (DstTy.getSizeInBits() != 64)
2444 break;
2445
2446 LLT HalfTy = getHalfSizedType(DstTy);
2447 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2448 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2449 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2450
2451 // All inputs are SGPRs, nothing special to do.
2452 if (DefRegs.empty()) {
2453 assert(Src0Regs.empty() && Src1Regs.empty());
2454 break;
2455 }
2456
2457 assert(DefRegs.size() == 2);
2458 assert(Src0Regs.size() == Src1Regs.size() &&
2459 (Src0Regs.empty() || Src0Regs.size() == 2));
2460
2461 // Depending on where the source registers came from, the generic code may
2462 // have decided to split the inputs already or not. If not, we still need to
2463 // extract the values.
2464
2465 if (Src0Regs.empty())
2466 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2467 else
2468 setRegsToType(MRI, Src0Regs, HalfTy);
2469
2470 if (Src1Regs.empty())
2471 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2472 else
2473 setRegsToType(MRI, Src1Regs, HalfTy);
2474
2475 setRegsToType(MRI, DefRegs, HalfTy);
2476
2477 auto Flags = MI.getFlags();
2478 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2479 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2480
2481 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2482 MI.eraseFromParent();
2483 return;
2484 }
2485 case AMDGPU::G_ABS: {
2486 Register SrcReg = MI.getOperand(1).getReg();
2487 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2488
2489 // There is no VALU abs instruction so we need to replace it with a sub and
2490 // max combination.
2491 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2492 MachineFunction *MF = MI.getMF();
2493 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2494 LegalizerHelper Helper(*MF, Apply, B);
2495
2497 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2498 return;
2499 }
2500 [[fallthrough]];
2501 }
2502 case AMDGPU::G_ADD:
2503 case AMDGPU::G_SUB:
2504 case AMDGPU::G_MUL:
2505 case AMDGPU::G_SHL:
2506 case AMDGPU::G_LSHR:
2507 case AMDGPU::G_ASHR:
2508 case AMDGPU::G_SMIN:
2509 case AMDGPU::G_SMAX:
2510 case AMDGPU::G_UMIN:
2511 case AMDGPU::G_UMAX: {
2512 Register DstReg = MI.getOperand(0).getReg();
2513 LLT DstTy = MRI.getType(DstReg);
2514
2515 // Special case for s_mul_u64. There is not a vector equivalent of
2516 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2517 // multiplications.
2518 if (!Subtarget.hasVectorMulU64() && Opc == AMDGPU::G_MUL &&
2519 DstTy.getSizeInBits() == 64) {
2520 applyMappingSMULU64(B, OpdMapper);
2521 return;
2522 }
2523
2524 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2525 // Packed 16-bit operations need to be scalarized and promoted.
2526 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2527 break;
2528
2529 const RegisterBank *DstBank =
2530 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2531 if (DstBank == &AMDGPU::VGPRRegBank)
2532 break;
2533
2534 const LLT S32 = LLT::scalar(32);
2535 MachineBasicBlock *MBB = MI.getParent();
2536 MachineFunction *MF = MBB->getParent();
2537 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2538
2539 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2540 Register WideSrcLo, WideSrcHi;
2541
2542 std::tie(WideSrcLo, WideSrcHi) =
2543 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2544 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2545 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2546 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2547 MI.eraseFromParent();
2548 return;
2549 }
2550
2551 if (DstTy.isVector()) {
2552 Register WideSrc0Lo, WideSrc0Hi;
2553 Register WideSrc1Lo, WideSrc1Hi;
2554
2555 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2556 std::tie(WideSrc0Lo, WideSrc0Hi)
2557 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2558 std::tie(WideSrc1Lo, WideSrc1Hi)
2559 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2560 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2561 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2562 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2563 MI.eraseFromParent();
2564 } else {
2565 LegalizerHelper Helper(*MF, ApplySALU, B);
2566
2567 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2568 llvm_unreachable("widen scalar should have succeeded");
2569
2570 // FIXME: s16 shift amounts should be legal.
2571 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2572 Opc == AMDGPU::G_ASHR) {
2573 B.setInsertPt(*MBB, MI.getIterator());
2574 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2575 llvm_unreachable("widen scalar should have succeeded");
2576 }
2577 }
2578
2579 return;
2580 }
2581 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2582 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2583 // This is a special case for s_mul_u64. We use
2584 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2585 // where the 33 higher bits are sign-extended and
2586 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2587 // where the 32 higher bits are zero-extended. In case scalar registers are
2588 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2589 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2590 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2591
2592 // Insert basic copies.
2593 applyDefaultMapping(OpdMapper);
2594
2595 Register DstReg = MI.getOperand(0).getReg();
2596 Register SrcReg0 = MI.getOperand(1).getReg();
2597 Register SrcReg1 = MI.getOperand(2).getReg();
2598 const LLT S32 = LLT::scalar(32);
2599 const LLT S64 = LLT::scalar(64);
2600 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2601 "that handles only 64-bit operands.");
2602 const RegisterBank *DstBank =
2603 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2604
2605 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2606 // with s_mul_u64 operation.
2607 if (DstBank == &AMDGPU::SGPRRegBank) {
2608 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2609 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2610 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2611 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2612 return;
2613 }
2614
2615 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2616 // with a vector mad.
2617 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2618 "The destination operand should be in vector registers.");
2619
2620 // Extract the lower subregister from the first operand.
2621 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2622 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2623 MRI.setType(Op0L, S32);
2624 B.buildTrunc(Op0L, SrcReg0);
2625
2626 // Extract the lower subregister from the second operand.
2627 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2628 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2629 MRI.setType(Op1L, S32);
2630 B.buildTrunc(Op1L, SrcReg1);
2631
2632 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2633 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2634 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2635
2637 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2638 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2639 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2640 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2641 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2642 MI.eraseFromParent();
2643 return;
2644 }
2645 case AMDGPU::G_SEXT_INREG: {
2646 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2647 if (SrcRegs.empty())
2648 break; // Nothing to repair
2649
2650 const LLT S32 = LLT::scalar(32);
2651 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2652
2653 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2654 // we would need to further expand, and doesn't let us directly set the
2655 // result registers.
2656 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2657
2658 int Amt = MI.getOperand(2).getImm();
2659 if (Amt <= 32) {
2660 // Downstream users have expectations for the high bit behavior, so freeze
2661 // incoming undefined bits.
2662 if (Amt == 32) {
2663 // The low bits are unchanged.
2664 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2665 } else {
2666 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2667 // Extend in the low bits and propagate the sign bit to the high half.
2668 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2669 }
2670
2671 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2672 } else {
2673 // The low bits are unchanged, and extend in the high bits.
2674 // No freeze required
2675 B.buildCopy(DstRegs[0], SrcRegs[0]);
2676 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2677 }
2678
2679 Register DstReg = MI.getOperand(0).getReg();
2680 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2681 MI.eraseFromParent();
2682 return;
2683 }
2684 case AMDGPU::G_CTPOP:
2685 case AMDGPU::G_BITREVERSE: {
2686 const RegisterBank *DstBank =
2687 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2688 if (DstBank == &AMDGPU::SGPRRegBank)
2689 break;
2690
2691 Register SrcReg = MI.getOperand(1).getReg();
2692 const LLT S32 = LLT::scalar(32);
2693 LLT Ty = MRI.getType(SrcReg);
2694 if (Ty == S32)
2695 break;
2696
2697 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2698
2699 MachineFunction &MF = B.getMF();
2700 LegalizerHelper Helper(MF, ApplyVALU, B);
2701
2702 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2703 llvm_unreachable("narrowScalar should have succeeded");
2704 return;
2705 }
2706 case AMDGPU::G_AMDGPU_FFBH_U32:
2707 case AMDGPU::G_AMDGPU_FFBL_B32:
2708 case AMDGPU::G_CTLZ_ZERO_UNDEF:
2709 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2710 const RegisterBank *DstBank =
2711 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2712 if (DstBank == &AMDGPU::SGPRRegBank)
2713 break;
2714
2715 Register SrcReg = MI.getOperand(1).getReg();
2716 const LLT S32 = LLT::scalar(32);
2717 LLT Ty = MRI.getType(SrcReg);
2718 if (Ty == S32)
2719 break;
2720
2721 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2722 // which return -1 when the input is zero:
2723 // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2724 // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2725 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2726 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2727 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2728 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2729 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2730 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2731 : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2732 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2733 : Opc;
2734 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2735 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2736 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2737 unsigned AddOpc =
2738 Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2739 ? AMDGPU::G_ADD
2740 : AMDGPU::G_UADDSAT;
2741 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2742 Register DstReg = MI.getOperand(0).getReg();
2743 B.buildUMin(DstReg, X, Y);
2744 MI.eraseFromParent();
2745 return;
2746 }
2747 case AMDGPU::G_SEXT:
2748 case AMDGPU::G_ZEXT:
2749 case AMDGPU::G_ANYEXT: {
2750 Register SrcReg = MI.getOperand(1).getReg();
2751 LLT SrcTy = MRI.getType(SrcReg);
2752 const bool Signed = Opc == AMDGPU::G_SEXT;
2753
2754 assert(OpdMapper.getVRegs(1).empty());
2755
2756 const RegisterBank *SrcBank =
2757 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2758
2759 Register DstReg = MI.getOperand(0).getReg();
2760 LLT DstTy = MRI.getType(DstReg);
2761 if (DstTy.isScalar() &&
2762 SrcBank != &AMDGPU::SGPRRegBank &&
2763 SrcBank != &AMDGPU::VCCRegBank &&
2764 // FIXME: Should handle any type that round to s64 when irregular
2765 // breakdowns supported.
2766 DstTy.getSizeInBits() == 64 &&
2767 SrcTy.getSizeInBits() <= 32) {
2768 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2769
2770 // Extend to 32-bit, and then extend the low half.
2771 if (Signed) {
2772 // TODO: Should really be buildSExtOrCopy
2773 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2774 } else if (Opc == AMDGPU::G_ZEXT) {
2775 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2776 } else {
2777 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2778 }
2779
2780 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2781 MRI.setRegBank(DstReg, *SrcBank);
2782 MI.eraseFromParent();
2783 return;
2784 }
2785
2786 if (SrcTy != LLT::scalar(1))
2787 return;
2788
2789 // It is not legal to have a legalization artifact with a VCC source. Rather
2790 // than introducing a copy, insert the select we would have to select the
2791 // copy to.
2792 if (SrcBank == &AMDGPU::VCCRegBank) {
2793 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2794
2795 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2796
2797 unsigned DstSize = DstTy.getSizeInBits();
2798 // 64-bit select is SGPR only
2799 const bool UseSel64 = DstSize > 32 &&
2800 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2801
2802 // TODO: Should s16 select be legal?
2803 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2804 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2805 auto False = B.buildConstant(SelType, 0);
2806
2807 MRI.setRegBank(True.getReg(0), *DstBank);
2808 MRI.setRegBank(False.getReg(0), *DstBank);
2809 MRI.setRegBank(DstReg, *DstBank);
2810
2811 if (DstSize > 32) {
2812 B.buildSelect(DefRegs[0], SrcReg, True, False);
2813 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2814 } else if (DstSize < 32) {
2815 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2816 MRI.setRegBank(Sel.getReg(0), *DstBank);
2817 B.buildTrunc(DstReg, Sel);
2818 } else {
2819 B.buildSelect(DstReg, SrcReg, True, False);
2820 }
2821
2822 MI.eraseFromParent();
2823 return;
2824 }
2825
2826 break;
2827 }
2828 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2829 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2830
2831 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2832
2833 Register DstReg = MI.getOperand(0).getReg();
2834 Register SrcReg = MI.getOperand(1).getReg();
2835
2836 const LLT S32 = LLT::scalar(32);
2837 LLT DstTy = MRI.getType(DstReg);
2838 LLT SrcTy = MRI.getType(SrcReg);
2839
2840 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2841 return;
2842
2843 const ValueMapping &DstMapping
2844 = OpdMapper.getInstrMapping().getOperandMapping(0);
2845 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2846 const RegisterBank *SrcBank =
2847 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2848 const RegisterBank *IdxBank =
2849 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2850
2851 Register BaseIdxReg;
2852 unsigned ConstOffset;
2853 std::tie(BaseIdxReg, ConstOffset) =
2854 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2855
2856 // See if the index is an add of a constant which will be foldable by moving
2857 // the base register of the index later if this is going to be executed in a
2858 // waterfall loop. This is essentially to reassociate the add of a constant
2859 // with the readfirstlane.
2860 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2861 ConstOffset > 0 &&
2862 ConstOffset < SrcTy.getNumElements();
2863
2864 // Move the base register. We'll re-insert the add later.
2865 if (ShouldMoveIndexIntoLoop)
2866 MI.getOperand(2).setReg(BaseIdxReg);
2867
2868 // If this is a VGPR result only because the index was a VGPR result, the
2869 // actual indexing will be done on the SGPR source vector, which will
2870 // produce a scalar result. We need to copy to the VGPR result inside the
2871 // waterfall loop.
2872 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2873 SrcBank == &AMDGPU::SGPRRegBank;
2874 if (DstRegs.empty()) {
2875 applyDefaultMapping(OpdMapper);
2876
2878
2879 if (NeedCopyToVGPR) {
2880 // We don't want a phi for this temporary reg.
2881 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2882 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2883 MI.getOperand(0).setReg(TmpReg);
2884 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2885
2886 // Use a v_mov_b32 here to make the exec dependency explicit.
2887 buildVCopy(B, DstReg, TmpReg);
2888 }
2889
2890 // Re-insert the constant offset add inside the waterfall loop.
2891 if (ShouldMoveIndexIntoLoop)
2892 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2893
2894 return;
2895 }
2896
2897 assert(DstTy.getSizeInBits() == 64);
2898
2899 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2900
2901 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2902 auto One = B.buildConstant(S32, 1);
2903
2904 MachineBasicBlock::iterator MII = MI.getIterator();
2905
2906 // Split the vector index into 32-bit pieces. Prepare to move all of the
2907 // new instructions into a waterfall loop if necessary.
2908 //
2909 // Don't put the bitcast or constant in the loop.
2910 MachineInstrSpan Span(MII, &B.getMBB());
2911
2912 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2913 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2914 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2915
2916 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2917 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2918
2919 MRI.setRegBank(DstReg, *DstBank);
2920 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2921 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2922 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2923 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2924
2925 SmallSet<Register, 4> OpsToWaterfall;
2926 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2927 MI.eraseFromParent();
2928 return;
2929 }
2930
2931 // Remove the original instruction to avoid potentially confusing the
2932 // waterfall loop logic.
2933 B.setInstr(*Span.begin());
2934 MI.eraseFromParent();
2935 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2936 OpsToWaterfall);
2937
2938 if (NeedCopyToVGPR) {
2939 MachineBasicBlock *LoopBB = Extract1->getParent();
2940 Register TmpReg0 = MRI.createGenericVirtualRegister(S32);
2941 Register TmpReg1 = MRI.createGenericVirtualRegister(S32);
2942 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2943 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2944
2945 Extract0->getOperand(0).setReg(TmpReg0);
2946 Extract1->getOperand(0).setReg(TmpReg1);
2947
2948 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2949
2950 buildVCopy(B, DstRegs[0], TmpReg0);
2951 buildVCopy(B, DstRegs[1], TmpReg1);
2952 }
2953
2954 if (ShouldMoveIndexIntoLoop)
2955 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2956
2957 return;
2958 }
2959 case AMDGPU::G_INSERT_VECTOR_ELT: {
2960 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2961
2962 Register DstReg = MI.getOperand(0).getReg();
2963 LLT VecTy = MRI.getType(DstReg);
2964
2965 assert(OpdMapper.getVRegs(0).empty());
2966 assert(OpdMapper.getVRegs(3).empty());
2967
2968 if (substituteSimpleCopyRegs(OpdMapper, 1))
2969 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2970
2971 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2972 return;
2973
2974 const RegisterBank *IdxBank =
2975 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2976
2977 Register SrcReg = MI.getOperand(1).getReg();
2978 Register InsReg = MI.getOperand(2).getReg();
2979 LLT InsTy = MRI.getType(InsReg);
2980 (void)InsTy;
2981
2982 Register BaseIdxReg;
2983 unsigned ConstOffset;
2984 std::tie(BaseIdxReg, ConstOffset) =
2985 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2986
2987 // See if the index is an add of a constant which will be foldable by moving
2988 // the base register of the index later if this is going to be executed in a
2989 // waterfall loop. This is essentially to reassociate the add of a constant
2990 // with the readfirstlane.
2991 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2992 ConstOffset > 0 &&
2993 ConstOffset < VecTy.getNumElements();
2994
2995 // Move the base register. We'll re-insert the add later.
2996 if (ShouldMoveIndexIntoLoop)
2997 MI.getOperand(3).setReg(BaseIdxReg);
2998
2999
3000 if (InsRegs.empty()) {
3002
3003 // Re-insert the constant offset add inside the waterfall loop.
3004 if (ShouldMoveIndexIntoLoop) {
3005 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
3006 }
3007
3008 return;
3009 }
3010
3011 assert(InsTy.getSizeInBits() == 64);
3012
3013 const LLT S32 = LLT::scalar(32);
3014 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3015
3016 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3017 auto One = B.buildConstant(S32, 1);
3018
3019 // Split the vector index into 32-bit pieces. Prepare to move all of the
3020 // new instructions into a waterfall loop if necessary.
3021 //
3022 // Don't put the bitcast or constant in the loop.
3024
3025 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3026 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3027 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3028
3029 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3030 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3031
3032 const RegisterBank *DstBank =
3033 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3034 const RegisterBank *SrcBank =
3035 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3036 const RegisterBank *InsSrcBank =
3037 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3038
3039 MRI.setRegBank(InsReg, *InsSrcBank);
3040 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3041 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3042 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3043 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3044 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3045 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3046
3047
3048 SmallSet<Register, 4> OpsToWaterfall;
3049 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3050 B.setInsertPt(B.getMBB(), MI);
3051 B.buildBitcast(DstReg, InsHi);
3052 MI.eraseFromParent();
3053 return;
3054 }
3055
3056 B.setInstr(*Span.begin());
3057 MI.eraseFromParent();
3058
3059 // Figure out the point after the waterfall loop before mangling the control
3060 // flow.
3061 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3062 OpsToWaterfall);
3063
3064 // The insertion point is now right after the original instruction.
3065 //
3066 // Keep the bitcast to the original vector type out of the loop. Doing this
3067 // saved an extra phi we don't need inside the loop.
3068 B.buildBitcast(DstReg, InsHi);
3069
3070 // Re-insert the constant offset add inside the waterfall loop.
3071 if (ShouldMoveIndexIntoLoop)
3072 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3073
3074 return;
3075 }
3076 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3077 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3078 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3079 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3080 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3081 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3082 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3083 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3084 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3085 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3086 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3087 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3088 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3089 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3090 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3091 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3092 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3093 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3094 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3095 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3096 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3097 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3098 applyDefaultMapping(OpdMapper);
3099 executeInWaterfallLoop(B, MI, {1, 4});
3100 return;
3101 }
3102 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3103 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3104 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3105 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3106 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3107 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3108 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3109 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3110 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3111 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3112 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3113 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3114 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3115 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3116 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3117 applyDefaultMapping(OpdMapper);
3118 executeInWaterfallLoop(B, MI, {2, 5});
3119 return;
3120 }
3121 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3122 applyDefaultMapping(OpdMapper);
3123 executeInWaterfallLoop(B, MI, {3, 6});
3124 return;
3125 }
3126 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3127 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3128 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3129 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3130 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3131 applyMappingSBufferLoad(B, OpdMapper);
3132 return;
3133 }
3134 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3137 return;
3138 case AMDGPU::G_INTRINSIC:
3139 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3140 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3141 case Intrinsic::amdgcn_readlane: {
3142 substituteSimpleCopyRegs(OpdMapper, 2);
3143
3144 assert(OpdMapper.getVRegs(0).empty());
3145 assert(OpdMapper.getVRegs(3).empty());
3146
3147 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3148 // waterfall loop, so assume it's a uniform value.
3149 constrainOpWithReadfirstlane(B, MI, 3); // Index
3150 return;
3151 }
3152 case Intrinsic::amdgcn_writelane: {
3153 assert(OpdMapper.getVRegs(0).empty());
3154 assert(OpdMapper.getVRegs(2).empty());
3155 assert(OpdMapper.getVRegs(3).empty());
3156
3157 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3158 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3159 constrainOpWithReadfirstlane(B, MI, 3); // Index
3160 return;
3161 }
3162 case Intrinsic::amdgcn_interp_p1:
3163 case Intrinsic::amdgcn_interp_p2:
3164 case Intrinsic::amdgcn_interp_mov:
3165 case Intrinsic::amdgcn_interp_p1_f16:
3166 case Intrinsic::amdgcn_interp_p2_f16:
3167 case Intrinsic::amdgcn_lds_param_load: {
3168 applyDefaultMapping(OpdMapper);
3169
3170 // Readlane for m0 value, which is always the last operand.
3171 // FIXME: Should this be a waterfall loop instead?
3172 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3173 return;
3174 }
3175 case Intrinsic::amdgcn_interp_inreg_p10:
3176 case Intrinsic::amdgcn_interp_inreg_p2:
3177 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3178 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3179 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3180 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3181 case Intrinsic::amdgcn_permlane16_swap:
3182 case Intrinsic::amdgcn_permlane32_swap:
3183 applyDefaultMapping(OpdMapper);
3184 return;
3185 case Intrinsic::amdgcn_permlane16:
3186 case Intrinsic::amdgcn_permlanex16: {
3187 // Doing a waterfall loop over these wouldn't make any sense.
3188 substituteSimpleCopyRegs(OpdMapper, 2);
3189 substituteSimpleCopyRegs(OpdMapper, 3);
3192 return;
3193 }
3194 case Intrinsic::amdgcn_permlane_bcast:
3195 case Intrinsic::amdgcn_permlane_up:
3196 case Intrinsic::amdgcn_permlane_down:
3197 case Intrinsic::amdgcn_permlane_xor:
3198 // Doing a waterfall loop over these wouldn't make any sense.
3201 return;
3202 case Intrinsic::amdgcn_permlane_idx_gen: {
3204 return;
3205 }
3206 case Intrinsic::amdgcn_sbfe:
3207 applyMappingBFE(B, OpdMapper, true);
3208 return;
3209 case Intrinsic::amdgcn_ubfe:
3210 applyMappingBFE(B, OpdMapper, false);
3211 return;
3212 case Intrinsic::amdgcn_inverse_ballot:
3213 case Intrinsic::amdgcn_s_bitreplicate:
3214 case Intrinsic::amdgcn_s_quadmask:
3215 case Intrinsic::amdgcn_s_wqm:
3216 applyDefaultMapping(OpdMapper);
3217 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3218 return;
3219 case Intrinsic::amdgcn_ballot:
3220 // Use default handling and insert copy to vcc source.
3221 break;
3222 }
3223 break;
3224 }
3225 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3226 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3227 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3228 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3229 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3230 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3232 assert(RSrcIntrin && RSrcIntrin->IsImage);
3233 // Non-images can have complications from operands that allow both SGPR
3234 // and VGPR. For now it's too complicated to figure out the final opcode
3235 // to derive the register bank from the MCInstrDesc.
3236 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3237 return;
3238 }
3239 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
3240 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
3241 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
3242 bool IsDualOrBVH8 =
3243 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
3244 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
3245 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
3246 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
3247 applyDefaultMapping(OpdMapper);
3248 executeInWaterfallLoop(B, MI, {LastRegOpIdx});
3249 return;
3250 }
3251 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3252 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3253 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3254 switch (IntrID) {
3255 case Intrinsic::amdgcn_ds_ordered_add:
3256 case Intrinsic::amdgcn_ds_ordered_swap: {
3257 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3258 assert(OpdMapper.getVRegs(0).empty());
3259 substituteSimpleCopyRegs(OpdMapper, 3);
3261 return;
3262 }
3263 case Intrinsic::amdgcn_ds_gws_init:
3264 case Intrinsic::amdgcn_ds_gws_barrier:
3265 case Intrinsic::amdgcn_ds_gws_sema_br: {
3266 // Only the first lane is executes, so readfirstlane is safe.
3267 substituteSimpleCopyRegs(OpdMapper, 1);
3269 return;
3270 }
3271 case Intrinsic::amdgcn_ds_gws_sema_v:
3272 case Intrinsic::amdgcn_ds_gws_sema_p:
3273 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3274 // Only the first lane is executes, so readfirstlane is safe.
3276 return;
3277 }
3278 case Intrinsic::amdgcn_ds_append:
3279 case Intrinsic::amdgcn_ds_consume: {
3281 return;
3282 }
3283 case Intrinsic::amdgcn_s_sendmsg:
3284 case Intrinsic::amdgcn_s_sendmsghalt: {
3285 // FIXME: Should this use a waterfall loop?
3287 return;
3288 }
3289 case Intrinsic::amdgcn_s_setreg: {
3291 return;
3292 }
3293 case Intrinsic::amdgcn_s_ttracedata:
3295 return;
3296 case Intrinsic::amdgcn_raw_buffer_load_lds:
3297 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
3298 applyDefaultMapping(OpdMapper);
3299 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3301 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3302 return;
3303 }
3304 case Intrinsic::amdgcn_struct_buffer_load_lds:
3305 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
3306 applyDefaultMapping(OpdMapper);
3307 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3309 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3310 return;
3311 }
3312 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
3313 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
3314 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
3315 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
3316 applyDefaultMapping(OpdMapper);
3318 return;
3319 }
3320 case Intrinsic::amdgcn_load_to_lds:
3321 case Intrinsic::amdgcn_global_load_lds: {
3322 applyDefaultMapping(OpdMapper);
3324 return;
3325 }
3326 case Intrinsic::amdgcn_lds_direct_load: {
3327 applyDefaultMapping(OpdMapper);
3328 // Readlane for m0 value, which is always the last operand.
3329 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3330 return;
3331 }
3332 case Intrinsic::amdgcn_exp_row:
3333 applyDefaultMapping(OpdMapper);
3335 return;
3336 case Intrinsic::amdgcn_cluster_load_b32:
3337 case Intrinsic::amdgcn_cluster_load_b64:
3338 case Intrinsic::amdgcn_cluster_load_b128: {
3339 applyDefaultMapping(OpdMapper);
3341 return;
3342 }
3343 case Intrinsic::amdgcn_s_sleep_var:
3344 assert(OpdMapper.getVRegs(1).empty());
3346 return;
3347 case Intrinsic::amdgcn_s_barrier_join:
3349 return;
3350 case Intrinsic::amdgcn_s_barrier_init:
3351 case Intrinsic::amdgcn_s_barrier_signal_var:
3354 return;
3355 case Intrinsic::amdgcn_s_get_barrier_state:
3356 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3358 return;
3359 }
3360 case Intrinsic::amdgcn_s_prefetch_data: {
3361 Register PtrReg = MI.getOperand(1).getReg();
3362 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3366 } else
3367 MI.eraseFromParent();
3368 return;
3369 }
3370 case Intrinsic::amdgcn_tensor_load_to_lds:
3371 case Intrinsic::amdgcn_tensor_store_from_lds: {
3376 return;
3377 }
3378 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
3379 case Intrinsic::amdgcn_tensor_store_from_lds_d2: {
3382 return;
3383 }
3384 default: {
3385 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3387 // Non-images can have complications from operands that allow both SGPR
3388 // and VGPR. For now it's too complicated to figure out the final opcode
3389 // to derive the register bank from the MCInstrDesc.
3390 if (RSrcIntrin->IsImage) {
3391 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3392 return;
3393 }
3394 }
3395
3396 break;
3397 }
3398 }
3399 break;
3400 }
3401 case AMDGPU::G_SI_CALL: {
3402 // Use a set to avoid extra readfirstlanes in the case where multiple
3403 // operands are the same register.
3404 SmallSet<Register, 4> SGPROperandRegs;
3405
3406 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3407 break;
3408
3409 // Move all copies to physical SGPRs that are used by the call instruction
3410 // into the loop block. Start searching for these copies until the
3411 // ADJCALLSTACKUP.
3412 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3413 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3414
3415 // Move all non-copies before the copies, so that a complete range can be
3416 // moved into the waterfall loop.
3417 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3418 // Count of NonCopyInstrs found until the current LastCopy.
3419 unsigned NonCopyInstrsLen = 0;
3421 MachineBasicBlock::iterator LastCopy = Start;
3422 MachineBasicBlock *MBB = MI.getParent();
3423 const SIMachineFunctionInfo *Info =
3424 MBB->getParent()->getInfo<SIMachineFunctionInfo>();
3425 while (Start->getOpcode() != FrameSetupOpcode) {
3426 --Start;
3427 bool IsCopy = false;
3428 if (Start->getOpcode() == AMDGPU::COPY) {
3429 auto &Dst = Start->getOperand(0);
3430 if (Dst.isReg()) {
3431 Register Reg = Dst.getReg();
3432 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3433 IsCopy = true;
3434 } else {
3435 // Also move the copy from the scratch rsrc descriptor into the loop
3436 // to allow it to be optimized away.
3437 auto &Src = Start->getOperand(1);
3438 if (Src.isReg()) {
3439 Reg = Src.getReg();
3440 IsCopy = Info->getScratchRSrcReg() == Reg;
3441 }
3442 }
3443 }
3444 }
3445
3446 if (IsCopy) {
3447 LastCopy = Start;
3448 NonCopyInstrsLen = NonCopyInstrs.size();
3449 } else {
3450 NonCopyInstrs.push_back(&*Start);
3451 }
3452 }
3453 NonCopyInstrs.resize(NonCopyInstrsLen);
3454
3455 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3456 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3457 }
3458 Start = LastCopy;
3459
3460 // Do the same for copies after the loop
3461 NonCopyInstrs.clear();
3462 NonCopyInstrsLen = 0;
3464 LastCopy = End;
3465 while (End->getOpcode() != FrameDestroyOpcode) {
3466 ++End;
3467 bool IsCopy = false;
3468 if (End->getOpcode() == AMDGPU::COPY) {
3469 auto &Src = End->getOperand(1);
3470 if (Src.isReg()) {
3471 Register Reg = Src.getReg();
3472 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3473 }
3474 }
3475
3476 if (IsCopy) {
3477 LastCopy = End;
3478 NonCopyInstrsLen = NonCopyInstrs.size();
3479 } else {
3480 NonCopyInstrs.push_back(&*End);
3481 }
3482 }
3483 NonCopyInstrs.resize(NonCopyInstrsLen);
3484
3485 End = LastCopy;
3486 ++LastCopy;
3487 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3488 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3489 }
3490
3491 ++End;
3492 B.setInsertPt(B.getMBB(), Start);
3493 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3494 break;
3495 }
3496 case AMDGPU::G_LOAD:
3497 case AMDGPU::G_ZEXTLOAD:
3498 case AMDGPU::G_SEXTLOAD: {
3499 if (applyMappingLoad(B, OpdMapper, MI))
3500 return;
3501 break;
3502 }
3503 case AMDGPU::G_DYN_STACKALLOC:
3504 applyMappingDynStackAlloc(B, OpdMapper, MI);
3505 return;
3506 case AMDGPU::G_STACKRESTORE: {
3507 applyDefaultMapping(OpdMapper);
3509 return;
3510 }
3511 case AMDGPU::G_SBFX:
3512 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3513 return;
3514 case AMDGPU::G_UBFX:
3515 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3516 return;
3517 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3518 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3519 applyMappingMAD_64_32(B, OpdMapper);
3520 return;
3521 case AMDGPU::G_PREFETCH: {
3522 if (!Subtarget.hasSafeSmemPrefetch() && !Subtarget.hasVmemPrefInsts()) {
3523 MI.eraseFromParent();
3524 return;
3525 }
3526 Register PtrReg = MI.getOperand(0).getReg();
3527 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3528 if (PtrBank == AMDGPU::VGPRRegBankID &&
3529 (!Subtarget.hasVmemPrefInsts() || !MI.getOperand(3).getImm())) {
3530 // Cannot do I$ prefetch with divergent pointer.
3531 MI.eraseFromParent();
3532 return;
3533 }
3534 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3537 (!Subtarget.hasSafeSmemPrefetch() &&
3539 !MI.getOperand(3).getImm() /* I$ prefetch */))) {
3540 MI.eraseFromParent();
3541 return;
3542 }
3543 applyDefaultMapping(OpdMapper);
3544 return;
3545 }
3546 default:
3547 break;
3548 }
3549
3550 return applyDefaultMapping(OpdMapper);
3551}
3552
3553// vgpr, sgpr -> vgpr
3554// vgpr, agpr -> vgpr
3555// agpr, agpr -> agpr
3556// agpr, sgpr -> vgpr
3557static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3558 if (RB0 == AMDGPU::InvalidRegBankID)
3559 return RB1;
3560 if (RB1 == AMDGPU::InvalidRegBankID)
3561 return RB0;
3562
3563 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3564 return AMDGPU::SGPRRegBankID;
3565
3566 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3567 return AMDGPU::AGPRRegBankID;
3568
3569 return AMDGPU::VGPRRegBankID;
3570}
3571
3572static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3573 if (RB0 == AMDGPU::InvalidRegBankID)
3574 return RB1;
3575 if (RB1 == AMDGPU::InvalidRegBankID)
3576 return RB0;
3577
3578 // vcc, vcc -> vcc
3579 // vcc, sgpr -> vcc
3580 // vcc, vgpr -> vcc
3581 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3582 return AMDGPU::VCCRegBankID;
3583
3584 // vcc, vgpr -> vgpr
3585 return regBankUnion(RB0, RB1);
3586}
3587
3589 const MachineInstr &MI) const {
3590 unsigned RegBank = AMDGPU::InvalidRegBankID;
3591
3592 for (const MachineOperand &MO : MI.operands()) {
3593 if (!MO.isReg())
3594 continue;
3595 Register Reg = MO.getReg();
3596 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3597 RegBank = regBankUnion(RegBank, Bank->getID());
3598 if (RegBank == AMDGPU::VGPRRegBankID)
3599 break;
3600 }
3601 }
3602
3603 return RegBank;
3604}
3605
3607 const MachineFunction &MF = *MI.getMF();
3608 const MachineRegisterInfo &MRI = MF.getRegInfo();
3609 for (const MachineOperand &MO : MI.operands()) {
3610 if (!MO.isReg())
3611 continue;
3612 Register Reg = MO.getReg();
3613 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3614 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3615 return false;
3616 }
3617 }
3618 return true;
3619}
3620
3623 const MachineFunction &MF = *MI.getMF();
3624 const MachineRegisterInfo &MRI = MF.getRegInfo();
3625 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3626
3627 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3628 const MachineOperand &SrcOp = MI.getOperand(i);
3629 if (!SrcOp.isReg())
3630 continue;
3631
3632 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3633 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3634 }
3635 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3636 MI.getNumOperands());
3637}
3638
3641 const MachineFunction &MF = *MI.getMF();
3642 const MachineRegisterInfo &MRI = MF.getRegInfo();
3643 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3644
3645 // Even though we technically could use SGPRs, this would require knowledge of
3646 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3647 //
3648 // TODO: Unary ops are trivially OK, so accept SGPRs?
3649 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3650 const MachineOperand &Src = MI.getOperand(i);
3651 if (!Src.isReg())
3652 continue;
3653
3654 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3655 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3656 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3657 }
3658
3659 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3660 MI.getNumOperands());
3661}
3662
3665 const MachineFunction &MF = *MI.getMF();
3666 const MachineRegisterInfo &MRI = MF.getRegInfo();
3667 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3668
3669 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3670 const MachineOperand &Op = MI.getOperand(I);
3671 if (!Op.isReg())
3672 continue;
3673
3674 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3675 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3676 }
3677
3678 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3679 MI.getNumOperands());
3680}
3681
3684 const MachineInstr &MI,
3685 int RsrcIdx) const {
3686 // The reported argument index is relative to the IR intrinsic call arguments,
3687 // so we need to shift by the number of defs and the intrinsic ID.
3688 RsrcIdx += MI.getNumExplicitDefs() + 1;
3689
3690 const int NumOps = MI.getNumOperands();
3692
3693 // TODO: Should packed/unpacked D16 difference be reported here as part of
3694 // the value mapping?
3695 for (int I = 0; I != NumOps; ++I) {
3696 if (!MI.getOperand(I).isReg())
3697 continue;
3698
3699 Register OpReg = MI.getOperand(I).getReg();
3700 // We replace some dead address operands with $noreg
3701 if (!OpReg)
3702 continue;
3703
3704 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3705
3706 // FIXME: Probably need a new intrinsic register bank searchable table to
3707 // handle arbitrary intrinsics easily.
3708 //
3709 // If this has a sampler, it immediately follows rsrc.
3710 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3711
3712 if (MustBeSGPR) {
3713 // If this must be an SGPR, so we must report whatever it is as legal.
3714 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3715 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3716 } else {
3717 // Some operands must be VGPR, and these are easy to copy to.
3718 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3719 }
3720 }
3721
3722 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3723}
3724
3725/// Return the mapping for a pointer argument.
3728 Register PtrReg) const {
3729 LLT PtrTy = MRI.getType(PtrReg);
3730 unsigned Size = PtrTy.getSizeInBits();
3731 if (Subtarget.useFlatForGlobal() ||
3733 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3734
3735 // If we're using MUBUF instructions for global memory, an SGPR base register
3736 // is possible. Otherwise this needs to be a VGPR.
3737 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3738 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3739}
3740
3743
3744 const MachineFunction &MF = *MI.getMF();
3745 const MachineRegisterInfo &MRI = MF.getRegInfo();
3747 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3748 Register PtrReg = MI.getOperand(1).getReg();
3749 LLT PtrTy = MRI.getType(PtrReg);
3750 unsigned AS = PtrTy.getAddressSpace();
3751 unsigned PtrSize = PtrTy.getSizeInBits();
3752
3753 const ValueMapping *ValMapping;
3754 const ValueMapping *PtrMapping;
3755
3756 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3757
3758 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3759 if (isScalarLoadLegal(MI)) {
3760 // We have a uniform instruction so we want to use an SMRD load
3761 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3762 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3763 } else {
3764 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3765
3766 // If we're using MUBUF instructions for global memory, an SGPR base
3767 // register is possible. Otherwise this needs to be a VGPR.
3768 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3769 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3770
3771 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3772 }
3773 } else {
3774 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3775 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3776 }
3777
3778 OpdsMapping[0] = ValMapping;
3779 OpdsMapping[1] = PtrMapping;
3781 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3782 return Mapping;
3783
3784 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3785 // handle that during instruction selection?
3786}
3787
3788unsigned
3790 const MachineRegisterInfo &MRI,
3791 unsigned Default) const {
3792 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3793 return Bank ? Bank->getID() : Default;
3794}
3795
3798 const MachineRegisterInfo &MRI,
3799 const TargetRegisterInfo &TRI) const {
3800 // Lie and claim anything is legal, even though this needs to be an SGPR
3801 // applyMapping will have to deal with it as a waterfall loop.
3802 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3803 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3804 return AMDGPU::getValueMapping(Bank, Size);
3805}
3806
3809 const MachineRegisterInfo &MRI,
3810 const TargetRegisterInfo &TRI) const {
3811 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3812 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3813}
3814
3817 const MachineRegisterInfo &MRI,
3818 const TargetRegisterInfo &TRI) const {
3819 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3820 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3821}
3822
3823///
3824/// This function must return a legal mapping, because
3825/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3826/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3827/// VGPR to SGPR generated is illegal.
3828///
3829// Operands that must be SGPRs must accept potentially divergent VGPRs as
3830// legal. These will be dealt with in applyMappingImpl.
3831//
3834 const MachineFunction &MF = *MI.getMF();
3835 const MachineRegisterInfo &MRI = MF.getRegInfo();
3836
3837 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3838 Register DstReg = MI.getOperand(0).getReg();
3839 Register SrcReg = MI.getOperand(1).getReg();
3840
3841 // The default logic bothers to analyze impossible alternative mappings. We
3842 // want the most straightforward mapping, so just directly handle this.
3843 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3844 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3845
3846 // For COPY between a physical reg and an s1, there is no type associated so
3847 // we need to take the virtual register's type as a hint on how to interpret
3848 // s1 values.
3849 unsigned Size;
3850 if (!SrcReg.isVirtual() && !DstBank &&
3851 MRI.getType(DstReg) == LLT::scalar(1)) {
3852 DstBank = &AMDGPU::VCCRegBank;
3853 Size = 1;
3854 } else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1)) {
3855 DstBank = &AMDGPU::VCCRegBank;
3856 Size = 1;
3857 } else {
3858 Size = getSizeInBits(DstReg, MRI, *TRI);
3859 }
3860
3861 if (!DstBank)
3862 DstBank = SrcBank;
3863 else if (!SrcBank)
3864 SrcBank = DstBank;
3865
3866 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3867 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3869
3870 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3871 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3872 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3873 OpdsMapping[0] = &ValMap;
3874 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3875 OpdsMapping[1] = &ValMap;
3876
3877 return getInstructionMapping(
3878 1, /*Cost*/ 1,
3879 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3880 }
3881
3882 if (MI.isRegSequence()) {
3883 // If any input is a VGPR, the result must be a VGPR. The default handling
3884 // assumes any copy between banks is legal.
3885 unsigned BankID = AMDGPU::SGPRRegBankID;
3886
3887 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3888 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3889 // It doesn't make sense to use vcc or scc banks here, so just ignore
3890 // them.
3891 if (OpBank != AMDGPU::SGPRRegBankID) {
3892 BankID = AMDGPU::VGPRRegBankID;
3893 break;
3894 }
3895 }
3896 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3897
3898 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3899 return getInstructionMapping(
3900 1, /*Cost*/ 1,
3901 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3902 }
3903
3904 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3905 // properly.
3906 //
3907 // TODO: There are additional exec masking dependencies to analyze.
3908 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3909 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3910 Register DstReg = PHI->getReg(0);
3911
3912 // Sometimes the result may have already been assigned a bank.
3913 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3914 ResultBank = DstBank->getID();
3915
3916 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3917 Register Reg = PHI->getIncomingValue(I);
3918 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3919
3920 // FIXME: Assuming VGPR for any undetermined inputs.
3921 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3922 ResultBank = AMDGPU::VGPRRegBankID;
3923 break;
3924 }
3925
3926 // FIXME: Need to promote SGPR case to s32
3927 unsigned OpBank = Bank->getID();
3928 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3929 }
3930
3931 assert(ResultBank != AMDGPU::InvalidRegBankID);
3932
3933 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3934
3935 const ValueMapping &ValMap =
3936 getValueMapping(0, Size, getRegBank(ResultBank));
3937 return getInstructionMapping(
3938 1, /*Cost*/ 1,
3939 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3940 }
3941
3943 if (Mapping.isValid())
3944 return Mapping;
3945
3946 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3947
3948 switch (MI.getOpcode()) {
3949 default:
3951
3952 case AMDGPU::G_AND:
3953 case AMDGPU::G_OR:
3954 case AMDGPU::G_XOR:
3955 case AMDGPU::G_MUL: {
3956 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3957 if (Size == 1) {
3958 const RegisterBank *DstBank
3959 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3960
3961 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3962 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3963 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3964 if (DstBank) {
3965 TargetBankID = DstBank->getID();
3966 if (DstBank == &AMDGPU::VCCRegBank) {
3967 TargetBankID = AMDGPU::VCCRegBankID;
3968 BankLHS = AMDGPU::VCCRegBankID;
3969 BankRHS = AMDGPU::VCCRegBankID;
3970 } else {
3971 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3972 AMDGPU::SGPRRegBankID);
3973 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3974 AMDGPU::SGPRRegBankID);
3975 }
3976 } else {
3977 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3978 AMDGPU::VCCRegBankID);
3979 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3980 AMDGPU::VCCRegBankID);
3981
3982 // Both inputs should be true booleans to produce a boolean result.
3983 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3984 TargetBankID = AMDGPU::VGPRRegBankID;
3985 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3986 TargetBankID = AMDGPU::VCCRegBankID;
3987 BankLHS = AMDGPU::VCCRegBankID;
3988 BankRHS = AMDGPU::VCCRegBankID;
3989 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
3990 TargetBankID = AMDGPU::SGPRRegBankID;
3991 }
3992 }
3993
3994 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
3995 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
3996 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
3997 break;
3998 }
3999
4000 if (Size == 64) {
4001
4002 if (isSALUMapping(MI)) {
4003 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
4004 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
4005 } else {
4006 if (MI.getOpcode() == AMDGPU::G_MUL && Subtarget.hasVectorMulU64())
4007 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4008 else
4009 OpdsMapping[0] =
4010 getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
4011 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
4012 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
4013
4014 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
4015 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
4016 }
4017
4018 break;
4019 }
4020
4021 [[fallthrough]];
4022 }
4023 case AMDGPU::G_PTR_ADD:
4024 case AMDGPU::G_PTRMASK:
4025 case AMDGPU::G_ADD:
4026 case AMDGPU::G_SUB:
4027 case AMDGPU::G_SHL:
4028 case AMDGPU::G_LSHR:
4029 case AMDGPU::G_ASHR:
4030 case AMDGPU::G_UADDO:
4031 case AMDGPU::G_USUBO:
4032 case AMDGPU::G_UADDE:
4033 case AMDGPU::G_SADDE:
4034 case AMDGPU::G_USUBE:
4035 case AMDGPU::G_SSUBE:
4036 case AMDGPU::G_ABS:
4037 case AMDGPU::G_SHUFFLE_VECTOR:
4038 case AMDGPU::G_SBFX:
4039 case AMDGPU::G_UBFX:
4040 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
4041 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
4042 if (isSALUMapping(MI))
4043 return getDefaultMappingSOP(MI);
4044 return getDefaultMappingVOP(MI);
4045 case AMDGPU::G_SMIN:
4046 case AMDGPU::G_SMAX:
4047 case AMDGPU::G_UMIN:
4048 case AMDGPU::G_UMAX:
4049 if (isSALUMapping(MI)) {
4050 // There are no scalar 64-bit min and max, use vector instruction instead.
4051 if (MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 64 &&
4052 Subtarget.hasIntMinMax64())
4053 return getDefaultMappingVOP(MI);
4054 return getDefaultMappingSOP(MI);
4055 }
4056 return getDefaultMappingVOP(MI);
4057 case AMDGPU::G_FADD:
4058 case AMDGPU::G_FSUB:
4059 case AMDGPU::G_FMUL:
4060 case AMDGPU::G_FMA:
4061 case AMDGPU::G_FFLOOR:
4062 case AMDGPU::G_FCEIL:
4063 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
4064 case AMDGPU::G_FMINNUM:
4065 case AMDGPU::G_FMAXNUM:
4066 case AMDGPU::G_FMINIMUM:
4067 case AMDGPU::G_FMAXIMUM:
4068 case AMDGPU::G_FMINIMUMNUM:
4069 case AMDGPU::G_FMAXIMUMNUM:
4070 case AMDGPU::G_INTRINSIC_TRUNC:
4071 case AMDGPU::G_STRICT_FADD:
4072 case AMDGPU::G_STRICT_FSUB:
4073 case AMDGPU::G_STRICT_FMUL:
4074 case AMDGPU::G_STRICT_FMA: {
4075 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4076 unsigned Size = Ty.getSizeInBits();
4077 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
4078 (Size == 32 || Size == 16) && isSALUMapping(MI))
4079 return getDefaultMappingSOP(MI);
4080 return getDefaultMappingVOP(MI);
4081 }
4082 case AMDGPU::G_FPTOSI:
4083 case AMDGPU::G_FPTOUI:
4084 case AMDGPU::G_SITOFP:
4085 case AMDGPU::G_UITOFP: {
4086 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4087 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4088 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4090 return getDefaultMappingSOP(MI);
4091 return getDefaultMappingVOP(MI);
4092 }
4093 case AMDGPU::G_FPTRUNC:
4094 case AMDGPU::G_FPEXT: {
4095 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4096 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4097 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4099 return getDefaultMappingSOP(MI);
4100 return getDefaultMappingVOP(MI);
4101 }
4102 case AMDGPU::G_FSQRT:
4103 case AMDGPU::G_FEXP2:
4104 case AMDGPU::G_FLOG2: {
4105 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4106 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4108 return getDefaultMappingSOP(MI);
4109 return getDefaultMappingVOP(MI);
4110 }
4111 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4112 case AMDGPU::G_SSUBSAT:
4113 case AMDGPU::G_UADDSAT:
4114 case AMDGPU::G_USUBSAT:
4115 case AMDGPU::G_FMAD:
4116 case AMDGPU::G_FLDEXP:
4117 case AMDGPU::G_FMINNUM_IEEE:
4118 case AMDGPU::G_FMAXNUM_IEEE:
4119 case AMDGPU::G_FCANONICALIZE:
4120 case AMDGPU::G_STRICT_FLDEXP:
4121 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4122 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4123 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4124 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4125 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4126 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4127 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4128 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4129 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4130 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4131 case AMDGPU::G_AMDGPU_SMED3:
4132 case AMDGPU::G_AMDGPU_FMED3:
4133 return getDefaultMappingVOP(MI);
4134 case AMDGPU::G_UMULH:
4135 case AMDGPU::G_SMULH: {
4136 if (Subtarget.hasScalarMulHiInsts() && isSALUMapping(MI))
4137 return getDefaultMappingSOP(MI);
4138 return getDefaultMappingVOP(MI);
4139 }
4140 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4141 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4142 // Three possible mappings:
4143 //
4144 // - Default SOP
4145 // - Default VOP
4146 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4147 //
4148 // This allows instruction selection to keep the multiplication part of the
4149 // instruction on the SALU.
4150 bool AllSalu = true;
4151 bool MulSalu = true;
4152 for (unsigned i = 0; i < 5; ++i) {
4153 Register Reg = MI.getOperand(i).getReg();
4154 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4155 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4156 AllSalu = false;
4157 if (i == 2 || i == 3) {
4158 MulSalu = false;
4159 break;
4160 }
4161 }
4162 }
4163 }
4164
4165 if (AllSalu)
4166 return getDefaultMappingSOP(MI);
4167
4168 // If the multiply-add is full-rate in VALU, use that even if the
4169 // multiplication part is scalar. Accumulating separately on the VALU would
4170 // take two instructions.
4171 if (!MulSalu || Subtarget.hasFullRate64Ops())
4172 return getDefaultMappingVOP(MI);
4173
4174 // Keep the multiplication on the SALU, then accumulate on the VALU.
4175 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4176 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4177 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4178 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4179 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4180 break;
4181 }
4182 case AMDGPU::G_IMPLICIT_DEF: {
4183 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4184 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4185 break;
4186 }
4187 case AMDGPU::G_FCONSTANT:
4188 case AMDGPU::G_CONSTANT:
4189 case AMDGPU::G_GLOBAL_VALUE:
4190 case AMDGPU::G_FRAME_INDEX:
4191 case AMDGPU::G_BLOCK_ADDR:
4192 case AMDGPU::G_READSTEADYCOUNTER:
4193 case AMDGPU::G_READCYCLECOUNTER: {
4194 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4195 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4196 break;
4197 }
4198 case AMDGPU::G_DYN_STACKALLOC: {
4199 // Result is always uniform, and a wave reduction is needed for the source.
4200 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4201 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4202 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4203 break;
4204 }
4205 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4206 // This case is weird because we expect a physical register in the source,
4207 // but need to set a bank anyway.
4208 //
4209 // TODO: We could select the result to SGPR or VGPR
4210 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4211 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4212 break;
4213 }
4214 case AMDGPU::G_INSERT: {
4215 unsigned BankID = getMappingType(MRI, MI);
4216 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4217 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4218 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4219 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4220 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4221 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4222 OpdsMapping[3] = nullptr;
4223 break;
4224 }
4225 case AMDGPU::G_EXTRACT: {
4226 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4227 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4228 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4229 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4230 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4231 OpdsMapping[2] = nullptr;
4232 break;
4233 }
4234 case AMDGPU::G_BUILD_VECTOR:
4235 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4236 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4237 if (DstTy == LLT::fixed_vector(2, 16)) {
4238 unsigned DstSize = DstTy.getSizeInBits();
4239 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4240 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4241 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4242 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4243
4244 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4245 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4246 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4247 break;
4248 }
4249
4250 [[fallthrough]];
4251 }
4252 case AMDGPU::G_MERGE_VALUES:
4253 case AMDGPU::G_CONCAT_VECTORS: {
4254 unsigned Bank = getMappingType(MRI, MI);
4255 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4256 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4257
4258 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4259 // Op1 and Dst should use the same register bank.
4260 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4261 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4262 break;
4263 }
4264 case AMDGPU::G_BITREVERSE:
4265 case AMDGPU::G_BITCAST:
4266 case AMDGPU::G_INTTOPTR:
4267 case AMDGPU::G_PTRTOINT:
4268 case AMDGPU::G_FABS:
4269 case AMDGPU::G_FNEG: {
4270 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4271 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4272 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4273 break;
4274 }
4275 case AMDGPU::G_AMDGPU_FFBH_U32:
4276 case AMDGPU::G_AMDGPU_FFBL_B32:
4277 case AMDGPU::G_CTLZ_ZERO_UNDEF:
4278 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
4279 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4280 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4281 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4282 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4283 break;
4284 }
4285 case AMDGPU::G_CTPOP: {
4286 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4287 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4288 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4289
4290 // This should really be getValueMappingSGPR64Only, but allowing the generic
4291 // code to handle the register split just makes using LegalizerHelper more
4292 // difficult.
4293 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4294 break;
4295 }
4296 case AMDGPU::G_TRUNC: {
4297 Register Dst = MI.getOperand(0).getReg();
4298 Register Src = MI.getOperand(1).getReg();
4299 unsigned Bank = getRegBankID(Src, MRI);
4300 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4301 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4302 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4303 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4304 break;
4305 }
4306 case AMDGPU::G_ZEXT:
4307 case AMDGPU::G_SEXT:
4308 case AMDGPU::G_ANYEXT:
4309 case AMDGPU::G_SEXT_INREG: {
4310 Register Dst = MI.getOperand(0).getReg();
4311 Register Src = MI.getOperand(1).getReg();
4312 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4313 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4314
4315 unsigned DstBank;
4316 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4317 assert(SrcBank);
4318 switch (SrcBank->getID()) {
4319 case AMDGPU::SGPRRegBankID:
4320 DstBank = AMDGPU::SGPRRegBankID;
4321 break;
4322 default:
4323 DstBank = AMDGPU::VGPRRegBankID;
4324 break;
4325 }
4326
4327 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4328 // 32-bits, and then to 64.
4329 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4330 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4331 SrcSize);
4332 break;
4333 }
4334 case AMDGPU::G_IS_FPCLASS: {
4335 Register SrcReg = MI.getOperand(1).getReg();
4336 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4337 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4338 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4339 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4340 break;
4341 }
4342 case AMDGPU::G_STORE: {
4343 assert(MI.getOperand(0).isReg());
4344 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4345
4346 // FIXME: We need to specify a different reg bank once scalar stores are
4347 // supported.
4348 const ValueMapping *ValMapping =
4349 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4350 OpdsMapping[0] = ValMapping;
4351 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4352 break;
4353 }
4354 case AMDGPU::G_ICMP:
4355 case AMDGPU::G_FCMP: {
4356 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4357
4358 // See if the result register has already been constrained to vcc, which may
4359 // happen due to control flow intrinsic lowering.
4360 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4361 AMDGPU::SGPRRegBankID);
4362 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4363 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4364
4365 auto canUseSCCICMP = [&]() {
4366 auto Pred =
4367 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4368 return Size == 32 ||
4369 (Size == 64 &&
4370 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4371 Subtarget.hasScalarCompareEq64());
4372 };
4373 auto canUseSCCFCMP = [&]() {
4374 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4375 };
4376
4377 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4378 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4379 Op2Bank == AMDGPU::SGPRRegBankID &&
4380 Op3Bank == AMDGPU::SGPRRegBankID &&
4381 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4382
4383 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4384 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4385
4386 // TODO: Use 32-bit for scalar output size.
4387 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4388 const unsigned ResultSize = 1;
4389
4390 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4391 OpdsMapping[1] = nullptr; // Predicate Operand.
4392 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4393 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4394 break;
4395 }
4396 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4397 // VGPR index can be used for waterfall when indexing a SGPR vector.
4398 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4399 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4400 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4401 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4402 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4403 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4404
4405 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4406 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4407
4408 // The index can be either if the source vector is VGPR.
4409 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4410 break;
4411 }
4412 case AMDGPU::G_INSERT_VECTOR_ELT: {
4413 unsigned OutputBankID = isSALUMapping(MI) ?
4414 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4415
4416 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4417 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4418 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4419 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4420 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4421
4422 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4423 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4424
4425 // This is a weird case, because we need to break down the mapping based on
4426 // the register bank of a different operand.
4427 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4428 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4429 InsertSize);
4430 } else {
4431 assert(InsertSize == 32 || InsertSize == 64);
4432 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4433 }
4434
4435 // The index can be either if the source vector is VGPR.
4436 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4437 break;
4438 }
4439 case AMDGPU::G_UNMERGE_VALUES: {
4440 unsigned Bank = getMappingType(MRI, MI);
4441
4442 // Op1 and Dst should use the same register bank.
4443 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4444 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4445 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4446 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4447 }
4448 break;
4449 }
4450 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4451 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4452 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4453 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4454 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4455 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4456 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4457 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4458 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4459 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4460 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4461 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4462 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4463 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4464 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4465 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4466 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4467 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4468 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4469 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4470 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4471 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4472 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4473
4474 // rsrc
4475 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4476
4477 // vindex
4478 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4479
4480 // voffset
4481 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4482
4483 // soffset
4484 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4485
4486 // Any remaining operands are immediates and were correctly null
4487 // initialized.
4488 break;
4489 }
4490 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4491 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4492 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4493 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4494 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4495 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4496 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4497 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4498 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4499 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4500 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4501 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4502 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4503 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4504 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4505 // vdata_out
4506 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4507
4508 // vdata_in
4509 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4510
4511 // rsrc
4512 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4513
4514 // vindex
4515 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4516
4517 // voffset
4518 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4519
4520 // soffset
4521 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4522
4523 // Any remaining operands are immediates and were correctly null
4524 // initialized.
4525 break;
4526 }
4527 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4528 // vdata_out
4529 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4530
4531 // vdata_in
4532 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4533
4534 // cmp
4535 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4536
4537 // rsrc
4538 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4539
4540 // vindex
4541 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4542
4543 // voffset
4544 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4545
4546 // soffset
4547 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4548
4549 // Any remaining operands are immediates and were correctly null
4550 // initialized.
4551 break;
4552 }
4553 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4554 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4555 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4556 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4557 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4558 // Lie and claim everything is legal, even though some need to be
4559 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4560 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4561 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4562
4563 // We need to convert this to a MUBUF if either the resource of offset is
4564 // VGPR.
4565 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4566 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4567 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4568
4569 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4570 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4571 break;
4572 }
4573 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4574 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4575 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4576 break;
4577 case AMDGPU::G_INTRINSIC:
4578 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4579 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4580 default:
4582 case Intrinsic::amdgcn_div_fmas:
4583 case Intrinsic::amdgcn_div_fixup:
4584 case Intrinsic::amdgcn_trig_preop:
4585 case Intrinsic::amdgcn_sin:
4586 case Intrinsic::amdgcn_cos:
4587 case Intrinsic::amdgcn_log_clamp:
4588 case Intrinsic::amdgcn_rcp_legacy:
4589 case Intrinsic::amdgcn_rsq_legacy:
4590 case Intrinsic::amdgcn_rsq_clamp:
4591 case Intrinsic::amdgcn_tanh:
4592 case Intrinsic::amdgcn_fmul_legacy:
4593 case Intrinsic::amdgcn_fma_legacy:
4594 case Intrinsic::amdgcn_frexp_mant:
4595 case Intrinsic::amdgcn_frexp_exp:
4596 case Intrinsic::amdgcn_fract:
4597 case Intrinsic::amdgcn_cvt_pknorm_i16:
4598 case Intrinsic::amdgcn_cvt_pknorm_u16:
4599 case Intrinsic::amdgcn_cvt_pk_i16:
4600 case Intrinsic::amdgcn_cvt_pk_u16:
4601 case Intrinsic::amdgcn_cvt_sr_pk_f16_f32:
4602 case Intrinsic::amdgcn_cvt_sr_pk_bf16_f32:
4603 case Intrinsic::amdgcn_cvt_pk_f16_fp8:
4604 case Intrinsic::amdgcn_cvt_pk_f16_bf8:
4605 case Intrinsic::amdgcn_cvt_pk_fp8_f16:
4606 case Intrinsic::amdgcn_cvt_pk_bf8_f16:
4607 case Intrinsic::amdgcn_cvt_sr_fp8_f16:
4608 case Intrinsic::amdgcn_cvt_sr_bf8_f16:
4609 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp8:
4610 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp8:
4611 case Intrinsic::amdgcn_cvt_scale_pk8_f16_bf8:
4612 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_bf8:
4613 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp4:
4614 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp4:
4615 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp8:
4616 case Intrinsic::amdgcn_cvt_scale_pk8_f32_bf8:
4617 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp4:
4618 case Intrinsic::amdgcn_cvt_scale_pk16_f16_fp6:
4619 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_fp6:
4620 case Intrinsic::amdgcn_cvt_scale_pk16_f16_bf6:
4621 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_bf6:
4622 case Intrinsic::amdgcn_cvt_scale_pk16_f32_fp6:
4623 case Intrinsic::amdgcn_cvt_scale_pk16_f32_bf6:
4624 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_bf16:
4625 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_bf16:
4626 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f16:
4627 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f16:
4628 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f32:
4629 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f32:
4630 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f32:
4631 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f16:
4632 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_bf16:
4633 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f32:
4634 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f32:
4635 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f16:
4636 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f16:
4637 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_bf16:
4638 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_bf16:
4639 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_bf16:
4640 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_bf16:
4641 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f16:
4642 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f16:
4643 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f32:
4644 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f32:
4645 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f32:
4646 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f16:
4647 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_bf16:
4648 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f32:
4649 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f32:
4650 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f16:
4651 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f16:
4652 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_bf16:
4653 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_bf16:
4654 case Intrinsic::amdgcn_sat_pk4_i4_i8:
4655 case Intrinsic::amdgcn_sat_pk4_u4_u8:
4656 case Intrinsic::amdgcn_fmed3:
4657 case Intrinsic::amdgcn_cubeid:
4658 case Intrinsic::amdgcn_cubema:
4659 case Intrinsic::amdgcn_cubesc:
4660 case Intrinsic::amdgcn_cubetc:
4661 case Intrinsic::amdgcn_sffbh:
4662 case Intrinsic::amdgcn_fmad_ftz:
4663 case Intrinsic::amdgcn_mbcnt_lo:
4664 case Intrinsic::amdgcn_mbcnt_hi:
4665 case Intrinsic::amdgcn_mul_u24:
4666 case Intrinsic::amdgcn_mul_i24:
4667 case Intrinsic::amdgcn_mulhi_u24:
4668 case Intrinsic::amdgcn_mulhi_i24:
4669 case Intrinsic::amdgcn_lerp:
4670 case Intrinsic::amdgcn_sad_u8:
4671 case Intrinsic::amdgcn_msad_u8:
4672 case Intrinsic::amdgcn_sad_hi_u8:
4673 case Intrinsic::amdgcn_sad_u16:
4674 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4675 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4676 case Intrinsic::amdgcn_mqsad_u32_u8:
4677 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4678 case Intrinsic::amdgcn_alignbyte:
4679 case Intrinsic::amdgcn_perm:
4680 case Intrinsic::amdgcn_prng_b32:
4681 case Intrinsic::amdgcn_fdot2:
4682 case Intrinsic::amdgcn_sdot2:
4683 case Intrinsic::amdgcn_udot2:
4684 case Intrinsic::amdgcn_sdot4:
4685 case Intrinsic::amdgcn_udot4:
4686 case Intrinsic::amdgcn_sdot8:
4687 case Intrinsic::amdgcn_udot8:
4688 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4689 case Intrinsic::amdgcn_fdot2_f16_f16:
4690 case Intrinsic::amdgcn_fdot2_f32_bf16:
4691 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4692 case Intrinsic::amdgcn_sudot4:
4693 case Intrinsic::amdgcn_sudot8:
4694 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4695 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4696 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4697 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4698 case Intrinsic::amdgcn_cvt_f32_fp8:
4699 case Intrinsic::amdgcn_cvt_f32_fp8_e5m3:
4700 case Intrinsic::amdgcn_cvt_f32_bf8:
4701 case Intrinsic::amdgcn_cvt_off_f32_i4:
4702 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4703 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4704 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4705 case Intrinsic::amdgcn_cvt_pk_fp8_f32_e5m3:
4706 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4707 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4708 case Intrinsic::amdgcn_cvt_sr_fp8_f32_e5m3:
4709 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4710 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4711 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4712 case Intrinsic::amdgcn_cvt_f16_fp8:
4713 case Intrinsic::amdgcn_cvt_f16_bf8:
4714 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4715 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4716 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4717 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4718 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4719 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4720 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4721 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4722 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4723 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4724 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4725 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4726 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4727 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4728 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4729 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4730 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4731 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4732 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4733 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4734 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4735 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4736 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4737 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4738 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4739 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4740 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4741 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4742 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4743 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4744 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4745 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4746 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4747 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4748 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4749 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4750 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4751 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4752 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4753 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4754 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4755 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4756 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4757 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4758 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4759 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4760 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4761 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4762 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4763 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4764 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4765 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4766 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4767 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4768 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4769 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4770 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4771 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4772 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4773 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4774 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4775 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4776 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4777 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4778 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4779 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4780 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4781 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4782 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4783 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4784 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4785 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4786 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4787 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4788 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4789 case Intrinsic::amdgcn_wmma_f32_16x16x4_f32:
4790 case Intrinsic::amdgcn_wmma_f32_16x16x32_bf16:
4791 case Intrinsic::amdgcn_wmma_f32_16x16x32_f16:
4792 case Intrinsic::amdgcn_wmma_f16_16x16x32_f16:
4793 case Intrinsic::amdgcn_wmma_bf16_16x16x32_bf16:
4794 case Intrinsic::amdgcn_wmma_bf16f32_16x16x32_bf16:
4795 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_fp8:
4796 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_bf8:
4797 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_fp8:
4798 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_bf8:
4799 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_fp8:
4800 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_bf8:
4801 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_fp8:
4802 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_bf8:
4803 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_fp8:
4804 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_bf8:
4805 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_fp8:
4806 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_bf8:
4807 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_fp8:
4808 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_bf8:
4809 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_fp8:
4810 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_bf8:
4811 case Intrinsic::amdgcn_wmma_i32_16x16x64_iu8:
4812 case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4:
4813 case Intrinsic::amdgcn_wmma_scale_f32_16x16x128_f8f6f4:
4814 case Intrinsic::amdgcn_wmma_scale16_f32_16x16x128_f8f6f4:
4815 case Intrinsic::amdgcn_wmma_f32_32x16x128_f4:
4816 case Intrinsic::amdgcn_wmma_scale_f32_32x16x128_f4:
4817 case Intrinsic::amdgcn_wmma_scale16_f32_32x16x128_f4:
4818 case Intrinsic::amdgcn_swmmac_f16_16x16x64_f16:
4819 case Intrinsic::amdgcn_swmmac_bf16_16x16x64_bf16:
4820 case Intrinsic::amdgcn_swmmac_f32_16x16x64_bf16:
4821 case Intrinsic::amdgcn_swmmac_bf16f32_16x16x64_bf16:
4822 case Intrinsic::amdgcn_swmmac_f32_16x16x64_f16:
4823 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_fp8:
4824 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_bf8:
4825 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_fp8:
4826 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_bf8:
4827 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_fp8:
4828 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_bf8:
4829 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_fp8:
4830 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_bf8:
4831 case Intrinsic::amdgcn_swmmac_i32_16x16x128_iu8:
4832 case Intrinsic::amdgcn_perm_pk16_b4_u4:
4833 case Intrinsic::amdgcn_perm_pk16_b6_u4:
4834 case Intrinsic::amdgcn_perm_pk16_b8_u4:
4835 case Intrinsic::amdgcn_add_max_i32:
4836 case Intrinsic::amdgcn_add_max_u32:
4837 case Intrinsic::amdgcn_add_min_i32:
4838 case Intrinsic::amdgcn_add_min_u32:
4839 case Intrinsic::amdgcn_pk_add_max_i16:
4840 case Intrinsic::amdgcn_pk_add_max_u16:
4841 case Intrinsic::amdgcn_pk_add_min_i16:
4842 case Intrinsic::amdgcn_pk_add_min_u16:
4843 return getDefaultMappingVOP(MI);
4844 case Intrinsic::amdgcn_log:
4845 case Intrinsic::amdgcn_exp2:
4846 case Intrinsic::amdgcn_rcp:
4847 case Intrinsic::amdgcn_rsq:
4848 case Intrinsic::amdgcn_sqrt: {
4849 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4850 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4852 return getDefaultMappingSOP(MI);
4853 return getDefaultMappingVOP(MI);
4854 }
4855 case Intrinsic::amdgcn_sbfe:
4856 case Intrinsic::amdgcn_ubfe:
4857 if (isSALUMapping(MI))
4858 return getDefaultMappingSOP(MI);
4859 return getDefaultMappingVOP(MI);
4860 case Intrinsic::amdgcn_ds_swizzle:
4861 case Intrinsic::amdgcn_ds_permute:
4862 case Intrinsic::amdgcn_ds_bpermute:
4863 case Intrinsic::amdgcn_update_dpp:
4864 case Intrinsic::amdgcn_mov_dpp8:
4865 case Intrinsic::amdgcn_mov_dpp:
4866 case Intrinsic::amdgcn_strict_wwm:
4867 case Intrinsic::amdgcn_wwm:
4868 case Intrinsic::amdgcn_strict_wqm:
4869 case Intrinsic::amdgcn_wqm:
4870 case Intrinsic::amdgcn_softwqm:
4871 case Intrinsic::amdgcn_set_inactive:
4872 case Intrinsic::amdgcn_set_inactive_chain_arg:
4873 case Intrinsic::amdgcn_permlane64:
4874 case Intrinsic::amdgcn_ds_bpermute_fi_b32:
4876 case Intrinsic::amdgcn_cvt_pkrtz:
4877 if (Subtarget.hasSALUFloatInsts() && isSALUMapping(MI))
4878 return getDefaultMappingSOP(MI);
4879 return getDefaultMappingVOP(MI);
4880 case Intrinsic::amdgcn_kernarg_segment_ptr:
4881 case Intrinsic::amdgcn_s_getpc:
4882 case Intrinsic::amdgcn_groupstaticsize:
4883 case Intrinsic::amdgcn_reloc_constant:
4884 case Intrinsic::returnaddress: {
4885 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4886 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4887 break;
4888 }
4889 case Intrinsic::amdgcn_wqm_vote: {
4890 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4891 OpdsMapping[0] = OpdsMapping[2]
4892 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4893 break;
4894 }
4895 case Intrinsic::amdgcn_ps_live: {
4896 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4897 break;
4898 }
4899 case Intrinsic::amdgcn_div_scale: {
4900 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4901 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4902 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4903 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4904
4905 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4906 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4907 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4908 break;
4909 }
4910 case Intrinsic::amdgcn_class: {
4911 Register Src0Reg = MI.getOperand(2).getReg();
4912 Register Src1Reg = MI.getOperand(3).getReg();
4913 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4914 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4915 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4916 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4917 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4918 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4919 break;
4920 }
4921 case Intrinsic::amdgcn_icmp:
4922 case Intrinsic::amdgcn_fcmp: {
4923 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4924 // This is not VCCRegBank because this is not used in boolean contexts.
4925 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4926 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4927 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4928 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4929 break;
4930 }
4931 case Intrinsic::amdgcn_readlane: {
4932 // This must be an SGPR, but accept a VGPR.
4933 Register IdxReg = MI.getOperand(3).getReg();
4934 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4935 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4936 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4937 [[fallthrough]];
4938 }
4939 case Intrinsic::amdgcn_readfirstlane: {
4940 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4941 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4942 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4943 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4944 break;
4945 }
4946 case Intrinsic::amdgcn_writelane: {
4947 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4948 Register SrcReg = MI.getOperand(2).getReg();
4949 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4950 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4951 Register IdxReg = MI.getOperand(3).getReg();
4952 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4953 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4954 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4955
4956 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4957 // to legalize.
4958 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4959 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4960 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4961 break;
4962 }
4963 case Intrinsic::amdgcn_if_break: {
4964 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4965 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4966 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4967 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4968 break;
4969 }
4970 case Intrinsic::amdgcn_permlane16:
4971 case Intrinsic::amdgcn_permlanex16: {
4972 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4973 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4974 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4975 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4976 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4977 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4978 break;
4979 }
4980 case Intrinsic::amdgcn_permlane_bcast:
4981 case Intrinsic::amdgcn_permlane_up:
4982 case Intrinsic::amdgcn_permlane_down:
4983 case Intrinsic::amdgcn_permlane_xor: {
4984 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4985 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4986 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4987 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4988 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4989 break;
4990 }
4991 case Intrinsic::amdgcn_permlane_idx_gen: {
4992 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4993 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4994 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4995 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4996 break;
4997 }
4998 case Intrinsic::amdgcn_permlane16_var:
4999 case Intrinsic::amdgcn_permlanex16_var: {
5000 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5001 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5002 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5003 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5004 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5005 break;
5006 }
5007 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
5008 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
5009 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
5010 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
5011 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
5012 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
5013 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
5014 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
5015 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
5016 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
5017 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
5018 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
5019 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
5020 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
5021 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
5022 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
5023 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
5024 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
5025 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
5026 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
5027 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
5028 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
5029 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
5030 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
5031 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
5032 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
5033 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
5034 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
5035 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
5036 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
5037 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
5038 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
5039 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
5040 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
5041 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
5042 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
5043 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
5044 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
5045 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
5046 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
5047 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
5048 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
5049 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
5050 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
5051 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5052 unsigned MinNumRegsRequired = DstSize / 32;
5053
5054 // Default for MAI intrinsics.
5055 // srcC can also be an immediate which can be folded later.
5056 // FIXME: Should we eventually add an alternative mapping with AGPR src
5057 // for srcA/srcB?
5058 //
5059 // vdst, srcA, srcB, srcC
5061
5062 bool UseAGPRForm = !Subtarget.hasGFX90AInsts() ||
5063 Info->selectAGPRFormMFMA(MinNumRegsRequired);
5064
5065 OpdsMapping[0] =
5066 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5067 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5068 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5069 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5070 OpdsMapping[4] =
5071 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5072 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5073 break;
5074 }
5075 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
5076 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
5077 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5078 unsigned MinNumRegsRequired = DstSize / 32;
5079
5081 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5082
5083 OpdsMapping[0] =
5084 UseAGPRForm ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5085 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5086
5087 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5088 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5089 OpdsMapping[4] =
5090 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5091 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5092
5093 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5094 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
5095 break;
5096 }
5097 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
5098 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
5099 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
5100 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
5101 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
5102 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
5103 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
5104 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
5105 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
5106 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
5107 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
5108 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
5109 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
5110 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
5111 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
5112 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
5113 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
5114 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
5115 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
5116 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
5117 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
5118 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
5119 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
5120 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
5121 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
5122 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
5123 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
5124 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
5125 Register DstReg = MI.getOperand(0).getReg();
5126 unsigned DstSize = MRI.getType(DstReg).getSizeInBits();
5127 unsigned MinNumRegsRequired = DstSize / 32;
5129 bool UseAGPRForm = Info->selectAGPRFormMFMA(MinNumRegsRequired);
5130
5131 // vdst, srcA, srcB, srcC, idx
5132 OpdsMapping[0] = UseAGPRForm ? getAGPROpMapping(DstReg, MRI, *TRI)
5133 : getVGPROpMapping(DstReg, MRI, *TRI);
5134
5135 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5136 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5137 OpdsMapping[4] =
5138 UseAGPRForm ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5139 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5140 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5141 break;
5142 }
5143 case Intrinsic::amdgcn_interp_p1:
5144 case Intrinsic::amdgcn_interp_p2:
5145 case Intrinsic::amdgcn_interp_mov:
5146 case Intrinsic::amdgcn_interp_p1_f16:
5147 case Intrinsic::amdgcn_interp_p2_f16:
5148 case Intrinsic::amdgcn_lds_param_load: {
5149 const int M0Idx = MI.getNumOperands() - 1;
5150 Register M0Reg = MI.getOperand(M0Idx).getReg();
5151 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5152 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5153
5154 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5155 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5156 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5157
5158 // Must be SGPR, but we must take whatever the original bank is and fix it
5159 // later.
5160 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5161 break;
5162 }
5163 case Intrinsic::amdgcn_interp_inreg_p10:
5164 case Intrinsic::amdgcn_interp_inreg_p2:
5165 case Intrinsic::amdgcn_interp_inreg_p10_f16:
5166 case Intrinsic::amdgcn_interp_inreg_p2_f16:
5167 case Intrinsic::amdgcn_interp_p10_rtz_f16:
5168 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
5169 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5170 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5171 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5172 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5173 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5174 break;
5175 }
5176 case Intrinsic::amdgcn_permlane16_swap:
5177 case Intrinsic::amdgcn_permlane32_swap: {
5178 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5179 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
5180 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5181 break;
5182 }
5183 case Intrinsic::amdgcn_ballot: {
5184 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5185 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5186 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5187 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
5188 break;
5189 }
5190 case Intrinsic::amdgcn_inverse_ballot: {
5191 // This must be an SGPR, but accept a VGPR.
5192 Register MaskReg = MI.getOperand(2).getReg();
5193 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5194 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5195 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5196 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5197 break;
5198 }
5199 case Intrinsic::amdgcn_bitop3: {
5200 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5201 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5202 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5203 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5204 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5205 break;
5206 }
5207 case Intrinsic::amdgcn_s_quadmask:
5208 case Intrinsic::amdgcn_s_wqm: {
5209 Register MaskReg = MI.getOperand(2).getReg();
5210 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5211 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5212 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
5213 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5214 break;
5215 }
5216 case Intrinsic::amdgcn_wave_reduce_add:
5217 case Intrinsic::amdgcn_wave_reduce_fadd:
5218 case Intrinsic::amdgcn_wave_reduce_sub:
5219 case Intrinsic::amdgcn_wave_reduce_fsub:
5220 case Intrinsic::amdgcn_wave_reduce_min:
5221 case Intrinsic::amdgcn_wave_reduce_umin:
5222 case Intrinsic::amdgcn_wave_reduce_fmin:
5223 case Intrinsic::amdgcn_wave_reduce_max:
5224 case Intrinsic::amdgcn_wave_reduce_umax:
5225 case Intrinsic::amdgcn_wave_reduce_fmax:
5226 case Intrinsic::amdgcn_wave_reduce_and:
5227 case Intrinsic::amdgcn_wave_reduce_or:
5228 case Intrinsic::amdgcn_wave_reduce_xor: {
5229 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5230 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5231 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5232 auto regBankID =
5233 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5234 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
5235 break;
5236 }
5237 case Intrinsic::amdgcn_s_bitreplicate:
5238 Register MaskReg = MI.getOperand(2).getReg();
5239 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5240 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5241 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
5242 }
5243 break;
5244 }
5245 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
5246 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5247 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5248 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5249 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5250 auto IntrID = AMDGPU::getIntrinsicID(MI);
5251 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5252 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5253 // Non-images can have complications from operands that allow both SGPR
5254 // and VGPR. For now it's too complicated to figure out the final opcode
5255 // to derive the register bank from the MCInstrDesc.
5256 assert(RSrcIntrin->IsImage);
5257 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5258 }
5259 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
5260 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
5261 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
5262 bool IsDualOrBVH8 =
5263 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
5264 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
5265 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
5266 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
5267 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5268 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5269 if (IsDualOrBVH8) {
5270 OpdsMapping[1] = AMDGPU::getValueMapping(
5271 AMDGPU::VGPRRegBankID,
5272 MRI.getType(MI.getOperand(1).getReg()).getSizeInBits());
5273 OpdsMapping[2] = AMDGPU::getValueMapping(
5274 AMDGPU::VGPRRegBankID,
5275 MRI.getType(MI.getOperand(2).getReg()).getSizeInBits());
5276 }
5277 OpdsMapping[LastRegOpIdx] =
5278 getSGPROpMapping(MI.getOperand(LastRegOpIdx).getReg(), MRI, *TRI);
5279 if (LastRegOpIdx == 3) {
5280 // Sequential form: all operands combined into VGPR256/VGPR512
5281 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5282 if (Size > 256)
5283 Size = 512;
5284 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5285 } else {
5286 // NSA form
5287 unsigned FirstSrcOpIdx = IsDualOrBVH8 ? 4 : 2;
5288 for (unsigned I = FirstSrcOpIdx; I < LastRegOpIdx; ++I) {
5289 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5290 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5291 }
5292 }
5293 break;
5294 }
5295 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5296 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5297 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5298 switch (IntrID) {
5299 case Intrinsic::amdgcn_s_getreg:
5300 case Intrinsic::amdgcn_s_memtime:
5301 case Intrinsic::amdgcn_s_memrealtime:
5302 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5303 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5304 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5305 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5306 break;
5307 }
5308 case Intrinsic::amdgcn_global_atomic_csub:
5309 case Intrinsic::amdgcn_global_atomic_fmin_num:
5310 case Intrinsic::amdgcn_global_atomic_fmax_num:
5311 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5312 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5313 case Intrinsic::amdgcn_atomic_cond_sub_u32:
5314 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5315 case Intrinsic::amdgcn_global_load_tr_b64:
5316 case Intrinsic::amdgcn_global_load_tr_b128:
5317 case Intrinsic::amdgcn_global_load_tr4_b64:
5318 case Intrinsic::amdgcn_global_load_tr6_b96:
5319 case Intrinsic::amdgcn_ds_load_tr8_b64:
5320 case Intrinsic::amdgcn_ds_load_tr16_b128:
5321 case Intrinsic::amdgcn_ds_load_tr4_b64:
5322 case Intrinsic::amdgcn_ds_load_tr6_b96:
5323 case Intrinsic::amdgcn_flat_load_monitor_b32:
5324 case Intrinsic::amdgcn_flat_load_monitor_b64:
5325 case Intrinsic::amdgcn_flat_load_monitor_b128:
5326 case Intrinsic::amdgcn_global_load_monitor_b32:
5327 case Intrinsic::amdgcn_global_load_monitor_b64:
5328 case Intrinsic::amdgcn_global_load_monitor_b128:
5329 case Intrinsic::amdgcn_ds_read_tr4_b64:
5330 case Intrinsic::amdgcn_ds_read_tr6_b96:
5331 case Intrinsic::amdgcn_ds_read_tr8_b64:
5332 case Intrinsic::amdgcn_ds_read_tr16_b64:
5333 case Intrinsic::amdgcn_ds_atomic_async_barrier_arrive_b64:
5334 case Intrinsic::amdgcn_ds_atomic_barrier_arrive_rtn_b64:
5336 case Intrinsic::amdgcn_ds_ordered_add:
5337 case Intrinsic::amdgcn_ds_ordered_swap: {
5338 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5339 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5340 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5341 AMDGPU::SGPRRegBankID);
5342 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5343 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5344 break;
5345 }
5346 case Intrinsic::amdgcn_ds_append:
5347 case Intrinsic::amdgcn_ds_consume: {
5348 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5349 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5350 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5351 break;
5352 }
5353 case Intrinsic::amdgcn_exp_compr:
5354 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5355 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5356 break;
5357 case Intrinsic::amdgcn_exp:
5358 // FIXME: Could we support packed types here?
5359 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5360 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5361 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5362 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5363 break;
5364 case Intrinsic::amdgcn_exp_row:
5365 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5366 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5367 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5368 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5369 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5370 break;
5371 case Intrinsic::amdgcn_s_sendmsg:
5372 case Intrinsic::amdgcn_s_sendmsghalt: {
5373 // This must be an SGPR, but accept a VGPR.
5374 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5375 AMDGPU::SGPRRegBankID);
5376 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5377 break;
5378 }
5379 case Intrinsic::amdgcn_s_setreg: {
5380 // This must be an SGPR, but accept a VGPR.
5381 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5382 AMDGPU::SGPRRegBankID);
5383 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5384 break;
5385 }
5386 case Intrinsic::amdgcn_s_ttracedata: {
5387 // This must be an SGPR, but accept a VGPR.
5388 unsigned Bank =
5389 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5390 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5391 break;
5392 }
5393 case Intrinsic::amdgcn_end_cf: {
5394 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5395 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5396 break;
5397 }
5398 case Intrinsic::amdgcn_else: {
5399 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5400 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5401 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5402 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5403 break;
5404 }
5405 case Intrinsic::amdgcn_init_whole_wave:
5406 case Intrinsic::amdgcn_live_mask: {
5407 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5408 break;
5409 }
5410 case Intrinsic::amdgcn_wqm_demote:
5411 case Intrinsic::amdgcn_kill: {
5412 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5413 break;
5414 }
5415 case Intrinsic::amdgcn_raw_buffer_load:
5416 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5417 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5418 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5419 case Intrinsic::amdgcn_raw_tbuffer_load:
5420 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5421 // FIXME: Should make intrinsic ID the last operand of the instruction,
5422 // then this would be the same as store
5423 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5424 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5425 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5426 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5427 break;
5428 }
5429 case Intrinsic::amdgcn_raw_buffer_load_lds:
5430 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
5431 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5432 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5433 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5434 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5435 break;
5436 }
5437 case Intrinsic::amdgcn_raw_buffer_store:
5438 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5439 case Intrinsic::amdgcn_raw_buffer_store_format:
5440 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5441 case Intrinsic::amdgcn_raw_tbuffer_store:
5442 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5443 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5444 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5445 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5446 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5447 break;
5448 }
5449 case Intrinsic::amdgcn_struct_buffer_load:
5450 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5451 case Intrinsic::amdgcn_struct_tbuffer_load:
5452 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5453 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5454 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5455 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5456 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5457 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5458 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5459 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5460 break;
5461 }
5462 case Intrinsic::amdgcn_struct_buffer_load_lds:
5463 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
5464 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5465 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5466 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5467 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5468 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5469 break;
5470 }
5471 case Intrinsic::amdgcn_struct_buffer_store:
5472 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5473 case Intrinsic::amdgcn_struct_tbuffer_store:
5474 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5475 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5476 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5477 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5478 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5479 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5480 break;
5481 }
5482 case Intrinsic::amdgcn_init_exec_from_input: {
5483 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5484 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5485 break;
5486 }
5487 case Intrinsic::amdgcn_ds_gws_init:
5488 case Intrinsic::amdgcn_ds_gws_barrier:
5489 case Intrinsic::amdgcn_ds_gws_sema_br: {
5490 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5491
5492 // This must be an SGPR, but accept a VGPR.
5493 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5494 AMDGPU::SGPRRegBankID);
5495 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5496 break;
5497 }
5498 case Intrinsic::amdgcn_ds_gws_sema_v:
5499 case Intrinsic::amdgcn_ds_gws_sema_p:
5500 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5501 // This must be an SGPR, but accept a VGPR.
5502 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5503 AMDGPU::SGPRRegBankID);
5504 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5505 break;
5506 }
5507 case Intrinsic::amdgcn_cluster_load_b32:
5508 case Intrinsic::amdgcn_cluster_load_b64:
5509 case Intrinsic::amdgcn_cluster_load_b128: {
5510 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5511 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5512 unsigned M0Bank =
5513 getRegBankID(MI.getOperand(4).getReg(), MRI, AMDGPU::SGPRRegBankID);
5514 OpdsMapping[4] = AMDGPU::getValueMapping(M0Bank, 32);
5515 break;
5516 }
5517 case Intrinsic::amdgcn_cluster_load_async_to_lds_b8:
5518 case Intrinsic::amdgcn_cluster_load_async_to_lds_b32:
5519 case Intrinsic::amdgcn_cluster_load_async_to_lds_b64:
5520 case Intrinsic::amdgcn_cluster_load_async_to_lds_b128: {
5521 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5522 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5523 unsigned M0Bank =
5524 getRegBankID(MI.getOperand(5).getReg(), MRI, AMDGPU::SGPRRegBankID);
5525 OpdsMapping[5] = AMDGPU::getValueMapping(M0Bank, 32);
5526 break;
5527 }
5528 case Intrinsic::amdgcn_global_store_async_from_lds_b8:
5529 case Intrinsic::amdgcn_global_store_async_from_lds_b32:
5530 case Intrinsic::amdgcn_global_store_async_from_lds_b64:
5531 case Intrinsic::amdgcn_global_store_async_from_lds_b128:
5532 case Intrinsic::amdgcn_global_load_async_to_lds_b8:
5533 case Intrinsic::amdgcn_global_load_async_to_lds_b32:
5534 case Intrinsic::amdgcn_global_load_async_to_lds_b64:
5535 case Intrinsic::amdgcn_global_load_async_to_lds_b128:
5536 case Intrinsic::amdgcn_load_to_lds:
5537 case Intrinsic::amdgcn_global_load_lds: {
5538 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5539 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5540 break;
5541 }
5542 case Intrinsic::amdgcn_lds_direct_load: {
5543 const int M0Idx = MI.getNumOperands() - 1;
5544 Register M0Reg = MI.getOperand(M0Idx).getReg();
5545 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5546 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5547
5548 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5549 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5550 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5551
5552 // Must be SGPR, but we must take whatever the original bank is and fix it
5553 // later.
5554 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5555 break;
5556 }
5557 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5558 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5559 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5560 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5561 break;
5562 case Intrinsic::amdgcn_ds_bvh_stack_rtn:
5563 case Intrinsic::amdgcn_ds_bvh_stack_push4_pop1_rtn:
5564 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop1_rtn:
5565 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop2_rtn: {
5566 OpdsMapping[0] =
5567 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5568 OpdsMapping[1] =
5569 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5570 OpdsMapping[3] =
5571 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5572 OpdsMapping[4] =
5573 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5574 OpdsMapping[5] =
5575 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5576 break;
5577 }
5578 case Intrinsic::amdgcn_s_sleep_var:
5579 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5580 break;
5581 case Intrinsic::amdgcn_s_barrier_join:
5582 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5583 break;
5584 case Intrinsic::amdgcn_s_barrier_init:
5585 case Intrinsic::amdgcn_s_barrier_signal_var:
5586 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5587 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5588 break;
5589 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5590 const unsigned ResultSize = 1;
5591 OpdsMapping[0] =
5592 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5593 break;
5594 }
5595 case Intrinsic::amdgcn_s_get_barrier_state:
5596 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5597 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5598 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5599 break;
5600 }
5601 case Intrinsic::amdgcn_pops_exiting_wave_id:
5602 return getDefaultMappingSOP(MI);
5603 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
5604 case Intrinsic::amdgcn_tensor_store_from_lds_d2:
5605 case Intrinsic::amdgcn_tensor_load_to_lds:
5606 case Intrinsic::amdgcn_tensor_store_from_lds: {
5607 // Lie and claim everything is legal, even all operands need to be
5608 // SGPRs. applyMapping will have to deal with it with readfirstlane.
5609 for (unsigned I = 1; I < MI.getNumOperands(); ++I) {
5610 if (MI.getOperand(I).isReg()) {
5611 Register Reg = MI.getOperand(I).getReg();
5612 auto OpBank = getRegBankID(Reg, MRI);
5613 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5614 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5615 }
5616 }
5617 break;
5618 }
5619 case Intrinsic::amdgcn_s_prefetch_data: {
5620 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5621 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5622 break;
5623 }
5624 case Intrinsic::amdgcn_flat_prefetch:
5625 case Intrinsic::amdgcn_global_prefetch:
5626 return getDefaultMappingVOP(MI);
5627 default:
5629 }
5630 break;
5631 }
5632 case AMDGPU::G_SELECT: {
5633 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5634 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5635 AMDGPU::SGPRRegBankID);
5636 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5637 AMDGPU::SGPRRegBankID);
5638 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5639 Op3Bank == AMDGPU::SGPRRegBankID;
5640
5641 unsigned CondBankDefault = SGPRSrcs ?
5642 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5643 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5644 CondBankDefault);
5645 if (CondBank == AMDGPU::SGPRRegBankID)
5646 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5647 else if (CondBank == AMDGPU::VGPRRegBankID)
5648 CondBank = AMDGPU::VCCRegBankID;
5649
5650 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5651 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5652
5653 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5654
5655 // TODO: Should report 32-bit for scalar condition type.
5656 if (Size == 64) {
5657 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5658 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5659 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5660 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5661 } else {
5662 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5663 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5664 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5665 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5666 }
5667
5668 break;
5669 }
5670
5671 case AMDGPU::G_SI_CALL: {
5672 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5673 // Lie and claim everything is legal, even though some need to be
5674 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5675 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5676
5677 // Allow anything for implicit arguments
5678 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5679 if (MI.getOperand(I).isReg()) {
5680 Register Reg = MI.getOperand(I).getReg();
5681 auto OpBank = getRegBankID(Reg, MRI);
5682 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5683 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5684 }
5685 }
5686 break;
5687 }
5688 case AMDGPU::G_LOAD:
5689 case AMDGPU::G_ZEXTLOAD:
5690 case AMDGPU::G_SEXTLOAD:
5691 return getInstrMappingForLoad(MI);
5692
5693 case AMDGPU::G_ATOMICRMW_XCHG:
5694 case AMDGPU::G_ATOMICRMW_ADD:
5695 case AMDGPU::G_ATOMICRMW_SUB:
5696 case AMDGPU::G_ATOMICRMW_AND:
5697 case AMDGPU::G_ATOMICRMW_OR:
5698 case AMDGPU::G_ATOMICRMW_XOR:
5699 case AMDGPU::G_ATOMICRMW_MAX:
5700 case AMDGPU::G_ATOMICRMW_MIN:
5701 case AMDGPU::G_ATOMICRMW_UMAX:
5702 case AMDGPU::G_ATOMICRMW_UMIN:
5703 case AMDGPU::G_ATOMICRMW_FADD:
5704 case AMDGPU::G_ATOMICRMW_FMIN:
5705 case AMDGPU::G_ATOMICRMW_FMAX:
5706 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5707 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5708 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5709 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5710 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5711 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5712 break;
5713 }
5714 case AMDGPU::G_ATOMIC_CMPXCHG: {
5715 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5716 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5717 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5718 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5719 break;
5720 }
5721 case AMDGPU::G_BRCOND: {
5722 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5723 AMDGPU::SGPRRegBankID);
5724 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5725 if (Bank != AMDGPU::SGPRRegBankID)
5726 Bank = AMDGPU::VCCRegBankID;
5727
5728 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5729 break;
5730 }
5731 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5732 return getDefaultMappingVOP(MI);
5733 case AMDGPU::G_PREFETCH:
5734 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5735 break;
5736 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_SETUP:
5737 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_RETURN:
5738 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5739 break;
5740 }
5741
5742 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5743 getOperandsMapping(OpdsMapping),
5744 MI.getNumOperands());
5745}
unsigned const MachineRegisterInfo * MRI
static unsigned getIntrinsicID(const SDNode *N)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
constexpr LLT S16
constexpr LLT S1
constexpr LLT S32
constexpr LLT S64
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
IRTranslator LLVM IR MI
const size_t AbstractManglingParser< Derived, Alloc >::NumOps
const AbstractManglingParser< Derived, Alloc >::OperatorInfo AbstractManglingParser< Derived, Alloc >::Ops[]
#define I(x, y, z)
Definition MD5.cpp:57
Contains matchers for matching SSA Machine Instructions.
This file declares the MachineIRBuilder class.
Register Reg
Promote Memory to Register
Definition Mem2Reg.cpp:110
static bool isReg(const MCInst &MI, unsigned OpNo)
MachineInstr unsigned OpIdx
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static constexpr MCPhysReg SPReg
Interface definition for SIRegisterInfo.
static TableGen::Emitter::Opt Y("gen-skeleton-entry", EmitSkeleton, "Generate example skeleton entry")
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
static const LaneMaskConstants & get(const GCNSubtarget &ST)
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_NE
not equal
Definition InstrTypes.h:698
A debug info location.
Definition DebugLoc.h:124
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:178
iterator end()
Definition DenseMap.h:81
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition DenseMap.h:241
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition TypeSize.h:309
Abstract class that contains various methods for clients to notify about changes.
constexpr unsigned getScalarSizeInBits() const
constexpr bool isScalar() const
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
constexpr bool isVector() const
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
constexpr LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
constexpr unsigned getAddressSpace() const
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
constexpr LLT getScalarType() const
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
constexpr LLT divide(int Factor) const
Return a type that is Factor times smaller.
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
LLVM_ABI void widenScalarSrc(MachineInstr &MI, LLT WideTy, unsigned OpIdx, unsigned ExtOpcode)
Legalize a single operand OpIdx of the machine instruction MI as a Use by extending the operand's typ...
LLVM_ABI LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LLVM_ABI LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LLVM_ABI LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LLVM_ABI LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LLVM_ABI LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
LLVM_ABI void widenScalarDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx=0, unsigned TruncOpcode=TargetOpcode::G_TRUNC)
Legalize a single operand OpIdx of the machine instruction MI as a Def by extending the operand's typ...
TypeSize getValue() const
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
const MachineBasicBlock * getParent() const
const MachineOperand & getOperand(unsigned i) const
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
LLVM_ABI Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
LLVM_ABI void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
SmallVector< const InstructionMapping *, 4 > InstructionMappings
Convenient type to represent the alternatives for mapping an instruction.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
unsigned getID() const
Get the identifier of this register bank.
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition Register.h:79
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
bool selectAGPRFormMFMA(unsigned NumRegs) const
Return true if an MFMA that requires at least NumRegs should select to the AGPR form,...
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition SmallSet.h:133
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition SmallSet.h:175
bool empty() const
Definition SmallSet.h:168
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition SmallSet.h:183
void resize(size_type N)
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:296
self_iterator getIterator()
Definition ilist_node.h:123
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isUniformMMO(const MachineMemOperand *MMO)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelValueTracking *ValueTracking=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
operand_type_match m_Reg()
SpecificConstantMatch m_ZeroInt()
Convenience matchers for specific integer values.
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
SpecificConstantOrSplatMatch m_SpecificICstOrSplat(const APInt &RequestedValue)
Matches a RequestedValue constant or a constant splat of RequestedValue.
@ Kill
The last use of a register.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
LLVM_ABI MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition Utils.cpp:651
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
LLVM_ABI bool constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition Utils.cpp:155
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition Utils.cpp:314
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition SIInstrInfo.h:44
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
@ Add
Sum of integers.
DWARFExpression::Operation Op
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition Threading.h:86
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
LLVM_ABI std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition Utils.cpp:433
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition Alignment.h:100
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition Alignment.h:197
LLVM_ABI Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition Utils.cpp:499
constexpr T maskTrailingOnes(unsigned N)
Create a bitmask with the N right-most bits set to 1, and all other bits set to 0.
Definition MathExtras.h:77
@ Default
The result values are uniform if and only if all operands are uniform.
Definition Uniformity.h:20
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition Threading.h:67