LLVM 18.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUTargetMachine.h"
181#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
190#include "llvm/IR/Constants.h"
191#include "llvm/IR/DerivedTypes.h"
192#include "llvm/IR/IRBuilder.h"
193#include "llvm/IR/InlineAsm.h"
194#include "llvm/IR/Instructions.h"
195#include "llvm/IR/IntrinsicsAMDGPU.h"
196#include "llvm/IR/MDBuilder.h"
199#include "llvm/Pass.h"
201#include "llvm/Support/Debug.h"
202#include "llvm/Support/Format.h"
207
208#include <tuple>
209#include <vector>
210
211#include <cstdio>
212
213#define DEBUG_TYPE "amdgpu-lower-module-lds"
214
215using namespace llvm;
216
217namespace {
218
219cl::opt<bool> SuperAlignLDSGlobals(
220 "amdgpu-super-align-lds-globals",
221 cl::desc("Increase alignment of LDS if it is not on align boundary"),
222 cl::init(true), cl::Hidden);
223
224enum class LoweringKind { module, table, kernel, hybrid };
225cl::opt<LoweringKind> LoweringKindLoc(
226 "amdgpu-lower-module-lds-strategy",
227 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
228 cl::init(LoweringKind::hybrid),
230 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
231 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
233 LoweringKind::kernel, "kernel",
234 "Lower variables reachable from one kernel, otherwise abort"),
235 clEnumValN(LoweringKind::hybrid, "hybrid",
236 "Lower via mixture of above strategies")));
237
238bool isKernelLDS(const Function *F) {
239 // Some weirdness here. AMDGPU::isKernelCC does not call into
240 // AMDGPU::isKernel with the calling conv, it instead calls into
241 // isModuleEntryFunction which returns true for more calling conventions
242 // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
243 // There's also a test that checks that the LDS lowering does not hit on
244 // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
245 // Putting LDS in the name of the function to draw attention to this.
246 return AMDGPU::isKernel(F->getCallingConv());
247}
248
249template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
250 llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
251 return L->getName() < R->getName();
252 });
253 return {std::move(V)};
254}
255
256class AMDGPULowerModuleLDS {
257 const AMDGPUTargetMachine &TM;
258
259 static void
260 removeLocalVarsFromUsedLists(Module &M,
261 const DenseSet<GlobalVariable *> &LocalVars) {
262 // The verifier rejects used lists containing an inttoptr of a constant
263 // so remove the variables from these lists before replaceAllUsesWith
264 SmallPtrSet<Constant *, 8> LocalVarsSet;
265 for (GlobalVariable *LocalVar : LocalVars)
266 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
267
269 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
270
271 for (GlobalVariable *LocalVar : LocalVars)
272 LocalVar->removeDeadConstantUsers();
273 }
274
275 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
276 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
277 // that might call a function which accesses a field within it. This is
278 // presently approximated to 'all kernels' if there are any such functions
279 // in the module. This implicit use is redefined as an explicit use here so
280 // that later passes, specifically PromoteAlloca, account for the required
281 // memory without any knowledge of this transform.
282
283 // An operand bundle on llvm.donothing works because the call instruction
284 // survives until after the last pass that needs to account for LDS. It is
285 // better than inline asm as the latter survives until the end of codegen. A
286 // totally robust solution would be a function with the same semantics as
287 // llvm.donothing that takes a pointer to the instance and is lowered to a
288 // no-op after LDS is allocated, but that is not presently necessary.
289
290 // This intrinsic is eliminated shortly before instruction selection. It
291 // does not suffice to indicate to ISel that a given global which is not
292 // immediately used by the kernel must still be allocated by it. An
293 // equivalent target specific intrinsic which lasts until immediately after
294 // codegen would suffice for that, but one would still need to ensure that
295 // the variables are allocated in the anticpated order.
296 BasicBlock *Entry = &Func->getEntryBlock();
297 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
298
299 Function *Decl =
300 Intrinsic::getDeclaration(Func->getParent(), Intrinsic::donothing, {});
301
302 Value *UseInstance[1] = {
303 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
304
305 Builder.CreateCall(
306 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
307 }
308
309 static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
310 // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
311 // global may have uses from multiple different functions as a result.
312 // This pass specialises LDS variables with respect to the kernel that
313 // allocates them.
314
315 // This is semantically equivalent to (the unimplemented as slow):
316 // for (auto &F : M.functions())
317 // for (auto &BB : F)
318 // for (auto &I : BB)
319 // for (Use &Op : I.operands())
320 // if (constantExprUsesLDS(Op))
321 // replaceConstantExprInFunction(I, Op);
322
323 SmallVector<Constant *> LDSGlobals;
324 for (auto &GV : M.globals())
326 LDSGlobals.push_back(&GV);
327
328 return convertUsersOfConstantsToInstructions(LDSGlobals);
329 }
330
331public:
332 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
333
334 using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
335
336 using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
337
338 static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
339 FunctionVariableMap &kernels,
340 FunctionVariableMap &functions) {
341
342 // Get uses from the current function, excluding uses by called functions
343 // Two output variables to avoid walking the globals list twice
344 for (auto &GV : M.globals()) {
346 continue;
347 }
348
349 if (GV.isAbsoluteSymbolRef()) {
351 "LDS variables with absolute addresses are unimplemented.");
352 }
353
354 for (User *V : GV.users()) {
355 if (auto *I = dyn_cast<Instruction>(V)) {
356 Function *F = I->getFunction();
357 if (isKernelLDS(F)) {
358 kernels[F].insert(&GV);
359 } else {
360 functions[F].insert(&GV);
361 }
362 }
363 }
364 }
365 }
366
367 struct LDSUsesInfoTy {
368 FunctionVariableMap direct_access;
369 FunctionVariableMap indirect_access;
370 };
371
372 static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
373
374 FunctionVariableMap direct_map_kernel;
375 FunctionVariableMap direct_map_function;
376 getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
377
378 // Collect variables that are used by functions whose address has escaped
379 DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
380 for (Function &F : M.functions()) {
381 if (!isKernelLDS(&F))
382 if (F.hasAddressTaken(nullptr,
383 /* IgnoreCallbackUses */ false,
384 /* IgnoreAssumeLikeCalls */ false,
385 /* IgnoreLLVMUsed */ true,
386 /* IgnoreArcAttachedCall */ false)) {
387 set_union(VariablesReachableThroughFunctionPointer,
388 direct_map_function[&F]);
389 }
390 }
391
392 auto functionMakesUnknownCall = [&](const Function *F) -> bool {
393 assert(!F->isDeclaration());
394 for (const CallGraphNode::CallRecord &R : *CG[F]) {
395 if (!R.second->getFunction()) {
396 return true;
397 }
398 }
399 return false;
400 };
401
402 // Work out which variables are reachable through function calls
403 FunctionVariableMap transitive_map_function = direct_map_function;
404
405 // If the function makes any unknown call, assume the worst case that it can
406 // access all variables accessed by functions whose address escaped
407 for (Function &F : M.functions()) {
408 if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
409 if (!isKernelLDS(&F)) {
410 set_union(transitive_map_function[&F],
411 VariablesReachableThroughFunctionPointer);
412 }
413 }
414 }
415
416 // Direct implementation of collecting all variables reachable from each
417 // function
418 for (Function &Func : M.functions()) {
419 if (Func.isDeclaration() || isKernelLDS(&Func))
420 continue;
421
422 DenseSet<Function *> seen; // catches cycles
424
425 while (!wip.empty()) {
426 Function *F = wip.pop_back_val();
427
428 // Can accelerate this by referring to transitive map for functions that
429 // have already been computed, with more care than this
430 set_union(transitive_map_function[&Func], direct_map_function[F]);
431
432 for (const CallGraphNode::CallRecord &R : *CG[F]) {
433 Function *ith = R.second->getFunction();
434 if (ith) {
435 if (!seen.contains(ith)) {
436 seen.insert(ith);
437 wip.push_back(ith);
438 }
439 }
440 }
441 }
442 }
443
444 // direct_map_kernel lists which variables are used by the kernel
445 // find the variables which are used through a function call
446 FunctionVariableMap indirect_map_kernel;
447
448 for (Function &Func : M.functions()) {
449 if (Func.isDeclaration() || !isKernelLDS(&Func))
450 continue;
451
452 for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
453 Function *ith = R.second->getFunction();
454 if (ith) {
455 set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
456 } else {
457 set_union(indirect_map_kernel[&Func],
458 VariablesReachableThroughFunctionPointer);
459 }
460 }
461 }
462
463 return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
464 }
465
466 struct LDSVariableReplacement {
467 GlobalVariable *SGV = nullptr;
468 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
469 };
470
471 // remap from lds global to a constantexpr gep to where it has been moved to
472 // for each kernel
473 // an array with an element for each kernel containing where the corresponding
474 // variable was remapped to
475
476 static Constant *getAddressesOfVariablesInKernel(
478 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
479 // Create a ConstantArray containing the address of each Variable within the
480 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
481 // does not allocate it
482 // TODO: Drop the ptrtoint conversion
483
484 Type *I32 = Type::getInt32Ty(Ctx);
485
486 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
487
489 for (size_t i = 0; i < Variables.size(); i++) {
490 GlobalVariable *GV = Variables[i];
491 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
492 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
493 auto elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
494 Elements.push_back(elt);
495 } else {
496 Elements.push_back(PoisonValue::get(I32));
497 }
498 }
499 return ConstantArray::get(KernelOffsetsType, Elements);
500 }
501
502 static GlobalVariable *buildLookupTable(
504 ArrayRef<Function *> kernels,
506 if (Variables.empty()) {
507 return nullptr;
508 }
509 LLVMContext &Ctx = M.getContext();
510
511 const size_t NumberVariables = Variables.size();
512 const size_t NumberKernels = kernels.size();
513
514 ArrayType *KernelOffsetsType =
515 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
516
517 ArrayType *AllKernelsOffsetsType =
518 ArrayType::get(KernelOffsetsType, NumberKernels);
519
520 Constant *Missing = PoisonValue::get(KernelOffsetsType);
521 std::vector<Constant *> overallConstantExprElts(NumberKernels);
522 for (size_t i = 0; i < NumberKernels; i++) {
523 auto Replacement = KernelToReplacement.find(kernels[i]);
524 overallConstantExprElts[i] =
525 (Replacement == KernelToReplacement.end())
526 ? Missing
527 : getAddressesOfVariablesInKernel(
528 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
529 }
530
531 Constant *init =
532 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
533
534 return new GlobalVariable(
535 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
536 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
538 }
539
540 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
541 GlobalVariable *LookupTable,
542 GlobalVariable *GV, Use &U,
543 Value *OptionalIndex) {
544 // Table is a constant array of the same length as OrderedKernels
545 LLVMContext &Ctx = M.getContext();
546 Type *I32 = Type::getInt32Ty(Ctx);
547 auto *I = cast<Instruction>(U.getUser());
548
549 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
550
551 if (auto *Phi = dyn_cast<PHINode>(I)) {
552 BasicBlock *BB = Phi->getIncomingBlock(U);
553 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
554 } else {
555 Builder.SetInsertPoint(I);
556 }
557
558 SmallVector<Value *, 3> GEPIdx = {
559 ConstantInt::get(I32, 0),
560 tableKernelIndex,
561 };
562 if (OptionalIndex)
563 GEPIdx.push_back(OptionalIndex);
564
565 Value *Address = Builder.CreateInBoundsGEP(
566 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
567
568 Value *loaded = Builder.CreateLoad(I32, Address);
569
570 Value *replacement =
571 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
572
573 U.set(replacement);
574 }
575
576 void replaceUsesInInstructionsWithTableLookup(
577 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
578 GlobalVariable *LookupTable) {
579
580 LLVMContext &Ctx = M.getContext();
581 IRBuilder<> Builder(Ctx);
582 Type *I32 = Type::getInt32Ty(Ctx);
583
584 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
585 auto *GV = ModuleScopeVariables[Index];
586
587 for (Use &U : make_early_inc_range(GV->uses())) {
588 auto *I = dyn_cast<Instruction>(U.getUser());
589 if (!I)
590 continue;
591
592 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
593 ConstantInt::get(I32, Index));
594 }
595 }
596 }
597
598 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
599 Module &M, LDSUsesInfoTy &LDSUsesInfo,
600 DenseSet<GlobalVariable *> const &VariableSet) {
601
602 DenseSet<Function *> KernelSet;
603
604 if (VariableSet.empty())
605 return KernelSet;
606
607 for (Function &Func : M.functions()) {
608 if (Func.isDeclaration() || !isKernelLDS(&Func))
609 continue;
610 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
611 if (VariableSet.contains(GV)) {
612 KernelSet.insert(&Func);
613 break;
614 }
615 }
616 }
617
618 return KernelSet;
619 }
620
621 static GlobalVariable *
622 chooseBestVariableForModuleStrategy(const DataLayout &DL,
623 VariableFunctionMap &LDSVars) {
624 // Find the global variable with the most indirect uses from kernels
625
626 struct CandidateTy {
627 GlobalVariable *GV = nullptr;
628 size_t UserCount = 0;
629 size_t Size = 0;
630
631 CandidateTy() = default;
632
633 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
634 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
635
636 bool operator<(const CandidateTy &Other) const {
637 // Fewer users makes module scope variable less attractive
638 if (UserCount < Other.UserCount) {
639 return true;
640 }
641 if (UserCount > Other.UserCount) {
642 return false;
643 }
644
645 // Bigger makes module scope variable less attractive
646 if (Size < Other.Size) {
647 return false;
648 }
649
650 if (Size > Other.Size) {
651 return true;
652 }
653
654 // Arbitrary but consistent
655 return GV->getName() < Other.GV->getName();
656 }
657 };
658
659 CandidateTy MostUsed;
660
661 for (auto &K : LDSVars) {
662 GlobalVariable *GV = K.first;
663 if (K.second.size() <= 1) {
664 // A variable reachable by only one kernel is best lowered with kernel
665 // strategy
666 continue;
667 }
668 CandidateTy Candidate(
669 GV, K.second.size(),
670 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
671 if (MostUsed < Candidate)
672 MostUsed = Candidate;
673 }
674
675 return MostUsed.GV;
676 }
677
678 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
679 uint32_t Address) {
680 // Write the specified address into metadata where it can be retrieved by
681 // the assembler. Format is a half open range, [Address Address+1)
682 LLVMContext &Ctx = M->getContext();
683 auto *IntTy =
684 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
685 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
686 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
687 GV->setMetadata(LLVMContext::MD_absolute_symbol,
688 MDNode::get(Ctx, {MinC, MaxC}));
689 }
690
691 DenseMap<Function *, Value *> tableKernelIndexCache;
692 Value *getTableLookupKernelIndex(Module &M, Function *F) {
693 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
694 // lowers to a read from a live in register. Emit it once in the entry
695 // block to spare deduplicating it later.
696 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
697 if (Inserted) {
698 Function *Decl =
699 Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_lds_kernel_id, {});
700
701 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
702 IRBuilder<> Builder(&*InsertAt);
703
704 It->second = Builder.CreateCall(Decl, {});
705 }
706
707 return It->second;
708 }
709
710 static std::vector<Function *> assignLDSKernelIDToEachKernel(
711 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
712 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
713 // Associate kernels in the set with an arbirary but reproducible order and
714 // annotate them with that order in metadata. This metadata is recognised by
715 // the backend and lowered to a SGPR which can be read from using
716 // amdgcn_lds_kernel_id.
717
718 std::vector<Function *> OrderedKernels;
719 if (!KernelsThatAllocateTableLDS.empty() ||
720 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
721
722 for (Function &Func : M->functions()) {
723 if (Func.isDeclaration())
724 continue;
725 if (!isKernelLDS(&Func))
726 continue;
727
728 if (KernelsThatAllocateTableLDS.contains(&Func) ||
729 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
730 assert(Func.hasName()); // else fatal error earlier
731 OrderedKernels.push_back(&Func);
732 }
733 }
734
735 // Put them in an arbitrary but reproducible order
736 OrderedKernels = sortByName(std::move(OrderedKernels));
737
738 // Annotate the kernels with their order in this vector
739 LLVMContext &Ctx = M->getContext();
740 IRBuilder<> Builder(Ctx);
741
742 if (OrderedKernels.size() > UINT32_MAX) {
743 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
744 report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
745 }
746
747 for (size_t i = 0; i < OrderedKernels.size(); i++) {
748 Metadata *AttrMDArgs[1] = {
750 };
751 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
752 MDNode::get(Ctx, AttrMDArgs));
753 }
754 }
755 return OrderedKernels;
756 }
757
758 static void partitionVariablesIntoIndirectStrategies(
759 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
760 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
761 DenseSet<GlobalVariable *> &ModuleScopeVariables,
762 DenseSet<GlobalVariable *> &TableLookupVariables,
763 DenseSet<GlobalVariable *> &KernelAccessVariables,
764 DenseSet<GlobalVariable *> &DynamicVariables) {
765
766 GlobalVariable *HybridModuleRoot =
767 LoweringKindLoc != LoweringKind::hybrid
768 ? nullptr
769 : chooseBestVariableForModuleStrategy(
770 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
771
772 DenseSet<Function *> const EmptySet;
773 DenseSet<Function *> const &HybridModuleRootKernels =
774 HybridModuleRoot
775 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
776 : EmptySet;
777
778 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
779 // Each iteration of this loop assigns exactly one global variable to
780 // exactly one of the implementation strategies.
781
782 GlobalVariable *GV = K.first;
784 assert(K.second.size() != 0);
785
786 if (AMDGPU::isDynamicLDS(*GV)) {
787 DynamicVariables.insert(GV);
788 continue;
789 }
790
791 switch (LoweringKindLoc) {
792 case LoweringKind::module:
793 ModuleScopeVariables.insert(GV);
794 break;
795
796 case LoweringKind::table:
797 TableLookupVariables.insert(GV);
798 break;
799
800 case LoweringKind::kernel:
801 if (K.second.size() == 1) {
802 KernelAccessVariables.insert(GV);
803 } else {
805 "cannot lower LDS '" + GV->getName() +
806 "' to kernel access as it is reachable from multiple kernels");
807 }
808 break;
809
810 case LoweringKind::hybrid: {
811 if (GV == HybridModuleRoot) {
812 assert(K.second.size() != 1);
813 ModuleScopeVariables.insert(GV);
814 } else if (K.second.size() == 1) {
815 KernelAccessVariables.insert(GV);
816 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
817 ModuleScopeVariables.insert(GV);
818 } else {
819 TableLookupVariables.insert(GV);
820 }
821 break;
822 }
823 }
824 }
825
826 // All LDS variables accessed indirectly have now been partitioned into
827 // the distinct lowering strategies.
828 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
829 KernelAccessVariables.size() + DynamicVariables.size() ==
830 LDSToKernelsThatNeedToAccessItIndirectly.size());
831 }
832
833 static GlobalVariable *lowerModuleScopeStructVariables(
834 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
835 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
836 // Create a struct to hold the ModuleScopeVariables
837 // Replace all uses of those variables from non-kernel functions with the
838 // new struct instance Replace only the uses from kernel functions that will
839 // allocate this instance. That is a space optimisation - kernels that use a
840 // subset of the module scope struct and do not need to allocate it for
841 // indirect calls will only allocate the subset they use (they do so as part
842 // of the per-kernel lowering).
843 if (ModuleScopeVariables.empty()) {
844 return nullptr;
845 }
846
847 LLVMContext &Ctx = M.getContext();
848
849 LDSVariableReplacement ModuleScopeReplacement =
850 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
851 ModuleScopeVariables);
852
853 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
855 cast<Constant>(ModuleScopeReplacement.SGV),
856 PointerType::getUnqual(Ctx)))});
857
858 // module.lds will be allocated at zero in any kernel that allocates it
859 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
860
861 // historic
862 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
863
864 // Replace all uses of module scope variable from non-kernel functions
865 replaceLDSVariablesWithStruct(
866 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
867 Instruction *I = dyn_cast<Instruction>(U.getUser());
868 if (!I) {
869 return false;
870 }
871 Function *F = I->getFunction();
872 return !isKernelLDS(F);
873 });
874
875 // Replace uses of module scope variable from kernel functions that
876 // allocate the module scope variable, otherwise leave them unchanged
877 // Record on each kernel whether the module scope global is used by it
878
879 for (Function &Func : M.functions()) {
880 if (Func.isDeclaration() || !isKernelLDS(&Func))
881 continue;
882
883 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
884 replaceLDSVariablesWithStruct(
885 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
886 Instruction *I = dyn_cast<Instruction>(U.getUser());
887 if (!I) {
888 return false;
889 }
890 Function *F = I->getFunction();
891 return F == &Func;
892 });
893
894 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
895 }
896 }
897
898 return ModuleScopeReplacement.SGV;
899 }
900
902 lowerKernelScopeStructVariables(
903 Module &M, LDSUsesInfoTy &LDSUsesInfo,
904 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
905 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
906 GlobalVariable *MaybeModuleScopeStruct) {
907
908 // Create a struct for each kernel for the non-module-scope variables.
909
911 for (Function &Func : M.functions()) {
912 if (Func.isDeclaration() || !isKernelLDS(&Func))
913 continue;
914
915 DenseSet<GlobalVariable *> KernelUsedVariables;
916 // Allocating variables that are used directly in this struct to get
917 // alignment aware allocation and predictable frame size.
918 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
919 if (!AMDGPU::isDynamicLDS(*v)) {
920 KernelUsedVariables.insert(v);
921 }
922 }
923
924 // Allocating variables that are accessed indirectly so that a lookup of
925 // this struct instance can find them from nested functions.
926 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
927 if (!AMDGPU::isDynamicLDS(*v)) {
928 KernelUsedVariables.insert(v);
929 }
930 }
931
932 // Variables allocated in module lds must all resolve to that struct,
933 // not to the per-kernel instance.
934 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
935 for (GlobalVariable *v : ModuleScopeVariables) {
936 KernelUsedVariables.erase(v);
937 }
938 }
939
940 if (KernelUsedVariables.empty()) {
941 // Either used no LDS, or the LDS it used was all in the module struct
942 // or dynamically sized
943 continue;
944 }
945
946 // The association between kernel function and LDS struct is done by
947 // symbol name, which only works if the function in question has a
948 // name This is not expected to be a problem in practice as kernels
949 // are called by name making anonymous ones (which are named by the
950 // backend) difficult to use. This does mean that llvm test cases need
951 // to name the kernels.
952 if (!Func.hasName()) {
953 report_fatal_error("Anonymous kernels cannot use LDS variables");
954 }
955
956 std::string VarName =
957 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
958
959 auto Replacement =
960 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
961
962 // If any indirect uses, create a direct use to ensure allocation
963 // TODO: Simpler to unconditionally mark used but that regresses
964 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
965 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
966 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
967 !Accesses->second.empty())
968 markUsedByKernel(&Func, Replacement.SGV);
969
970 // remove preserves existing codegen
971 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
972 KernelToReplacement[&Func] = Replacement;
973
974 // Rewrite uses within kernel to the new struct
975 replaceLDSVariablesWithStruct(
976 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
977 Instruction *I = dyn_cast<Instruction>(U.getUser());
978 return I && I->getFunction() == &Func;
979 });
980 }
981 return KernelToReplacement;
982 }
983
984 static GlobalVariable *
985 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
986 Function *func) {
987 // Create a dynamic lds variable with a name associated with the passed
988 // function that has the maximum alignment of any dynamic lds variable
989 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
990 // allocation, possibly after alignment padding. The representative variable
991 // created here has the maximum alignment of any other dynamic variable
992 // reachable by that kernel. All dynamic LDS variables are allocated at the
993 // same address in each kernel in order to provide the documented aliasing
994 // semantics. Setting the alignment here allows this IR pass to accurately
995 // predict the exact constant at which it will be allocated.
996
997 assert(isKernelLDS(func));
998
999 LLVMContext &Ctx = M.getContext();
1000 const DataLayout &DL = M.getDataLayout();
1001 Align MaxDynamicAlignment(1);
1002
1003 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
1004 if (AMDGPU::isDynamicLDS(*GV)) {
1005 MaxDynamicAlignment =
1006 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
1007 }
1008 };
1009
1010 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
1011 UpdateMaxAlignment(GV);
1012 }
1013
1014 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
1015 UpdateMaxAlignment(GV);
1016 }
1017
1018 assert(func->hasName()); // Checked by caller
1019 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1021 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
1022 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
1023 false);
1024 N->setAlignment(MaxDynamicAlignment);
1025
1027 return N;
1028 }
1029
1030 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
1031 Module &M, LDSUsesInfoTy &LDSUsesInfo,
1032 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
1033 DenseSet<GlobalVariable *> const &DynamicVariables,
1034 std::vector<Function *> const &OrderedKernels) {
1035 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
1036 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
1037 LLVMContext &Ctx = M.getContext();
1038 IRBuilder<> Builder(Ctx);
1039 Type *I32 = Type::getInt32Ty(Ctx);
1040
1041 std::vector<Constant *> newDynamicLDS;
1042
1043 // Table is built in the same order as OrderedKernels
1044 for (auto &func : OrderedKernels) {
1045
1046 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
1047 assert(isKernelLDS(func));
1048 if (!func->hasName()) {
1049 report_fatal_error("Anonymous kernels cannot use LDS variables");
1050 }
1051
1052 GlobalVariable *N =
1053 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
1054
1055 KernelToCreatedDynamicLDS[func] = N;
1056
1057 markUsedByKernel(func, N);
1058
1059 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1061 emptyCharArray, N, ConstantInt::get(I32, 0), true);
1062 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
1063 } else {
1064 newDynamicLDS.push_back(PoisonValue::get(I32));
1065 }
1066 }
1067 assert(OrderedKernels.size() == newDynamicLDS.size());
1068
1069 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
1070 Constant *init = ConstantArray::get(t, newDynamicLDS);
1071 GlobalVariable *table = new GlobalVariable(
1072 M, t, true, GlobalValue::InternalLinkage, init,
1073 "llvm.amdgcn.dynlds.offset.table", nullptr,
1075
1076 for (GlobalVariable *GV : DynamicVariables) {
1077 for (Use &U : make_early_inc_range(GV->uses())) {
1078 auto *I = dyn_cast<Instruction>(U.getUser());
1079 if (!I)
1080 continue;
1081 if (isKernelLDS(I->getFunction()))
1082 continue;
1083
1084 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
1085 }
1086 }
1087 }
1088 return KernelToCreatedDynamicLDS;
1089 }
1090
1091 bool runOnModule(Module &M) {
1092 CallGraph CG = CallGraph(M);
1093 bool Changed = superAlignLDSGlobals(M);
1094
1095 Changed |= eliminateConstantExprUsesOfLDSFromAllInstructions(M);
1096
1097 Changed = true; // todo: narrow this down
1098
1099 // For each kernel, what variables does it access directly or through
1100 // callees
1101 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1102
1103 // For each variable accessed through callees, which kernels access it
1104 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1105 for (auto &K : LDSUsesInfo.indirect_access) {
1106 Function *F = K.first;
1107 assert(isKernelLDS(F));
1108 for (GlobalVariable *GV : K.second) {
1109 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1110 }
1111 }
1112
1113 // Partition variables accessed indirectly into the different strategies
1114 DenseSet<GlobalVariable *> ModuleScopeVariables;
1115 DenseSet<GlobalVariable *> TableLookupVariables;
1116 DenseSet<GlobalVariable *> KernelAccessVariables;
1117 DenseSet<GlobalVariable *> DynamicVariables;
1118 partitionVariablesIntoIndirectStrategies(
1119 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1120 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1121 DynamicVariables);
1122
1123 // If the kernel accesses a variable that is going to be stored in the
1124 // module instance through a call then that kernel needs to allocate the
1125 // module instance
1126 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1127 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1128 ModuleScopeVariables);
1129 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1130 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1131 TableLookupVariables);
1132
1133 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1134 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1135 DynamicVariables);
1136
1137 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1138 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1139
1141 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1142 KernelsThatAllocateModuleLDS,
1143 MaybeModuleScopeStruct);
1144
1145 // Lower zero cost accesses to the kernel instances just created
1146 for (auto &GV : KernelAccessVariables) {
1147 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1148 assert(funcs.size() == 1); // Only one kernel can access it
1149 LDSVariableReplacement Replacement =
1150 KernelToReplacement[*(funcs.begin())];
1151
1153 Vec.insert(GV);
1154
1155 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1156 return isa<Instruction>(U.getUser());
1157 });
1158 }
1159
1160 // The ith element of this vector is kernel id i
1161 std::vector<Function *> OrderedKernels =
1162 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1163 KernelsThatIndirectlyAllocateDynamicLDS);
1164
1165 if (!KernelsThatAllocateTableLDS.empty()) {
1166 LLVMContext &Ctx = M.getContext();
1167 IRBuilder<> Builder(Ctx);
1168
1169 // The order must be consistent between lookup table and accesses to
1170 // lookup table
1171 auto TableLookupVariablesOrdered =
1172 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1173 TableLookupVariables.end()));
1174
1175 GlobalVariable *LookupTable = buildLookupTable(
1176 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1177 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1178 LookupTable);
1179 }
1180
1181 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1182 lowerDynamicLDSVariables(M, LDSUsesInfo,
1183 KernelsThatIndirectlyAllocateDynamicLDS,
1184 DynamicVariables, OrderedKernels);
1185
1186 // All kernel frames have been allocated. Calculate and record the
1187 // addresses.
1188 {
1189 const DataLayout &DL = M.getDataLayout();
1190
1191 for (Function &Func : M.functions()) {
1192 if (Func.isDeclaration() || !isKernelLDS(&Func))
1193 continue;
1194
1195 // All three of these are optional. The first variable is allocated at
1196 // zero. They are allocated by AMDGPUMachineFunction as one block.
1197 // Layout:
1198 //{
1199 // module.lds
1200 // alignment padding
1201 // kernel instance
1202 // alignment padding
1203 // dynamic lds variables
1204 //}
1205
1206 const bool AllocateModuleScopeStruct =
1207 MaybeModuleScopeStruct &&
1208 KernelsThatAllocateModuleLDS.contains(&Func);
1209
1210 auto Replacement = KernelToReplacement.find(&Func);
1211 const bool AllocateKernelScopeStruct =
1212 Replacement != KernelToReplacement.end();
1213
1214 const bool AllocateDynamicVariable =
1215 KernelToCreatedDynamicLDS.contains(&Func);
1216
1217 uint32_t Offset = 0;
1218
1219 if (AllocateModuleScopeStruct) {
1220 // Allocated at zero, recorded once on construction, not once per
1221 // kernel
1222 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1223 }
1224
1225 if (AllocateKernelScopeStruct) {
1226 GlobalVariable *KernelStruct = Replacement->second.SGV;
1227 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1228 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1229 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1230 }
1231
1232 // If there is dynamic allocation, the alignment needed is included in
1233 // the static frame size. There may be no reference to the dynamic
1234 // variable in the kernel itself, so without including it here, that
1235 // alignment padding could be missed.
1236 if (AllocateDynamicVariable) {
1237 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1238 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1239 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1240 }
1241
1242 if (Offset != 0) {
1243 (void)TM; // TODO: Account for target maximum LDS
1244 std::string Buffer;
1245 raw_string_ostream SS{Buffer};
1246 SS << format("%u", Offset);
1247
1248 // Instead of explictly marking kernels that access dynamic variables
1249 // using special case metadata, annotate with min-lds == max-lds, i.e.
1250 // that there is no more space available for allocating more static
1251 // LDS variables. That is the right condition to prevent allocating
1252 // more variables which would collide with the addresses assigned to
1253 // dynamic variables.
1254 if (AllocateDynamicVariable)
1255 SS << format(",%u", Offset);
1256
1257 Func.addFnAttr("amdgpu-lds-size", Buffer);
1258 }
1259 }
1260 }
1261
1262 for (auto &GV : make_early_inc_range(M.globals()))
1264 // probably want to remove from used lists
1266 if (GV.use_empty())
1267 GV.eraseFromParent();
1268 }
1269
1270 return Changed;
1271 }
1272
1273private:
1274 // Increase the alignment of LDS globals if necessary to maximise the chance
1275 // that we can use aligned LDS instructions to access them.
1276 static bool superAlignLDSGlobals(Module &M) {
1277 const DataLayout &DL = M.getDataLayout();
1278 bool Changed = false;
1279 if (!SuperAlignLDSGlobals) {
1280 return Changed;
1281 }
1282
1283 for (auto &GV : M.globals()) {
1285 // Only changing alignment of LDS variables
1286 continue;
1287 }
1288 if (!GV.hasInitializer()) {
1289 // cuda/hip extern __shared__ variable, leave alignment alone
1290 continue;
1291 }
1292
1293 Align Alignment = AMDGPU::getAlign(DL, &GV);
1294 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1295
1296 if (GVSize > 8) {
1297 // We might want to use a b96 or b128 load/store
1298 Alignment = std::max(Alignment, Align(16));
1299 } else if (GVSize > 4) {
1300 // We might want to use a b64 load/store
1301 Alignment = std::max(Alignment, Align(8));
1302 } else if (GVSize > 2) {
1303 // We might want to use a b32 load/store
1304 Alignment = std::max(Alignment, Align(4));
1305 } else if (GVSize > 1) {
1306 // We might want to use a b16 load/store
1307 Alignment = std::max(Alignment, Align(2));
1308 }
1309
1310 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1311 Changed = true;
1312 GV.setAlignment(Alignment);
1313 }
1314 }
1315 return Changed;
1316 }
1317
1318 static LDSVariableReplacement createLDSVariableReplacement(
1319 Module &M, std::string VarName,
1320 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1321 // Create a struct instance containing LDSVarsToTransform and map from those
1322 // variables to ConstantExprGEP
1323 // Variables may be introduced to meet alignment requirements. No aliasing
1324 // metadata is useful for these as they have no uses. Erased before return.
1325
1326 LLVMContext &Ctx = M.getContext();
1327 const DataLayout &DL = M.getDataLayout();
1328 assert(!LDSVarsToTransform.empty());
1329
1331 LayoutFields.reserve(LDSVarsToTransform.size());
1332 {
1333 // The order of fields in this struct depends on the order of
1334 // varables in the argument which varies when changing how they
1335 // are identified, leading to spurious test breakage.
1336 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1337 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1338
1339 for (GlobalVariable *GV : Sorted) {
1341 DL.getTypeAllocSize(GV->getValueType()),
1342 AMDGPU::getAlign(DL, GV));
1343 LayoutFields.emplace_back(F);
1344 }
1345 }
1346
1347 performOptimizedStructLayout(LayoutFields);
1348
1349 std::vector<GlobalVariable *> LocalVars;
1350 BitVector IsPaddingField;
1351 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1352 IsPaddingField.reserve(LDSVarsToTransform.size());
1353 {
1354 uint64_t CurrentOffset = 0;
1355 for (size_t I = 0; I < LayoutFields.size(); I++) {
1356 GlobalVariable *FGV = static_cast<GlobalVariable *>(
1357 const_cast<void *>(LayoutFields[I].Id));
1358 Align DataAlign = LayoutFields[I].Alignment;
1359
1360 uint64_t DataAlignV = DataAlign.value();
1361 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1362 uint64_t Padding = DataAlignV - Rem;
1363
1364 // Append an array of padding bytes to meet alignment requested
1365 // Note (o + (a - (o % a)) ) % a == 0
1366 // (offset + Padding ) % align == 0
1367
1368 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1369 LocalVars.push_back(new GlobalVariable(
1370 M, ATy, false, GlobalValue::InternalLinkage,
1372 AMDGPUAS::LOCAL_ADDRESS, false));
1373 IsPaddingField.push_back(true);
1374 CurrentOffset += Padding;
1375 }
1376
1377 LocalVars.push_back(FGV);
1378 IsPaddingField.push_back(false);
1379 CurrentOffset += LayoutFields[I].Size;
1380 }
1381 }
1382
1383 std::vector<Type *> LocalVarTypes;
1384 LocalVarTypes.reserve(LocalVars.size());
1385 std::transform(
1386 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1387 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1388
1389 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1390
1391 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1392
1393 GlobalVariable *SGV = new GlobalVariable(
1394 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1396 false);
1397 SGV->setAlignment(StructAlign);
1398
1400 Type *I32 = Type::getInt32Ty(Ctx);
1401 for (size_t I = 0; I < LocalVars.size(); I++) {
1402 GlobalVariable *GV = LocalVars[I];
1403 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1404 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1405 if (IsPaddingField[I]) {
1406 assert(GV->use_empty());
1407 GV->eraseFromParent();
1408 } else {
1409 Map[GV] = GEP;
1410 }
1411 }
1412 assert(Map.size() == LDSVarsToTransform.size());
1413 return {SGV, std::move(Map)};
1414 }
1415
1416 template <typename PredicateTy>
1417 static void replaceLDSVariablesWithStruct(
1418 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1419 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1420 LLVMContext &Ctx = M.getContext();
1421 const DataLayout &DL = M.getDataLayout();
1422
1423 // A hack... we need to insert the aliasing info in a predictable order for
1424 // lit tests. Would like to have them in a stable order already, ideally the
1425 // same order they get allocated, which might mean an ordered set container
1426 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1427 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1428
1429 // Create alias.scope and their lists. Each field in the new structure
1430 // does not alias with all other fields.
1431 SmallVector<MDNode *> AliasScopes;
1432 SmallVector<Metadata *> NoAliasList;
1433 const size_t NumberVars = LDSVarsToTransform.size();
1434 if (NumberVars > 1) {
1435 MDBuilder MDB(Ctx);
1436 AliasScopes.reserve(NumberVars);
1438 for (size_t I = 0; I < NumberVars; I++) {
1440 AliasScopes.push_back(Scope);
1441 }
1442 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1443 }
1444
1445 // Replace uses of ith variable with a constantexpr to the corresponding
1446 // field of the instance that will be allocated by AMDGPUMachineFunction
1447 for (size_t I = 0; I < NumberVars; I++) {
1448 GlobalVariable *GV = LDSVarsToTransform[I];
1449 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1450
1451 GV->replaceUsesWithIf(GEP, Predicate);
1452
1453 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1454 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1455 uint64_t Offset = APOff.getZExtValue();
1456
1457 Align A =
1458 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1459
1460 if (I)
1461 NoAliasList[I - 1] = AliasScopes[I - 1];
1462 MDNode *NoAlias =
1463 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1464 MDNode *AliasScope =
1465 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1466
1467 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1468 }
1469 }
1470
1471 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1472 const DataLayout &DL, MDNode *AliasScope,
1473 MDNode *NoAlias, unsigned MaxDepth = 5) {
1474 if (!MaxDepth || (A == 1 && !AliasScope))
1475 return;
1476
1477 for (User *U : Ptr->users()) {
1478 if (auto *I = dyn_cast<Instruction>(U)) {
1479 if (AliasScope && I->mayReadOrWriteMemory()) {
1480 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1481 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1482 : AliasScope);
1483 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1484
1485 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1486 NA = (NA ? MDNode::intersect(NA, NoAlias) : NoAlias);
1487 I->setMetadata(LLVMContext::MD_noalias, NA);
1488 }
1489 }
1490
1491 if (auto *LI = dyn_cast<LoadInst>(U)) {
1492 LI->setAlignment(std::max(A, LI->getAlign()));
1493 continue;
1494 }
1495 if (auto *SI = dyn_cast<StoreInst>(U)) {
1496 if (SI->getPointerOperand() == Ptr)
1497 SI->setAlignment(std::max(A, SI->getAlign()));
1498 continue;
1499 }
1500 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1501 // None of atomicrmw operations can work on pointers, but let's
1502 // check it anyway in case it will or we will process ConstantExpr.
1503 if (AI->getPointerOperand() == Ptr)
1504 AI->setAlignment(std::max(A, AI->getAlign()));
1505 continue;
1506 }
1507 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1508 if (AI->getPointerOperand() == Ptr)
1509 AI->setAlignment(std::max(A, AI->getAlign()));
1510 continue;
1511 }
1512 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1513 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1514 APInt Off(BitWidth, 0);
1515 if (GEP->getPointerOperand() == Ptr) {
1516 Align GA;
1517 if (GEP->accumulateConstantOffset(DL, Off))
1518 GA = commonAlignment(A, Off.getLimitedValue());
1519 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1520 MaxDepth - 1);
1521 }
1522 continue;
1523 }
1524 if (auto *I = dyn_cast<Instruction>(U)) {
1525 if (I->getOpcode() == Instruction::BitCast ||
1526 I->getOpcode() == Instruction::AddrSpaceCast)
1527 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1528 }
1529 }
1530 }
1531};
1532
1533class AMDGPULowerModuleLDSLegacy : public ModulePass {
1534public:
1535 const AMDGPUTargetMachine *TM;
1536 static char ID;
1537
1538 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM_ = nullptr)
1539 : ModulePass(ID), TM(TM_) {
1541 }
1542
1543 void getAnalysisUsage(AnalysisUsage &AU) const override {
1544 if (!TM)
1546 }
1547
1548 bool runOnModule(Module &M) override {
1549 if (!TM) {
1550 auto &TPC = getAnalysis<TargetPassConfig>();
1551 TM = &TPC.getTM<AMDGPUTargetMachine>();
1552 }
1553
1554 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1555 }
1556};
1557
1558} // namespace
1559char AMDGPULowerModuleLDSLegacy::ID = 0;
1560
1561char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1562
1563INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1564 "Lower uses of LDS variables from non-kernel functions",
1565 false, false)
1567INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1568 "Lower uses of LDS variables from non-kernel functions",
1570
1571ModulePass *
1573 return new AMDGPULowerModuleLDSLegacy(TM);
1574}
1575
1578 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1580}
aarch64 promote const
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Lower uses of LDS variables from non kernel functions
#define DEBUG_TYPE
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:680
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1275
Hexagon Common GEP
static const unsigned MaxDepth
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
const char LLVMTargetMachineRef TM
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:59
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:76
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1485
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:649
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
LLVM Basic Block Representation.
Definition: BasicBlock.h:60
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:446
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
std::pair< std::optional< WeakTrackingVH >, CallGraphNode * > CallRecord
A pair of the calling instruction (a call or invoke) and the call graph node being called.
Definition: CallGraph.h:178
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1235
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:506
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2003
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2028
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, bool InBounds=false, std::optional< unsigned > InRangeIndex=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1181
static Constant * get(Type *Ty, uint64_t V, bool IsSigned=false)
If Ty is a vector type, return a Constant with a splat of the given value.
Definition: Constants.cpp:888
This is an important base class in LLVM.
Definition: Constant.h:41
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:708
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:110
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:155
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&... Args)
Definition: DenseMap.h:235
iterator end()
Definition: DenseMap.h:84
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:145
Implements a dense probed hash-table based set.
Definition: DenseSet.h:271
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1447
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition: Globals.cpp:128
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:290
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:55
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:48
Type * getValueType() const
Definition: GlobalValue.h:292
bool hasInitializer() const
Definitions have initializers, declarations don't.
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:454
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition: IRBuilder.h:2095
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition: IRBuilder.h:1890
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition: IRBuilder.h:1875
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition: IRBuilder.h:480
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition: IRBuilder.h:1789
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition: IRBuilder.h:180
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args=std::nullopt, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition: IRBuilder.h:2385
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2639
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:159
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:152
Metadata node.
Definition: Metadata.h:1037
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1094
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1504
static MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1081
Root of the metadata hierarchy.
Definition: Metadata.h:62
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1180
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1743
A set of analyses that are preserved following a run of a transformation pass.
Definition: PassManager.h:172
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: PassManager.h:175
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: PassManager.h:178
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:384
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:366
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:451
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void reserve(size_type N)
Definition: SmallVector.h:667
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
Class to represent struct types.
Definition: DerivedTypes.h:216
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:520
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static IntegerType * getInt8Ty(LLVMContext &C)
static IntegerType * getInt32Ty(LLVMContext &C)
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
LLVM Value Representation.
Definition: Value.h:74
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:542
bool use_empty() const
Definition: Value.h:344
iterator_range< use_iterator > uses()
Definition: Value.h:376
bool hasName() const
Definition: Value.h:261
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:206
size_type size() const
Definition: DenseSet.h:81
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:185
bool erase(const ValueT &V)
Definition: DenseSet.h:101
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:642
@ LOCAL_ADDRESS
Address space for local memory.
Definition: AMDGPU.h:411
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
Definition: AMDGPU.h:410
LLVM_READNONE bool isKernel(CallingConv::ID CC)
bool isDynamicLDS(const GlobalVariable &GV)
Align getAlign(DataLayout const &DL, const GlobalVariable *GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
Function * getDeclaration(Module *M, ID id, ArrayRef< Type * > Tys=std::nullopt)
Create or insert an LLVM Function declaration for an intrinsic, and return it.
Definition: Function.cpp:1444
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:705
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:445
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:456
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:361
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts)
Replace constant expressions users of the given constants with instructions.
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:665
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1651
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:156
char & AMDGPULowerModuleLDSLegacyPassID
bool set_union(S1Ty &S1, const S2Ty &S2)
set_union(A, B) - Compute A := A u B, return whether A changed.
Definition: SetOperations.h:23
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:125
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:191
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:132
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85