LLVM 17.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76// Kernels that do not use this struct are annoteated with the attribute
77// amdgpu-elide-module-lds which allows the back end to elide the allocation.
78//
79// The "table" lowering implemented here has three components.
80// First kernels are assigned a unique integer identifier which is available in
81// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
82// is passed through a specific SGPR, thus works with indirect calls.
83// Second, each kernel allocates LDS variables independent of other kernels and
84// writes the addresses it chose for each variable into an array in consistent
85// order. If the kernel does not allocate a given variable, it writes undef to
86// the corresponding array location. These arrays are written to a constant
87// table in the order matching the kernel unique integer identifier.
88// Third, uses from non-kernel functions are replaced with a table lookup using
89// the intrinsic function to find the address of the variable.
90//
91// "Kernel" lowering is only applicable for variables that are unambiguously
92// reachable from exactly one kernel. For those cases, accesses to the variable
93// can be lowered to ConstantExpr address of a struct instance specific to that
94// one kernel. This is zero cost in space and in compute. It will raise a fatal
95// error on any variable that might be reachable from multiple kernels and is
96// thus most easily used as part of the hybrid lowering strategy.
97//
98// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
99// lowering where it can. It lowers the variable accessed by the greatest
100// number of kernels using the module strategy as that is free for the first
101// variable. Any futher variables that can be lowered with the module strategy
102// without incurring LDS memory overhead are. The remaining ones are lowered
103// via table.
104//
105// Consequences
106// - No heuristics or user controlled magic numbers, hybrid is the right choice
107// - Kernels that don't use functions (or have had them all inlined) are not
108// affected by any lowering for kernels that do.
109// - Kernels that don't make indirect function calls are not affected by those
110// that do.
111// - Variables which are used by lots of kernels, e.g. those injected by a
112// language runtime in most kernels, are expected to have no overhead
113// - Implementations that instantiate templates per-kernel where those templates
114// use LDS are expected to hit the "Kernel" lowering strategy
115// - The runtime properties impose a cost in compiler implementation complexity
116//
117// Dynamic LDS implementation
118// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
119// same intrinsic to identify which kernel is at the root of the dynamic call
120// graph. This relies on the specified behaviour that all dynamic LDS variables
121// alias one another, i.e. are at the same address, with respect to a given
122// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
123// that allocates any dynamic LDS and builds a table of addresses out of those.
124// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
125// The corresponding optimisation for "kernel" lowering where the table lookup
126// is elided is not implemented.
127//
128//
129// Implementation notes / limitations
130// A single LDS global variable represents an instance per kernel that can reach
131// said variables. This pass essentially specialises said variables per kernel.
132// Handling ConstantExpr during the pass complicated this significantly so now
133// all ConstantExpr uses of LDS variables are expanded to instructions. This
134// may need amending when implementing non-undef initialisers.
135//
136// Lowering is split between this IR pass and the back end. This pass chooses
137// where given variables should be allocated and marks them with metadata,
138// MD_absolute_symbol. The backend places the variables in coincidentally the
139// same location and raises a fatal error if something has gone awry. This works
140// in practice because the only pass between this one and the backend that
141// changes LDS is PromoteAlloca and the changes it makes do not conflict.
142//
143// Addresses are written to constant global arrays based on the same metadata.
144//
145// The backend lowers LDS variables in the order of traversal of the function.
146// This is at odds with the deterministic layout required. The workaround is to
147// allocate the fixed-address variables immediately upon starting the function
148// where they can be placed as intended. This requires a means of mapping from
149// the function to the variables that it allocates. For the module scope lds,
150// this is via metadata indicating whether the variable is not required. If a
151// pass deletes that metadata, a fatal error on disagreement with the absolute
152// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
153// correspondence between the function and the variable. It requires the
154// kernel to have a name (which is only a limitation for tests in practice) and
155// for nothing to rename the corresponding symbols. This is a hazard if the pass
156// is run multiple times during debugging. Alternative schemes considered all
157// involve bespoke metadata.
158//
159// If the name correspondence can be replaced, multiple distinct kernels that
160// have the same memory layout can map to the same kernel id (as the address
161// itself is handled by the absolute symbol metadata) and that will allow more
162// uses of the "kernel" style faster lowering and reduce the size of the lookup
163// tables.
164//
165// There is a test that checks this does not fire for a graphics shader. This
166// lowering is expected to work for graphics if the isKernel test is changed.
167//
168// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
169// before codegen. Replacing this with an equivalent intrinsic which lasts until
170// shortly after the machine function lowering of LDS would help break the name
171// mapping. The other part needed is probably to amend PromoteAlloca to embed
172// the LDS variables it creates in the same struct created here. That avoids the
173// current hazard where a PromoteAlloca LDS variable might be allocated before
174// the kernel scope (and thus error on the address check). Given a new invariant
175// that no LDS variables exist outside of the structs managed here, and an
176// intrinsic that lasts until after the LDS frame lowering, it should be
177// possible to drop the name mapping and fold equivalent memory layouts.
178//
179//===----------------------------------------------------------------------===//
180
181#include "AMDGPU.h"
182#include "Utils/AMDGPUBaseInfo.h"
184#include "llvm/ADT/BitVector.h"
185#include "llvm/ADT/DenseMap.h"
186#include "llvm/ADT/DenseSet.h"
187#include "llvm/ADT/STLExtras.h"
189#include "llvm/ADT/SetVector.h"
191#include "llvm/IR/Constants.h"
192#include "llvm/IR/DerivedTypes.h"
193#include "llvm/IR/IRBuilder.h"
194#include "llvm/IR/InlineAsm.h"
195#include "llvm/IR/Instructions.h"
196#include "llvm/IR/IntrinsicsAMDGPU.h"
197#include "llvm/IR/MDBuilder.h"
200#include "llvm/Pass.h"
202#include "llvm/Support/Debug.h"
206
207#include <tuple>
208#include <vector>
209
210#include <cstdio>
211
212#define DEBUG_TYPE "amdgpu-lower-module-lds"
213
214using namespace llvm;
215
216namespace {
217
218cl::opt<bool> SuperAlignLDSGlobals(
219 "amdgpu-super-align-lds-globals",
220 cl::desc("Increase alignment of LDS if it is not on align boundary"),
221 cl::init(true), cl::Hidden);
222
223enum class LoweringKind { module, table, kernel, hybrid };
224cl::opt<LoweringKind> LoweringKindLoc(
225 "amdgpu-lower-module-lds-strategy",
226 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
227 cl::init(LoweringKind::hybrid),
229 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
230 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
232 LoweringKind::kernel, "kernel",
233 "Lower variables reachable from one kernel, otherwise abort"),
234 clEnumValN(LoweringKind::hybrid, "hybrid",
235 "Lower via mixture of above strategies")));
236
237bool isKernelLDS(const Function *F) {
238 // Some weirdness here. AMDGPU::isKernelCC does not call into
239 // AMDGPU::isKernel with the calling conv, it instead calls into
240 // isModuleEntryFunction which returns true for more calling conventions
241 // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
242 // There's also a test that checks that the LDS lowering does not hit on
243 // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
244 // Putting LDS in the name of the function to draw attention to this.
245 return AMDGPU::isKernel(F->getCallingConv());
246}
247
248class AMDGPULowerModuleLDS : public ModulePass {
249
250 static void
251 removeLocalVarsFromUsedLists(Module &M,
252 const DenseSet<GlobalVariable *> &LocalVars) {
253 // The verifier rejects used lists containing an inttoptr of a constant
254 // so remove the variables from these lists before replaceAllUsesWith
255 SmallPtrSet<Constant *, 8> LocalVarsSet;
256 for (GlobalVariable *LocalVar : LocalVars)
257 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
258
260 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
261
262 for (GlobalVariable *LocalVar : LocalVars)
263 LocalVar->removeDeadConstantUsers();
264 }
265
266 static void markUsedByKernel(IRBuilder<> &Builder, Function *Func,
267 GlobalVariable *SGV) {
268 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
269 // that might call a function which accesses a field within it. This is
270 // presently approximated to 'all kernels' if there are any such functions
271 // in the module. This implicit use is redefined as an explicit use here so
272 // that later passes, specifically PromoteAlloca, account for the required
273 // memory without any knowledge of this transform.
274
275 // An operand bundle on llvm.donothing works because the call instruction
276 // survives until after the last pass that needs to account for LDS. It is
277 // better than inline asm as the latter survives until the end of codegen. A
278 // totally robust solution would be a function with the same semantics as
279 // llvm.donothing that takes a pointer to the instance and is lowered to a
280 // no-op after LDS is allocated, but that is not presently necessary.
281
282 // This intrinsic is eliminated shortly before instruction selection. It
283 // does not suffice to indicate to ISel that a given global which is not
284 // immediately used by the kernel must still be allocated by it. An
285 // equivalent target specific intrinsic which lasts until immediately after
286 // codegen would suffice for that, but one would still need to ensure that
287 // the variables are allocated in the anticpated order.
288
289 LLVMContext &Ctx = Func->getContext();
290
291 Builder.SetInsertPoint(Func->getEntryBlock().getFirstNonPHI());
292
294
295 Function *Decl =
296 Intrinsic::getDeclaration(Func->getParent(), Intrinsic::donothing, {});
297
298 Value *UseInstance[1] = {Builder.CreateInBoundsGEP(
299 SGV->getValueType(), SGV, ConstantInt::get(Type::getInt32Ty(Ctx), 0))};
300
301 Builder.CreateCall(FTy, Decl, {},
302 {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)},
303 "");
304 }
305
306 static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
307 // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
308 // global may have uses from multiple different functions as a result.
309 // This pass specialises LDS variables with respect to the kernel that
310 // allocates them.
311
312 // This is semantically equivalent to (the unimplemented as slow):
313 // for (auto &F : M.functions())
314 // for (auto &BB : F)
315 // for (auto &I : BB)
316 // for (Use &Op : I.operands())
317 // if (constantExprUsesLDS(Op))
318 // replaceConstantExprInFunction(I, Op);
319
320 SmallVector<Constant *> LDSGlobals;
321 for (auto &GV : M.globals())
323 LDSGlobals.push_back(&GV);
324
325 return convertUsersOfConstantsToInstructions(LDSGlobals);
326 }
327
328public:
329 static char ID;
330
331 AMDGPULowerModuleLDS() : ModulePass(ID) {
333 }
334
335 using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
336
337 using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
338
339 static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
340 FunctionVariableMap &kernels,
341 FunctionVariableMap &functions) {
342
343 // Get uses from the current function, excluding uses by called functions
344 // Two output variables to avoid walking the globals list twice
345 for (auto &GV : M.globals()) {
347 continue;
348 }
349
350 SmallVector<User *, 16> Stack(GV.users());
351 for (User *V : GV.users()) {
352 if (auto *I = dyn_cast<Instruction>(V)) {
353 Function *F = I->getFunction();
354 if (isKernelLDS(F)) {
355 kernels[F].insert(&GV);
356 } else {
357 functions[F].insert(&GV);
358 }
359 }
360 }
361 }
362 }
363
364 struct LDSUsesInfoTy {
365 FunctionVariableMap direct_access;
366 FunctionVariableMap indirect_access;
367 };
368
369 static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
370
371 FunctionVariableMap direct_map_kernel;
372 FunctionVariableMap direct_map_function;
373 getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
374
375 // Collect variables that are used by functions whose address has escaped
376 DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
377 for (Function &F : M.functions()) {
378 if (!isKernelLDS(&F))
379 if (F.hasAddressTaken(nullptr,
380 /* IgnoreCallbackUses */ false,
381 /* IgnoreAssumeLikeCalls */ false,
382 /* IgnoreLLVMUsed */ true,
383 /* IgnoreArcAttachedCall */ false)) {
384 set_union(VariablesReachableThroughFunctionPointer,
385 direct_map_function[&F]);
386 }
387 }
388
389 auto functionMakesUnknownCall = [&](const Function *F) -> bool {
390 assert(!F->isDeclaration());
391 for (CallGraphNode::CallRecord R : *CG[F]) {
392 if (!R.second->getFunction()) {
393 return true;
394 }
395 }
396 return false;
397 };
398
399 // Work out which variables are reachable through function calls
400 FunctionVariableMap transitive_map_function = direct_map_function;
401
402 // If the function makes any unknown call, assume the worst case that it can
403 // access all variables accessed by functions whose address escaped
404 for (Function &F : M.functions()) {
405 if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
406 if (!isKernelLDS(&F)) {
407 set_union(transitive_map_function[&F],
408 VariablesReachableThroughFunctionPointer);
409 }
410 }
411 }
412
413 // Direct implementation of collecting all variables reachable from each
414 // function
415 for (Function &Func : M.functions()) {
416 if (Func.isDeclaration() || isKernelLDS(&Func))
417 continue;
418
419 DenseSet<Function *> seen; // catches cycles
421
422 while (!wip.empty()) {
423 Function *F = wip.pop_back_val();
424
425 // Can accelerate this by referring to transitive map for functions that
426 // have already been computed, with more care than this
427 set_union(transitive_map_function[&Func], direct_map_function[F]);
428
429 for (CallGraphNode::CallRecord R : *CG[F]) {
430 Function *ith = R.second->getFunction();
431 if (ith) {
432 if (!seen.contains(ith)) {
433 seen.insert(ith);
434 wip.push_back(ith);
435 }
436 }
437 }
438 }
439 }
440
441 // direct_map_kernel lists which variables are used by the kernel
442 // find the variables which are used through a function call
443 FunctionVariableMap indirect_map_kernel;
444
445 for (Function &Func : M.functions()) {
446 if (Func.isDeclaration() || !isKernelLDS(&Func))
447 continue;
448
449 for (CallGraphNode::CallRecord R : *CG[&Func]) {
450 Function *ith = R.second->getFunction();
451 if (ith) {
452 set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
453 } else {
454 set_union(indirect_map_kernel[&Func],
455 VariablesReachableThroughFunctionPointer);
456 }
457 }
458 }
459
460 return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
461 }
462
463 struct LDSVariableReplacement {
464 GlobalVariable *SGV = nullptr;
465 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
466 };
467
468 // remap from lds global to a constantexpr gep to where it has been moved to
469 // for each kernel
470 // an array with an element for each kernel containing where the corresponding
471 // variable was remapped to
472
473 static Constant *getAddressesOfVariablesInKernel(
475 DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
476 // Create a ConstantArray containing the address of each Variable within the
477 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
478 // does not allocate it
479 // TODO: Drop the ptrtoint conversion
480
481 Type *I32 = Type::getInt32Ty(Ctx);
482
483 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
484
486 for (size_t i = 0; i < Variables.size(); i++) {
487 GlobalVariable *GV = Variables[i];
488 if (LDSVarsToConstantGEP.count(GV) != 0) {
489 auto elt = ConstantExpr::getPtrToInt(LDSVarsToConstantGEP[GV], I32);
490 Elements.push_back(elt);
491 } else {
492 Elements.push_back(PoisonValue::get(I32));
493 }
494 }
495 return ConstantArray::get(KernelOffsetsType, Elements);
496 }
497
498 static GlobalVariable *buildLookupTable(
500 ArrayRef<Function *> kernels,
502 if (Variables.empty()) {
503 return nullptr;
504 }
505 LLVMContext &Ctx = M.getContext();
506
507 const size_t NumberVariables = Variables.size();
508 const size_t NumberKernels = kernels.size();
509
510 ArrayType *KernelOffsetsType =
511 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
512
513 ArrayType *AllKernelsOffsetsType =
514 ArrayType::get(KernelOffsetsType, NumberKernels);
515
516 std::vector<Constant *> overallConstantExprElts(NumberKernels);
517 for (size_t i = 0; i < NumberKernels; i++) {
518 LDSVariableReplacement Replacement = KernelToReplacement[kernels[i]];
519 overallConstantExprElts[i] = getAddressesOfVariablesInKernel(
520 Ctx, Variables, Replacement.LDSVarsToConstantGEP);
521 }
522
523 Constant *init =
524 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
525
526 return new GlobalVariable(
527 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
528 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
530 }
531
532 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
533 GlobalVariable *LookupTable,
534 GlobalVariable *GV, Use &U,
535 Value *OptionalIndex) {
536 // Table is a constant array of the same length as OrderedKernels
537 LLVMContext &Ctx = M.getContext();
538 Type *I32 = Type::getInt32Ty(Ctx);
539 auto *I = cast<Instruction>(U.getUser());
540
541 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
542
543 if (auto *Phi = dyn_cast<PHINode>(I)) {
544 BasicBlock *BB = Phi->getIncomingBlock(U);
545 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
546 } else {
547 Builder.SetInsertPoint(I);
548 }
549
550 SmallVector<Value *, 3> GEPIdx = {
551 ConstantInt::get(I32, 0),
552 tableKernelIndex,
553 };
554 if (OptionalIndex)
555 GEPIdx.push_back(OptionalIndex);
556
557 Value *Address = Builder.CreateInBoundsGEP(
558 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
559
560 Value *loaded = Builder.CreateLoad(I32, Address);
561
562 Value *replacement =
563 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
564
565 U.set(replacement);
566 }
567
568 void replaceUsesInInstructionsWithTableLookup(
569 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
570 GlobalVariable *LookupTable) {
571
572 LLVMContext &Ctx = M.getContext();
573 IRBuilder<> Builder(Ctx);
574 Type *I32 = Type::getInt32Ty(Ctx);
575
576 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
577 auto *GV = ModuleScopeVariables[Index];
578
579 for (Use &U : make_early_inc_range(GV->uses())) {
580 auto *I = dyn_cast<Instruction>(U.getUser());
581 if (!I)
582 continue;
583
584 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
585 ConstantInt::get(I32, Index));
586 }
587 }
588 }
589
590 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
591 Module &M, LDSUsesInfoTy &LDSUsesInfo,
592 DenseSet<GlobalVariable *> const &VariableSet) {
593
594 DenseSet<Function *> KernelSet;
595
596 if (VariableSet.empty())
597 return KernelSet;
598
599 for (Function &Func : M.functions()) {
600 if (Func.isDeclaration() || !isKernelLDS(&Func))
601 continue;
602 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
603 if (VariableSet.contains(GV)) {
604 KernelSet.insert(&Func);
605 break;
606 }
607 }
608 }
609
610 return KernelSet;
611 }
612
613 static GlobalVariable *
614 chooseBestVariableForModuleStrategy(const DataLayout &DL,
615 VariableFunctionMap &LDSVars) {
616 // Find the global variable with the most indirect uses from kernels
617
618 struct CandidateTy {
619 GlobalVariable *GV = nullptr;
620 size_t UserCount = 0;
621 size_t Size = 0;
622
623 CandidateTy() = default;
624
625 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
626 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
627
628 bool operator<(const CandidateTy &Other) const {
629 // Fewer users makes module scope variable less attractive
630 if (UserCount < Other.UserCount) {
631 return true;
632 }
633 if (UserCount > Other.UserCount) {
634 return false;
635 }
636
637 // Bigger makes module scope variable less attractive
638 if (Size < Other.Size) {
639 return false;
640 }
641
642 if (Size > Other.Size) {
643 return true;
644 }
645
646 // Arbitrary but consistent
647 return GV->getName() < Other.GV->getName();
648 }
649 };
650
651 CandidateTy MostUsed;
652
653 for (auto &K : LDSVars) {
654 GlobalVariable *GV = K.first;
655 if (K.second.size() <= 1) {
656 // A variable reachable by only one kernel is best lowered with kernel
657 // strategy
658 continue;
659 }
660 CandidateTy Candidate(
661 GV, K.second.size(),
662 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
663 if (MostUsed < Candidate)
664 MostUsed = Candidate;
665 }
666
667 return MostUsed.GV;
668 }
669
670 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
671 uint32_t Address) {
672 // Write the specified address into metadata where it can be retrieved by
673 // the assembler. Format is a half open range, [Address Address+1)
674 LLVMContext &Ctx = M->getContext();
675 auto *IntTy =
676 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
677 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
678 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
679 GV->setMetadata(LLVMContext::MD_absolute_symbol,
680 MDNode::get(Ctx, {MinC, MaxC}));
681 }
682
683 DenseMap<Function *, Value *> tableKernelIndexCache;
684 Value *getTableLookupKernelIndex(Module &M, Function *F) {
685 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
686 // lowers to a read from a live in register. Emit it once in the entry
687 // block to spare deduplicating it later.
688 if (tableKernelIndexCache.count(F) == 0) {
689 LLVMContext &Ctx = M.getContext();
690 IRBuilder<> Builder(Ctx);
692 Function *Decl =
693 Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_lds_kernel_id, {});
694
696 F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
697 Instruction &i = *it;
698 Builder.SetInsertPoint(&i);
699
700 tableKernelIndexCache[F] = Builder.CreateCall(FTy, Decl, {});
701 }
702
703 return tableKernelIndexCache[F];
704 }
705
706 static std::vector<Function *> assignLDSKernelIDToEachKernel(
707 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
708 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
709 // Associate kernels in the set with an arbirary but reproducible order and
710 // annotate them with that order in metadata. This metadata is recognised by
711 // the backend and lowered to a SGPR which can be read from using
712 // amdgcn_lds_kernel_id.
713
714 std::vector<Function *> OrderedKernels;
715 if (!KernelsThatAllocateTableLDS.empty() ||
716 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
717
718 for (Function &Func : M->functions()) {
719 if (Func.isDeclaration())
720 continue;
721 if (!isKernelLDS(&Func))
722 continue;
723
724 if (KernelsThatAllocateTableLDS.contains(&Func) ||
725 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
726 assert(Func.hasName()); // else fatal error earlier
727 OrderedKernels.push_back(&Func);
728 }
729 }
730
731 // Put them in an arbitrary but reproducible order
732 llvm::sort(OrderedKernels.begin(), OrderedKernels.end(),
733 [](const Function *lhs, const Function *rhs) -> bool {
734 return lhs->getName() < rhs->getName();
735 });
736
737 // Annotate the kernels with their order in this vector
738 LLVMContext &Ctx = M->getContext();
739 IRBuilder<> Builder(Ctx);
740
741 if (OrderedKernels.size() > UINT32_MAX) {
742 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
743 report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
744 }
745
746 for (size_t i = 0; i < OrderedKernels.size(); i++) {
747 Metadata *AttrMDArgs[1] = {
748 ConstantAsMetadata::get(Builder.getInt32(i)),
749 };
750 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
751 MDNode::get(Ctx, AttrMDArgs));
752 }
753 }
754 return OrderedKernels;
755 }
756
757 static void partitionVariablesIntoIndirectStrategies(
758 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
759 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
760 DenseSet<GlobalVariable *> &ModuleScopeVariables,
761 DenseSet<GlobalVariable *> &TableLookupVariables,
762 DenseSet<GlobalVariable *> &KernelAccessVariables,
763 DenseSet<GlobalVariable *> &DynamicVariables) {
764
765 GlobalVariable *HybridModuleRoot =
766 LoweringKindLoc != LoweringKind::hybrid
767 ? nullptr
768 : chooseBestVariableForModuleStrategy(
769 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
770
771 DenseSet<Function *> const EmptySet;
772 DenseSet<Function *> const &HybridModuleRootKernels =
773 HybridModuleRoot
774 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
775 : EmptySet;
776
777 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
778 // Each iteration of this loop assigns exactly one global variable to
779 // exactly one of the implementation strategies.
780
781 GlobalVariable *GV = K.first;
783 assert(K.second.size() != 0);
784
785 if (AMDGPU::isDynamicLDS(*GV)) {
786 DynamicVariables.insert(GV);
787 continue;
788 }
789
790 switch (LoweringKindLoc) {
791 case LoweringKind::module:
792 ModuleScopeVariables.insert(GV);
793 break;
794
795 case LoweringKind::table:
796 TableLookupVariables.insert(GV);
797 break;
798
799 case LoweringKind::kernel:
800 if (K.second.size() == 1) {
801 KernelAccessVariables.insert(GV);
802 } else {
804 "cannot lower LDS '" + GV->getName() +
805 "' to kernel access as it is reachable from multiple kernels");
806 }
807 break;
808
809 case LoweringKind::hybrid: {
810 if (GV == HybridModuleRoot) {
811 assert(K.second.size() != 1);
812 ModuleScopeVariables.insert(GV);
813 } else if (K.second.size() == 1) {
814 KernelAccessVariables.insert(GV);
815 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
816 ModuleScopeVariables.insert(GV);
817 } else {
818 TableLookupVariables.insert(GV);
819 }
820 break;
821 }
822 }
823 }
824
825 // All LDS variables accessed indirectly have now been partitioned into
826 // the distinct lowering strategies.
827 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
828 KernelAccessVariables.size() + DynamicVariables.size() ==
829 LDSToKernelsThatNeedToAccessItIndirectly.size());
830 }
831
832 static GlobalVariable *lowerModuleScopeStructVariables(
833 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
834 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
835 // Create a struct to hold the ModuleScopeVariables
836 // Replace all uses of those variables from non-kernel functions with the
837 // new struct instance Replace only the uses from kernel functions that will
838 // allocate this instance. That is a space optimisation - kernels that use a
839 // subset of the module scope struct and do not need to allocate it for
840 // indirect calls will only allocate the subset they use (they do so as part
841 // of the per-kernel lowering).
842 if (ModuleScopeVariables.empty()) {
843 return nullptr;
844 }
845
846 LLVMContext &Ctx = M.getContext();
847
848 LDSVariableReplacement ModuleScopeReplacement =
849 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
850 ModuleScopeVariables);
851
852 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
854 cast<Constant>(ModuleScopeReplacement.SGV),
855 Type::getInt8PtrTy(Ctx)))});
856
857 // module.lds will be allocated at zero in any kernel that allocates it
858 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
859
860 // historic
861 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
862
863 // Replace all uses of module scope variable from non-kernel functions
864 replaceLDSVariablesWithStruct(
865 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
866 Instruction *I = dyn_cast<Instruction>(U.getUser());
867 if (!I) {
868 return false;
869 }
870 Function *F = I->getFunction();
871 return !isKernelLDS(F);
872 });
873
874 // Replace uses of module scope variable from kernel functions that
875 // allocate the module scope variable, otherwise leave them unchanged
876 // Record on each kernel whether the module scope global is used by it
877
878 IRBuilder<> Builder(Ctx);
879
880 for (Function &Func : M.functions()) {
881 if (Func.isDeclaration() || !isKernelLDS(&Func))
882 continue;
883
884 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
885 replaceLDSVariablesWithStruct(
886 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
887 Instruction *I = dyn_cast<Instruction>(U.getUser());
888 if (!I) {
889 return false;
890 }
891 Function *F = I->getFunction();
892 return F == &Func;
893 });
894
895 markUsedByKernel(Builder, &Func, ModuleScopeReplacement.SGV);
896
897 } else {
898 markElideModuleLDS(Func);
899 }
900 }
901
902 return ModuleScopeReplacement.SGV;
903 }
904
906 lowerKernelScopeStructVariables(
907 Module &M, LDSUsesInfoTy &LDSUsesInfo,
908 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
909 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
910 GlobalVariable *MaybeModuleScopeStruct) {
911
912 // Create a struct for each kernel for the non-module-scope variables.
913
915 for (Function &Func : M.functions()) {
916 if (Func.isDeclaration() || !isKernelLDS(&Func))
917 continue;
918
919 DenseSet<GlobalVariable *> KernelUsedVariables;
920 // Allocating variables that are used directly in this struct to get
921 // alignment aware allocation and predictable frame size.
922 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
923 if (!AMDGPU::isDynamicLDS(*v)) {
924 KernelUsedVariables.insert(v);
925 }
926 }
927
928 // Allocating variables that are accessed indirectly so that a lookup of
929 // this struct instance can find them from nested functions.
930 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
931 if (!AMDGPU::isDynamicLDS(*v)) {
932 KernelUsedVariables.insert(v);
933 }
934 }
935
936 // Variables allocated in module lds must all resolve to that struct,
937 // not to the per-kernel instance.
938 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
939 for (GlobalVariable *v : ModuleScopeVariables) {
940 KernelUsedVariables.erase(v);
941 }
942 }
943
944 if (KernelUsedVariables.empty()) {
945 // Either used no LDS, or the LDS it used was all in the module struct
946 // or dynamically sized
947 continue;
948 }
949
950 // The association between kernel function and LDS struct is done by
951 // symbol name, which only works if the function in question has a
952 // name This is not expected to be a problem in practice as kernels
953 // are called by name making anonymous ones (which are named by the
954 // backend) difficult to use. This does mean that llvm test cases need
955 // to name the kernels.
956 if (!Func.hasName()) {
957 report_fatal_error("Anonymous kernels cannot use LDS variables");
958 }
959
960 std::string VarName =
961 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
962
963 auto Replacement =
964 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
965
966 // remove preserves existing codegen
967 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
968 KernelToReplacement[&Func] = Replacement;
969
970 // Rewrite uses within kernel to the new struct
971 replaceLDSVariablesWithStruct(
972 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
973 Instruction *I = dyn_cast<Instruction>(U.getUser());
974 return I && I->getFunction() == &Func;
975 });
976 }
977 return KernelToReplacement;
978 }
979
980 static GlobalVariable *
981 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
982 Function *func) {
983 // Create a dynamic lds variable with a name associated with the passed
984 // function that has the maximum alignment of any dynamic lds variable
985 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
986 // allocation, possibly after alignment padding. The representative variable
987 // created here has the maximum alignment of any other dynamic variable
988 // reachable by that kernel. All dynamic LDS variables are allocated at the
989 // same address in each kernel in order to provide the documented aliasing
990 // semantics. Setting the alignment here allows this IR pass to accurately
991 // predict the exact constant at which it will be allocated.
992
993 assert(isKernelLDS(func));
994
995 LLVMContext &Ctx = M.getContext();
996 const DataLayout &DL = M.getDataLayout();
997 Align MaxDynamicAlignment(1);
998
999 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
1000 if (AMDGPU::isDynamicLDS(*GV)) {
1001 MaxDynamicAlignment =
1002 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
1003 }
1004 };
1005
1006 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
1007 UpdateMaxAlignment(GV);
1008 }
1009
1010 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
1011 UpdateMaxAlignment(GV);
1012 }
1013
1014 assert(func->hasName()); // Checked by caller
1015 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1017 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
1018 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
1019 false);
1020 N->setAlignment(MaxDynamicAlignment);
1021
1023 return N;
1024 }
1025
1026 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
1027 Module &M, LDSUsesInfoTy &LDSUsesInfo,
1028 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
1029 DenseSet<GlobalVariable *> const &DynamicVariables,
1030 std::vector<Function *> const &OrderedKernels) {
1031 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
1032 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
1033 LLVMContext &Ctx = M.getContext();
1034 IRBuilder<> Builder(Ctx);
1035 Type *I32 = Type::getInt32Ty(Ctx);
1036
1037 std::vector<Constant *> newDynamicLDS;
1038
1039 // Table is built in the same order as OrderedKernels
1040 for (auto &func : OrderedKernels) {
1041
1042 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
1043 assert(isKernelLDS(func));
1044 if (!func->hasName()) {
1045 report_fatal_error("Anonymous kernels cannot use LDS variables");
1046 }
1047
1048 GlobalVariable *N =
1049 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
1050
1051 KernelToCreatedDynamicLDS[func] = N;
1052
1053 markUsedByKernel(Builder, func, N);
1054
1055 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1057 emptyCharArray, N, ConstantInt::get(I32, 0), true);
1058 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
1059 } else {
1060 newDynamicLDS.push_back(PoisonValue::get(I32));
1061 }
1062 }
1063 assert(OrderedKernels.size() == newDynamicLDS.size());
1064
1065 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
1066 Constant *init = ConstantArray::get(t, newDynamicLDS);
1067 GlobalVariable *table = new GlobalVariable(
1068 M, t, true, GlobalValue::InternalLinkage, init,
1069 "llvm.amdgcn.dynlds.offset.table", nullptr,
1071
1072 for (GlobalVariable *GV : DynamicVariables) {
1073 for (Use &U : make_early_inc_range(GV->uses())) {
1074 auto *I = dyn_cast<Instruction>(U.getUser());
1075 if (!I)
1076 continue;
1077 if (isKernelLDS(I->getFunction()))
1078 continue;
1079
1080 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
1081 }
1082 }
1083 }
1084 return KernelToCreatedDynamicLDS;
1085 }
1086
1087 static bool canElideModuleLDS(const Function &F) {
1088 return F.hasFnAttribute("amdgpu-elide-module-lds");
1089 }
1090
1091 static void markElideModuleLDS(Function &F) {
1092 F.addFnAttr("amdgpu-elide-module-lds");
1093 }
1094
1095 bool runOnModule(Module &M) override {
1096 CallGraph CG = CallGraph(M);
1097 bool Changed = superAlignLDSGlobals(M);
1098
1099 Changed |= eliminateConstantExprUsesOfLDSFromAllInstructions(M);
1100
1101 Changed = true; // todo: narrow this down
1102
1103 // For each kernel, what variables does it access directly or through
1104 // callees
1105 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1106
1107 // For each variable accessed through callees, which kernels access it
1108 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1109 for (auto &K : LDSUsesInfo.indirect_access) {
1110 Function *F = K.first;
1111 assert(isKernelLDS(F));
1112 for (GlobalVariable *GV : K.second) {
1113 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1114 }
1115 }
1116
1117 // Partition variables accessed indirectly into the different strategies
1118 DenseSet<GlobalVariable *> ModuleScopeVariables;
1119 DenseSet<GlobalVariable *> TableLookupVariables;
1120 DenseSet<GlobalVariable *> KernelAccessVariables;
1121 DenseSet<GlobalVariable *> DynamicVariables;
1122 partitionVariablesIntoIndirectStrategies(
1123 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1124 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1125 DynamicVariables);
1126
1127 // If the kernel accesses a variable that is going to be stored in the
1128 // module instance through a call then that kernel needs to allocate the
1129 // module instance
1130 DenseSet<Function *> KernelsThatAllocateModuleLDS =
1131 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1132 ModuleScopeVariables);
1133 DenseSet<Function *> KernelsThatAllocateTableLDS =
1134 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1135 TableLookupVariables);
1136
1137 DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1138 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1139 DynamicVariables);
1140
1141 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1142 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1143
1145 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1146 KernelsThatAllocateModuleLDS,
1147 MaybeModuleScopeStruct);
1148
1149 // Lower zero cost accesses to the kernel instances just created
1150 for (auto &GV : KernelAccessVariables) {
1151 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1152 assert(funcs.size() == 1); // Only one kernel can access it
1153 LDSVariableReplacement Replacement =
1154 KernelToReplacement[*(funcs.begin())];
1155
1157 Vec.insert(GV);
1158
1159 // TODO: Looks like a latent bug, Replacement may not be marked
1160 // UsedByKernel here
1161 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1162 return isa<Instruction>(U.getUser());
1163 });
1164 }
1165
1166 // The ith element of this vector is kernel id i
1167 std::vector<Function *> OrderedKernels =
1168 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1169 KernelsThatIndirectlyAllocateDynamicLDS);
1170
1171 if (!KernelsThatAllocateTableLDS.empty()) {
1172 LLVMContext &Ctx = M.getContext();
1173 IRBuilder<> Builder(Ctx);
1174
1175 for (size_t i = 0; i < OrderedKernels.size(); i++) {
1176 markUsedByKernel(Builder, OrderedKernels[i],
1177 KernelToReplacement[OrderedKernels[i]].SGV);
1178 }
1179
1180 // The order must be consistent between lookup table and accesses to
1181 // lookup table
1182 std::vector<GlobalVariable *> TableLookupVariablesOrdered(
1183 TableLookupVariables.begin(), TableLookupVariables.end());
1184 llvm::sort(TableLookupVariablesOrdered.begin(),
1185 TableLookupVariablesOrdered.end(),
1186 [](const GlobalVariable *lhs, const GlobalVariable *rhs) {
1187 return lhs->getName() < rhs->getName();
1188 });
1189
1190 GlobalVariable *LookupTable = buildLookupTable(
1191 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1192 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1193 LookupTable);
1194 }
1195
1196 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1197 lowerDynamicLDSVariables(M, LDSUsesInfo,
1198 KernelsThatIndirectlyAllocateDynamicLDS,
1199 DynamicVariables, OrderedKernels);
1200
1201 // All kernel frames have been allocated. Calculate and record the
1202 // addresses.
1203
1204 {
1205 const DataLayout &DL = M.getDataLayout();
1206
1207 for (Function &Func : M.functions()) {
1208 if (Func.isDeclaration() || !isKernelLDS(&Func))
1209 continue;
1210
1211 // All three of these are optional. The first variable is allocated at
1212 // zero. They are allocated by allocateKnownAddressLDSGlobal in the
1213 // following order:
1214 //{
1215 // module.lds
1216 // alignment padding
1217 // kernel instance
1218 // alignment padding
1219 // dynamic lds variables
1220 //}
1221
1222 const bool AllocateModuleScopeStruct =
1223 MaybeModuleScopeStruct && !canElideModuleLDS(Func);
1224
1225 const bool AllocateKernelScopeStruct =
1226 KernelToReplacement.contains(&Func);
1227
1228 const bool AllocateDynamicVariable =
1229 KernelToCreatedDynamicLDS.contains(&Func);
1230
1231 uint32_t Offset = 0;
1232
1233 if (AllocateModuleScopeStruct) {
1234 // Allocated at zero, recorded once on construction, not once per
1235 // kernel
1236 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1237 }
1238
1239 if (AllocateKernelScopeStruct) {
1240 GlobalVariable *KernelStruct = KernelToReplacement[&Func].SGV;
1241
1242 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1243
1244 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1245
1246 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1247
1248 }
1249
1250 if (AllocateDynamicVariable) {
1251 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1252
1253 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1254
1255 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1256 }
1257 }
1258 }
1259
1260 for (auto &GV : make_early_inc_range(M.globals()))
1262 // probably want to remove from used lists
1264 if (GV.use_empty())
1265 GV.eraseFromParent();
1266 }
1267
1268 return Changed;
1269 }
1270
1271private:
1272 // Increase the alignment of LDS globals if necessary to maximise the chance
1273 // that we can use aligned LDS instructions to access them.
1274 static bool superAlignLDSGlobals(Module &M) {
1275 const DataLayout &DL = M.getDataLayout();
1276 bool Changed = false;
1277 if (!SuperAlignLDSGlobals) {
1278 return Changed;
1279 }
1280
1281 for (auto &GV : M.globals()) {
1283 // Only changing alignment of LDS variables
1284 continue;
1285 }
1286 if (!GV.hasInitializer()) {
1287 // cuda/hip extern __shared__ variable, leave alignment alone
1288 continue;
1289 }
1290
1291 Align Alignment = AMDGPU::getAlign(DL, &GV);
1292 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1293
1294 if (GVSize > 8) {
1295 // We might want to use a b96 or b128 load/store
1296 Alignment = std::max(Alignment, Align(16));
1297 } else if (GVSize > 4) {
1298 // We might want to use a b64 load/store
1299 Alignment = std::max(Alignment, Align(8));
1300 } else if (GVSize > 2) {
1301 // We might want to use a b32 load/store
1302 Alignment = std::max(Alignment, Align(4));
1303 } else if (GVSize > 1) {
1304 // We might want to use a b16 load/store
1305 Alignment = std::max(Alignment, Align(2));
1306 }
1307
1308 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1309 Changed = true;
1310 GV.setAlignment(Alignment);
1311 }
1312 }
1313 return Changed;
1314 }
1315
1316 static LDSVariableReplacement createLDSVariableReplacement(
1317 Module &M, std::string VarName,
1318 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1319 // Create a struct instance containing LDSVarsToTransform and map from those
1320 // variables to ConstantExprGEP
1321 // Variables may be introduced to meet alignment requirements. No aliasing
1322 // metadata is useful for these as they have no uses. Erased before return.
1323
1324 LLVMContext &Ctx = M.getContext();
1325 const DataLayout &DL = M.getDataLayout();
1326 assert(!LDSVarsToTransform.empty());
1327
1329 LayoutFields.reserve(LDSVarsToTransform.size());
1330 {
1331 // The order of fields in this struct depends on the order of
1332 // varables in the argument which varies when changing how they
1333 // are identified, leading to spurious test breakage.
1334 std::vector<GlobalVariable *> Sorted(LDSVarsToTransform.begin(),
1335 LDSVarsToTransform.end());
1336 llvm::sort(Sorted.begin(), Sorted.end(),
1337 [](const GlobalVariable *lhs, const GlobalVariable *rhs) {
1338 return lhs->getName() < rhs->getName();
1339 });
1340 for (GlobalVariable *GV : Sorted) {
1342 DL.getTypeAllocSize(GV->getValueType()),
1343 AMDGPU::getAlign(DL, GV));
1344 LayoutFields.emplace_back(F);
1345 }
1346 }
1347
1348 performOptimizedStructLayout(LayoutFields);
1349
1350 std::vector<GlobalVariable *> LocalVars;
1351 BitVector IsPaddingField;
1352 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1353 IsPaddingField.reserve(LDSVarsToTransform.size());
1354 {
1355 uint64_t CurrentOffset = 0;
1356 for (size_t I = 0; I < LayoutFields.size(); I++) {
1357 GlobalVariable *FGV = static_cast<GlobalVariable *>(
1358 const_cast<void *>(LayoutFields[I].Id));
1359 Align DataAlign = LayoutFields[I].Alignment;
1360
1361 uint64_t DataAlignV = DataAlign.value();
1362 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1363 uint64_t Padding = DataAlignV - Rem;
1364
1365 // Append an array of padding bytes to meet alignment requested
1366 // Note (o + (a - (o % a)) ) % a == 0
1367 // (offset + Padding ) % align == 0
1368
1369 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1370 LocalVars.push_back(new GlobalVariable(
1371 M, ATy, false, GlobalValue::InternalLinkage, UndefValue::get(ATy),
1373 false));
1374 IsPaddingField.push_back(true);
1375 CurrentOffset += Padding;
1376 }
1377
1378 LocalVars.push_back(FGV);
1379 IsPaddingField.push_back(false);
1380 CurrentOffset += LayoutFields[I].Size;
1381 }
1382 }
1383
1384 std::vector<Type *> LocalVarTypes;
1385 LocalVarTypes.reserve(LocalVars.size());
1386 std::transform(
1387 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1388 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1389
1390 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1391
1392 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1393
1394 GlobalVariable *SGV = new GlobalVariable(
1395 M, LDSTy, false, GlobalValue::InternalLinkage, UndefValue::get(LDSTy),
1397 false);
1398 SGV->setAlignment(StructAlign);
1399
1401 Type *I32 = Type::getInt32Ty(Ctx);
1402 for (size_t I = 0; I < LocalVars.size(); I++) {
1403 GlobalVariable *GV = LocalVars[I];
1404 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1405 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1406 if (IsPaddingField[I]) {
1407 assert(GV->use_empty());
1408 GV->eraseFromParent();
1409 } else {
1410 Map[GV] = GEP;
1411 }
1412 }
1413 assert(Map.size() == LDSVarsToTransform.size());
1414 return {SGV, std::move(Map)};
1415 }
1416
1417 template <typename PredicateTy>
1418 static void replaceLDSVariablesWithStruct(
1419 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1420 LDSVariableReplacement Replacement, PredicateTy Predicate) {
1421 LLVMContext &Ctx = M.getContext();
1422 const DataLayout &DL = M.getDataLayout();
1423
1424 // A hack... we need to insert the aliasing info in a predictable order for
1425 // lit tests. Would like to have them in a stable order already, ideally the
1426 // same order they get allocated, which might mean an ordered set container
1427 std::vector<GlobalVariable *> LDSVarsToTransform(
1428 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end());
1429 llvm::sort(LDSVarsToTransform.begin(), LDSVarsToTransform.end(),
1430 [](const GlobalVariable *lhs, const GlobalVariable *rhs) {
1431 return lhs->getName() < rhs->getName();
1432 });
1433
1434 // Create alias.scope and their lists. Each field in the new structure
1435 // does not alias with all other fields.
1436 SmallVector<MDNode *> AliasScopes;
1437 SmallVector<Metadata *> NoAliasList;
1438 const size_t NumberVars = LDSVarsToTransform.size();
1439 if (NumberVars > 1) {
1440 MDBuilder MDB(Ctx);
1441 AliasScopes.reserve(NumberVars);
1443 for (size_t I = 0; I < NumberVars; I++) {
1445 AliasScopes.push_back(Scope);
1446 }
1447 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1448 }
1449
1450 // Replace uses of ith variable with a constantexpr to the corresponding
1451 // field of the instance that will be allocated by AMDGPUMachineFunction
1452 for (size_t I = 0; I < NumberVars; I++) {
1453 GlobalVariable *GV = LDSVarsToTransform[I];
1454 Constant *GEP = Replacement.LDSVarsToConstantGEP[GV];
1455
1456 GV->replaceUsesWithIf(GEP, Predicate);
1457
1458 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1459 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1460 uint64_t Offset = APOff.getZExtValue();
1461
1462 Align A =
1463 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1464
1465 if (I)
1466 NoAliasList[I - 1] = AliasScopes[I - 1];
1467 MDNode *NoAlias =
1468 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1469 MDNode *AliasScope =
1470 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1471
1472 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1473 }
1474 }
1475
1476 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1477 const DataLayout &DL, MDNode *AliasScope,
1478 MDNode *NoAlias, unsigned MaxDepth = 5) {
1479 if (!MaxDepth || (A == 1 && !AliasScope))
1480 return;
1481
1482 for (User *U : Ptr->users()) {
1483 if (auto *I = dyn_cast<Instruction>(U)) {
1484 if (AliasScope && I->mayReadOrWriteMemory()) {
1485 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1486 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1487 : AliasScope);
1488 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1489
1490 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1491 NA = (NA ? MDNode::intersect(NA, NoAlias) : NoAlias);
1492 I->setMetadata(LLVMContext::MD_noalias, NA);
1493 }
1494 }
1495
1496 if (auto *LI = dyn_cast<LoadInst>(U)) {
1497 LI->setAlignment(std::max(A, LI->getAlign()));
1498 continue;
1499 }
1500 if (auto *SI = dyn_cast<StoreInst>(U)) {
1501 if (SI->getPointerOperand() == Ptr)
1502 SI->setAlignment(std::max(A, SI->getAlign()));
1503 continue;
1504 }
1505 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1506 // None of atomicrmw operations can work on pointers, but let's
1507 // check it anyway in case it will or we will process ConstantExpr.
1508 if (AI->getPointerOperand() == Ptr)
1509 AI->setAlignment(std::max(A, AI->getAlign()));
1510 continue;
1511 }
1512 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1513 if (AI->getPointerOperand() == Ptr)
1514 AI->setAlignment(std::max(A, AI->getAlign()));
1515 continue;
1516 }
1517 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1518 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1519 APInt Off(BitWidth, 0);
1520 if (GEP->getPointerOperand() == Ptr) {
1521 Align GA;
1522 if (GEP->accumulateConstantOffset(DL, Off))
1523 GA = commonAlignment(A, Off.getLimitedValue());
1524 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1525 MaxDepth - 1);
1526 }
1527 continue;
1528 }
1529 if (auto *I = dyn_cast<Instruction>(U)) {
1530 if (I->getOpcode() == Instruction::BitCast ||
1531 I->getOpcode() == Instruction::AddrSpaceCast)
1532 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1533 }
1534 }
1535 }
1536};
1537
1538} // namespace
1539char AMDGPULowerModuleLDS::ID = 0;
1540
1541char &llvm::AMDGPULowerModuleLDSID = AMDGPULowerModuleLDS::ID;
1542
1543INITIALIZE_PASS(AMDGPULowerModuleLDS, DEBUG_TYPE,
1544 "Lower uses of LDS variables from non-kernel functions", false,
1545 false)
1546
1548 return new AMDGPULowerModuleLDS();
1549}
1550
1553 return AMDGPULowerModuleLDS().runOnModule(M) ? PreservedAnalyses::none()
1555}
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
#define DEBUG_TYPE
static bool canElideModuleLDS(const Function &F)
amdgpu propagate attributes Late propagate attributes from kernels to functions
assume Assume Builder
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:678
This file contains the declarations for the subclasses of Constant, which represent the different fla...
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1269
Hexagon Common GEP
static const unsigned MaxDepth
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
#define INITIALIZE_PASS(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:38
@ SI
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
This file implements a set that has insertion order iteration characteristics.
Class for arbitrary precision integers.
Definition: APInt.h:75
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1498
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:620
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:163
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:158
static ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
Definition: Type.cpp:708
LLVM Basic Block Representation.
Definition: BasicBlock.h:56
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:253
InstListType::iterator iterator
Instruction iterators...
Definition: BasicBlock.h:87
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
std::pair< std::optional< WeakTrackingVH >, CallGraphNode * > CallRecord
A pair of the calling instruction (a call or invoke) and the call graph node being called.
Definition: CallGraph.h:178
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1235
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:419
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2040
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2185
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, bool InBounds=false, std::optional< unsigned > InRangeIndex=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1227
static Constant * get(Type *Ty, uint64_t V, bool IsSigned=false)
If Ty is a vector type, return a Constant with a splat of the given value.
Definition: Constants.cpp:888
This is an important base class in LLVM.
Definition: Constant.h:41
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:708
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:110
size_type count(const_arg_type_t< KeyT > Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition: DenseMap.h:151
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:145
Implements a dense probed hash-table based set.
Definition: DenseSet.h:271
static FunctionType * get(Type *Result, ArrayRef< Type * > Params, bool isVarArg)
This static method is the primary way of constructing a FunctionType.
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1391
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition: Globals.cpp:128
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:290
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:55
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:48
Type * getValueType() const
Definition: GlobalValue.h:292
bool hasInitializer() const
Definitions have initializers, declarations don't.
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:454
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2570
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:159
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:152
Metadata node.
Definition: Metadata.h:950
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1034
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1416
static MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1021
Root of the metadata hierarchy.
Definition: Metadata.h:61
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1143
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1743
A set of analyses that are preserved following a run of a transformation pass.
Definition: PassManager.h:152
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: PassManager.h:155
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: PassManager.h:158
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:383
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:365
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:450
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void reserve(size_type N)
Definition: SmallVector.h:667
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
Class to represent struct types.
Definition: DerivedTypes.h:213
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:574
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static Type * getVoidTy(LLVMContext &C)
static IntegerType * getInt8Ty(LLVMContext &C)
static PointerType * getInt8PtrTy(LLVMContext &C, unsigned AS=0)
static IntegerType * getInt32Ty(LLVMContext &C)
static UndefValue * get(Type *T)
Static factory methods - Return an 'undef' object of the specified type.
Definition: Constants.cpp:1724
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
LLVM Value Representation.
Definition: Value.h:74
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:543
bool use_empty() const
Definition: Value.h:344
iterator_range< use_iterator > uses()
Definition: Value.h:376
bool hasName() const
Definition: Value.h:261
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:206
size_type size() const
Definition: DenseSet.h:81
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:185
bool erase(const ValueT &V)
Definition: DenseSet.h:101
@ LOCAL_ADDRESS
Address space for local memory.
Definition: AMDGPU.h:392
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
Definition: AMDGPU.h:391
LLVM_READNONE bool isKernel(CallingConv::ID CC)
bool isDynamicLDS(const GlobalVariable &GV)
Align getAlign(DataLayout const &DL, const GlobalVariable *GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
Function * getDeclaration(Module *M, ID id, ArrayRef< Type * > Tys=std::nullopt)
Create or insert an LLVM Function declaration for an intrinsic, and return it.
Definition: Function.cpp:1465
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:703
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:445
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:440
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:361
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts)
Replace constant expressions users of the given constants with instructions.
char & AMDGPULowerModuleLDSID
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:748
ModulePass * createAMDGPULowerModuleLDSPass()
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1744
void initializeAMDGPULowerModuleLDSPass(PassRegistry &)
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:145
bool set_union(S1Ty &S1, const S2Ty &S2)
set_union(A, B) - Compute A := A u B, return whether A changed.
Definition: SetOperations.h:23
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:184
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85