LLVM 19.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUTargetMachine.h"
181#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
190#include "llvm/IR/Constants.h"
191#include "llvm/IR/DerivedTypes.h"
192#include "llvm/IR/IRBuilder.h"
193#include "llvm/IR/InlineAsm.h"
194#include "llvm/IR/Instructions.h"
195#include "llvm/IR/IntrinsicsAMDGPU.h"
196#include "llvm/IR/MDBuilder.h"
199#include "llvm/Pass.h"
201#include "llvm/Support/Debug.h"
202#include "llvm/Support/Format.h"
207
208#include <vector>
209
210#include <cstdio>
211
212#define DEBUG_TYPE "amdgpu-lower-module-lds"
213
214using namespace llvm;
215using namespace AMDGPU;
216
217namespace {
218
219cl::opt<bool> SuperAlignLDSGlobals(
220 "amdgpu-super-align-lds-globals",
221 cl::desc("Increase alignment of LDS if it is not on align boundary"),
222 cl::init(true), cl::Hidden);
223
224enum class LoweringKind { module, table, kernel, hybrid };
225cl::opt<LoweringKind> LoweringKindLoc(
226 "amdgpu-lower-module-lds-strategy",
227 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
228 cl::init(LoweringKind::hybrid),
230 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
231 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
233 LoweringKind::kernel, "kernel",
234 "Lower variables reachable from one kernel, otherwise abort"),
235 clEnumValN(LoweringKind::hybrid, "hybrid",
236 "Lower via mixture of above strategies")));
237
238template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
239 llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
240 return L->getName() < R->getName();
241 });
242 return {std::move(V)};
243}
244
245class AMDGPULowerModuleLDS {
246 const AMDGPUTargetMachine &TM;
247
248 static void
249 removeLocalVarsFromUsedLists(Module &M,
250 const DenseSet<GlobalVariable *> &LocalVars) {
251 // The verifier rejects used lists containing an inttoptr of a constant
252 // so remove the variables from these lists before replaceAllUsesWith
253 SmallPtrSet<Constant *, 8> LocalVarsSet;
254 for (GlobalVariable *LocalVar : LocalVars)
255 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
256
258 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
259
260 for (GlobalVariable *LocalVar : LocalVars)
261 LocalVar->removeDeadConstantUsers();
262 }
263
264 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
265 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
266 // that might call a function which accesses a field within it. This is
267 // presently approximated to 'all kernels' if there are any such functions
268 // in the module. This implicit use is redefined as an explicit use here so
269 // that later passes, specifically PromoteAlloca, account for the required
270 // memory without any knowledge of this transform.
271
272 // An operand bundle on llvm.donothing works because the call instruction
273 // survives until after the last pass that needs to account for LDS. It is
274 // better than inline asm as the latter survives until the end of codegen. A
275 // totally robust solution would be a function with the same semantics as
276 // llvm.donothing that takes a pointer to the instance and is lowered to a
277 // no-op after LDS is allocated, but that is not presently necessary.
278
279 // This intrinsic is eliminated shortly before instruction selection. It
280 // does not suffice to indicate to ISel that a given global which is not
281 // immediately used by the kernel must still be allocated by it. An
282 // equivalent target specific intrinsic which lasts until immediately after
283 // codegen would suffice for that, but one would still need to ensure that
284 // the variables are allocated in the anticpated order.
285 BasicBlock *Entry = &Func->getEntryBlock();
286 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
287
288 Function *Decl =
289 Intrinsic::getDeclaration(Func->getParent(), Intrinsic::donothing, {});
290
291 Value *UseInstance[1] = {
292 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
293
294 Builder.CreateCall(
295 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
296 }
297
298public:
299 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
300
301 struct LDSVariableReplacement {
302 GlobalVariable *SGV = nullptr;
303 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
304 };
305
306 // remap from lds global to a constantexpr gep to where it has been moved to
307 // for each kernel
308 // an array with an element for each kernel containing where the corresponding
309 // variable was remapped to
310
311 static Constant *getAddressesOfVariablesInKernel(
313 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
314 // Create a ConstantArray containing the address of each Variable within the
315 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
316 // does not allocate it
317 // TODO: Drop the ptrtoint conversion
318
319 Type *I32 = Type::getInt32Ty(Ctx);
320
321 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
322
324 for (size_t i = 0; i < Variables.size(); i++) {
325 GlobalVariable *GV = Variables[i];
326 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
327 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
328 auto elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
329 Elements.push_back(elt);
330 } else {
331 Elements.push_back(PoisonValue::get(I32));
332 }
333 }
334 return ConstantArray::get(KernelOffsetsType, Elements);
335 }
336
337 static GlobalVariable *buildLookupTable(
339 ArrayRef<Function *> kernels,
341 if (Variables.empty()) {
342 return nullptr;
343 }
344 LLVMContext &Ctx = M.getContext();
345
346 const size_t NumberVariables = Variables.size();
347 const size_t NumberKernels = kernels.size();
348
349 ArrayType *KernelOffsetsType =
350 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
351
352 ArrayType *AllKernelsOffsetsType =
353 ArrayType::get(KernelOffsetsType, NumberKernels);
354
355 Constant *Missing = PoisonValue::get(KernelOffsetsType);
356 std::vector<Constant *> overallConstantExprElts(NumberKernels);
357 for (size_t i = 0; i < NumberKernels; i++) {
358 auto Replacement = KernelToReplacement.find(kernels[i]);
359 overallConstantExprElts[i] =
360 (Replacement == KernelToReplacement.end())
361 ? Missing
362 : getAddressesOfVariablesInKernel(
363 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
364 }
365
366 Constant *init =
367 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
368
369 return new GlobalVariable(
370 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
371 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
373 }
374
375 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
376 GlobalVariable *LookupTable,
377 GlobalVariable *GV, Use &U,
378 Value *OptionalIndex) {
379 // Table is a constant array of the same length as OrderedKernels
380 LLVMContext &Ctx = M.getContext();
381 Type *I32 = Type::getInt32Ty(Ctx);
382 auto *I = cast<Instruction>(U.getUser());
383
384 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
385
386 if (auto *Phi = dyn_cast<PHINode>(I)) {
387 BasicBlock *BB = Phi->getIncomingBlock(U);
388 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
389 } else {
390 Builder.SetInsertPoint(I);
391 }
392
393 SmallVector<Value *, 3> GEPIdx = {
394 ConstantInt::get(I32, 0),
395 tableKernelIndex,
396 };
397 if (OptionalIndex)
398 GEPIdx.push_back(OptionalIndex);
399
400 Value *Address = Builder.CreateInBoundsGEP(
401 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
402
403 Value *loaded = Builder.CreateLoad(I32, Address);
404
405 Value *replacement =
406 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
407
408 U.set(replacement);
409 }
410
411 void replaceUsesInInstructionsWithTableLookup(
412 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
413 GlobalVariable *LookupTable) {
414
415 LLVMContext &Ctx = M.getContext();
416 IRBuilder<> Builder(Ctx);
417 Type *I32 = Type::getInt32Ty(Ctx);
418
419 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
420 auto *GV = ModuleScopeVariables[Index];
421
422 for (Use &U : make_early_inc_range(GV->uses())) {
423 auto *I = dyn_cast<Instruction>(U.getUser());
424 if (!I)
425 continue;
426
427 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
428 ConstantInt::get(I32, Index));
429 }
430 }
431 }
432
433 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
434 Module &M, LDSUsesInfoTy &LDSUsesInfo,
435 DenseSet<GlobalVariable *> const &VariableSet) {
436
437 DenseSet<Function *> KernelSet;
438
439 if (VariableSet.empty())
440 return KernelSet;
441
442 for (Function &Func : M.functions()) {
443 if (Func.isDeclaration() || !isKernelLDS(&Func))
444 continue;
445 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
446 if (VariableSet.contains(GV)) {
447 KernelSet.insert(&Func);
448 break;
449 }
450 }
451 }
452
453 return KernelSet;
454 }
455
456 static GlobalVariable *
457 chooseBestVariableForModuleStrategy(const DataLayout &DL,
458 VariableFunctionMap &LDSVars) {
459 // Find the global variable with the most indirect uses from kernels
460
461 struct CandidateTy {
462 GlobalVariable *GV = nullptr;
463 size_t UserCount = 0;
464 size_t Size = 0;
465
466 CandidateTy() = default;
467
468 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
469 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
470
471 bool operator<(const CandidateTy &Other) const {
472 // Fewer users makes module scope variable less attractive
473 if (UserCount < Other.UserCount) {
474 return true;
475 }
476 if (UserCount > Other.UserCount) {
477 return false;
478 }
479
480 // Bigger makes module scope variable less attractive
481 if (Size < Other.Size) {
482 return false;
483 }
484
485 if (Size > Other.Size) {
486 return true;
487 }
488
489 // Arbitrary but consistent
490 return GV->getName() < Other.GV->getName();
491 }
492 };
493
494 CandidateTy MostUsed;
495
496 for (auto &K : LDSVars) {
497 GlobalVariable *GV = K.first;
498 if (K.second.size() <= 1) {
499 // A variable reachable by only one kernel is best lowered with kernel
500 // strategy
501 continue;
502 }
503 CandidateTy Candidate(
504 GV, K.second.size(),
505 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
506 if (MostUsed < Candidate)
507 MostUsed = Candidate;
508 }
509
510 return MostUsed.GV;
511 }
512
513 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
514 uint32_t Address) {
515 // Write the specified address into metadata where it can be retrieved by
516 // the assembler. Format is a half open range, [Address Address+1)
517 LLVMContext &Ctx = M->getContext();
518 auto *IntTy =
519 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
520 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
521 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
522 GV->setMetadata(LLVMContext::MD_absolute_symbol,
523 MDNode::get(Ctx, {MinC, MaxC}));
524 }
525
526 DenseMap<Function *, Value *> tableKernelIndexCache;
527 Value *getTableLookupKernelIndex(Module &M, Function *F) {
528 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
529 // lowers to a read from a live in register. Emit it once in the entry
530 // block to spare deduplicating it later.
531 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
532 if (Inserted) {
533 Function *Decl =
534 Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_lds_kernel_id, {});
535
536 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
537 IRBuilder<> Builder(&*InsertAt);
538
539 It->second = Builder.CreateCall(Decl, {});
540 }
541
542 return It->second;
543 }
544
545 static std::vector<Function *> assignLDSKernelIDToEachKernel(
546 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
547 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
548 // Associate kernels in the set with an arbirary but reproducible order and
549 // annotate them with that order in metadata. This metadata is recognised by
550 // the backend and lowered to a SGPR which can be read from using
551 // amdgcn_lds_kernel_id.
552
553 std::vector<Function *> OrderedKernels;
554 if (!KernelsThatAllocateTableLDS.empty() ||
555 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
556
557 for (Function &Func : M->functions()) {
558 if (Func.isDeclaration())
559 continue;
560 if (!isKernelLDS(&Func))
561 continue;
562
563 if (KernelsThatAllocateTableLDS.contains(&Func) ||
564 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
565 assert(Func.hasName()); // else fatal error earlier
566 OrderedKernels.push_back(&Func);
567 }
568 }
569
570 // Put them in an arbitrary but reproducible order
571 OrderedKernels = sortByName(std::move(OrderedKernels));
572
573 // Annotate the kernels with their order in this vector
574 LLVMContext &Ctx = M->getContext();
575 IRBuilder<> Builder(Ctx);
576
577 if (OrderedKernels.size() > UINT32_MAX) {
578 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
579 report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
580 }
581
582 for (size_t i = 0; i < OrderedKernels.size(); i++) {
583 Metadata *AttrMDArgs[1] = {
585 };
586 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
587 MDNode::get(Ctx, AttrMDArgs));
588 }
589 }
590 return OrderedKernels;
591 }
592
593 static void partitionVariablesIntoIndirectStrategies(
594 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
595 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
596 DenseSet<GlobalVariable *> &ModuleScopeVariables,
597 DenseSet<GlobalVariable *> &TableLookupVariables,
598 DenseSet<GlobalVariable *> &KernelAccessVariables,
599 DenseSet<GlobalVariable *> &DynamicVariables) {
600
601 GlobalVariable *HybridModuleRoot =
602 LoweringKindLoc != LoweringKind::hybrid
603 ? nullptr
604 : chooseBestVariableForModuleStrategy(
605 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
606
607 DenseSet<Function *> const EmptySet;
608 DenseSet<Function *> const &HybridModuleRootKernels =
609 HybridModuleRoot
610 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
611 : EmptySet;
612
613 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
614 // Each iteration of this loop assigns exactly one global variable to
615 // exactly one of the implementation strategies.
616
617 GlobalVariable *GV = K.first;
619 assert(K.second.size() != 0);
620
621 if (AMDGPU::isDynamicLDS(*GV)) {
622 DynamicVariables.insert(GV);
623 continue;
624 }
625
626 switch (LoweringKindLoc) {
627 case LoweringKind::module:
628 ModuleScopeVariables.insert(GV);
629 break;
630
631 case LoweringKind::table:
632 TableLookupVariables.insert(GV);
633 break;
634
635 case LoweringKind::kernel:
636 if (K.second.size() == 1) {
637 KernelAccessVariables.insert(GV);
638 } else {
640 "cannot lower LDS '" + GV->getName() +
641 "' to kernel access as it is reachable from multiple kernels");
642 }
643 break;
644
645 case LoweringKind::hybrid: {
646 if (GV == HybridModuleRoot) {
647 assert(K.second.size() != 1);
648 ModuleScopeVariables.insert(GV);
649 } else if (K.second.size() == 1) {
650 KernelAccessVariables.insert(GV);
651 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
652 ModuleScopeVariables.insert(GV);
653 } else {
654 TableLookupVariables.insert(GV);
655 }
656 break;
657 }
658 }
659 }
660
661 // All LDS variables accessed indirectly have now been partitioned into
662 // the distinct lowering strategies.
663 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
664 KernelAccessVariables.size() + DynamicVariables.size() ==
665 LDSToKernelsThatNeedToAccessItIndirectly.size());
666 }
667
668 static GlobalVariable *lowerModuleScopeStructVariables(
669 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
670 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
671 // Create a struct to hold the ModuleScopeVariables
672 // Replace all uses of those variables from non-kernel functions with the
673 // new struct instance Replace only the uses from kernel functions that will
674 // allocate this instance. That is a space optimisation - kernels that use a
675 // subset of the module scope struct and do not need to allocate it for
676 // indirect calls will only allocate the subset they use (they do so as part
677 // of the per-kernel lowering).
678 if (ModuleScopeVariables.empty()) {
679 return nullptr;
680 }
681
682 LLVMContext &Ctx = M.getContext();
683
684 LDSVariableReplacement ModuleScopeReplacement =
685 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
686 ModuleScopeVariables);
687
688 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
690 cast<Constant>(ModuleScopeReplacement.SGV),
691 PointerType::getUnqual(Ctx)))});
692
693 // module.lds will be allocated at zero in any kernel that allocates it
694 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
695
696 // historic
697 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
698
699 // Replace all uses of module scope variable from non-kernel functions
700 replaceLDSVariablesWithStruct(
701 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
702 Instruction *I = dyn_cast<Instruction>(U.getUser());
703 if (!I) {
704 return false;
705 }
706 Function *F = I->getFunction();
707 return !isKernelLDS(F);
708 });
709
710 // Replace uses of module scope variable from kernel functions that
711 // allocate the module scope variable, otherwise leave them unchanged
712 // Record on each kernel whether the module scope global is used by it
713
714 for (Function &Func : M.functions()) {
715 if (Func.isDeclaration() || !isKernelLDS(&Func))
716 continue;
717
718 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
719 replaceLDSVariablesWithStruct(
720 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
721 Instruction *I = dyn_cast<Instruction>(U.getUser());
722 if (!I) {
723 return false;
724 }
725 Function *F = I->getFunction();
726 return F == &Func;
727 });
728
729 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
730 }
731 }
732
733 return ModuleScopeReplacement.SGV;
734 }
735
737 lowerKernelScopeStructVariables(
738 Module &M, LDSUsesInfoTy &LDSUsesInfo,
739 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
740 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
741 GlobalVariable *MaybeModuleScopeStruct) {
742
743 // Create a struct for each kernel for the non-module-scope variables.
744
746 for (Function &Func : M.functions()) {
747 if (Func.isDeclaration() || !isKernelLDS(&Func))
748 continue;
749
750 DenseSet<GlobalVariable *> KernelUsedVariables;
751 // Allocating variables that are used directly in this struct to get
752 // alignment aware allocation and predictable frame size.
753 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
754 if (!AMDGPU::isDynamicLDS(*v)) {
755 KernelUsedVariables.insert(v);
756 }
757 }
758
759 // Allocating variables that are accessed indirectly so that a lookup of
760 // this struct instance can find them from nested functions.
761 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
762 if (!AMDGPU::isDynamicLDS(*v)) {
763 KernelUsedVariables.insert(v);
764 }
765 }
766
767 // Variables allocated in module lds must all resolve to that struct,
768 // not to the per-kernel instance.
769 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
770 for (GlobalVariable *v : ModuleScopeVariables) {
771 KernelUsedVariables.erase(v);
772 }
773 }
774
775 if (KernelUsedVariables.empty()) {
776 // Either used no LDS, or the LDS it used was all in the module struct
777 // or dynamically sized
778 continue;
779 }
780
781 // The association between kernel function and LDS struct is done by
782 // symbol name, which only works if the function in question has a
783 // name This is not expected to be a problem in practice as kernels
784 // are called by name making anonymous ones (which are named by the
785 // backend) difficult to use. This does mean that llvm test cases need
786 // to name the kernels.
787 if (!Func.hasName()) {
788 report_fatal_error("Anonymous kernels cannot use LDS variables");
789 }
790
791 std::string VarName =
792 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
793
794 auto Replacement =
795 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
796
797 // If any indirect uses, create a direct use to ensure allocation
798 // TODO: Simpler to unconditionally mark used but that regresses
799 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
800 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
801 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
802 !Accesses->second.empty())
803 markUsedByKernel(&Func, Replacement.SGV);
804
805 // remove preserves existing codegen
806 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
807 KernelToReplacement[&Func] = Replacement;
808
809 // Rewrite uses within kernel to the new struct
810 replaceLDSVariablesWithStruct(
811 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
812 Instruction *I = dyn_cast<Instruction>(U.getUser());
813 return I && I->getFunction() == &Func;
814 });
815 }
816 return KernelToReplacement;
817 }
818
819 static GlobalVariable *
820 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
821 Function *func) {
822 // Create a dynamic lds variable with a name associated with the passed
823 // function that has the maximum alignment of any dynamic lds variable
824 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
825 // allocation, possibly after alignment padding. The representative variable
826 // created here has the maximum alignment of any other dynamic variable
827 // reachable by that kernel. All dynamic LDS variables are allocated at the
828 // same address in each kernel in order to provide the documented aliasing
829 // semantics. Setting the alignment here allows this IR pass to accurately
830 // predict the exact constant at which it will be allocated.
831
832 assert(isKernelLDS(func));
833
834 LLVMContext &Ctx = M.getContext();
835 const DataLayout &DL = M.getDataLayout();
836 Align MaxDynamicAlignment(1);
837
838 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
839 if (AMDGPU::isDynamicLDS(*GV)) {
840 MaxDynamicAlignment =
841 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
842 }
843 };
844
845 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
846 UpdateMaxAlignment(GV);
847 }
848
849 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
850 UpdateMaxAlignment(GV);
851 }
852
853 assert(func->hasName()); // Checked by caller
854 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
856 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
857 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
858 false);
859 N->setAlignment(MaxDynamicAlignment);
860
862 return N;
863 }
864
865 /// Strip "amdgpu-no-lds-kernel-id" from any functions where we may have
866 /// introduced its use. If AMDGPUAttributor ran prior to the pass, we inferred
867 /// the lack of llvm.amdgcn.lds.kernel.id calls.
868 void removeNoLdsKernelIdFromReachable(CallGraph &CG, Function *KernelRoot) {
869 KernelRoot->removeFnAttr("amdgpu-no-lds-kernel-id");
870
871 SmallVector<Function *> WorkList({CG[KernelRoot]->getFunction()});
873 bool SeenUnknownCall = false;
874
875 while (!WorkList.empty()) {
876 Function *F = WorkList.pop_back_val();
877
878 for (auto &CallRecord : *CG[F]) {
879 if (!CallRecord.second)
880 continue;
881
882 Function *Callee = CallRecord.second->getFunction();
883 if (!Callee) {
884 if (!SeenUnknownCall) {
885 SeenUnknownCall = true;
886
887 // If we see any indirect calls, assume nothing about potential
888 // targets.
889 // TODO: This could be refined to possible LDS global users.
890 for (auto &ExternalCallRecord : *CG.getExternalCallingNode()) {
891 Function *PotentialCallee =
892 ExternalCallRecord.second->getFunction();
893 assert(PotentialCallee);
894 if (!isKernelLDS(PotentialCallee))
895 PotentialCallee->removeFnAttr("amdgpu-no-lds-kernel-id");
896 }
897 }
898 } else {
899 Callee->removeFnAttr("amdgpu-no-lds-kernel-id");
900 if (Visited.insert(Callee).second)
901 WorkList.push_back(Callee);
902 }
903 }
904 }
905 }
906
907 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
908 Module &M, LDSUsesInfoTy &LDSUsesInfo,
909 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
910 DenseSet<GlobalVariable *> const &DynamicVariables,
911 std::vector<Function *> const &OrderedKernels) {
912 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
913 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
914 LLVMContext &Ctx = M.getContext();
915 IRBuilder<> Builder(Ctx);
916 Type *I32 = Type::getInt32Ty(Ctx);
917
918 std::vector<Constant *> newDynamicLDS;
919
920 // Table is built in the same order as OrderedKernels
921 for (auto &func : OrderedKernels) {
922
923 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
924 assert(isKernelLDS(func));
925 if (!func->hasName()) {
926 report_fatal_error("Anonymous kernels cannot use LDS variables");
927 }
928
930 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
931
932 KernelToCreatedDynamicLDS[func] = N;
933
934 markUsedByKernel(func, N);
935
936 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
938 emptyCharArray, N, ConstantInt::get(I32, 0), true);
939 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
940 } else {
941 newDynamicLDS.push_back(PoisonValue::get(I32));
942 }
943 }
944 assert(OrderedKernels.size() == newDynamicLDS.size());
945
946 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
947 Constant *init = ConstantArray::get(t, newDynamicLDS);
948 GlobalVariable *table = new GlobalVariable(
949 M, t, true, GlobalValue::InternalLinkage, init,
950 "llvm.amdgcn.dynlds.offset.table", nullptr,
952
953 for (GlobalVariable *GV : DynamicVariables) {
954 for (Use &U : make_early_inc_range(GV->uses())) {
955 auto *I = dyn_cast<Instruction>(U.getUser());
956 if (!I)
957 continue;
958 if (isKernelLDS(I->getFunction()))
959 continue;
960
961 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
962 }
963 }
964 }
965 return KernelToCreatedDynamicLDS;
966 }
967
968 bool runOnModule(Module &M) {
969 CallGraph CG = CallGraph(M);
970 bool Changed = superAlignLDSGlobals(M);
971
973
974 Changed = true; // todo: narrow this down
975
976 // For each kernel, what variables does it access directly or through
977 // callees
978 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
979
980 // For each variable accessed through callees, which kernels access it
981 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
982 for (auto &K : LDSUsesInfo.indirect_access) {
983 Function *F = K.first;
985 for (GlobalVariable *GV : K.second) {
986 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
987 }
988 }
989
990 // Partition variables accessed indirectly into the different strategies
991 DenseSet<GlobalVariable *> ModuleScopeVariables;
992 DenseSet<GlobalVariable *> TableLookupVariables;
993 DenseSet<GlobalVariable *> KernelAccessVariables;
994 DenseSet<GlobalVariable *> DynamicVariables;
995 partitionVariablesIntoIndirectStrategies(
996 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
997 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
998 DynamicVariables);
999
1000 // If the kernel accesses a variable that is going to be stored in the
1001 // module instance through a call then that kernel needs to allocate the
1002 // module instance
1003 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1004 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1005 ModuleScopeVariables);
1006 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1007 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1008 TableLookupVariables);
1009
1010 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1011 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1012 DynamicVariables);
1013
1014 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1015 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1016
1018 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1019 KernelsThatAllocateModuleLDS,
1020 MaybeModuleScopeStruct);
1021
1022 // Lower zero cost accesses to the kernel instances just created
1023 for (auto &GV : KernelAccessVariables) {
1024 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1025 assert(funcs.size() == 1); // Only one kernel can access it
1026 LDSVariableReplacement Replacement =
1027 KernelToReplacement[*(funcs.begin())];
1028
1030 Vec.insert(GV);
1031
1032 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1033 return isa<Instruction>(U.getUser());
1034 });
1035 }
1036
1037 // The ith element of this vector is kernel id i
1038 std::vector<Function *> OrderedKernels =
1039 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1040 KernelsThatIndirectlyAllocateDynamicLDS);
1041
1042 if (!KernelsThatAllocateTableLDS.empty()) {
1043 LLVMContext &Ctx = M.getContext();
1044 IRBuilder<> Builder(Ctx);
1045
1046 // The order must be consistent between lookup table and accesses to
1047 // lookup table
1048 auto TableLookupVariablesOrdered =
1049 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1050 TableLookupVariables.end()));
1051
1052 GlobalVariable *LookupTable = buildLookupTable(
1053 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1054 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1055 LookupTable);
1056
1057 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1058 // kernel. We may have inferred this wasn't used prior to the pass.
1059 //
1060 // TODO: We could filter out subgraphs that do not access LDS globals.
1061 for (Function *F : KernelsThatAllocateTableLDS)
1062 removeNoLdsKernelIdFromReachable(CG, F);
1063 }
1064
1065 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1066 lowerDynamicLDSVariables(M, LDSUsesInfo,
1067 KernelsThatIndirectlyAllocateDynamicLDS,
1068 DynamicVariables, OrderedKernels);
1069
1070 // All kernel frames have been allocated. Calculate and record the
1071 // addresses.
1072 {
1073 const DataLayout &DL = M.getDataLayout();
1074
1075 for (Function &Func : M.functions()) {
1076 if (Func.isDeclaration() || !isKernelLDS(&Func))
1077 continue;
1078
1079 // All three of these are optional. The first variable is allocated at
1080 // zero. They are allocated by AMDGPUMachineFunction as one block.
1081 // Layout:
1082 //{
1083 // module.lds
1084 // alignment padding
1085 // kernel instance
1086 // alignment padding
1087 // dynamic lds variables
1088 //}
1089
1090 const bool AllocateModuleScopeStruct =
1091 MaybeModuleScopeStruct &&
1092 KernelsThatAllocateModuleLDS.contains(&Func);
1093
1094 auto Replacement = KernelToReplacement.find(&Func);
1095 const bool AllocateKernelScopeStruct =
1096 Replacement != KernelToReplacement.end();
1097
1098 const bool AllocateDynamicVariable =
1099 KernelToCreatedDynamicLDS.contains(&Func);
1100
1101 uint32_t Offset = 0;
1102
1103 if (AllocateModuleScopeStruct) {
1104 // Allocated at zero, recorded once on construction, not once per
1105 // kernel
1106 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1107 }
1108
1109 if (AllocateKernelScopeStruct) {
1110 GlobalVariable *KernelStruct = Replacement->second.SGV;
1111 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1112 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1113 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1114 }
1115
1116 // If there is dynamic allocation, the alignment needed is included in
1117 // the static frame size. There may be no reference to the dynamic
1118 // variable in the kernel itself, so without including it here, that
1119 // alignment padding could be missed.
1120 if (AllocateDynamicVariable) {
1121 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1122 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1123 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1124 }
1125
1126 if (Offset != 0) {
1127 (void)TM; // TODO: Account for target maximum LDS
1128 std::string Buffer;
1129 raw_string_ostream SS{Buffer};
1130 SS << format("%u", Offset);
1131
1132 // Instead of explictly marking kernels that access dynamic variables
1133 // using special case metadata, annotate with min-lds == max-lds, i.e.
1134 // that there is no more space available for allocating more static
1135 // LDS variables. That is the right condition to prevent allocating
1136 // more variables which would collide with the addresses assigned to
1137 // dynamic variables.
1138 if (AllocateDynamicVariable)
1139 SS << format(",%u", Offset);
1140
1141 Func.addFnAttr("amdgpu-lds-size", Buffer);
1142 }
1143 }
1144 }
1145
1146 for (auto &GV : make_early_inc_range(M.globals()))
1148 // probably want to remove from used lists
1150 if (GV.use_empty())
1151 GV.eraseFromParent();
1152 }
1153
1154 return Changed;
1155 }
1156
1157private:
1158 // Increase the alignment of LDS globals if necessary to maximise the chance
1159 // that we can use aligned LDS instructions to access them.
1160 static bool superAlignLDSGlobals(Module &M) {
1161 const DataLayout &DL = M.getDataLayout();
1162 bool Changed = false;
1163 if (!SuperAlignLDSGlobals) {
1164 return Changed;
1165 }
1166
1167 for (auto &GV : M.globals()) {
1169 // Only changing alignment of LDS variables
1170 continue;
1171 }
1172 if (!GV.hasInitializer()) {
1173 // cuda/hip extern __shared__ variable, leave alignment alone
1174 continue;
1175 }
1176
1177 Align Alignment = AMDGPU::getAlign(DL, &GV);
1178 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1179
1180 if (GVSize > 8) {
1181 // We might want to use a b96 or b128 load/store
1182 Alignment = std::max(Alignment, Align(16));
1183 } else if (GVSize > 4) {
1184 // We might want to use a b64 load/store
1185 Alignment = std::max(Alignment, Align(8));
1186 } else if (GVSize > 2) {
1187 // We might want to use a b32 load/store
1188 Alignment = std::max(Alignment, Align(4));
1189 } else if (GVSize > 1) {
1190 // We might want to use a b16 load/store
1191 Alignment = std::max(Alignment, Align(2));
1192 }
1193
1194 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1195 Changed = true;
1196 GV.setAlignment(Alignment);
1197 }
1198 }
1199 return Changed;
1200 }
1201
1202 static LDSVariableReplacement createLDSVariableReplacement(
1203 Module &M, std::string VarName,
1204 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1205 // Create a struct instance containing LDSVarsToTransform and map from those
1206 // variables to ConstantExprGEP
1207 // Variables may be introduced to meet alignment requirements. No aliasing
1208 // metadata is useful for these as they have no uses. Erased before return.
1209
1210 LLVMContext &Ctx = M.getContext();
1211 const DataLayout &DL = M.getDataLayout();
1212 assert(!LDSVarsToTransform.empty());
1213
1215 LayoutFields.reserve(LDSVarsToTransform.size());
1216 {
1217 // The order of fields in this struct depends on the order of
1218 // varables in the argument which varies when changing how they
1219 // are identified, leading to spurious test breakage.
1220 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1221 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1222
1223 for (GlobalVariable *GV : Sorted) {
1225 DL.getTypeAllocSize(GV->getValueType()),
1226 AMDGPU::getAlign(DL, GV));
1227 LayoutFields.emplace_back(F);
1228 }
1229 }
1230
1231 performOptimizedStructLayout(LayoutFields);
1232
1233 std::vector<GlobalVariable *> LocalVars;
1234 BitVector IsPaddingField;
1235 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1236 IsPaddingField.reserve(LDSVarsToTransform.size());
1237 {
1238 uint64_t CurrentOffset = 0;
1239 for (size_t I = 0; I < LayoutFields.size(); I++) {
1240 GlobalVariable *FGV = static_cast<GlobalVariable *>(
1241 const_cast<void *>(LayoutFields[I].Id));
1242 Align DataAlign = LayoutFields[I].Alignment;
1243
1244 uint64_t DataAlignV = DataAlign.value();
1245 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1246 uint64_t Padding = DataAlignV - Rem;
1247
1248 // Append an array of padding bytes to meet alignment requested
1249 // Note (o + (a - (o % a)) ) % a == 0
1250 // (offset + Padding ) % align == 0
1251
1252 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1253 LocalVars.push_back(new GlobalVariable(
1254 M, ATy, false, GlobalValue::InternalLinkage,
1256 AMDGPUAS::LOCAL_ADDRESS, false));
1257 IsPaddingField.push_back(true);
1258 CurrentOffset += Padding;
1259 }
1260
1261 LocalVars.push_back(FGV);
1262 IsPaddingField.push_back(false);
1263 CurrentOffset += LayoutFields[I].Size;
1264 }
1265 }
1266
1267 std::vector<Type *> LocalVarTypes;
1268 LocalVarTypes.reserve(LocalVars.size());
1269 std::transform(
1270 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1271 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1272
1273 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1274
1275 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1276
1277 GlobalVariable *SGV = new GlobalVariable(
1278 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1280 false);
1281 SGV->setAlignment(StructAlign);
1282
1284 Type *I32 = Type::getInt32Ty(Ctx);
1285 for (size_t I = 0; I < LocalVars.size(); I++) {
1286 GlobalVariable *GV = LocalVars[I];
1287 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1288 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1289 if (IsPaddingField[I]) {
1290 assert(GV->use_empty());
1291 GV->eraseFromParent();
1292 } else {
1293 Map[GV] = GEP;
1294 }
1295 }
1296 assert(Map.size() == LDSVarsToTransform.size());
1297 return {SGV, std::move(Map)};
1298 }
1299
1300 template <typename PredicateTy>
1301 static void replaceLDSVariablesWithStruct(
1302 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1303 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1304 LLVMContext &Ctx = M.getContext();
1305 const DataLayout &DL = M.getDataLayout();
1306
1307 // A hack... we need to insert the aliasing info in a predictable order for
1308 // lit tests. Would like to have them in a stable order already, ideally the
1309 // same order they get allocated, which might mean an ordered set container
1310 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1311 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1312
1313 // Create alias.scope and their lists. Each field in the new structure
1314 // does not alias with all other fields.
1315 SmallVector<MDNode *> AliasScopes;
1316 SmallVector<Metadata *> NoAliasList;
1317 const size_t NumberVars = LDSVarsToTransform.size();
1318 if (NumberVars > 1) {
1319 MDBuilder MDB(Ctx);
1320 AliasScopes.reserve(NumberVars);
1322 for (size_t I = 0; I < NumberVars; I++) {
1324 AliasScopes.push_back(Scope);
1325 }
1326 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1327 }
1328
1329 // Replace uses of ith variable with a constantexpr to the corresponding
1330 // field of the instance that will be allocated by AMDGPUMachineFunction
1331 for (size_t I = 0; I < NumberVars; I++) {
1332 GlobalVariable *GV = LDSVarsToTransform[I];
1333 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1334
1335 GV->replaceUsesWithIf(GEP, Predicate);
1336
1337 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1338 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1339 uint64_t Offset = APOff.getZExtValue();
1340
1341 Align A =
1342 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1343
1344 if (I)
1345 NoAliasList[I - 1] = AliasScopes[I - 1];
1346 MDNode *NoAlias =
1347 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1348 MDNode *AliasScope =
1349 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1350
1351 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1352 }
1353 }
1354
1355 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1356 const DataLayout &DL, MDNode *AliasScope,
1357 MDNode *NoAlias, unsigned MaxDepth = 5) {
1358 if (!MaxDepth || (A == 1 && !AliasScope))
1359 return;
1360
1361 for (User *U : Ptr->users()) {
1362 if (auto *I = dyn_cast<Instruction>(U)) {
1363 if (AliasScope && I->mayReadOrWriteMemory()) {
1364 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1365 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1366 : AliasScope);
1367 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1368
1369 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1370 NA = (NA ? MDNode::intersect(NA, NoAlias) : NoAlias);
1371 I->setMetadata(LLVMContext::MD_noalias, NA);
1372 }
1373 }
1374
1375 if (auto *LI = dyn_cast<LoadInst>(U)) {
1376 LI->setAlignment(std::max(A, LI->getAlign()));
1377 continue;
1378 }
1379 if (auto *SI = dyn_cast<StoreInst>(U)) {
1380 if (SI->getPointerOperand() == Ptr)
1381 SI->setAlignment(std::max(A, SI->getAlign()));
1382 continue;
1383 }
1384 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1385 // None of atomicrmw operations can work on pointers, but let's
1386 // check it anyway in case it will or we will process ConstantExpr.
1387 if (AI->getPointerOperand() == Ptr)
1388 AI->setAlignment(std::max(A, AI->getAlign()));
1389 continue;
1390 }
1391 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1392 if (AI->getPointerOperand() == Ptr)
1393 AI->setAlignment(std::max(A, AI->getAlign()));
1394 continue;
1395 }
1396 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1397 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1398 APInt Off(BitWidth, 0);
1399 if (GEP->getPointerOperand() == Ptr) {
1400 Align GA;
1401 if (GEP->accumulateConstantOffset(DL, Off))
1402 GA = commonAlignment(A, Off.getLimitedValue());
1403 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1404 MaxDepth - 1);
1405 }
1406 continue;
1407 }
1408 if (auto *I = dyn_cast<Instruction>(U)) {
1409 if (I->getOpcode() == Instruction::BitCast ||
1410 I->getOpcode() == Instruction::AddrSpaceCast)
1411 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1412 }
1413 }
1414 }
1415};
1416
1417class AMDGPULowerModuleLDSLegacy : public ModulePass {
1418public:
1419 const AMDGPUTargetMachine *TM;
1420 static char ID;
1421
1422 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM_ = nullptr)
1423 : ModulePass(ID), TM(TM_) {
1425 }
1426
1427 void getAnalysisUsage(AnalysisUsage &AU) const override {
1428 if (!TM)
1430 }
1431
1432 bool runOnModule(Module &M) override {
1433 if (!TM) {
1434 auto &TPC = getAnalysis<TargetPassConfig>();
1435 TM = &TPC.getTM<AMDGPUTargetMachine>();
1436 }
1437
1438 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1439 }
1440};
1441
1442} // namespace
1443char AMDGPULowerModuleLDSLegacy::ID = 0;
1444
1445char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1446
1447INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1448 "Lower uses of LDS variables from non-kernel functions",
1449 false, false)
1451INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1452 "Lower uses of LDS variables from non-kernel functions",
1454
1455ModulePass *
1457 return new AMDGPULowerModuleLDSLegacy(TM);
1458}
1459
1462 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1464}
aarch64 promote const
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Lower uses of LDS variables from non kernel functions
#define DEBUG_TYPE
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:693
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1291
Hexagon Common GEP
static const unsigned MaxDepth
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
const char LLVMTargetMachineRef TM
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:59
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:76
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1498
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:321
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
LLVM Basic Block Representation.
Definition: BasicBlock.h:60
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:409
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
CallGraphNode * getExternalCallingNode() const
Returns the CallGraphNode which is used to represent undetermined calls into the callgraph.
Definition: CallGraph.h:127
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1291
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:528
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2087
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2112
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, bool InBounds=false, std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1200
This is an important base class in LLVM.
Definition: Constant.h:41
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:722
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:110
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:155
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&... Args)
Definition: DenseMap.h:235
iterator end()
Definition: DenseMap.h:84
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:145
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition: DenseMap.h:220
Implements a dense probed hash-table based set.
Definition: DenseSet.h:271
const Function & getFunction() const
Definition: Function.h:162
void removeFnAttr(Attribute::AttrKind Kind)
Remove function attributes from this function.
Definition: Function.cpp:633
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1487
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition: Globals.cpp:133
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:293
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:58
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:51
Type * getValueType() const
Definition: GlobalValue.h:295
bool hasInitializer() const
Definitions have initializers, declarations don't.
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:467
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition: IRBuilder.h:2122
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition: IRBuilder.h:1891
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition: IRBuilder.h:1876
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition: IRBuilder.h:486
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition: IRBuilder.h:1790
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition: IRBuilder.h:180
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args=std::nullopt, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition: IRBuilder.h:2412
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2666
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:167
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:160
Metadata node.
Definition: Metadata.h:1067
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1135
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1541
static MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1122
Root of the metadata hierarchy.
Definition: Metadata.h:62
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1447
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1827
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:109
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:112
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:115
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:360
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:342
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:427
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void reserve(size_type N)
Definition: SmallVector.h:676
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
iterator insert(iterator I, T &&Elt)
Definition: SmallVector.h:818
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
Class to represent struct types.
Definition: DerivedTypes.h:216
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:513
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static IntegerType * getInt8Ty(LLVMContext &C)
static IntegerType * getInt32Ty(LLVMContext &C)
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
LLVM Value Representation.
Definition: Value.h:74
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:542
bool use_empty() const
Definition: Value.h:344
iterator_range< use_iterator > uses()
Definition: Value.h:376
bool hasName() const
Definition: Value.h:261
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:206
size_type size() const
Definition: DenseSet.h:81
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:185
bool erase(const ValueT &V)
Definition: DenseSet.h:101
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:660
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
bool isDynamicLDS(const GlobalVariable &GV)
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
bool isLDSVariableToLower(const GlobalVariable &GV)
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
bool isKernelLDS(const Function *F)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
Function * getDeclaration(Module *M, ID id, ArrayRef< Type * > Tys=std::nullopt)
Create or insert an LLVM Function declaration for an intrinsic, and return it.
Definition: Function.cpp:1469
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:718
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:450
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:456
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:361
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:656
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1647
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:156
char & AMDGPULowerModuleLDSLegacyPassID
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:125
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:191
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:134
FunctionVariableMap direct_access
FunctionVariableMap indirect_access
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85