LLVM 20.0.0git
AMDGPULowerBufferFatPointers.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerBufferFatPointers.cpp ---------------------------=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass lowers operations on buffer fat pointers (addrspace 7) to
10// operations on buffer resources (addrspace 8) and is needed for correct
11// codegen.
12//
13// # Background
14//
15// Address space 7 (the buffer fat pointer) is a 160-bit pointer that consists
16// of a 128-bit buffer descriptor and a 32-bit offset into that descriptor.
17// The buffer resource part needs to be it needs to be a "raw" buffer resource
18// (it must have a stride of 0 and bounds checks must be in raw buffer mode
19// or disabled).
20//
21// When these requirements are met, a buffer resource can be treated as a
22// typical (though quite wide) pointer that follows typical LLVM pointer
23// semantics. This allows the frontend to reason about such buffers (which are
24// often encountered in the context of SPIR-V kernels).
25//
26// However, because of their non-power-of-2 size, these fat pointers cannot be
27// present during translation to MIR (though this restriction may be lifted
28// during the transition to GlobalISel). Therefore, this pass is needed in order
29// to correctly implement these fat pointers.
30//
31// The resource intrinsics take the resource part (the address space 8 pointer)
32// and the offset part (the 32-bit integer) as separate arguments. In addition,
33// many users of these buffers manipulate the offset while leaving the resource
34// part alone. For these reasons, we want to typically separate the resource
35// and offset parts into separate variables, but combine them together when
36// encountering cases where this is required, such as by inserting these values
37// into aggretates or moving them to memory.
38//
39// Therefore, at a high level, `ptr addrspace(7) %x` becomes `ptr addrspace(8)
40// %x.rsrc` and `i32 %x.off`, which will be combined into `{ptr addrspace(8),
41// i32} %x = {%x.rsrc, %x.off}` if needed. Similarly, `vector<Nxp7>` becomes
42// `{vector<Nxp8>, vector<Nxi32 >}` and its component parts.
43//
44// # Implementation
45//
46// This pass proceeds in three main phases:
47//
48// ## Rewriting loads and stores of p7
49//
50// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`,
51// including aggregates containing such pointers, to ones that use `i160`. This
52// is handled by `StoreFatPtrsAsIntsVisitor` , which visits loads, stores, and
53// allocas and, if the loaded or stored type contains `ptr addrspace(7)`,
54// rewrites that type to one where the p7s are replaced by i160s, copying other
55// parts of aggregates as needed. In the case of a store, each pointer is
56// `ptrtoint`d to i160 before storing, and load integers are `inttoptr`d back.
57// This same transformation is applied to vectors of pointers.
58//
59// Such a transformation allows the later phases of the pass to not need
60// to handle buffer fat pointers moving to and from memory, where we load
61// have to handle the incompatibility between a `{Nxp8, Nxi32}` representation
62// and `Nxi60` directly. Instead, that transposing action (where the vectors
63// of resources and vectors of offsets are concatentated before being stored to
64// memory) are handled through implementing `inttoptr` and `ptrtoint` only.
65//
66// Atomics operations on `ptr addrspace(7)` values are not suppported, as the
67// hardware does not include a 160-bit atomic.
68//
69// ## Type remapping
70//
71// We use a `ValueMapper` to mangle uses of [vectors of] buffer fat pointers
72// to the corresponding struct type, which has a resource part and an offset
73// part.
74//
75// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
76// to, usually by way of `setType`ing values. Constants are handled here
77// because there isn't a good way to fix them up later.
78//
79// This has the downside of leaving the IR in an invalid state (for example,
80// the instruction `getelementptr {ptr addrspace(8), i32} %p, ...` will exist),
81// but all such invalid states will be resolved by the third phase.
82//
83// Functions that don't take buffer fat pointers are modified in place. Those
84// that do take such pointers have their basic blocks moved to a new function
85// with arguments that are {ptr addrspace(8), i32} arguments and return values.
86// This phase also records intrinsics so that they can be remangled or deleted
87// later.
88//
89//
90// ## Splitting pointer structs
91//
92// The meat of this pass consists of defining semantics for operations that
93// produce or consume [vectors of] buffer fat pointers in terms of their
94// resource and offset parts. This is accomplished throgh the `SplitPtrStructs`
95// visitor.
96//
97// In the first pass through each function that is being lowered, the splitter
98// inserts new instructions to implement the split-structures behavior, which is
99// needed for correctness and performance. It records a list of "split users",
100// instructions that are being replaced by operations on the resource and offset
101// parts.
102//
103// Split users do not necessarily need to produce parts themselves (
104// a `load float, ptr addrspace(7)` does not, for example), but, if they do not
105// generate fat buffer pointers, they must RAUW in their replacement
106// instructions during the initial visit.
107//
108// When these new instructions are created, they use the split parts recorded
109// for their initial arguments in order to generate their replacements, creating
110// a parallel set of instructions that does not refer to the original fat
111// pointer values but instead to their resource and offset components.
112//
113// Instructions, such as `extractvalue`, that produce buffer fat pointers from
114// sources that do not have split parts, have such parts generated using
115// `extractvalue`. This is also the initial handling of PHI nodes, which
116// are then cleaned up.
117//
118// ### Conditionals
119//
120// PHI nodes are initially given resource parts via `extractvalue`. However,
121// this is not an efficient rewrite of such nodes, as, in most cases, the
122// resource part in a conditional or loop remains constant throughout the loop
123// and only the offset varies. Failing to optimize away these constant resources
124// would cause additional registers to be sent around loops and might lead to
125// waterfall loops being generated for buffer operations due to the
126// "non-uniform" resource argument.
127//
128// Therefore, after all instructions have been visited, the pointer splitter
129// post-processes all encountered conditionals. Given a PHI node or select,
130// getPossibleRsrcRoots() collects all values that the resource parts of that
131// conditional's input could come from as well as collecting all conditional
132// instructions encountered during the search. If, after filtering out the
133// initial node itself, the set of encountered conditionals is a subset of the
134// potential roots and there is a single potential resource that isn't in the
135// conditional set, that value is the only possible value the resource argument
136// could have throughout the control flow.
137//
138// If that condition is met, then a PHI node can have its resource part changed
139// to the singleton value and then be replaced by a PHI on the offsets.
140// Otherwise, each PHI node is split into two, one for the resource part and one
141// for the offset part, which replace the temporary `extractvalue` instructions
142// that were added during the first pass.
143//
144// Similar logic applies to `select`, where
145// `%z = select i1 %cond, %cond, ptr addrspace(7) %x, ptr addrspace(7) %y`
146// can be split into `%z.rsrc = %x.rsrc` and
147// `%z.off = select i1 %cond, ptr i32 %x.off, i32 %y.off`
148// if both `%x` and `%y` have the same resource part, but two `select`
149// operations will be needed if they do not.
150//
151// ### Final processing
152//
153// After conditionals have been cleaned up, the IR for each function is
154// rewritten to remove all the old instructions that have been split up.
155//
156// Any instruction that used to produce a buffer fat pointer (and therefore now
157// produces a resource-and-offset struct after type remapping) is
158// replaced as follows:
159// 1. All debug value annotations are cloned to reflect that the resource part
160// and offset parts are computed separately and constitute different
161// fragments of the underlying source language variable.
162// 2. All uses that were themselves split are replaced by a `poison` of the
163// struct type, as they will themselves be erased soon. This rule, combined
164// with debug handling, should leave the use lists of split instructions
165// empty in almost all cases.
166// 3. If a user of the original struct-valued result remains, the structure
167// needed for the new types to work is constructed out of the newly-defined
168// parts, and the original instruction is replaced by this structure
169// before being erased. Instructions requiring this construction include
170// `ret` and `insertvalue`.
171//
172// # Consequences
173//
174// This pass does not alter the CFG.
175//
176// Alias analysis information will become coarser, as the LLVM alias analyzer
177// cannot handle the buffer intrinsics. Specifically, while we can determine
178// that the following two loads do not alias:
179// ```
180// %y = getelementptr i32, ptr addrspace(7) %x, i32 1
181// %a = load i32, ptr addrspace(7) %x
182// %b = load i32, ptr addrspace(7) %y
183// ```
184// we cannot (except through some code that runs during scheduling) determine
185// that the rewritten loads below do not alias.
186// ```
187// %y.off = add i32 %x.off, 1
188// %a = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %x.rsrc, i32
189// %x.off, ...)
190// %b = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8)
191// %x.rsrc, i32 %y.off, ...)
192// ```
193// However, existing alias information is preserved.
194//===----------------------------------------------------------------------===//
195
196#include "AMDGPU.h"
197#include "AMDGPUTargetMachine.h"
198#include "GCNSubtarget.h"
199#include "SIDefines.h"
201#include "llvm/ADT/SmallVector.h"
206#include "llvm/IR/Constants.h"
207#include "llvm/IR/DebugInfo.h"
208#include "llvm/IR/DerivedTypes.h"
209#include "llvm/IR/IRBuilder.h"
210#include "llvm/IR/InstIterator.h"
211#include "llvm/IR/InstVisitor.h"
212#include "llvm/IR/Instructions.h"
213#include "llvm/IR/Intrinsics.h"
214#include "llvm/IR/IntrinsicsAMDGPU.h"
215#include "llvm/IR/Metadata.h"
216#include "llvm/IR/Operator.h"
217#include "llvm/IR/PatternMatch.h"
220#include "llvm/Pass.h"
222#include "llvm/Support/Debug.h"
227
228#define DEBUG_TYPE "amdgpu-lower-buffer-fat-pointers"
229
230using namespace llvm;
231
232static constexpr unsigned BufferOffsetWidth = 32;
233
234namespace {
235/// Recursively replace instances of ptr addrspace(7) and vector<Nxptr
236/// addrspace(7)> with some other type as defined by the relevant subclass.
237class BufferFatPtrTypeLoweringBase : public ValueMapTypeRemapper {
239
240 Type *remapTypeImpl(Type *Ty, SmallPtrSetImpl<StructType *> &Seen);
241
242protected:
243 virtual Type *remapScalar(PointerType *PT) = 0;
244 virtual Type *remapVector(VectorType *VT) = 0;
245
246 const DataLayout &DL;
247
248public:
249 BufferFatPtrTypeLoweringBase(const DataLayout &DL) : DL(DL) {}
250 Type *remapType(Type *SrcTy) override;
251 void clear() { Map.clear(); }
252};
253
254/// Remap ptr addrspace(7) to i160 and vector<Nxptr addrspace(7)> to
255/// vector<Nxi60> in order to correctly handling loading/storing these values
256/// from memory.
257class BufferFatPtrToIntTypeMap : public BufferFatPtrTypeLoweringBase {
258 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
259
260protected:
261 Type *remapScalar(PointerType *PT) override { return DL.getIntPtrType(PT); }
262 Type *remapVector(VectorType *VT) override { return DL.getIntPtrType(VT); }
263};
264
265/// Remap ptr addrspace(7) to {ptr addrspace(8), i32} (the resource and offset
266/// parts of the pointer) so that we can easily rewrite operations on these
267/// values that aren't loading them from or storing them to memory.
268class BufferFatPtrToStructTypeMap : public BufferFatPtrTypeLoweringBase {
269 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
270
271protected:
272 Type *remapScalar(PointerType *PT) override;
273 Type *remapVector(VectorType *VT) override;
274};
275} // namespace
276
277// This code is adapted from the type remapper in lib/Linker/IRMover.cpp
278Type *BufferFatPtrTypeLoweringBase::remapTypeImpl(
280 Type **Entry = &Map[Ty];
281 if (*Entry)
282 return *Entry;
283 if (auto *PT = dyn_cast<PointerType>(Ty)) {
284 if (PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
285 return *Entry = remapScalar(PT);
286 }
287 }
288 if (auto *VT = dyn_cast<VectorType>(Ty)) {
289 auto *PT = dyn_cast<PointerType>(VT->getElementType());
290 if (PT && PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
291 return *Entry = remapVector(VT);
292 }
293 return *Entry = Ty;
294 }
295 // Whether the type is one that is structurally uniqued - that is, if it is
296 // not a named struct (the only kind of type where multiple structurally
297 // identical types that have a distinct `Type*`)
298 StructType *TyAsStruct = dyn_cast<StructType>(Ty);
299 bool IsUniqued = !TyAsStruct || TyAsStruct->isLiteral();
300 // Base case for ints, floats, opaque pointers, and so on, which don't
301 // require recursion.
302 if (Ty->getNumContainedTypes() == 0 && IsUniqued)
303 return *Entry = Ty;
304 if (!IsUniqued) {
305 // Create a dummy type for recursion purposes.
306 if (!Seen.insert(TyAsStruct).second) {
307 StructType *Placeholder = StructType::create(Ty->getContext());
308 return *Entry = Placeholder;
309 }
310 }
311 bool Changed = false;
312 SmallVector<Type *> ElementTypes(Ty->getNumContainedTypes(), nullptr);
313 for (unsigned int I = 0, E = Ty->getNumContainedTypes(); I < E; ++I) {
314 Type *OldElem = Ty->getContainedType(I);
315 Type *NewElem = remapTypeImpl(OldElem, Seen);
316 ElementTypes[I] = NewElem;
317 Changed |= (OldElem != NewElem);
318 }
319 // Recursive calls to remapTypeImpl() may have invalidated pointer.
320 Entry = &Map[Ty];
321 if (!Changed) {
322 return *Entry = Ty;
323 }
324 if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
325 return *Entry = ArrayType::get(ElementTypes[0], ArrTy->getNumElements());
326 if (auto *FnTy = dyn_cast<FunctionType>(Ty))
327 return *Entry = FunctionType::get(ElementTypes[0],
328 ArrayRef(ElementTypes).slice(1),
329 FnTy->isVarArg());
330 if (auto *STy = dyn_cast<StructType>(Ty)) {
331 // Genuine opaque types don't have a remapping.
332 if (STy->isOpaque())
333 return *Entry = Ty;
334 bool IsPacked = STy->isPacked();
335 if (IsUniqued)
336 return *Entry = StructType::get(Ty->getContext(), ElementTypes, IsPacked);
337 SmallString<16> Name(STy->getName());
338 STy->setName("");
339 Type **RecursionEntry = &Map[Ty];
340 if (*RecursionEntry) {
341 auto *Placeholder = cast<StructType>(*RecursionEntry);
342 Placeholder->setBody(ElementTypes, IsPacked);
343 Placeholder->setName(Name);
344 return *Entry = Placeholder;
345 }
346 return *Entry = StructType::create(Ty->getContext(), ElementTypes, Name,
347 IsPacked);
348 }
349 llvm_unreachable("Unknown type of type that contains elements");
350}
351
352Type *BufferFatPtrTypeLoweringBase::remapType(Type *SrcTy) {
354 return remapTypeImpl(SrcTy, Visited);
355}
356
357Type *BufferFatPtrToStructTypeMap::remapScalar(PointerType *PT) {
358 LLVMContext &Ctx = PT->getContext();
359 return StructType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
361}
362
363Type *BufferFatPtrToStructTypeMap::remapVector(VectorType *VT) {
364 ElementCount EC = VT->getElementCount();
365 LLVMContext &Ctx = VT->getContext();
366 Type *RsrcVec =
367 VectorType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE), EC);
368 Type *OffVec = VectorType::get(IntegerType::get(Ctx, BufferOffsetWidth), EC);
369 return StructType::get(RsrcVec, OffVec);
370}
371
372static bool isBufferFatPtrOrVector(Type *Ty) {
373 if (auto *PT = dyn_cast<PointerType>(Ty->getScalarType()))
374 return PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER;
375 return false;
376}
377
378// True if the type is {ptr addrspace(8), i32} or a struct containing vectors of
379// those types. Used to quickly skip instructions we don't need to process.
380static bool isSplitFatPtr(Type *Ty) {
381 auto *ST = dyn_cast<StructType>(Ty);
382 if (!ST)
383 return false;
384 if (!ST->isLiteral() || ST->getNumElements() != 2)
385 return false;
386 auto *MaybeRsrc =
387 dyn_cast<PointerType>(ST->getElementType(0)->getScalarType());
388 auto *MaybeOff =
389 dyn_cast<IntegerType>(ST->getElementType(1)->getScalarType());
390 return MaybeRsrc && MaybeOff &&
391 MaybeRsrc->getAddressSpace() == AMDGPUAS::BUFFER_RESOURCE &&
392 MaybeOff->getBitWidth() == BufferOffsetWidth;
393}
394
395// True if the result type or any argument types are buffer fat pointers.
397 Type *T = C->getType();
398 return isBufferFatPtrOrVector(T) || any_of(C->operands(), [](const Use &U) {
399 return isBufferFatPtrOrVector(U.get()->getType());
400 });
401}
402
403namespace {
404/// Convert [vectors of] buffer fat pointers to integers when they are read from
405/// or stored to memory. This ensures that these pointers will have the same
406/// memory layout as before they are lowered, even though they will no longer
407/// have their previous layout in registers/in the program (they'll be broken
408/// down into resource and offset parts). This has the downside of imposing
409/// marshalling costs when reading or storing these values, but since placing
410/// such pointers into memory is an uncommon operation at best, we feel that
411/// this cost is acceptable for better performance in the common case.
412class StoreFatPtrsAsIntsVisitor
413 : public InstVisitor<StoreFatPtrsAsIntsVisitor, bool> {
414 BufferFatPtrToIntTypeMap *TypeMap;
415
416 ValueToValueMapTy ConvertedForStore;
417
418 IRBuilder<> IRB;
419
420 // Convert all the buffer fat pointers within the input value to inttegers
421 // so that it can be stored in memory.
422 Value *fatPtrsToInts(Value *V, Type *From, Type *To, const Twine &Name);
423 // Convert all the i160s that need to be buffer fat pointers (as specified)
424 // by the To type) into those pointers to preserve the semantics of the rest
425 // of the program.
426 Value *intsToFatPtrs(Value *V, Type *From, Type *To, const Twine &Name);
427
428public:
429 StoreFatPtrsAsIntsVisitor(BufferFatPtrToIntTypeMap *TypeMap, LLVMContext &Ctx)
430 : TypeMap(TypeMap), IRB(Ctx) {}
431 bool processFunction(Function &F);
432
433 bool visitInstruction(Instruction &I) { return false; }
435 bool visitLoadInst(LoadInst &LI);
436 bool visitStoreInst(StoreInst &SI);
438};
439} // namespace
440
441Value *StoreFatPtrsAsIntsVisitor::fatPtrsToInts(Value *V, Type *From, Type *To,
442 const Twine &Name) {
443 if (From == To)
444 return V;
445 ValueToValueMapTy::iterator Find = ConvertedForStore.find(V);
446 if (Find != ConvertedForStore.end())
447 return Find->second;
449 Value *Cast = IRB.CreatePtrToInt(V, To, Name + ".int");
450 ConvertedForStore[V] = Cast;
451 return Cast;
452 }
453 if (From->getNumContainedTypes() == 0)
454 return V;
455 // Structs, arrays, and other compound types.
457 if (auto *AT = dyn_cast<ArrayType>(From)) {
458 Type *FromPart = AT->getArrayElementType();
459 Type *ToPart = cast<ArrayType>(To)->getElementType();
460 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
461 Value *Field = IRB.CreateExtractValue(V, I);
462 Value *NewField =
463 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(I));
464 Ret = IRB.CreateInsertValue(Ret, NewField, I);
465 }
466 } else {
467 for (auto [Idx, FromPart, ToPart] :
468 enumerate(From->subtypes(), To->subtypes())) {
469 Value *Field = IRB.CreateExtractValue(V, Idx);
470 Value *NewField =
471 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(Idx));
472 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
473 }
474 }
475 ConvertedForStore[V] = Ret;
476 return Ret;
477}
478
479Value *StoreFatPtrsAsIntsVisitor::intsToFatPtrs(Value *V, Type *From, Type *To,
480 const Twine &Name) {
481 if (From == To)
482 return V;
483 if (isBufferFatPtrOrVector(To)) {
484 Value *Cast = IRB.CreateIntToPtr(V, To, Name + ".ptr");
485 return Cast;
486 }
487 if (From->getNumContainedTypes() == 0)
488 return V;
489 // Structs, arrays, and other compound types.
491 if (auto *AT = dyn_cast<ArrayType>(From)) {
492 Type *FromPart = AT->getArrayElementType();
493 Type *ToPart = cast<ArrayType>(To)->getElementType();
494 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
495 Value *Field = IRB.CreateExtractValue(V, I);
496 Value *NewField =
497 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(I));
498 Ret = IRB.CreateInsertValue(Ret, NewField, I);
499 }
500 } else {
501 for (auto [Idx, FromPart, ToPart] :
502 enumerate(From->subtypes(), To->subtypes())) {
503 Value *Field = IRB.CreateExtractValue(V, Idx);
504 Value *NewField =
505 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(Idx));
506 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
507 }
508 }
509 return Ret;
510}
511
512bool StoreFatPtrsAsIntsVisitor::processFunction(Function &F) {
513 bool Changed = false;
514 // The visitors will mutate GEPs and allocas, but will push loads and stores
515 // to the worklist to avoid invalidation.
517 Changed |= visit(I);
518 }
519 ConvertedForStore.clear();
520 return Changed;
521}
522
523bool StoreFatPtrsAsIntsVisitor::visitAllocaInst(AllocaInst &I) {
524 Type *Ty = I.getAllocatedType();
525 Type *NewTy = TypeMap->remapType(Ty);
526 if (Ty == NewTy)
527 return false;
528 I.setAllocatedType(NewTy);
529 return true;
530}
531
532bool StoreFatPtrsAsIntsVisitor::visitGetElementPtrInst(GetElementPtrInst &I) {
533 Type *Ty = I.getSourceElementType();
534 Type *NewTy = TypeMap->remapType(Ty);
535 if (Ty == NewTy)
536 return false;
537 // We'll be rewriting the type `ptr addrspace(7)` out of existence soon, so
538 // make sure GEPs don't have different semantics with the new type.
539 I.setSourceElementType(NewTy);
540 I.setResultElementType(TypeMap->remapType(I.getResultElementType()));
541 return true;
542}
543
544bool StoreFatPtrsAsIntsVisitor::visitLoadInst(LoadInst &LI) {
545 Type *Ty = LI.getType();
546 Type *IntTy = TypeMap->remapType(Ty);
547 if (Ty == IntTy)
548 return false;
549
550 IRB.SetInsertPoint(&LI);
551 auto *NLI = cast<LoadInst>(LI.clone());
552 NLI->mutateType(IntTy);
553 NLI = IRB.Insert(NLI);
554 copyMetadataForLoad(*NLI, LI);
555 NLI->takeName(&LI);
556
557 Value *CastBack = intsToFatPtrs(NLI, IntTy, Ty, NLI->getName());
558 LI.replaceAllUsesWith(CastBack);
559 LI.eraseFromParent();
560 return true;
561}
562
563bool StoreFatPtrsAsIntsVisitor::visitStoreInst(StoreInst &SI) {
564 Value *V = SI.getValueOperand();
565 Type *Ty = V->getType();
566 Type *IntTy = TypeMap->remapType(Ty);
567 if (Ty == IntTy)
568 return false;
569
570 IRB.SetInsertPoint(&SI);
571 Value *IntV = fatPtrsToInts(V, Ty, IntTy, V->getName());
572 for (auto *Dbg : at::getAssignmentMarkers(&SI))
573 Dbg->setValue(IntV);
574
575 SI.setOperand(0, IntV);
576 return true;
577}
578
579/// Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered
580/// buffer fat pointer constant.
581static std::pair<Constant *, Constant *>
583 assert(isSplitFatPtr(C->getType()) && "Not a split fat buffer pointer");
584 return std::make_pair(C->getAggregateElement(0u), C->getAggregateElement(1u));
585}
586
587namespace {
588/// Handle the remapping of ptr addrspace(7) constants.
589class FatPtrConstMaterializer final : public ValueMaterializer {
590 BufferFatPtrToStructTypeMap *TypeMap;
591 // An internal mapper that is used to recurse into the arguments of constants.
592 // While the documentation for `ValueMapper` specifies not to use it
593 // recursively, examination of the logic in mapValue() shows that it can
594 // safely be used recursively when handling constants, like it does in its own
595 // logic.
596 ValueMapper InternalMapper;
597
598 Constant *materializeBufferFatPtrConst(Constant *C);
599
600public:
601 // UnderlyingMap is the value map this materializer will be filling.
602 FatPtrConstMaterializer(BufferFatPtrToStructTypeMap *TypeMap,
603 ValueToValueMapTy &UnderlyingMap)
604 : TypeMap(TypeMap),
605 InternalMapper(UnderlyingMap, RF_None, TypeMap, this) {}
606 virtual ~FatPtrConstMaterializer() = default;
607
608 Value *materialize(Value *V) override;
609};
610} // namespace
611
612Constant *FatPtrConstMaterializer::materializeBufferFatPtrConst(Constant *C) {
613 Type *SrcTy = C->getType();
614 auto *NewTy = dyn_cast<StructType>(TypeMap->remapType(SrcTy));
615 if (C->isNullValue())
616 return ConstantAggregateZero::getNullValue(NewTy);
617 if (isa<PoisonValue>(C)) {
618 return ConstantStruct::get(NewTy,
619 {PoisonValue::get(NewTy->getElementType(0)),
620 PoisonValue::get(NewTy->getElementType(1))});
621 }
622 if (isa<UndefValue>(C)) {
623 return ConstantStruct::get(NewTy,
624 {UndefValue::get(NewTy->getElementType(0)),
625 UndefValue::get(NewTy->getElementType(1))});
626 }
627
628 if (auto *VC = dyn_cast<ConstantVector>(C)) {
629 if (Constant *S = VC->getSplatValue()) {
630 Constant *NewS = InternalMapper.mapConstant(*S);
631 if (!NewS)
632 return nullptr;
633 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewS);
634 auto EC = VC->getType()->getElementCount();
635 return ConstantStruct::get(NewTy, {ConstantVector::getSplat(EC, Rsrc),
636 ConstantVector::getSplat(EC, Off)});
637 }
640 for (Value *Op : VC->operand_values()) {
641 auto *NewOp = dyn_cast_or_null<Constant>(InternalMapper.mapValue(*Op));
642 if (!NewOp)
643 return nullptr;
644 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewOp);
645 Rsrcs.push_back(Rsrc);
646 Offs.push_back(Off);
647 }
648 Constant *RsrcVec = ConstantVector::get(Rsrcs);
649 Constant *OffVec = ConstantVector::get(Offs);
650 return ConstantStruct::get(NewTy, {RsrcVec, OffVec});
651 }
652
653 if (isa<GlobalValue>(C))
654 report_fatal_error("Global values containing ptr addrspace(7) (buffer "
655 "fat pointer) values are not supported");
656
657 if (isa<ConstantExpr>(C))
658 report_fatal_error("Constant exprs containing ptr addrspace(7) (buffer "
659 "fat pointer) values should have been expanded earlier");
660
661 return nullptr;
662}
663
664Value *FatPtrConstMaterializer::materialize(Value *V) {
665 Constant *C = dyn_cast<Constant>(V);
666 if (!C)
667 return nullptr;
668 // Structs and other types that happen to contain fat pointers get remapped
669 // by the mapValue() logic.
671 return nullptr;
672 return materializeBufferFatPtrConst(C);
673}
674
675using PtrParts = std::pair<Value *, Value *>;
676namespace {
677// The visitor returns the resource and offset parts for an instruction if they
678// can be computed, or (nullptr, nullptr) for cases that don't have a meaningful
679// value mapping.
680class SplitPtrStructs : public InstVisitor<SplitPtrStructs, PtrParts> {
681 ValueToValueMapTy RsrcParts;
682 ValueToValueMapTy OffParts;
683
684 // Track instructions that have been rewritten into a user of the component
685 // parts of their ptr addrspace(7) input. Instructions that produced
686 // ptr addrspace(7) parts should **not** be RAUW'd before being added to this
687 // set, as that replacement will be handled in a post-visit step. However,
688 // instructions that yield values that aren't fat pointers (ex. ptrtoint)
689 // should RAUW themselves with new instructions that use the split parts
690 // of their arguments during processing.
691 DenseSet<Instruction *> SplitUsers;
692
693 // Nodes that need a second look once we've computed the parts for all other
694 // instructions to see if, for example, we really need to phi on the resource
695 // part.
696 SmallVector<Instruction *> Conditionals;
697 // Temporary instructions produced while lowering conditionals that should be
698 // killed.
699 SmallVector<Instruction *> ConditionalTemps;
700
701 // Subtarget info, needed for determining what cache control bits to set.
702 const TargetMachine *TM;
703 const GCNSubtarget *ST = nullptr;
704
705 IRBuilder<> IRB;
706
707 // Copy metadata between instructions if applicable.
708 void copyMetadata(Value *Dest, Value *Src);
709
710 // Get the resource and offset parts of the value V, inserting appropriate
711 // extractvalue calls if needed.
712 PtrParts getPtrParts(Value *V);
713
714 // Given an instruction that could produce multiple resource parts (a PHI or
715 // select), collect the set of possible instructions that could have provided
716 // its resource parts that it could have (the `Roots`) and the set of
717 // conditional instructions visited during the search (`Seen`). If, after
718 // removing the root of the search from `Seen` and `Roots`, `Seen` is a subset
719 // of `Roots` and `Roots - Seen` contains one element, the resource part of
720 // that element can replace the resource part of all other elements in `Seen`.
721 void getPossibleRsrcRoots(Instruction *I, SmallPtrSetImpl<Value *> &Roots,
723 void processConditionals();
724
725 // If an instruction hav been split into resource and offset parts,
726 // delete that instruction. If any of its uses have not themselves been split
727 // into parts (for example, an insertvalue), construct the structure
728 // that the type rewrites declared should be produced by the dying instruction
729 // and use that.
730 // Also, kill the temporary extractvalue operations produced by the two-stage
731 // lowering of PHIs and conditionals.
732 void killAndReplaceSplitInstructions(SmallVectorImpl<Instruction *> &Origs);
733
734 void setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx);
735 void insertPreMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
736 void insertPostMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
737 Value *handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr, Type *Ty,
738 Align Alignment, AtomicOrdering Order,
739 bool IsVolatile, SyncScope::ID SSID);
740
741public:
742 SplitPtrStructs(LLVMContext &Ctx, const TargetMachine *TM)
743 : TM(TM), IRB(Ctx) {}
744
745 void processFunction(Function &F);
746
753
759
763
766
768};
769} // namespace
770
771void SplitPtrStructs::copyMetadata(Value *Dest, Value *Src) {
772 auto *DestI = dyn_cast<Instruction>(Dest);
773 auto *SrcI = dyn_cast<Instruction>(Src);
774
775 if (!DestI || !SrcI)
776 return;
777
778 DestI->copyMetadata(*SrcI);
779}
780
781PtrParts SplitPtrStructs::getPtrParts(Value *V) {
782 assert(isSplitFatPtr(V->getType()) && "it's not meaningful to get the parts "
783 "of something that wasn't rewritten");
784 auto *RsrcEntry = &RsrcParts[V];
785 auto *OffEntry = &OffParts[V];
786 if (*RsrcEntry && *OffEntry)
787 return {*RsrcEntry, *OffEntry};
788
789 if (auto *C = dyn_cast<Constant>(V)) {
790 auto [Rsrc, Off] = splitLoweredFatBufferConst(C);
791 return {*RsrcEntry = Rsrc, *OffEntry = Off};
792 }
793
795 if (auto *I = dyn_cast<Instruction>(V)) {
796 LLVM_DEBUG(dbgs() << "Recursing to split parts of " << *I << "\n");
797 auto [Rsrc, Off] = visit(*I);
798 if (Rsrc && Off)
799 return {*RsrcEntry = Rsrc, *OffEntry = Off};
800 // We'll be creating the new values after the relevant instruction.
801 // This instruction generates a value and so isn't a terminator.
802 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
803 IRB.SetCurrentDebugLocation(I->getDebugLoc());
804 } else if (auto *A = dyn_cast<Argument>(V)) {
805 IRB.SetInsertPointPastAllocas(A->getParent());
806 IRB.SetCurrentDebugLocation(DebugLoc());
807 }
808 Value *Rsrc = IRB.CreateExtractValue(V, 0, V->getName() + ".rsrc");
809 Value *Off = IRB.CreateExtractValue(V, 1, V->getName() + ".off");
810 return {*RsrcEntry = Rsrc, *OffEntry = Off};
811}
812
813/// Returns the instruction that defines the resource part of the value V.
814/// Note that this is not getUnderlyingObject(), since that looks through
815/// operations like ptrmask which might modify the resource part.
816///
817/// We can limit ourselves to just looking through GEPs followed by looking
818/// through addrspacecasts because only those two operations preserve the
819/// resource part, and because operations on an `addrspace(8)` (which is the
820/// legal input to this addrspacecast) would produce a different resource part.
822 while (auto *GEP = dyn_cast<GEPOperator>(V))
823 V = GEP->getPointerOperand();
824 while (auto *ASC = dyn_cast<AddrSpaceCastOperator>(V))
825 V = ASC->getPointerOperand();
826 return V;
827}
828
829void SplitPtrStructs::getPossibleRsrcRoots(Instruction *I,
832 if (auto *PHI = dyn_cast<PHINode>(I)) {
833 if (!Seen.insert(I).second)
834 return;
835 for (Value *In : PHI->incoming_values()) {
836 In = rsrcPartRoot(In);
837 Roots.insert(In);
838 if (isa<PHINode, SelectInst>(In))
839 getPossibleRsrcRoots(cast<Instruction>(In), Roots, Seen);
840 }
841 } else if (auto *SI = dyn_cast<SelectInst>(I)) {
842 if (!Seen.insert(SI).second)
843 return;
844 Value *TrueVal = rsrcPartRoot(SI->getTrueValue());
845 Value *FalseVal = rsrcPartRoot(SI->getFalseValue());
846 Roots.insert(TrueVal);
847 Roots.insert(FalseVal);
848 if (isa<PHINode, SelectInst>(TrueVal))
849 getPossibleRsrcRoots(cast<Instruction>(TrueVal), Roots, Seen);
850 if (isa<PHINode, SelectInst>(FalseVal))
851 getPossibleRsrcRoots(cast<Instruction>(FalseVal), Roots, Seen);
852 } else {
853 llvm_unreachable("getPossibleRsrcParts() only works on phi and select");
854 }
855}
856
857void SplitPtrStructs::processConditionals() {
861 for (Instruction *I : Conditionals) {
862 // These have to exist by now because we've visited these nodes.
863 Value *Rsrc = RsrcParts[I];
864 Value *Off = OffParts[I];
865 assert(Rsrc && Off && "must have visited conditionals by now");
866
867 std::optional<Value *> MaybeRsrc;
868 auto MaybeFoundRsrc = FoundRsrcs.find(I);
869 if (MaybeFoundRsrc != FoundRsrcs.end()) {
870 MaybeRsrc = MaybeFoundRsrc->second;
871 } else {
873 Roots.clear();
874 Seen.clear();
875 getPossibleRsrcRoots(I, Roots, Seen);
876 LLVM_DEBUG(dbgs() << "Processing conditional: " << *I << "\n");
877#ifndef NDEBUG
878 for (Value *V : Roots)
879 LLVM_DEBUG(dbgs() << "Root: " << *V << "\n");
880 for (Value *V : Seen)
881 LLVM_DEBUG(dbgs() << "Seen: " << *V << "\n");
882#endif
883 // If we are our own possible root, then we shouldn't block our
884 // replacement with a valid incoming value.
885 Roots.erase(I);
886 // We don't want to block the optimization for conditionals that don't
887 // refer to themselves but did see themselves during the traversal.
888 Seen.erase(I);
889
890 if (set_is_subset(Seen, Roots)) {
891 auto Diff = set_difference(Roots, Seen);
892 if (Diff.size() == 1) {
893 Value *RootVal = *Diff.begin();
894 // Handle the case where previous loops already looked through
895 // an addrspacecast.
896 if (isSplitFatPtr(RootVal->getType()))
897 MaybeRsrc = std::get<0>(getPtrParts(RootVal));
898 else
899 MaybeRsrc = RootVal;
900 }
901 }
902 }
903
904 if (auto *PHI = dyn_cast<PHINode>(I)) {
905 Value *NewRsrc;
906 StructType *PHITy = cast<StructType>(PHI->getType());
907 IRB.SetInsertPoint(*PHI->getInsertionPointAfterDef());
908 IRB.SetCurrentDebugLocation(PHI->getDebugLoc());
909 if (MaybeRsrc) {
910 NewRsrc = *MaybeRsrc;
911 } else {
912 Type *RsrcTy = PHITy->getElementType(0);
913 auto *RsrcPHI = IRB.CreatePHI(RsrcTy, PHI->getNumIncomingValues());
914 RsrcPHI->takeName(Rsrc);
915 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
916 Value *VRsrc = std::get<0>(getPtrParts(V));
917 RsrcPHI->addIncoming(VRsrc, BB);
918 }
919 copyMetadata(RsrcPHI, PHI);
920 NewRsrc = RsrcPHI;
921 }
922
923 Type *OffTy = PHITy->getElementType(1);
924 auto *NewOff = IRB.CreatePHI(OffTy, PHI->getNumIncomingValues());
925 NewOff->takeName(Off);
926 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
927 assert(OffParts.count(V) && "An offset part had to be created by now");
928 Value *VOff = std::get<1>(getPtrParts(V));
929 NewOff->addIncoming(VOff, BB);
930 }
931 copyMetadata(NewOff, PHI);
932
933 // Note: We don't eraseFromParent() the temporaries because we don't want
934 // to put the corrections maps in an inconstent state. That'll be handed
935 // during the rest of the killing. Also, `ValueToValueMapTy` guarantees
936 // that references in that map will be updated as well.
937 ConditionalTemps.push_back(cast<Instruction>(Rsrc));
938 ConditionalTemps.push_back(cast<Instruction>(Off));
939 Rsrc->replaceAllUsesWith(NewRsrc);
940 Off->replaceAllUsesWith(NewOff);
941
942 // Save on recomputing the cycle traversals in known-root cases.
943 if (MaybeRsrc)
944 for (Value *V : Seen)
945 FoundRsrcs[cast<Instruction>(V)] = NewRsrc;
946 } else if (isa<SelectInst>(I)) {
947 if (MaybeRsrc) {
948 ConditionalTemps.push_back(cast<Instruction>(Rsrc));
949 Rsrc->replaceAllUsesWith(*MaybeRsrc);
950 for (Value *V : Seen)
951 FoundRsrcs[cast<Instruction>(V)] = *MaybeRsrc;
952 }
953 } else {
954 llvm_unreachable("Only PHIs and selects go in the conditionals list");
955 }
956 }
957}
958
959void SplitPtrStructs::killAndReplaceSplitInstructions(
961 for (Instruction *I : ConditionalTemps)
962 I->eraseFromParent();
963
964 for (Instruction *I : Origs) {
965 if (!SplitUsers.contains(I))
966 continue;
967
969 findDbgValues(Dbgs, I);
970 for (auto *Dbg : Dbgs) {
971 IRB.SetInsertPoint(Dbg);
972 auto &DL = I->getDataLayout();
973 assert(isSplitFatPtr(I->getType()) &&
974 "We should've RAUW'd away loads, stores, etc. at this point");
975 auto *OffDbg = cast<DbgValueInst>(Dbg->clone());
976 copyMetadata(OffDbg, Dbg);
977 auto [Rsrc, Off] = getPtrParts(I);
978
979 int64_t RsrcSz = DL.getTypeSizeInBits(Rsrc->getType());
980 int64_t OffSz = DL.getTypeSizeInBits(Off->getType());
981
982 std::optional<DIExpression *> RsrcExpr =
984 RsrcSz);
985 std::optional<DIExpression *> OffExpr =
986 DIExpression::createFragmentExpression(Dbg->getExpression(), RsrcSz,
987 OffSz);
988 if (OffExpr) {
989 OffDbg->setExpression(*OffExpr);
990 OffDbg->replaceVariableLocationOp(I, Off);
991 IRB.Insert(OffDbg);
992 } else {
993 OffDbg->deleteValue();
994 }
995 if (RsrcExpr) {
996 Dbg->setExpression(*RsrcExpr);
997 Dbg->replaceVariableLocationOp(I, Rsrc);
998 } else {
999 Dbg->replaceVariableLocationOp(I, UndefValue::get(I->getType()));
1000 }
1001 }
1002
1003 Value *Poison = PoisonValue::get(I->getType());
1004 I->replaceUsesWithIf(Poison, [&](const Use &U) -> bool {
1005 if (const auto *UI = dyn_cast<Instruction>(U.getUser()))
1006 return SplitUsers.contains(UI);
1007 return false;
1008 });
1009
1010 if (I->use_empty()) {
1011 I->eraseFromParent();
1012 continue;
1013 }
1014 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1015 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1016 auto [Rsrc, Off] = getPtrParts(I);
1017 Value *Struct = PoisonValue::get(I->getType());
1018 Struct = IRB.CreateInsertValue(Struct, Rsrc, 0);
1019 Struct = IRB.CreateInsertValue(Struct, Off, 1);
1020 copyMetadata(Struct, I);
1021 Struct->takeName(I);
1022 I->replaceAllUsesWith(Struct);
1023 I->eraseFromParent();
1024 }
1025}
1026
1027void SplitPtrStructs::setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx) {
1028 LLVMContext &Ctx = Intr->getContext();
1029 Intr->addParamAttr(RsrcArgIdx, Attribute::getWithAlignment(Ctx, A));
1030}
1031
1032void SplitPtrStructs::insertPreMemOpFence(AtomicOrdering Order,
1033 SyncScope::ID SSID) {
1034 switch (Order) {
1035 case AtomicOrdering::Release:
1036 case AtomicOrdering::AcquireRelease:
1037 case AtomicOrdering::SequentiallyConsistent:
1038 IRB.CreateFence(AtomicOrdering::Release, SSID);
1039 break;
1040 default:
1041 break;
1042 }
1043}
1044
1045void SplitPtrStructs::insertPostMemOpFence(AtomicOrdering Order,
1046 SyncScope::ID SSID) {
1047 switch (Order) {
1048 case AtomicOrdering::Acquire:
1049 case AtomicOrdering::AcquireRelease:
1050 case AtomicOrdering::SequentiallyConsistent:
1051 IRB.CreateFence(AtomicOrdering::Acquire, SSID);
1052 break;
1053 default:
1054 break;
1055 }
1056}
1057
1058Value *SplitPtrStructs::handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr,
1059 Type *Ty, Align Alignment,
1060 AtomicOrdering Order, bool IsVolatile,
1061 SyncScope::ID SSID) {
1062 IRB.SetInsertPoint(I);
1063
1064 auto [Rsrc, Off] = getPtrParts(Ptr);
1066 if (Arg)
1067 Args.push_back(Arg);
1068 Args.push_back(Rsrc);
1069 Args.push_back(Off);
1070 insertPreMemOpFence(Order, SSID);
1071 // soffset is always 0 for these cases, where we always want any offset to be
1072 // part of bounds checking and we don't know which parts of the GEPs is
1073 // uniform.
1074 Args.push_back(IRB.getInt32(0));
1075
1076 uint32_t Aux = 0;
1077 if (IsVolatile)
1079 Args.push_back(IRB.getInt32(Aux));
1080
1082 if (isa<LoadInst>(I))
1083 IID = Order == AtomicOrdering::NotAtomic
1084 ? Intrinsic::amdgcn_raw_ptr_buffer_load
1085 : Intrinsic::amdgcn_raw_ptr_atomic_buffer_load;
1086 else if (isa<StoreInst>(I))
1087 IID = Intrinsic::amdgcn_raw_ptr_buffer_store;
1088 else if (auto *RMW = dyn_cast<AtomicRMWInst>(I)) {
1089 switch (RMW->getOperation()) {
1091 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap;
1092 break;
1093 case AtomicRMWInst::Add:
1094 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_add;
1095 break;
1096 case AtomicRMWInst::Sub:
1097 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub;
1098 break;
1099 case AtomicRMWInst::And:
1100 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_and;
1101 break;
1102 case AtomicRMWInst::Or:
1103 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_or;
1104 break;
1105 case AtomicRMWInst::Xor:
1106 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_xor;
1107 break;
1108 case AtomicRMWInst::Max:
1109 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smax;
1110 break;
1111 case AtomicRMWInst::Min:
1112 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smin;
1113 break;
1115 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umax;
1116 break;
1118 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umin;
1119 break;
1121 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fadd;
1122 break;
1124 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmax;
1125 break;
1127 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmin;
1128 break;
1129 case AtomicRMWInst::FSub: {
1130 report_fatal_error("atomic floating point subtraction not supported for "
1131 "buffer resources and should've been expanded away");
1132 break;
1133 }
1135 report_fatal_error("atomic nand not supported for buffer resources and "
1136 "should've been expanded away");
1137 break;
1140 report_fatal_error("wrapping increment/decrement not supported for "
1141 "buffer resources and should've ben expanded away");
1142 break;
1144 llvm_unreachable("Not sure how we got a bad binop");
1147 break;
1148 }
1149 }
1150
1151 auto *Call = IRB.CreateIntrinsic(IID, Ty, Args);
1152 copyMetadata(Call, I);
1153 setAlign(Call, Alignment, Arg ? 1 : 0);
1154 Call->takeName(I);
1155
1156 insertPostMemOpFence(Order, SSID);
1157 // The "no moving p7 directly" rewrites ensure that this load or store won't
1158 // itself need to be split into parts.
1159 SplitUsers.insert(I);
1160 I->replaceAllUsesWith(Call);
1161 return Call;
1162}
1163
1164PtrParts SplitPtrStructs::visitInstruction(Instruction &I) {
1165 return {nullptr, nullptr};
1166}
1167
1168PtrParts SplitPtrStructs::visitLoadInst(LoadInst &LI) {
1170 return {nullptr, nullptr};
1171 handleMemoryInst(&LI, nullptr, LI.getPointerOperand(), LI.getType(),
1172 LI.getAlign(), LI.getOrdering(), LI.isVolatile(),
1173 LI.getSyncScopeID());
1174 return {nullptr, nullptr};
1175}
1176
1177PtrParts SplitPtrStructs::visitStoreInst(StoreInst &SI) {
1178 if (!isSplitFatPtr(SI.getPointerOperandType()))
1179 return {nullptr, nullptr};
1180 Value *Arg = SI.getValueOperand();
1181 handleMemoryInst(&SI, Arg, SI.getPointerOperand(), Arg->getType(),
1182 SI.getAlign(), SI.getOrdering(), SI.isVolatile(),
1183 SI.getSyncScopeID());
1184 return {nullptr, nullptr};
1185}
1186
1187PtrParts SplitPtrStructs::visitAtomicRMWInst(AtomicRMWInst &AI) {
1189 return {nullptr, nullptr};
1190 Value *Arg = AI.getValOperand();
1191 handleMemoryInst(&AI, Arg, AI.getPointerOperand(), Arg->getType(),
1192 AI.getAlign(), AI.getOrdering(), AI.isVolatile(),
1193 AI.getSyncScopeID());
1194 return {nullptr, nullptr};
1195}
1196
1197// Unlike load, store, and RMW, cmpxchg needs special handling to account
1198// for the boolean argument.
1199PtrParts SplitPtrStructs::visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI) {
1200 Value *Ptr = AI.getPointerOperand();
1201 if (!isSplitFatPtr(Ptr->getType()))
1202 return {nullptr, nullptr};
1203 IRB.SetInsertPoint(&AI);
1204
1205 Type *Ty = AI.getNewValOperand()->getType();
1206 AtomicOrdering Order = AI.getMergedOrdering();
1207 SyncScope::ID SSID = AI.getSyncScopeID();
1208 bool IsNonTemporal = AI.getMetadata(LLVMContext::MD_nontemporal);
1209
1210 auto [Rsrc, Off] = getPtrParts(Ptr);
1211 insertPreMemOpFence(Order, SSID);
1212
1213 uint32_t Aux = 0;
1214 if (IsNonTemporal)
1215 Aux |= AMDGPU::CPol::SLC;
1216 if (AI.isVolatile())
1218 auto *Call =
1219 IRB.CreateIntrinsic(Intrinsic::amdgcn_raw_ptr_buffer_atomic_cmpswap, Ty,
1220 {AI.getNewValOperand(), AI.getCompareOperand(), Rsrc,
1221 Off, IRB.getInt32(0), IRB.getInt32(Aux)});
1222 copyMetadata(Call, &AI);
1223 setAlign(Call, AI.getAlign(), 2);
1224 Call->takeName(&AI);
1225 insertPostMemOpFence(Order, SSID);
1226
1227 Value *Res = PoisonValue::get(AI.getType());
1228 Res = IRB.CreateInsertValue(Res, Call, 0);
1229 if (!AI.isWeak()) {
1230 Value *Succeeded = IRB.CreateICmpEQ(Call, AI.getCompareOperand());
1231 Res = IRB.CreateInsertValue(Res, Succeeded, 1);
1232 }
1233 SplitUsers.insert(&AI);
1234 AI.replaceAllUsesWith(Res);
1235 return {nullptr, nullptr};
1236}
1237
1238PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
1239 using namespace llvm::PatternMatch;
1240 Value *Ptr = GEP.getPointerOperand();
1241 if (!isSplitFatPtr(Ptr->getType()))
1242 return {nullptr, nullptr};
1243 IRB.SetInsertPoint(&GEP);
1244
1245 auto [Rsrc, Off] = getPtrParts(Ptr);
1246 const DataLayout &DL = GEP.getDataLayout();
1247 bool IsNUW = GEP.hasNoUnsignedWrap();
1248 bool IsNUSW = GEP.hasNoUnsignedSignedWrap();
1249
1250 // In order to call emitGEPOffset() and thus not have to reimplement it,
1251 // we need the GEP result to have ptr addrspace(7) type.
1252 Type *FatPtrTy = IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER);
1253 if (auto *VT = dyn_cast<VectorType>(Off->getType()))
1254 FatPtrTy = VectorType::get(FatPtrTy, VT->getElementCount());
1255 GEP.mutateType(FatPtrTy);
1256 Value *OffAccum = emitGEPOffset(&IRB, DL, &GEP);
1257 GEP.mutateType(Ptr->getType());
1258 if (match(OffAccum, m_Zero())) { // Constant-zero offset
1259 SplitUsers.insert(&GEP);
1260 return {Rsrc, Off};
1261 }
1262
1263 bool HasNonNegativeOff = false;
1264 if (auto *CI = dyn_cast<ConstantInt>(OffAccum)) {
1265 HasNonNegativeOff = !CI->isNegative();
1266 }
1267 Value *NewOff;
1268 if (match(Off, m_Zero())) {
1269 NewOff = OffAccum;
1270 } else {
1271 NewOff = IRB.CreateAdd(Off, OffAccum, "",
1272 /*hasNUW=*/IsNUW || (IsNUSW && HasNonNegativeOff),
1273 /*hasNSW=*/false);
1274 }
1275 copyMetadata(NewOff, &GEP);
1276 NewOff->takeName(&GEP);
1277 SplitUsers.insert(&GEP);
1278 return {Rsrc, NewOff};
1279}
1280
1281PtrParts SplitPtrStructs::visitPtrToIntInst(PtrToIntInst &PI) {
1282 Value *Ptr = PI.getPointerOperand();
1283 if (!isSplitFatPtr(Ptr->getType()))
1284 return {nullptr, nullptr};
1285 IRB.SetInsertPoint(&PI);
1286
1287 Type *ResTy = PI.getType();
1288 unsigned Width = ResTy->getScalarSizeInBits();
1289
1290 auto [Rsrc, Off] = getPtrParts(Ptr);
1291 const DataLayout &DL = PI.getDataLayout();
1292 unsigned FatPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
1293
1294 Value *Res;
1295 if (Width <= BufferOffsetWidth) {
1296 Res = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1297 PI.getName() + ".off");
1298 } else {
1299 Value *RsrcInt = IRB.CreatePtrToInt(Rsrc, ResTy, PI.getName() + ".rsrc");
1300 Value *Shl = IRB.CreateShl(
1301 RsrcInt,
1302 ConstantExpr::getIntegerValue(ResTy, APInt(Width, BufferOffsetWidth)),
1303 "", Width >= FatPtrWidth, Width > FatPtrWidth);
1304 Value *OffCast = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1305 PI.getName() + ".off");
1306 Res = IRB.CreateOr(Shl, OffCast);
1307 }
1308
1309 copyMetadata(Res, &PI);
1310 Res->takeName(&PI);
1311 SplitUsers.insert(&PI);
1312 PI.replaceAllUsesWith(Res);
1313 return {nullptr, nullptr};
1314}
1315
1316PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
1317 if (!isSplitFatPtr(IP.getType()))
1318 return {nullptr, nullptr};
1319 IRB.SetInsertPoint(&IP);
1320 const DataLayout &DL = IP.getDataLayout();
1321 unsigned RsrcPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_RESOURCE);
1322 Value *Int = IP.getOperand(0);
1323 Type *IntTy = Int->getType();
1324 Type *RsrcIntTy = IntTy->getWithNewBitWidth(RsrcPtrWidth);
1325 unsigned Width = IntTy->getScalarSizeInBits();
1326
1327 auto *RetTy = cast<StructType>(IP.getType());
1328 Type *RsrcTy = RetTy->getElementType(0);
1329 Type *OffTy = RetTy->getElementType(1);
1330 Value *RsrcPart = IRB.CreateLShr(
1331 Int,
1332 ConstantExpr::getIntegerValue(IntTy, APInt(Width, BufferOffsetWidth)));
1333 Value *RsrcInt = IRB.CreateIntCast(RsrcPart, RsrcIntTy, /*isSigned=*/false);
1334 Value *Rsrc = IRB.CreateIntToPtr(RsrcInt, RsrcTy, IP.getName() + ".rsrc");
1335 Value *Off =
1336 IRB.CreateIntCast(Int, OffTy, /*IsSigned=*/false, IP.getName() + ".off");
1337
1338 copyMetadata(Rsrc, &IP);
1339 SplitUsers.insert(&IP);
1340 return {Rsrc, Off};
1341}
1342
1343PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
1344 if (!isSplitFatPtr(I.getType()))
1345 return {nullptr, nullptr};
1346 IRB.SetInsertPoint(&I);
1347 Value *In = I.getPointerOperand();
1348 // No-op casts preserve parts
1349 if (In->getType() == I.getType()) {
1350 auto [Rsrc, Off] = getPtrParts(In);
1351 SplitUsers.insert(&I);
1352 return {Rsrc, Off};
1353 }
1354 if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
1355 report_fatal_error("Only buffer resources (addrspace 8) can be cast to "
1356 "buffer fat pointers (addrspace 7)");
1357 Type *OffTy = cast<StructType>(I.getType())->getElementType(1);
1358 Value *ZeroOff = Constant::getNullValue(OffTy);
1359 SplitUsers.insert(&I);
1360 return {In, ZeroOff};
1361}
1362
1363PtrParts SplitPtrStructs::visitICmpInst(ICmpInst &Cmp) {
1364 Value *Lhs = Cmp.getOperand(0);
1365 if (!isSplitFatPtr(Lhs->getType()))
1366 return {nullptr, nullptr};
1367 Value *Rhs = Cmp.getOperand(1);
1368 IRB.SetInsertPoint(&Cmp);
1369 ICmpInst::Predicate Pred = Cmp.getPredicate();
1370
1371 assert((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
1372 "Pointer comparison is only equal or unequal");
1373 auto [LhsRsrc, LhsOff] = getPtrParts(Lhs);
1374 auto [RhsRsrc, RhsOff] = getPtrParts(Rhs);
1375 Value *RsrcCmp =
1376 IRB.CreateICmp(Pred, LhsRsrc, RhsRsrc, Cmp.getName() + ".rsrc");
1377 copyMetadata(RsrcCmp, &Cmp);
1378 Value *OffCmp = IRB.CreateICmp(Pred, LhsOff, RhsOff, Cmp.getName() + ".off");
1379 copyMetadata(OffCmp, &Cmp);
1380
1381 Value *Res = nullptr;
1382 if (Pred == ICmpInst::ICMP_EQ)
1383 Res = IRB.CreateAnd(RsrcCmp, OffCmp);
1384 else if (Pred == ICmpInst::ICMP_NE)
1385 Res = IRB.CreateOr(RsrcCmp, OffCmp);
1386 copyMetadata(Res, &Cmp);
1387 Res->takeName(&Cmp);
1388 SplitUsers.insert(&Cmp);
1389 Cmp.replaceAllUsesWith(Res);
1390 return {nullptr, nullptr};
1391}
1392
1393PtrParts SplitPtrStructs::visitFreezeInst(FreezeInst &I) {
1394 if (!isSplitFatPtr(I.getType()))
1395 return {nullptr, nullptr};
1396 IRB.SetInsertPoint(&I);
1397 auto [Rsrc, Off] = getPtrParts(I.getOperand(0));
1398
1399 Value *RsrcRes = IRB.CreateFreeze(Rsrc, I.getName() + ".rsrc");
1400 copyMetadata(RsrcRes, &I);
1401 Value *OffRes = IRB.CreateFreeze(Off, I.getName() + ".off");
1402 copyMetadata(OffRes, &I);
1403 SplitUsers.insert(&I);
1404 return {RsrcRes, OffRes};
1405}
1406
1407PtrParts SplitPtrStructs::visitExtractElementInst(ExtractElementInst &I) {
1408 if (!isSplitFatPtr(I.getType()))
1409 return {nullptr, nullptr};
1410 IRB.SetInsertPoint(&I);
1411 Value *Vec = I.getVectorOperand();
1412 Value *Idx = I.getIndexOperand();
1413 auto [Rsrc, Off] = getPtrParts(Vec);
1414
1415 Value *RsrcRes = IRB.CreateExtractElement(Rsrc, Idx, I.getName() + ".rsrc");
1416 copyMetadata(RsrcRes, &I);
1417 Value *OffRes = IRB.CreateExtractElement(Off, Idx, I.getName() + ".off");
1418 copyMetadata(OffRes, &I);
1419 SplitUsers.insert(&I);
1420 return {RsrcRes, OffRes};
1421}
1422
1423PtrParts SplitPtrStructs::visitInsertElementInst(InsertElementInst &I) {
1424 // The mutated instructions temporarily don't return vectors, and so
1425 // we need the generic getType() here to avoid crashes.
1426 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
1427 return {nullptr, nullptr};
1428 IRB.SetInsertPoint(&I);
1429 Value *Vec = I.getOperand(0);
1430 Value *Elem = I.getOperand(1);
1431 Value *Idx = I.getOperand(2);
1432 auto [VecRsrc, VecOff] = getPtrParts(Vec);
1433 auto [ElemRsrc, ElemOff] = getPtrParts(Elem);
1434
1435 Value *RsrcRes =
1436 IRB.CreateInsertElement(VecRsrc, ElemRsrc, Idx, I.getName() + ".rsrc");
1437 copyMetadata(RsrcRes, &I);
1438 Value *OffRes =
1439 IRB.CreateInsertElement(VecOff, ElemOff, Idx, I.getName() + ".off");
1440 copyMetadata(OffRes, &I);
1441 SplitUsers.insert(&I);
1442 return {RsrcRes, OffRes};
1443}
1444
1445PtrParts SplitPtrStructs::visitShuffleVectorInst(ShuffleVectorInst &I) {
1446 // Cast is needed for the same reason as insertelement's.
1447 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
1448 return {nullptr, nullptr};
1449 IRB.SetInsertPoint(&I);
1450
1451 Value *V1 = I.getOperand(0);
1452 Value *V2 = I.getOperand(1);
1453 ArrayRef<int> Mask = I.getShuffleMask();
1454 auto [V1Rsrc, V1Off] = getPtrParts(V1);
1455 auto [V2Rsrc, V2Off] = getPtrParts(V2);
1456
1457 Value *RsrcRes =
1458 IRB.CreateShuffleVector(V1Rsrc, V2Rsrc, Mask, I.getName() + ".rsrc");
1459 copyMetadata(RsrcRes, &I);
1460 Value *OffRes =
1461 IRB.CreateShuffleVector(V1Off, V2Off, Mask, I.getName() + ".off");
1462 copyMetadata(OffRes, &I);
1463 SplitUsers.insert(&I);
1464 return {RsrcRes, OffRes};
1465}
1466
1467PtrParts SplitPtrStructs::visitPHINode(PHINode &PHI) {
1468 if (!isSplitFatPtr(PHI.getType()))
1469 return {nullptr, nullptr};
1470 IRB.SetInsertPoint(*PHI.getInsertionPointAfterDef());
1471 // Phi nodes will be handled in post-processing after we've visited every
1472 // instruction. However, instead of just returning {nullptr, nullptr},
1473 // we explicitly create the temporary extractvalue operations that are our
1474 // temporary results so that they end up at the beginning of the block with
1475 // the PHIs.
1476 Value *TmpRsrc = IRB.CreateExtractValue(&PHI, 0, PHI.getName() + ".rsrc");
1477 Value *TmpOff = IRB.CreateExtractValue(&PHI, 1, PHI.getName() + ".off");
1478 Conditionals.push_back(&PHI);
1479 SplitUsers.insert(&PHI);
1480 return {TmpRsrc, TmpOff};
1481}
1482
1483PtrParts SplitPtrStructs::visitSelectInst(SelectInst &SI) {
1484 if (!isSplitFatPtr(SI.getType()))
1485 return {nullptr, nullptr};
1486 IRB.SetInsertPoint(&SI);
1487
1488 Value *Cond = SI.getCondition();
1489 Value *True = SI.getTrueValue();
1490 Value *False = SI.getFalseValue();
1491 auto [TrueRsrc, TrueOff] = getPtrParts(True);
1492 auto [FalseRsrc, FalseOff] = getPtrParts(False);
1493
1494 Value *RsrcRes =
1495 IRB.CreateSelect(Cond, TrueRsrc, FalseRsrc, SI.getName() + ".rsrc", &SI);
1496 copyMetadata(RsrcRes, &SI);
1497 Conditionals.push_back(&SI);
1498 Value *OffRes =
1499 IRB.CreateSelect(Cond, TrueOff, FalseOff, SI.getName() + ".off", &SI);
1500 copyMetadata(OffRes, &SI);
1501 SplitUsers.insert(&SI);
1502 return {RsrcRes, OffRes};
1503}
1504
1505/// Returns true if this intrinsic needs to be removed when it is
1506/// applied to `ptr addrspace(7)` values. Calls to these intrinsics are
1507/// rewritten into calls to versions of that intrinsic on the resource
1508/// descriptor.
1510 switch (IID) {
1511 default:
1512 return false;
1513 case Intrinsic::ptrmask:
1514 case Intrinsic::invariant_start:
1515 case Intrinsic::invariant_end:
1516 case Intrinsic::launder_invariant_group:
1517 case Intrinsic::strip_invariant_group:
1518 return true;
1519 }
1520}
1521
1522PtrParts SplitPtrStructs::visitIntrinsicInst(IntrinsicInst &I) {
1523 Intrinsic::ID IID = I.getIntrinsicID();
1524 switch (IID) {
1525 default:
1526 break;
1527 case Intrinsic::ptrmask: {
1528 Value *Ptr = I.getArgOperand(0);
1529 if (!isSplitFatPtr(Ptr->getType()))
1530 return {nullptr, nullptr};
1531 Value *Mask = I.getArgOperand(1);
1532 IRB.SetInsertPoint(&I);
1533 auto [Rsrc, Off] = getPtrParts(Ptr);
1534 if (Mask->getType() != Off->getType())
1535 report_fatal_error("offset width is not equal to index width of fat "
1536 "pointer (data layout not set up correctly?)");
1537 Value *OffRes = IRB.CreateAnd(Off, Mask, I.getName() + ".off");
1538 copyMetadata(OffRes, &I);
1539 SplitUsers.insert(&I);
1540 return {Rsrc, OffRes};
1541 }
1542 // Pointer annotation intrinsics that, given their object-wide nature
1543 // operate on the resource part.
1544 case Intrinsic::invariant_start: {
1545 Value *Ptr = I.getArgOperand(1);
1546 if (!isSplitFatPtr(Ptr->getType()))
1547 return {nullptr, nullptr};
1548 IRB.SetInsertPoint(&I);
1549 auto [Rsrc, Off] = getPtrParts(Ptr);
1550 Type *NewTy = PointerType::get(I.getContext(), AMDGPUAS::BUFFER_RESOURCE);
1551 auto *NewRsrc = IRB.CreateIntrinsic(IID, {NewTy}, {I.getOperand(0), Rsrc});
1552 copyMetadata(NewRsrc, &I);
1553 NewRsrc->takeName(&I);
1554 SplitUsers.insert(&I);
1555 I.replaceAllUsesWith(NewRsrc);
1556 return {nullptr, nullptr};
1557 }
1558 case Intrinsic::invariant_end: {
1559 Value *RealPtr = I.getArgOperand(2);
1560 if (!isSplitFatPtr(RealPtr->getType()))
1561 return {nullptr, nullptr};
1562 IRB.SetInsertPoint(&I);
1563 Value *RealRsrc = getPtrParts(RealPtr).first;
1564 Value *InvPtr = I.getArgOperand(0);
1565 Value *Size = I.getArgOperand(1);
1566 Value *NewRsrc = IRB.CreateIntrinsic(IID, {RealRsrc->getType()},
1567 {InvPtr, Size, RealRsrc});
1568 copyMetadata(NewRsrc, &I);
1569 NewRsrc->takeName(&I);
1570 SplitUsers.insert(&I);
1571 I.replaceAllUsesWith(NewRsrc);
1572 return {nullptr, nullptr};
1573 }
1574 case Intrinsic::launder_invariant_group:
1575 case Intrinsic::strip_invariant_group: {
1576 Value *Ptr = I.getArgOperand(0);
1577 if (!isSplitFatPtr(Ptr->getType()))
1578 return {nullptr, nullptr};
1579 IRB.SetInsertPoint(&I);
1580 auto [Rsrc, Off] = getPtrParts(Ptr);
1581 Value *NewRsrc = IRB.CreateIntrinsic(IID, {Rsrc->getType()}, {Rsrc});
1582 copyMetadata(NewRsrc, &I);
1583 NewRsrc->takeName(&I);
1584 SplitUsers.insert(&I);
1585 return {NewRsrc, Off};
1586 }
1587 }
1588 return {nullptr, nullptr};
1589}
1590
1591void SplitPtrStructs::processFunction(Function &F) {
1592 ST = &TM->getSubtarget<GCNSubtarget>(F);
1594 LLVM_DEBUG(dbgs() << "Splitting pointer structs in function: " << F.getName()
1595 << "\n");
1596 for (Instruction &I : instructions(F))
1597 Originals.push_back(&I);
1598 for (Instruction *I : Originals) {
1599 auto [Rsrc, Off] = visit(I);
1600 assert(((Rsrc && Off) || (!Rsrc && !Off)) &&
1601 "Can't have a resource but no offset");
1602 if (Rsrc)
1603 RsrcParts[I] = Rsrc;
1604 if (Off)
1605 OffParts[I] = Off;
1606 }
1607 processConditionals();
1608 killAndReplaceSplitInstructions(Originals);
1609
1610 // Clean up after ourselves to save on memory.
1611 RsrcParts.clear();
1612 OffParts.clear();
1613 SplitUsers.clear();
1614 Conditionals.clear();
1615 ConditionalTemps.clear();
1616}
1617
1618namespace {
1619class AMDGPULowerBufferFatPointers : public ModulePass {
1620public:
1621 static char ID;
1622
1623 AMDGPULowerBufferFatPointers() : ModulePass(ID) {
1626 }
1627
1628 bool run(Module &M, const TargetMachine &TM);
1629 bool runOnModule(Module &M) override;
1630
1631 void getAnalysisUsage(AnalysisUsage &AU) const override;
1632};
1633} // namespace
1634
1635/// Returns true if there are values that have a buffer fat pointer in them,
1636/// which means we'll need to perform rewrites on this function. As a side
1637/// effect, this will populate the type remapping cache.
1639 BufferFatPtrToStructTypeMap *TypeMap) {
1640 bool HasFatPointers = false;
1641 for (const BasicBlock &BB : F)
1642 for (const Instruction &I : BB)
1643 HasFatPointers |= (I.getType() != TypeMap->remapType(I.getType()));
1644 return HasFatPointers;
1645}
1646
1648 BufferFatPtrToStructTypeMap *TypeMap) {
1649 Type *Ty = F.getFunctionType();
1650 return Ty != TypeMap->remapType(Ty);
1651}
1652
1653/// Move the body of `OldF` into a new function, returning it.
1655 ValueToValueMapTy &CloneMap) {
1656 bool IsIntrinsic = OldF->isIntrinsic();
1657 Function *NewF =
1658 Function::Create(NewTy, OldF->getLinkage(), OldF->getAddressSpace());
1660 NewF->copyAttributesFrom(OldF);
1661 NewF->copyMetadata(OldF, 0);
1662 NewF->takeName(OldF);
1663 NewF->updateAfterNameChange();
1665 OldF->getParent()->getFunctionList().insertAfter(OldF->getIterator(), NewF);
1666
1667 while (!OldF->empty()) {
1668 BasicBlock *BB = &OldF->front();
1669 BB->removeFromParent();
1670 BB->insertInto(NewF);
1671 CloneMap[BB] = BB;
1672 for (Instruction &I : *BB) {
1673 CloneMap[&I] = &I;
1674 }
1675 }
1676
1677 AttributeMask PtrOnlyAttrs;
1678 for (auto K :
1679 {Attribute::Dereferenceable, Attribute::DereferenceableOrNull,
1680 Attribute::NoAlias, Attribute::NoCapture, Attribute::NoFree,
1681 Attribute::NonNull, Attribute::NullPointerIsValid, Attribute::ReadNone,
1682 Attribute::ReadOnly, Attribute::WriteOnly}) {
1683 PtrOnlyAttrs.addAttribute(K);
1684 }
1686 AttributeList OldAttrs = OldF->getAttributes();
1687
1688 for (auto [I, OldArg, NewArg] : enumerate(OldF->args(), NewF->args())) {
1689 CloneMap[&NewArg] = &OldArg;
1690 NewArg.takeName(&OldArg);
1691 Type *OldArgTy = OldArg.getType(), *NewArgTy = NewArg.getType();
1692 // Temporarily mutate type of `NewArg` to allow RAUW to work.
1693 NewArg.mutateType(OldArgTy);
1694 OldArg.replaceAllUsesWith(&NewArg);
1695 NewArg.mutateType(NewArgTy);
1696
1697 AttributeSet ArgAttr = OldAttrs.getParamAttrs(I);
1698 // Intrinsics get their attributes fixed later.
1699 if (OldArgTy != NewArgTy && !IsIntrinsic)
1700 ArgAttr = ArgAttr.removeAttributes(NewF->getContext(), PtrOnlyAttrs);
1701 ArgAttrs.push_back(ArgAttr);
1702 }
1703 AttributeSet RetAttrs = OldAttrs.getRetAttrs();
1704 if (OldF->getReturnType() != NewF->getReturnType() && !IsIntrinsic)
1705 RetAttrs = RetAttrs.removeAttributes(NewF->getContext(), PtrOnlyAttrs);
1707 NewF->getContext(), OldAttrs.getFnAttrs(), RetAttrs, ArgAttrs));
1708 return NewF;
1709}
1710
1712 for (Argument &A : F->args())
1713 CloneMap[&A] = &A;
1714 for (BasicBlock &BB : *F) {
1715 CloneMap[&BB] = &BB;
1716 for (Instruction &I : BB)
1717 CloneMap[&I] = &I;
1718 }
1719}
1720
1721bool AMDGPULowerBufferFatPointers::run(Module &M, const TargetMachine &TM) {
1722 bool Changed = false;
1723 const DataLayout &DL = M.getDataLayout();
1724 // Record the functions which need to be remapped.
1725 // The second element of the pair indicates whether the function has to have
1726 // its arguments or return types adjusted.
1728
1729 BufferFatPtrToStructTypeMap StructTM(DL);
1730 BufferFatPtrToIntTypeMap IntTM(DL);
1731 for (const GlobalVariable &GV : M.globals()) {
1732 if (GV.getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER)
1733 report_fatal_error("Global variables with a buffer fat pointer address "
1734 "space (7) are not supported");
1735 Type *VT = GV.getValueType();
1736 if (VT != StructTM.remapType(VT))
1737 report_fatal_error("Global variables that contain buffer fat pointers "
1738 "(address space 7 pointers) are unsupported. Use "
1739 "buffer resource pointers (address space 8) instead.");
1740 }
1741
1742 {
1743 // Collect all constant exprs and aggregates referenced by any function.
1745 for (Function &F : M.functions())
1746 for (Instruction &I : instructions(F))
1747 for (Value *Op : I.operands())
1748 if (isa<ConstantExpr>(Op) || isa<ConstantAggregate>(Op))
1749 Worklist.push_back(cast<Constant>(Op));
1750
1751 // Recursively look for any referenced buffer pointer constants.
1753 SetVector<Constant *> BufferFatPtrConsts;
1754 while (!Worklist.empty()) {
1755 Constant *C = Worklist.pop_back_val();
1756 if (!Visited.insert(C).second)
1757 continue;
1758 if (isBufferFatPtrOrVector(C->getType()))
1759 BufferFatPtrConsts.insert(C);
1760 for (Value *Op : C->operands())
1761 if (isa<ConstantExpr>(Op) || isa<ConstantAggregate>(Op))
1762 Worklist.push_back(cast<Constant>(Op));
1763 }
1764
1765 // Expand all constant expressions using fat buffer pointers to
1766 // instructions.
1768 BufferFatPtrConsts.getArrayRef(), /*RestrictToFunc=*/nullptr,
1769 /*RemoveDeadConstants=*/false, /*IncludeSelf=*/true);
1770 }
1771
1772 StoreFatPtrsAsIntsVisitor MemOpsRewrite(&IntTM, M.getContext());
1773 for (Function &F : M.functions()) {
1774 bool InterfaceChange = hasFatPointerInterface(F, &StructTM);
1775 bool BodyChanges = containsBufferFatPointers(F, &StructTM);
1776 Changed |= MemOpsRewrite.processFunction(F);
1777 if (InterfaceChange || BodyChanges)
1778 NeedsRemap.push_back(std::make_pair(&F, InterfaceChange));
1779 }
1780 if (NeedsRemap.empty())
1781 return Changed;
1782
1783 SmallVector<Function *> NeedsPostProcess;
1784 SmallVector<Function *> Intrinsics;
1785 // Keep one big map so as to memoize constants across functions.
1786 ValueToValueMapTy CloneMap;
1787 FatPtrConstMaterializer Materializer(&StructTM, CloneMap);
1788
1789 ValueMapper LowerInFuncs(CloneMap, RF_None, &StructTM, &Materializer);
1790 for (auto [F, InterfaceChange] : NeedsRemap) {
1791 Function *NewF = F;
1792 if (InterfaceChange)
1794 F, cast<FunctionType>(StructTM.remapType(F->getFunctionType())),
1795 CloneMap);
1796 else
1797 makeCloneInPraceMap(F, CloneMap);
1798 LowerInFuncs.remapFunction(*NewF);
1799 if (NewF->isIntrinsic())
1800 Intrinsics.push_back(NewF);
1801 else
1802 NeedsPostProcess.push_back(NewF);
1803 if (InterfaceChange) {
1804 F->replaceAllUsesWith(NewF);
1805 F->eraseFromParent();
1806 }
1807 Changed = true;
1808 }
1809 StructTM.clear();
1810 IntTM.clear();
1811 CloneMap.clear();
1812
1813 SplitPtrStructs Splitter(M.getContext(), &TM);
1814 for (Function *F : NeedsPostProcess)
1815 Splitter.processFunction(*F);
1816 for (Function *F : Intrinsics) {
1817 if (isRemovablePointerIntrinsic(F->getIntrinsicID())) {
1818 F->eraseFromParent();
1819 } else {
1820 std::optional<Function *> NewF = Intrinsic::remangleIntrinsicFunction(F);
1821 if (NewF)
1822 F->replaceAllUsesWith(*NewF);
1823 }
1824 }
1825 return Changed;
1826}
1827
1828bool AMDGPULowerBufferFatPointers::runOnModule(Module &M) {
1829 TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
1830 const TargetMachine &TM = TPC.getTM<TargetMachine>();
1831 return run(M, TM);
1832}
1833
1834char AMDGPULowerBufferFatPointers::ID = 0;
1835
1836char &llvm::AMDGPULowerBufferFatPointersID = AMDGPULowerBufferFatPointers::ID;
1837
1838void AMDGPULowerBufferFatPointers::getAnalysisUsage(AnalysisUsage &AU) const {
1840}
1841
1842#define PASS_DESC "Lower buffer fat pointer operations to buffer resources"
1843INITIALIZE_PASS_BEGIN(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC,
1844 false, false)
1846INITIALIZE_PASS_END(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC, false,
1847 false)
1848#undef PASS_DESC
1849
1851 return new AMDGPULowerBufferFatPointers();
1852}
1853
1856 return AMDGPULowerBufferFatPointers().run(M, TM) ? PreservedAnalyses::none()
1858}
@ Poison
unsigned Intr
static Function * moveFunctionAdaptingType(Function *OldF, FunctionType *NewTy, ValueToValueMapTy &CloneMap)
Move the body of OldF into a new function, returning it.
static void makeCloneInPraceMap(Function *F, ValueToValueMapTy &CloneMap)
static bool isBufferFatPtrOrVector(Type *Ty)
static bool isSplitFatPtr(Type *Ty)
std::pair< Value *, Value * > PtrParts
static bool hasFatPointerInterface(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
static bool isRemovablePointerIntrinsic(Intrinsic::ID IID)
Returns true if this intrinsic needs to be removed when it is applied to ptr addrspace(7) values.
static bool containsBufferFatPointers(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
Returns true if there are values that have a buffer fat pointer in them, which means we'll need to pe...
static Value * rsrcPartRoot(Value *V)
Returns the instruction that defines the resource part of the value V.
static constexpr unsigned BufferOffsetWidth
static bool isBufferFatPtrConst(Constant *C)
static std::pair< Constant *, Constant * > splitLoweredFatBufferConst(Constant *C)
Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered buffer fat pointer const...
Rewrite undef for PHI
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Expand Atomic instructions
Atomic ordering constants.
BlockVerifier::State From
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
return RetTy
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(...)
Definition: Debug.h:106
std::string Name
uint64_t Size
AMD GCN specific subclass of TargetSubtarget.
Hexagon Common GEP
static const T * Find(StringRef S, ArrayRef< T > A)
Find KV in array using binary search.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file contains the declarations for metadata subclasses.
uint64_t IntrinsicInst * II
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:57
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
const SmallVectorImpl< MachineOperand > & Cond
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file defines generic set operations that may be used on set's of different types,...
This file defines the SmallVector class.
static SymbolRef::Type getType(const Symbol *Sym)
Definition: TapiFile.cpp:39
@ Struct
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:78
This class represents a conversion between pointers from one address space to another.
an instruction to allocate memory on the stack
Definition: Instructions.h:63
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:253
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
This class represents an incoming formal argument to a Function.
Definition: Argument.h:31
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
ArrayRef< T > slice(size_t N, size_t M) const
slice(n, m) - Chop off the first N elements of the array, and keep M elements in the array.
Definition: ArrayRef.h:198
An instruction that atomically checks whether a specified value is in a memory location,...
Definition: Instructions.h:501
AtomicOrdering getMergedOrdering() const
Returns a single ordering which is at least as strong as both the success and failure orderings for t...
Definition: Instructions.h:607
bool isVolatile() const
Return true if this is a cmpxchg from a volatile memory location.
Definition: Instructions.h:555
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:544
bool isWeak() const
Return true if this cmpxchg may spuriously fail.
Definition: Instructions.h:562
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this cmpxchg instruction.
Definition: Instructions.h:620
an instruction that atomically reads a memory location, combines it with another value,...
Definition: Instructions.h:704
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:827
bool isVolatile() const
Return true if this is a RMW on a volatile memory location.
Definition: Instructions.h:837
@ Add
*p = old + v
Definition: Instructions.h:720
@ FAdd
*p = old + v
Definition: Instructions.h:741
@ USubCond
Subtract only if no unsigned overflow.
Definition: Instructions.h:764
@ Min
*p = old <signed v ? old : v
Definition: Instructions.h:734
@ Or
*p = old | v
Definition: Instructions.h:728
@ Sub
*p = old - v
Definition: Instructions.h:722
@ And
*p = old & v
Definition: Instructions.h:724
@ Xor
*p = old ^ v
Definition: Instructions.h:730
@ USubSat
*p = usub.sat(old, v) usub.sat matches the behavior of llvm.usub.sat.
Definition: Instructions.h:768
@ FSub
*p = old - v
Definition: Instructions.h:744
@ UIncWrap
Increment one up to a maximum value.
Definition: Instructions.h:756
@ Max
*p = old >signed v ? old : v
Definition: Instructions.h:732
@ UMin
*p = old <unsigned v ? old : v
Definition: Instructions.h:738
@ FMin
*p = minnum(old, v) minnum matches the behavior of llvm.minnum.
Definition: Instructions.h:752
@ UMax
*p = old >unsigned v ? old : v
Definition: Instructions.h:736
@ FMax
*p = maxnum(old, v) maxnum matches the behavior of llvm.maxnum.
Definition: Instructions.h:748
@ UDecWrap
Decrement one until a minimum value or zero.
Definition: Instructions.h:760
@ Nand
*p = ~(old & v)
Definition: Instructions.h:726
Value * getPointerOperand()
Definition: Instructions.h:870
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this rmw instruction.
Definition: Instructions.h:861
Value * getValOperand()
Definition: Instructions.h:874
AtomicOrdering getOrdering() const
Returns the ordering constraint of this rmw instruction.
Definition: Instructions.h:847
AttributeSet getFnAttrs() const
The function attributes are returned.
static AttributeList get(LLVMContext &C, ArrayRef< std::pair< unsigned, Attribute > > Attrs)
Create an AttributeList with the specified parameters in it.
AttributeSet getRetAttrs() const
The attributes for the ret value are returned.
AttributeSet getParamAttrs(unsigned ArgNo) const
The attributes for the argument or parameter at the given index are returned.
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
Definition: AttributeMask.h:44
AttributeSet removeAttributes(LLVMContext &C, const AttributeMask &AttrsToRemove) const
Remove the specified attributes from this set.
Definition: Attributes.cpp:962
static Attribute getWithAlignment(LLVMContext &Context, Align Alignment)
Return a uniquified Attribute object that has the specific alignment set.
Definition: Attributes.cpp:234
LLVM Basic Block Representation.
Definition: BasicBlock.h:61
void removeFromParent()
Unlink 'this' from the containing function, but do not delete it.
Definition: BasicBlock.cpp:275
void insertInto(Function *Parent, BasicBlock *InsertBefore=nullptr)
Insert unlinked basic block into a function.
Definition: BasicBlock.cpp:198
This class represents a function call, abstracting a target machine's calling convention.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:673
static Constant * get(StructType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1378
static Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
Definition: Constants.cpp:1472
static Constant * get(ArrayRef< Constant * > V)
Definition: Constants.cpp:1421
This is an important base class in LLVM.
Definition: Constant.h:42
static Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
Definition: Constants.cpp:373
static std::optional< DIExpression * > createFragmentExpression(const DIExpression *Expr, unsigned OffsetInBits, unsigned SizeInBits)
Create a DIExpression to describe one part of an aggregate variable that is fragmented across multipl...
This class represents an Operation in the Expression.
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:63
A debug info location.
Definition: DebugLoc.h:33
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:156
iterator end()
Definition: DenseMap.h:84
Implements a dense probed hash-table based set.
Definition: DenseSet.h:278
This instruction extracts a single (scalar) element from a VectorType value.
This class represents a freeze function that returns random concrete value if an operand is either a ...
static Function * Create(FunctionType *Ty, LinkageTypes Linkage, unsigned AddrSpace, const Twine &N="", Module *M=nullptr)
Definition: Function.h:173
bool empty() const
Definition: Function.h:859
const BasicBlock & front() const
Definition: Function.h:860
iterator_range< arg_iterator > args()
Definition: Function.h:892
bool IsNewDbgInfoFormat
Is this function using intrinsics to record the position of debugging information,...
Definition: Function.h:116
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:353
bool isIntrinsic() const
isIntrinsic - Returns true if the function's name starts with "llvm.".
Definition: Function.h:256
void setAttributes(AttributeList Attrs)
Set the attribute list for this Function.
Definition: Function.h:356
LLVMContext & getContext() const
getContext - Return a reference to the LLVMContext associated with this function.
Definition: Function.cpp:369
void updateAfterNameChange()
Update internal caches that depend on the function name (such as the intrinsic ID and libcall cache).
Definition: Function.cpp:939
Type * getReturnType() const
Returns the type of the ret val.
Definition: Function.h:221
void copyAttributesFrom(const Function *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a Function) from the ...
Definition: Function.cpp:860
an instruction for type-safe pointer arithmetic to access elements of arrays and structs
Definition: Instructions.h:933
void copyMetadata(const GlobalObject *Src, unsigned Offset)
Copy metadata from Src, adjusting offsets by Offset.
Definition: Metadata.cpp:1799
LinkageTypes getLinkage() const
Definition: GlobalValue.h:546
void setDLLStorageClass(DLLStorageClassTypes C)
Definition: GlobalValue.h:284
unsigned getAddressSpace() const
Definition: GlobalValue.h:205
Module * getParent()
Get the module that this global value is contained inside of...
Definition: GlobalValue.h:656
DLLStorageClassTypes getDLLStorageClass() const
Definition: GlobalValue.h:275
This instruction compares its operands according to the predicate given to the constructor.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2704
This instruction inserts a single (scalar) element into a VectorType value.
Base class for instruction visitors.
Definition: InstVisitor.h:78
RetTy visitFreezeInst(FreezeInst &I)
Definition: InstVisitor.h:200
RetTy visitPtrToIntInst(PtrToIntInst &I)
Definition: InstVisitor.h:185
RetTy visitExtractElementInst(ExtractElementInst &I)
Definition: InstVisitor.h:191
RetTy visitIntrinsicInst(IntrinsicInst &I)
Definition: InstVisitor.h:222
RetTy visitShuffleVectorInst(ShuffleVectorInst &I)
Definition: InstVisitor.h:193
RetTy visitAtomicCmpXchgInst(AtomicCmpXchgInst &I)
Definition: InstVisitor.h:171
RetTy visitIntToPtrInst(IntToPtrInst &I)
Definition: InstVisitor.h:186
RetTy visitPHINode(PHINode &I)
Definition: InstVisitor.h:175
RetTy visitStoreInst(StoreInst &I)
Definition: InstVisitor.h:170
RetTy visitInsertElementInst(InsertElementInst &I)
Definition: InstVisitor.h:192
RetTy visitAtomicRMWInst(AtomicRMWInst &I)
Definition: InstVisitor.h:172
RetTy visitAddrSpaceCastInst(AddrSpaceCastInst &I)
Definition: InstVisitor.h:188
RetTy visitAllocaInst(AllocaInst &I)
Definition: InstVisitor.h:168
RetTy visitICmpInst(ICmpInst &I)
Definition: InstVisitor.h:166
RetTy visitSelectInst(SelectInst &I)
Definition: InstVisitor.h:189
RetTy visitGetElementPtrInst(GetElementPtrInst &I)
Definition: InstVisitor.h:174
void visitInstruction(Instruction &I)
Definition: InstVisitor.h:283
RetTy visitLoadInst(LoadInst &I)
Definition: InstVisitor.h:169
Instruction * clone() const
Create a copy of 'this' instruction that is identical in all ways except the following:
InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
Definition: Instruction.cpp:94
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
Definition: Instruction.h:390
const DataLayout & getDataLayout() const
Get the data layout of the module this instruction belongs to.
Definition: Instruction.cpp:76
This class represents a cast from an integer to a pointer.
static IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition: Type.cpp:311
A wrapper class for inspecting calls to intrinsic functions.
Definition: IntrinsicInst.h:48
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
An instruction for reading from memory.
Definition: Instructions.h:176
Value * getPointerOperand()
Definition: Instructions.h:255
bool isVolatile() const
Return true if this is a load from a volatile memory location.
Definition: Instructions.h:205
AtomicOrdering getOrdering() const
Returns the ordering constraint of this load instruction.
Definition: Instructions.h:220
Type * getPointerOperandType() const
Definition: Instructions.h:258
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this load instruction.
Definition: Instructions.h:230
Align getAlign() const
Return the alignment of the access that is being performed.
Definition: Instructions.h:211
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
const FunctionListType & getFunctionList() const
Get the Module's list of functions (constant).
Definition: Module.h:614
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1878
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:111
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:114
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:117
This class represents a cast from a pointer to an integer.
Value * getPointerOperand()
Gets the pointer operand.
This class represents the LLVM 'select' instruction.
A vector that has set insertion semantics.
Definition: SetVector.h:57
ArrayRef< value_type > getArrayRef() const
Definition: SetVector.h:84
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition: SetVector.h:162
This instruction constructs a fixed permutation of two input vectors.
A templated base class for SmallPtrSet which provides the typesafe interface that is common across al...
Definition: SmallPtrSet.h:363
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:384
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:519
SmallString - A SmallString is just a SmallVector with methods and accessors that make it work better...
Definition: SmallString.h:26
bool empty() const
Definition: SmallVector.h:81
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:573
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
An instruction for storing to memory.
Definition: Instructions.h:292
Class to represent struct types.
Definition: DerivedTypes.h:218
static StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition: Type.cpp:406
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:612
bool isLiteral() const
Return true if this type is uniqued by structural equivalence, false if it is a struct definition.
Definition: DerivedTypes.h:288
Type * getElementType(unsigned N) const
Definition: DerivedTypes.h:366
Primary interface to the complete machine description for the target machine.
Definition: TargetMachine.h:77
Target-Independent Code Generator Pass Configuration Options.
TMC & getTM() const
Get the right type of TargetMachine for this target.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
Type * getArrayElementType() const
Definition: Type.h:411
ArrayRef< Type * > subtypes() const
Definition: Type.h:368
unsigned getNumContainedTypes() const
Return the number of types in the derived type.
Definition: Type.h:390
unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Type * getWithNewBitWidth(unsigned NewBitWidth) const
Given an integer or vector type, change the lane bitwidth to NewBitwidth, whilst keeping the old numb...
LLVMContext & getContext() const
Return the LLVMContext in which this type was uniqued.
Definition: Type.h:128
Type * getContainedType(unsigned i) const
This method is used to implement the type iterator (defined at the end of the file).
Definition: Type.h:384
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition: Type.h:355
static UndefValue * get(Type *T)
Static factory methods - Return an 'undef' object of the specified type.
Definition: Constants.cpp:1859
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
Value * getOperand(unsigned i) const
Definition: User.h:228
This is a class that can be implemented by clients to remap types when cloning constants and instruct...
Definition: ValueMapper.h:41
virtual Type * remapType(Type *SrcTy)=0
The client should implement this method if they want to remap types while mapping values.
void clear()
Definition: ValueMap.h:145
Context for (re-)mapping values (and metadata).
Definition: ValueMapper.h:149
This is a class that can be implemented by clients to materialize Values on demand.
Definition: ValueMapper.h:54
virtual Value * materialize(Value *V)=0
This method can be implemented to generate a mapped Value on demand.
LLVM Value Representation.
Definition: Value.h:74
Type * getType() const
All values are typed, get the type of this value.
Definition: Value.h:255
void replaceAllUsesWith(Value *V)
Change all uses of this to point to a new Value.
Definition: Value.cpp:534
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
void takeName(Value *V)
Transfer the name from V to this value.
Definition: Value.cpp:383
self_iterator getIterator()
Definition: ilist_node.h:132
iterator insertAfter(iterator where, pointer New)
Definition: ilist.h:174
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ BUFFER_FAT_POINTER
Address space for 160-bit buffer fat pointers.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
constexpr char Args[]
Key for Kernel::Metadata::mArgs.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
Definition: BitmaskEnum.h:125
@ Entry
Definition: COFF.h:844
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
std::optional< Function * > remangleIntrinsicFunction(Function *F)
bool match(Val *V, const Pattern &P)
Definition: PatternMatch.h:49
is_zero m_Zero()
Match any null constant or a vector with all elements equal to 0.
Definition: PatternMatch.h:612
AssignmentMarkerRange getAssignmentMarkers(DIAssignID *ID)
Return a range of dbg.assign intrinsics which use \ID as an operand.
Definition: DebugInfo.cpp:1866
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
@ FalseVal
Definition: TGLexer.h:59
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
detail::zippy< detail::zip_shortest, T, U, Args... > zip(T &&t, U &&u, Args &&...args)
zip iterator for two or more iteratable types.
Definition: STLExtras.h:854
ModulePass * createAMDGPULowerBufferFatPointersPass()
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition: STLExtras.h:2448
void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source)
Copy the metadata from the source instruction to the destination (the replacement for the source inst...
Definition: Local.cpp:3448
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:657
void findDbgValues(SmallVectorImpl< DbgValueInst * > &DbgValues, Value *V, SmallVectorImpl< DbgVariableRecord * > *DbgVariableRecords=nullptr)
Finds the llvm.dbg.value intrinsics describing a value.
Definition: DebugInfo.cpp:155
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts, Function *RestrictToFunc=nullptr, bool RemoveDeadConstants=true, bool IncludeSelf=false)
Replace constant expressions users of the given constants with instructions.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1746
Value * emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL, User *GEP, bool NoAssumptions=false)
Given a getelementptr instruction/constantexpr, emit the code necessary to compute the offset from th...
Definition: Local.cpp:22
@ RF_None
Definition: ValueMapper.h:71
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
char & AMDGPULowerBufferFatPointersID
AtomicOrdering
Atomic ordering for LLVM's memory model.
S1Ty set_difference(const S1Ty &S1, const S2Ty &S2)
set_difference(A, B) - Return A - B
Definition: SetOperations.h:93
void initializeAMDGPULowerBufferFatPointersPass(PassRegistry &)
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39