LLVM 20.0.0git
AMDGPULowerBufferFatPointers.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerBufferFatPointers.cpp ---------------------------=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass lowers operations on buffer fat pointers (addrspace 7) to
10// operations on buffer resources (addrspace 8) and is needed for correct
11// codegen.
12//
13// # Background
14//
15// Address space 7 (the buffer fat pointer) is a 160-bit pointer that consists
16// of a 128-bit buffer descriptor and a 32-bit offset into that descriptor.
17// The buffer resource part needs to be it needs to be a "raw" buffer resource
18// (it must have a stride of 0 and bounds checks must be in raw buffer mode
19// or disabled).
20//
21// When these requirements are met, a buffer resource can be treated as a
22// typical (though quite wide) pointer that follows typical LLVM pointer
23// semantics. This allows the frontend to reason about such buffers (which are
24// often encountered in the context of SPIR-V kernels).
25//
26// However, because of their non-power-of-2 size, these fat pointers cannot be
27// present during translation to MIR (though this restriction may be lifted
28// during the transition to GlobalISel). Therefore, this pass is needed in order
29// to correctly implement these fat pointers.
30//
31// The resource intrinsics take the resource part (the address space 8 pointer)
32// and the offset part (the 32-bit integer) as separate arguments. In addition,
33// many users of these buffers manipulate the offset while leaving the resource
34// part alone. For these reasons, we want to typically separate the resource
35// and offset parts into separate variables, but combine them together when
36// encountering cases where this is required, such as by inserting these values
37// into aggretates or moving them to memory.
38//
39// Therefore, at a high level, `ptr addrspace(7) %x` becomes `ptr addrspace(8)
40// %x.rsrc` and `i32 %x.off`, which will be combined into `{ptr addrspace(8),
41// i32} %x = {%x.rsrc, %x.off}` if needed. Similarly, `vector<Nxp7>` becomes
42// `{vector<Nxp8>, vector<Nxi32 >}` and its component parts.
43//
44// # Implementation
45//
46// This pass proceeds in three main phases:
47//
48// ## Rewriting loads and stores of p7
49//
50// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`,
51// including aggregates containing such pointers, to ones that use `i160`. This
52// is handled by `StoreFatPtrsAsIntsVisitor` , which visits loads, stores, and
53// allocas and, if the loaded or stored type contains `ptr addrspace(7)`,
54// rewrites that type to one where the p7s are replaced by i160s, copying other
55// parts of aggregates as needed. In the case of a store, each pointer is
56// `ptrtoint`d to i160 before storing, and load integers are `inttoptr`d back.
57// This same transformation is applied to vectors of pointers.
58//
59// Such a transformation allows the later phases of the pass to not need
60// to handle buffer fat pointers moving to and from memory, where we load
61// have to handle the incompatibility between a `{Nxp8, Nxi32}` representation
62// and `Nxi60` directly. Instead, that transposing action (where the vectors
63// of resources and vectors of offsets are concatentated before being stored to
64// memory) are handled through implementing `inttoptr` and `ptrtoint` only.
65//
66// Atomics operations on `ptr addrspace(7)` values are not suppported, as the
67// hardware does not include a 160-bit atomic.
68//
69// ## Type remapping
70//
71// We use a `ValueMapper` to mangle uses of [vectors of] buffer fat pointers
72// to the corresponding struct type, which has a resource part and an offset
73// part.
74//
75// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
76// to, usually by way of `setType`ing values. Constants are handled here
77// because there isn't a good way to fix them up later.
78//
79// This has the downside of leaving the IR in an invalid state (for example,
80// the instruction `getelementptr {ptr addrspace(8), i32} %p, ...` will exist),
81// but all such invalid states will be resolved by the third phase.
82//
83// Functions that don't take buffer fat pointers are modified in place. Those
84// that do take such pointers have their basic blocks moved to a new function
85// with arguments that are {ptr addrspace(8), i32} arguments and return values.
86// This phase also records intrinsics so that they can be remangled or deleted
87// later.
88//
89//
90// ## Splitting pointer structs
91//
92// The meat of this pass consists of defining semantics for operations that
93// produce or consume [vectors of] buffer fat pointers in terms of their
94// resource and offset parts. This is accomplished throgh the `SplitPtrStructs`
95// visitor.
96//
97// In the first pass through each function that is being lowered, the splitter
98// inserts new instructions to implement the split-structures behavior, which is
99// needed for correctness and performance. It records a list of "split users",
100// instructions that are being replaced by operations on the resource and offset
101// parts.
102//
103// Split users do not necessarily need to produce parts themselves (
104// a `load float, ptr addrspace(7)` does not, for example), but, if they do not
105// generate fat buffer pointers, they must RAUW in their replacement
106// instructions during the initial visit.
107//
108// When these new instructions are created, they use the split parts recorded
109// for their initial arguments in order to generate their replacements, creating
110// a parallel set of instructions that does not refer to the original fat
111// pointer values but instead to their resource and offset components.
112//
113// Instructions, such as `extractvalue`, that produce buffer fat pointers from
114// sources that do not have split parts, have such parts generated using
115// `extractvalue`. This is also the initial handling of PHI nodes, which
116// are then cleaned up.
117//
118// ### Conditionals
119//
120// PHI nodes are initially given resource parts via `extractvalue`. However,
121// this is not an efficient rewrite of such nodes, as, in most cases, the
122// resource part in a conditional or loop remains constant throughout the loop
123// and only the offset varies. Failing to optimize away these constant resources
124// would cause additional registers to be sent around loops and might lead to
125// waterfall loops being generated for buffer operations due to the
126// "non-uniform" resource argument.
127//
128// Therefore, after all instructions have been visited, the pointer splitter
129// post-processes all encountered conditionals. Given a PHI node or select,
130// getPossibleRsrcRoots() collects all values that the resource parts of that
131// conditional's input could come from as well as collecting all conditional
132// instructions encountered during the search. If, after filtering out the
133// initial node itself, the set of encountered conditionals is a subset of the
134// potential roots and there is a single potential resource that isn't in the
135// conditional set, that value is the only possible value the resource argument
136// could have throughout the control flow.
137//
138// If that condition is met, then a PHI node can have its resource part changed
139// to the singleton value and then be replaced by a PHI on the offsets.
140// Otherwise, each PHI node is split into two, one for the resource part and one
141// for the offset part, which replace the temporary `extractvalue` instructions
142// that were added during the first pass.
143//
144// Similar logic applies to `select`, where
145// `%z = select i1 %cond, %cond, ptr addrspace(7) %x, ptr addrspace(7) %y`
146// can be split into `%z.rsrc = %x.rsrc` and
147// `%z.off = select i1 %cond, ptr i32 %x.off, i32 %y.off`
148// if both `%x` and `%y` have the same resource part, but two `select`
149// operations will be needed if they do not.
150//
151// ### Final processing
152//
153// After conditionals have been cleaned up, the IR for each function is
154// rewritten to remove all the old instructions that have been split up.
155//
156// Any instruction that used to produce a buffer fat pointer (and therefore now
157// produces a resource-and-offset struct after type remapping) is
158// replaced as follows:
159// 1. All debug value annotations are cloned to reflect that the resource part
160// and offset parts are computed separately and constitute different
161// fragments of the underlying source language variable.
162// 2. All uses that were themselves split are replaced by a `poison` of the
163// struct type, as they will themselves be erased soon. This rule, combined
164// with debug handling, should leave the use lists of split instructions
165// empty in almost all cases.
166// 3. If a user of the original struct-valued result remains, the structure
167// needed for the new types to work is constructed out of the newly-defined
168// parts, and the original instruction is replaced by this structure
169// before being erased. Instructions requiring this construction include
170// `ret` and `insertvalue`.
171//
172// # Consequences
173//
174// This pass does not alter the CFG.
175//
176// Alias analysis information will become coarser, as the LLVM alias analyzer
177// cannot handle the buffer intrinsics. Specifically, while we can determine
178// that the following two loads do not alias:
179// ```
180// %y = getelementptr i32, ptr addrspace(7) %x, i32 1
181// %a = load i32, ptr addrspace(7) %x
182// %b = load i32, ptr addrspace(7) %y
183// ```
184// we cannot (except through some code that runs during scheduling) determine
185// that the rewritten loads below do not alias.
186// ```
187// %y.off = add i32 %x.off, 1
188// %a = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %x.rsrc, i32
189// %x.off, ...)
190// %b = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8)
191// %x.rsrc, i32 %y.off, ...)
192// ```
193// However, existing alias information is preserved.
194//===----------------------------------------------------------------------===//
195
196#include "AMDGPU.h"
197#include "AMDGPUTargetMachine.h"
198#include "GCNSubtarget.h"
199#include "SIDefines.h"
201#include "llvm/ADT/SmallVector.h"
206#include "llvm/IR/Constants.h"
207#include "llvm/IR/DebugInfo.h"
208#include "llvm/IR/DerivedTypes.h"
209#include "llvm/IR/IRBuilder.h"
210#include "llvm/IR/InstIterator.h"
211#include "llvm/IR/InstVisitor.h"
212#include "llvm/IR/Instructions.h"
213#include "llvm/IR/Intrinsics.h"
214#include "llvm/IR/IntrinsicsAMDGPU.h"
215#include "llvm/IR/Metadata.h"
216#include "llvm/IR/Operator.h"
217#include "llvm/IR/PatternMatch.h"
220#include "llvm/Pass.h"
222#include "llvm/Support/Debug.h"
227
228#define DEBUG_TYPE "amdgpu-lower-buffer-fat-pointers"
229
230using namespace llvm;
231
232static constexpr unsigned BufferOffsetWidth = 32;
233
234namespace {
235/// Recursively replace instances of ptr addrspace(7) and vector<Nxptr
236/// addrspace(7)> with some other type as defined by the relevant subclass.
237class BufferFatPtrTypeLoweringBase : public ValueMapTypeRemapper {
239
240 Type *remapTypeImpl(Type *Ty, SmallPtrSetImpl<StructType *> &Seen);
241
242protected:
243 virtual Type *remapScalar(PointerType *PT) = 0;
244 virtual Type *remapVector(VectorType *VT) = 0;
245
246 const DataLayout &DL;
247
248public:
249 BufferFatPtrTypeLoweringBase(const DataLayout &DL) : DL(DL) {}
250 Type *remapType(Type *SrcTy) override;
251 void clear() { Map.clear(); }
252};
253
254/// Remap ptr addrspace(7) to i160 and vector<Nxptr addrspace(7)> to
255/// vector<Nxi60> in order to correctly handling loading/storing these values
256/// from memory.
257class BufferFatPtrToIntTypeMap : public BufferFatPtrTypeLoweringBase {
258 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
259
260protected:
261 Type *remapScalar(PointerType *PT) override { return DL.getIntPtrType(PT); }
262 Type *remapVector(VectorType *VT) override { return DL.getIntPtrType(VT); }
263};
264
265/// Remap ptr addrspace(7) to {ptr addrspace(8), i32} (the resource and offset
266/// parts of the pointer) so that we can easily rewrite operations on these
267/// values that aren't loading them from or storing them to memory.
268class BufferFatPtrToStructTypeMap : public BufferFatPtrTypeLoweringBase {
269 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
270
271protected:
272 Type *remapScalar(PointerType *PT) override;
273 Type *remapVector(VectorType *VT) override;
274};
275} // namespace
276
277// This code is adapted from the type remapper in lib/Linker/IRMover.cpp
278Type *BufferFatPtrTypeLoweringBase::remapTypeImpl(
280 Type **Entry = &Map[Ty];
281 if (*Entry)
282 return *Entry;
283 if (auto *PT = dyn_cast<PointerType>(Ty)) {
284 if (PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
285 return *Entry = remapScalar(PT);
286 }
287 }
288 if (auto *VT = dyn_cast<VectorType>(Ty)) {
289 auto *PT = dyn_cast<PointerType>(VT->getElementType());
290 if (PT && PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
291 return *Entry = remapVector(VT);
292 }
293 return *Entry = Ty;
294 }
295 // Whether the type is one that is structurally uniqued - that is, if it is
296 // not a named struct (the only kind of type where multiple structurally
297 // identical types that have a distinct `Type*`)
298 StructType *TyAsStruct = dyn_cast<StructType>(Ty);
299 bool IsUniqued = !TyAsStruct || TyAsStruct->isLiteral();
300 // Base case for ints, floats, opaque pointers, and so on, which don't
301 // require recursion.
302 if (Ty->getNumContainedTypes() == 0 && IsUniqued)
303 return *Entry = Ty;
304 if (!IsUniqued) {
305 // Create a dummy type for recursion purposes.
306 if (!Seen.insert(TyAsStruct).second) {
307 StructType *Placeholder = StructType::create(Ty->getContext());
308 return *Entry = Placeholder;
309 }
310 }
311 bool Changed = false;
312 SmallVector<Type *> ElementTypes(Ty->getNumContainedTypes(), nullptr);
313 for (unsigned int I = 0, E = Ty->getNumContainedTypes(); I < E; ++I) {
314 Type *OldElem = Ty->getContainedType(I);
315 Type *NewElem = remapTypeImpl(OldElem, Seen);
316 ElementTypes[I] = NewElem;
317 Changed |= (OldElem != NewElem);
318 }
319 // Recursive calls to remapTypeImpl() may have invalidated pointer.
320 Entry = &Map[Ty];
321 if (!Changed) {
322 return *Entry = Ty;
323 }
324 if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
325 return *Entry = ArrayType::get(ElementTypes[0], ArrTy->getNumElements());
326 if (auto *FnTy = dyn_cast<FunctionType>(Ty))
327 return *Entry = FunctionType::get(ElementTypes[0],
328 ArrayRef(ElementTypes).slice(1),
329 FnTy->isVarArg());
330 if (auto *STy = dyn_cast<StructType>(Ty)) {
331 // Genuine opaque types don't have a remapping.
332 if (STy->isOpaque())
333 return *Entry = Ty;
334 bool IsPacked = STy->isPacked();
335 if (IsUniqued)
336 return *Entry = StructType::get(Ty->getContext(), ElementTypes, IsPacked);
337 SmallString<16> Name(STy->getName());
338 STy->setName("");
339 Type **RecursionEntry = &Map[Ty];
340 if (*RecursionEntry) {
341 auto *Placeholder = cast<StructType>(*RecursionEntry);
342 Placeholder->setBody(ElementTypes, IsPacked);
343 Placeholder->setName(Name);
344 return *Entry = Placeholder;
345 }
346 return *Entry = StructType::create(Ty->getContext(), ElementTypes, Name,
347 IsPacked);
348 }
349 llvm_unreachable("Unknown type of type that contains elements");
350}
351
352Type *BufferFatPtrTypeLoweringBase::remapType(Type *SrcTy) {
354 return remapTypeImpl(SrcTy, Visited);
355}
356
357Type *BufferFatPtrToStructTypeMap::remapScalar(PointerType *PT) {
358 LLVMContext &Ctx = PT->getContext();
359 return StructType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
361}
362
363Type *BufferFatPtrToStructTypeMap::remapVector(VectorType *VT) {
364 ElementCount EC = VT->getElementCount();
365 LLVMContext &Ctx = VT->getContext();
366 Type *RsrcVec =
367 VectorType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE), EC);
368 Type *OffVec = VectorType::get(IntegerType::get(Ctx, BufferOffsetWidth), EC);
369 return StructType::get(RsrcVec, OffVec);
370}
371
372static bool isBufferFatPtrOrVector(Type *Ty) {
373 if (auto *PT = dyn_cast<PointerType>(Ty->getScalarType()))
374 return PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER;
375 return false;
376}
377
378// True if the type is {ptr addrspace(8), i32} or a struct containing vectors of
379// those types. Used to quickly skip instructions we don't need to process.
380static bool isSplitFatPtr(Type *Ty) {
381 auto *ST = dyn_cast<StructType>(Ty);
382 if (!ST)
383 return false;
384 if (!ST->isLiteral() || ST->getNumElements() != 2)
385 return false;
386 auto *MaybeRsrc =
387 dyn_cast<PointerType>(ST->getElementType(0)->getScalarType());
388 auto *MaybeOff =
389 dyn_cast<IntegerType>(ST->getElementType(1)->getScalarType());
390 return MaybeRsrc && MaybeOff &&
391 MaybeRsrc->getAddressSpace() == AMDGPUAS::BUFFER_RESOURCE &&
392 MaybeOff->getBitWidth() == BufferOffsetWidth;
393}
394
395// True if the result type or any argument types are buffer fat pointers.
397 Type *T = C->getType();
398 return isBufferFatPtrOrVector(T) || any_of(C->operands(), [](const Use &U) {
399 return isBufferFatPtrOrVector(U.get()->getType());
400 });
401}
402
403namespace {
404/// Convert [vectors of] buffer fat pointers to integers when they are read from
405/// or stored to memory. This ensures that these pointers will have the same
406/// memory layout as before they are lowered, even though they will no longer
407/// have their previous layout in registers/in the program (they'll be broken
408/// down into resource and offset parts). This has the downside of imposing
409/// marshalling costs when reading or storing these values, but since placing
410/// such pointers into memory is an uncommon operation at best, we feel that
411/// this cost is acceptable for better performance in the common case.
412class StoreFatPtrsAsIntsVisitor
413 : public InstVisitor<StoreFatPtrsAsIntsVisitor, bool> {
414 BufferFatPtrToIntTypeMap *TypeMap;
415
416 ValueToValueMapTy ConvertedForStore;
417
418 IRBuilder<> IRB;
419
420 // Convert all the buffer fat pointers within the input value to inttegers
421 // so that it can be stored in memory.
422 Value *fatPtrsToInts(Value *V, Type *From, Type *To, const Twine &Name);
423 // Convert all the i160s that need to be buffer fat pointers (as specified)
424 // by the To type) into those pointers to preserve the semantics of the rest
425 // of the program.
426 Value *intsToFatPtrs(Value *V, Type *From, Type *To, const Twine &Name);
427
428public:
429 StoreFatPtrsAsIntsVisitor(BufferFatPtrToIntTypeMap *TypeMap, LLVMContext &Ctx)
430 : TypeMap(TypeMap), IRB(Ctx) {}
431 bool processFunction(Function &F);
432
433 bool visitInstruction(Instruction &I) { return false; }
435 bool visitLoadInst(LoadInst &LI);
436 bool visitStoreInst(StoreInst &SI);
438};
439} // namespace
440
441Value *StoreFatPtrsAsIntsVisitor::fatPtrsToInts(Value *V, Type *From, Type *To,
442 const Twine &Name) {
443 if (From == To)
444 return V;
445 ValueToValueMapTy::iterator Find = ConvertedForStore.find(V);
446 if (Find != ConvertedForStore.end())
447 return Find->second;
449 Value *Cast = IRB.CreatePtrToInt(V, To, Name + ".int");
450 ConvertedForStore[V] = Cast;
451 return Cast;
452 }
453 if (From->getNumContainedTypes() == 0)
454 return V;
455 // Structs, arrays, and other compound types.
457 if (auto *AT = dyn_cast<ArrayType>(From)) {
458 Type *FromPart = AT->getArrayElementType();
459 Type *ToPart = cast<ArrayType>(To)->getElementType();
460 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
461 Value *Field = IRB.CreateExtractValue(V, I);
462 Value *NewField =
463 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(I));
464 Ret = IRB.CreateInsertValue(Ret, NewField, I);
465 }
466 } else {
467 for (auto [Idx, FromPart, ToPart] :
468 enumerate(From->subtypes(), To->subtypes())) {
469 Value *Field = IRB.CreateExtractValue(V, Idx);
470 Value *NewField =
471 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(Idx));
472 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
473 }
474 }
475 ConvertedForStore[V] = Ret;
476 return Ret;
477}
478
479Value *StoreFatPtrsAsIntsVisitor::intsToFatPtrs(Value *V, Type *From, Type *To,
480 const Twine &Name) {
481 if (From == To)
482 return V;
483 if (isBufferFatPtrOrVector(To)) {
484 Value *Cast = IRB.CreateIntToPtr(V, To, Name + ".ptr");
485 return Cast;
486 }
487 if (From->getNumContainedTypes() == 0)
488 return V;
489 // Structs, arrays, and other compound types.
491 if (auto *AT = dyn_cast<ArrayType>(From)) {
492 Type *FromPart = AT->getArrayElementType();
493 Type *ToPart = cast<ArrayType>(To)->getElementType();
494 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
495 Value *Field = IRB.CreateExtractValue(V, I);
496 Value *NewField =
497 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(I));
498 Ret = IRB.CreateInsertValue(Ret, NewField, I);
499 }
500 } else {
501 for (auto [Idx, FromPart, ToPart] :
502 enumerate(From->subtypes(), To->subtypes())) {
503 Value *Field = IRB.CreateExtractValue(V, Idx);
504 Value *NewField =
505 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(Idx));
506 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
507 }
508 }
509 return Ret;
510}
511
512bool StoreFatPtrsAsIntsVisitor::processFunction(Function &F) {
513 bool Changed = false;
514 // The visitors will mutate GEPs and allocas, but will push loads and stores
515 // to the worklist to avoid invalidation.
517 Changed |= visit(I);
518 }
519 ConvertedForStore.clear();
520 return Changed;
521}
522
523bool StoreFatPtrsAsIntsVisitor::visitAllocaInst(AllocaInst &I) {
524 Type *Ty = I.getAllocatedType();
525 Type *NewTy = TypeMap->remapType(Ty);
526 if (Ty == NewTy)
527 return false;
528 I.setAllocatedType(NewTy);
529 return true;
530}
531
532bool StoreFatPtrsAsIntsVisitor::visitGetElementPtrInst(GetElementPtrInst &I) {
533 Type *Ty = I.getSourceElementType();
534 Type *NewTy = TypeMap->remapType(Ty);
535 if (Ty == NewTy)
536 return false;
537 // We'll be rewriting the type `ptr addrspace(7)` out of existence soon, so
538 // make sure GEPs don't have different semantics with the new type.
539 I.setSourceElementType(NewTy);
540 I.setResultElementType(TypeMap->remapType(I.getResultElementType()));
541 return true;
542}
543
544bool StoreFatPtrsAsIntsVisitor::visitLoadInst(LoadInst &LI) {
545 Type *Ty = LI.getType();
546 Type *IntTy = TypeMap->remapType(Ty);
547 if (Ty == IntTy)
548 return false;
549
550 IRB.SetInsertPoint(&LI);
551 auto *NLI = cast<LoadInst>(LI.clone());
552 NLI->mutateType(IntTy);
553 NLI = IRB.Insert(NLI);
554 copyMetadataForLoad(*NLI, LI);
555 NLI->takeName(&LI);
556
557 Value *CastBack = intsToFatPtrs(NLI, IntTy, Ty, NLI->getName());
558 LI.replaceAllUsesWith(CastBack);
559 LI.eraseFromParent();
560 return true;
561}
562
563bool StoreFatPtrsAsIntsVisitor::visitStoreInst(StoreInst &SI) {
564 Value *V = SI.getValueOperand();
565 Type *Ty = V->getType();
566 Type *IntTy = TypeMap->remapType(Ty);
567 if (Ty == IntTy)
568 return false;
569
570 IRB.SetInsertPoint(&SI);
571 Value *IntV = fatPtrsToInts(V, Ty, IntTy, V->getName());
572 for (auto *Dbg : at::getAssignmentMarkers(&SI))
573 Dbg->setValue(IntV);
574
575 SI.setOperand(0, IntV);
576 return true;
577}
578
579/// Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered
580/// buffer fat pointer constant.
581static std::pair<Constant *, Constant *>
583 assert(isSplitFatPtr(C->getType()) && "Not a split fat buffer pointer");
584 return std::make_pair(C->getAggregateElement(0u), C->getAggregateElement(1u));
585}
586
587namespace {
588/// Handle the remapping of ptr addrspace(7) constants.
589class FatPtrConstMaterializer final : public ValueMaterializer {
590 BufferFatPtrToStructTypeMap *TypeMap;
591 // An internal mapper that is used to recurse into the arguments of constants.
592 // While the documentation for `ValueMapper` specifies not to use it
593 // recursively, examination of the logic in mapValue() shows that it can
594 // safely be used recursively when handling constants, like it does in its own
595 // logic.
596 ValueMapper InternalMapper;
597
598 Constant *materializeBufferFatPtrConst(Constant *C);
599
600public:
601 // UnderlyingMap is the value map this materializer will be filling.
602 FatPtrConstMaterializer(BufferFatPtrToStructTypeMap *TypeMap,
603 ValueToValueMapTy &UnderlyingMap)
604 : TypeMap(TypeMap),
605 InternalMapper(UnderlyingMap, RF_None, TypeMap, this) {}
606 virtual ~FatPtrConstMaterializer() = default;
607
608 Value *materialize(Value *V) override;
609};
610} // namespace
611
612Constant *FatPtrConstMaterializer::materializeBufferFatPtrConst(Constant *C) {
613 Type *SrcTy = C->getType();
614 auto *NewTy = dyn_cast<StructType>(TypeMap->remapType(SrcTy));
615 if (C->isNullValue())
616 return ConstantAggregateZero::getNullValue(NewTy);
617 if (isa<PoisonValue>(C)) {
618 return ConstantStruct::get(NewTy,
619 {PoisonValue::get(NewTy->getElementType(0)),
620 PoisonValue::get(NewTy->getElementType(1))});
621 }
622 if (isa<UndefValue>(C)) {
623 return ConstantStruct::get(NewTy,
624 {UndefValue::get(NewTy->getElementType(0)),
625 UndefValue::get(NewTy->getElementType(1))});
626 }
627
628 if (auto *VC = dyn_cast<ConstantVector>(C)) {
629 if (Constant *S = VC->getSplatValue()) {
630 Constant *NewS = InternalMapper.mapConstant(*S);
631 if (!NewS)
632 return nullptr;
633 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewS);
634 auto EC = VC->getType()->getElementCount();
635 return ConstantStruct::get(NewTy, {ConstantVector::getSplat(EC, Rsrc),
636 ConstantVector::getSplat(EC, Off)});
637 }
640 for (Value *Op : VC->operand_values()) {
641 auto *NewOp = dyn_cast_or_null<Constant>(InternalMapper.mapValue(*Op));
642 if (!NewOp)
643 return nullptr;
644 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewOp);
645 Rsrcs.push_back(Rsrc);
646 Offs.push_back(Off);
647 }
648 Constant *RsrcVec = ConstantVector::get(Rsrcs);
649 Constant *OffVec = ConstantVector::get(Offs);
650 return ConstantStruct::get(NewTy, {RsrcVec, OffVec});
651 }
652
653 if (isa<GlobalValue>(C))
654 report_fatal_error("Global values containing ptr addrspace(7) (buffer "
655 "fat pointer) values are not supported");
656
657 if (isa<ConstantExpr>(C))
658 report_fatal_error("Constant exprs containing ptr addrspace(7) (buffer "
659 "fat pointer) values should have been expanded earlier");
660
661 return nullptr;
662}
663
664Value *FatPtrConstMaterializer::materialize(Value *V) {
665 Constant *C = dyn_cast<Constant>(V);
666 if (!C)
667 return nullptr;
668 // Structs and other types that happen to contain fat pointers get remapped
669 // by the mapValue() logic.
671 return nullptr;
672 return materializeBufferFatPtrConst(C);
673}
674
675using PtrParts = std::pair<Value *, Value *>;
676namespace {
677// The visitor returns the resource and offset parts for an instruction if they
678// can be computed, or (nullptr, nullptr) for cases that don't have a meaningful
679// value mapping.
680class SplitPtrStructs : public InstVisitor<SplitPtrStructs, PtrParts> {
681 ValueToValueMapTy RsrcParts;
682 ValueToValueMapTy OffParts;
683
684 // Track instructions that have been rewritten into a user of the component
685 // parts of their ptr addrspace(7) input. Instructions that produced
686 // ptr addrspace(7) parts should **not** be RAUW'd before being added to this
687 // set, as that replacement will be handled in a post-visit step. However,
688 // instructions that yield values that aren't fat pointers (ex. ptrtoint)
689 // should RAUW themselves with new instructions that use the split parts
690 // of their arguments during processing.
691 DenseSet<Instruction *> SplitUsers;
692
693 // Nodes that need a second look once we've computed the parts for all other
694 // instructions to see if, for example, we really need to phi on the resource
695 // part.
696 SmallVector<Instruction *> Conditionals;
697 // Temporary instructions produced while lowering conditionals that should be
698 // killed.
699 SmallVector<Instruction *> ConditionalTemps;
700
701 // Subtarget info, needed for determining what cache control bits to set.
702 const TargetMachine *TM;
703 const GCNSubtarget *ST = nullptr;
704
705 IRBuilder<> IRB;
706
707 // Copy metadata between instructions if applicable.
708 void copyMetadata(Value *Dest, Value *Src);
709
710 // Get the resource and offset parts of the value V, inserting appropriate
711 // extractvalue calls if needed.
712 PtrParts getPtrParts(Value *V);
713
714 // Given an instruction that could produce multiple resource parts (a PHI or
715 // select), collect the set of possible instructions that could have provided
716 // its resource parts that it could have (the `Roots`) and the set of
717 // conditional instructions visited during the search (`Seen`). If, after
718 // removing the root of the search from `Seen` and `Roots`, `Seen` is a subset
719 // of `Roots` and `Roots - Seen` contains one element, the resource part of
720 // that element can replace the resource part of all other elements in `Seen`.
721 void getPossibleRsrcRoots(Instruction *I, SmallPtrSetImpl<Value *> &Roots,
723 void processConditionals();
724
725 // If an instruction hav been split into resource and offset parts,
726 // delete that instruction. If any of its uses have not themselves been split
727 // into parts (for example, an insertvalue), construct the structure
728 // that the type rewrites declared should be produced by the dying instruction
729 // and use that.
730 // Also, kill the temporary extractvalue operations produced by the two-stage
731 // lowering of PHIs and conditionals.
732 void killAndReplaceSplitInstructions(SmallVectorImpl<Instruction *> &Origs);
733
734 void setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx);
735 void insertPreMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
736 void insertPostMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
737 Value *handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr, Type *Ty,
738 Align Alignment, AtomicOrdering Order,
739 bool IsVolatile, SyncScope::ID SSID);
740
741public:
742 SplitPtrStructs(LLVMContext &Ctx, const TargetMachine *TM)
743 : TM(TM), IRB(Ctx) {}
744
745 void processFunction(Function &F);
746
753
759
763
766
768};
769} // namespace
770
771void SplitPtrStructs::copyMetadata(Value *Dest, Value *Src) {
772 auto *DestI = dyn_cast<Instruction>(Dest);
773 auto *SrcI = dyn_cast<Instruction>(Src);
774
775 if (!DestI || !SrcI)
776 return;
777
778 DestI->copyMetadata(*SrcI);
779}
780
781PtrParts SplitPtrStructs::getPtrParts(Value *V) {
782 assert(isSplitFatPtr(V->getType()) && "it's not meaningful to get the parts "
783 "of something that wasn't rewritten");
784 auto *RsrcEntry = &RsrcParts[V];
785 auto *OffEntry = &OffParts[V];
786 if (*RsrcEntry && *OffEntry)
787 return {*RsrcEntry, *OffEntry};
788
789 if (auto *C = dyn_cast<Constant>(V)) {
790 auto [Rsrc, Off] = splitLoweredFatBufferConst(C);
791 return {*RsrcEntry = Rsrc, *OffEntry = Off};
792 }
793
795 if (auto *I = dyn_cast<Instruction>(V)) {
796 LLVM_DEBUG(dbgs() << "Recursing to split parts of " << *I << "\n");
797 auto [Rsrc, Off] = visit(*I);
798 if (Rsrc && Off)
799 return {*RsrcEntry = Rsrc, *OffEntry = Off};
800 // We'll be creating the new values after the relevant instruction.
801 // This instruction generates a value and so isn't a terminator.
802 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
803 IRB.SetCurrentDebugLocation(I->getDebugLoc());
804 } else if (auto *A = dyn_cast<Argument>(V)) {
805 IRB.SetInsertPointPastAllocas(A->getParent());
806 IRB.SetCurrentDebugLocation(DebugLoc());
807 }
808 Value *Rsrc = IRB.CreateExtractValue(V, 0, V->getName() + ".rsrc");
809 Value *Off = IRB.CreateExtractValue(V, 1, V->getName() + ".off");
810 return {*RsrcEntry = Rsrc, *OffEntry = Off};
811}
812
813/// Returns the instruction that defines the resource part of the value V.
814/// Note that this is not getUnderlyingObject(), since that looks through
815/// operations like ptrmask which might modify the resource part.
816///
817/// We can limit ourselves to just looking through GEPs followed by looking
818/// through addrspacecasts because only those two operations preserve the
819/// resource part, and because operations on an `addrspace(8)` (which is the
820/// legal input to this addrspacecast) would produce a different resource part.
822 while (auto *GEP = dyn_cast<GEPOperator>(V))
823 V = GEP->getPointerOperand();
824 while (auto *ASC = dyn_cast<AddrSpaceCastOperator>(V))
825 V = ASC->getPointerOperand();
826 return V;
827}
828
829void SplitPtrStructs::getPossibleRsrcRoots(Instruction *I,
832 if (auto *PHI = dyn_cast<PHINode>(I)) {
833 if (!Seen.insert(I).second)
834 return;
835 for (Value *In : PHI->incoming_values()) {
836 In = rsrcPartRoot(In);
837 Roots.insert(In);
838 if (isa<PHINode, SelectInst>(In))
839 getPossibleRsrcRoots(cast<Instruction>(In), Roots, Seen);
840 }
841 } else if (auto *SI = dyn_cast<SelectInst>(I)) {
842 if (!Seen.insert(SI).second)
843 return;
844 Value *TrueVal = rsrcPartRoot(SI->getTrueValue());
845 Value *FalseVal = rsrcPartRoot(SI->getFalseValue());
846 Roots.insert(TrueVal);
847 Roots.insert(FalseVal);
848 if (isa<PHINode, SelectInst>(TrueVal))
849 getPossibleRsrcRoots(cast<Instruction>(TrueVal), Roots, Seen);
850 if (isa<PHINode, SelectInst>(FalseVal))
851 getPossibleRsrcRoots(cast<Instruction>(FalseVal), Roots, Seen);
852 } else {
853 llvm_unreachable("getPossibleRsrcParts() only works on phi and select");
854 }
855}
856
857void SplitPtrStructs::processConditionals() {
861 for (Instruction *I : Conditionals) {
862 // These have to exist by now because we've visited these nodes.
863 Value *Rsrc = RsrcParts[I];
864 Value *Off = OffParts[I];
865 assert(Rsrc && Off && "must have visited conditionals by now");
866
867 std::optional<Value *> MaybeRsrc;
868 auto MaybeFoundRsrc = FoundRsrcs.find(I);
869 if (MaybeFoundRsrc != FoundRsrcs.end()) {
870 MaybeRsrc = MaybeFoundRsrc->second;
871 } else {
873 Roots.clear();
874 Seen.clear();
875 getPossibleRsrcRoots(I, Roots, Seen);
876 LLVM_DEBUG(dbgs() << "Processing conditional: " << *I << "\n");
877#ifndef NDEBUG
878 for (Value *V : Roots)
879 LLVM_DEBUG(dbgs() << "Root: " << *V << "\n");
880 for (Value *V : Seen)
881 LLVM_DEBUG(dbgs() << "Seen: " << *V << "\n");
882#endif
883 // If we are our own possible root, then we shouldn't block our
884 // replacement with a valid incoming value.
885 Roots.erase(I);
886 // We don't want to block the optimization for conditionals that don't
887 // refer to themselves but did see themselves during the traversal.
888 Seen.erase(I);
889
890 if (set_is_subset(Seen, Roots)) {
891 auto Diff = set_difference(Roots, Seen);
892 if (Diff.size() == 1) {
893 Value *RootVal = *Diff.begin();
894 // Handle the case where previous loops already looked through
895 // an addrspacecast.
896 if (isSplitFatPtr(RootVal->getType()))
897 MaybeRsrc = std::get<0>(getPtrParts(RootVal));
898 else
899 MaybeRsrc = RootVal;
900 }
901 }
902 }
903
904 if (auto *PHI = dyn_cast<PHINode>(I)) {
905 Value *NewRsrc;
906 StructType *PHITy = cast<StructType>(PHI->getType());
907 IRB.SetInsertPoint(*PHI->getInsertionPointAfterDef());
908 IRB.SetCurrentDebugLocation(PHI->getDebugLoc());
909 if (MaybeRsrc) {
910 NewRsrc = *MaybeRsrc;
911 } else {
912 Type *RsrcTy = PHITy->getElementType(0);
913 auto *RsrcPHI = IRB.CreatePHI(RsrcTy, PHI->getNumIncomingValues());
914 RsrcPHI->takeName(Rsrc);
915 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
916 Value *VRsrc = std::get<0>(getPtrParts(V));
917 RsrcPHI->addIncoming(VRsrc, BB);
918 }
919 copyMetadata(RsrcPHI, PHI);
920 NewRsrc = RsrcPHI;
921 }
922
923 Type *OffTy = PHITy->getElementType(1);
924 auto *NewOff = IRB.CreatePHI(OffTy, PHI->getNumIncomingValues());
925 NewOff->takeName(Off);
926 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
927 assert(OffParts.count(V) && "An offset part had to be created by now");
928 Value *VOff = std::get<1>(getPtrParts(V));
929 NewOff->addIncoming(VOff, BB);
930 }
931 copyMetadata(NewOff, PHI);
932
933 // Note: We don't eraseFromParent() the temporaries because we don't want
934 // to put the corrections maps in an inconstent state. That'll be handed
935 // during the rest of the killing. Also, `ValueToValueMapTy` guarantees
936 // that references in that map will be updated as well.
937 ConditionalTemps.push_back(cast<Instruction>(Rsrc));
938 ConditionalTemps.push_back(cast<Instruction>(Off));
939 Rsrc->replaceAllUsesWith(NewRsrc);
940 Off->replaceAllUsesWith(NewOff);
941
942 // Save on recomputing the cycle traversals in known-root cases.
943 if (MaybeRsrc)
944 for (Value *V : Seen)
945 FoundRsrcs[cast<Instruction>(V)] = NewRsrc;
946 } else if (isa<SelectInst>(I)) {
947 if (MaybeRsrc) {
948 ConditionalTemps.push_back(cast<Instruction>(Rsrc));
949 Rsrc->replaceAllUsesWith(*MaybeRsrc);
950 for (Value *V : Seen)
951 FoundRsrcs[cast<Instruction>(V)] = *MaybeRsrc;
952 }
953 } else {
954 llvm_unreachable("Only PHIs and selects go in the conditionals list");
955 }
956 }
957}
958
959void SplitPtrStructs::killAndReplaceSplitInstructions(
961 for (Instruction *I : ConditionalTemps)
962 I->eraseFromParent();
963
964 for (Instruction *I : Origs) {
965 if (!SplitUsers.contains(I))
966 continue;
967
969 findDbgValues(Dbgs, I);
970 for (auto *Dbg : Dbgs) {
971 IRB.SetInsertPoint(Dbg);
972 auto &DL = I->getDataLayout();
973 assert(isSplitFatPtr(I->getType()) &&
974 "We should've RAUW'd away loads, stores, etc. at this point");
975 auto *OffDbg = cast<DbgValueInst>(Dbg->clone());
976 copyMetadata(OffDbg, Dbg);
977 auto [Rsrc, Off] = getPtrParts(I);
978
979 int64_t RsrcSz = DL.getTypeSizeInBits(Rsrc->getType());
980 int64_t OffSz = DL.getTypeSizeInBits(Off->getType());
981
982 std::optional<DIExpression *> RsrcExpr =
984 RsrcSz);
985 std::optional<DIExpression *> OffExpr =
986 DIExpression::createFragmentExpression(Dbg->getExpression(), RsrcSz,
987 OffSz);
988 if (OffExpr) {
989 OffDbg->setExpression(*OffExpr);
990 OffDbg->replaceVariableLocationOp(I, Off);
991 IRB.Insert(OffDbg);
992 } else {
993 OffDbg->deleteValue();
994 }
995 if (RsrcExpr) {
996 Dbg->setExpression(*RsrcExpr);
997 Dbg->replaceVariableLocationOp(I, Rsrc);
998 } else {
999 Dbg->replaceVariableLocationOp(I, UndefValue::get(I->getType()));
1000 }
1001 }
1002
1003 Value *Poison = PoisonValue::get(I->getType());
1004 I->replaceUsesWithIf(Poison, [&](const Use &U) -> bool {
1005 if (const auto *UI = dyn_cast<Instruction>(U.getUser()))
1006 return SplitUsers.contains(UI);
1007 return false;
1008 });
1009
1010 if (I->use_empty()) {
1011 I->eraseFromParent();
1012 continue;
1013 }
1014 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1015 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1016 auto [Rsrc, Off] = getPtrParts(I);
1017 Value *Struct = PoisonValue::get(I->getType());
1018 Struct = IRB.CreateInsertValue(Struct, Rsrc, 0);
1019 Struct = IRB.CreateInsertValue(Struct, Off, 1);
1020 copyMetadata(Struct, I);
1021 Struct->takeName(I);
1022 I->replaceAllUsesWith(Struct);
1023 I->eraseFromParent();
1024 }
1025}
1026
1027void SplitPtrStructs::setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx) {
1028 LLVMContext &Ctx = Intr->getContext();
1029 Intr->addParamAttr(RsrcArgIdx, Attribute::getWithAlignment(Ctx, A));
1030}
1031
1032void SplitPtrStructs::insertPreMemOpFence(AtomicOrdering Order,
1033 SyncScope::ID SSID) {
1034 switch (Order) {
1035 case AtomicOrdering::Release:
1036 case AtomicOrdering::AcquireRelease:
1037 case AtomicOrdering::SequentiallyConsistent:
1038 IRB.CreateFence(AtomicOrdering::Release, SSID);
1039 break;
1040 default:
1041 break;
1042 }
1043}
1044
1045void SplitPtrStructs::insertPostMemOpFence(AtomicOrdering Order,
1046 SyncScope::ID SSID) {
1047 switch (Order) {
1048 case AtomicOrdering::Acquire:
1049 case AtomicOrdering::AcquireRelease:
1050 case AtomicOrdering::SequentiallyConsistent:
1051 IRB.CreateFence(AtomicOrdering::Acquire, SSID);
1052 break;
1053 default:
1054 break;
1055 }
1056}
1057
1058Value *SplitPtrStructs::handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr,
1059 Type *Ty, Align Alignment,
1060 AtomicOrdering Order, bool IsVolatile,
1061 SyncScope::ID SSID) {
1062 IRB.SetInsertPoint(I);
1063
1064 auto [Rsrc, Off] = getPtrParts(Ptr);
1066 if (Arg)
1067 Args.push_back(Arg);
1068 Args.push_back(Rsrc);
1069 Args.push_back(Off);
1070 insertPreMemOpFence(Order, SSID);
1071 // soffset is always 0 for these cases, where we always want any offset to be
1072 // part of bounds checking and we don't know which parts of the GEPs is
1073 // uniform.
1074 Args.push_back(IRB.getInt32(0));
1075
1076 uint32_t Aux = 0;
1077 bool IsInvariant =
1078 (isa<LoadInst>(I) && I->getMetadata(LLVMContext::MD_invariant_load));
1079 bool IsNonTemporal = I->getMetadata(LLVMContext::MD_nontemporal);
1080 // Atomic loads and stores need glc, atomic read-modify-write doesn't.
1081 bool IsOneWayAtomic =
1082 !isa<AtomicRMWInst>(I) && Order != AtomicOrdering::NotAtomic;
1083 if (IsOneWayAtomic)
1084 Aux |= AMDGPU::CPol::GLC;
1085 if (IsNonTemporal && !IsInvariant)
1086 Aux |= AMDGPU::CPol::SLC;
1087 if (isa<LoadInst>(I) && ST->getGeneration() == AMDGPUSubtarget::GFX10)
1088 Aux |= (Aux & AMDGPU::CPol::GLC ? AMDGPU::CPol::DLC : 0);
1089 if (IsVolatile)
1091 Args.push_back(IRB.getInt32(Aux));
1092
1094 if (isa<LoadInst>(I))
1095 IID = Order == AtomicOrdering::NotAtomic
1096 ? Intrinsic::amdgcn_raw_ptr_buffer_load
1097 : Intrinsic::amdgcn_raw_ptr_atomic_buffer_load;
1098 else if (isa<StoreInst>(I))
1099 IID = Intrinsic::amdgcn_raw_ptr_buffer_store;
1100 else if (auto *RMW = dyn_cast<AtomicRMWInst>(I)) {
1101 switch (RMW->getOperation()) {
1103 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap;
1104 break;
1105 case AtomicRMWInst::Add:
1106 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_add;
1107 break;
1108 case AtomicRMWInst::Sub:
1109 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub;
1110 break;
1111 case AtomicRMWInst::And:
1112 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_and;
1113 break;
1114 case AtomicRMWInst::Or:
1115 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_or;
1116 break;
1117 case AtomicRMWInst::Xor:
1118 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_xor;
1119 break;
1120 case AtomicRMWInst::Max:
1121 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smax;
1122 break;
1123 case AtomicRMWInst::Min:
1124 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smin;
1125 break;
1127 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umax;
1128 break;
1130 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umin;
1131 break;
1133 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fadd;
1134 break;
1136 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmax;
1137 break;
1139 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmin;
1140 break;
1141 case AtomicRMWInst::FSub: {
1142 report_fatal_error("atomic floating point subtraction not supported for "
1143 "buffer resources and should've been expanded away");
1144 break;
1145 }
1147 report_fatal_error("atomic nand not supported for buffer resources and "
1148 "should've been expanded away");
1149 break;
1152 report_fatal_error("wrapping increment/decrement not supported for "
1153 "buffer resources and should've ben expanded away");
1154 break;
1156 llvm_unreachable("Not sure how we got a bad binop");
1159 break;
1160 }
1161 }
1162
1163 auto *Call = IRB.CreateIntrinsic(IID, Ty, Args);
1164 copyMetadata(Call, I);
1165 setAlign(Call, Alignment, Arg ? 1 : 0);
1166 Call->takeName(I);
1167
1168 insertPostMemOpFence(Order, SSID);
1169 // The "no moving p7 directly" rewrites ensure that this load or store won't
1170 // itself need to be split into parts.
1171 SplitUsers.insert(I);
1172 I->replaceAllUsesWith(Call);
1173 return Call;
1174}
1175
1176PtrParts SplitPtrStructs::visitInstruction(Instruction &I) {
1177 return {nullptr, nullptr};
1178}
1179
1180PtrParts SplitPtrStructs::visitLoadInst(LoadInst &LI) {
1182 return {nullptr, nullptr};
1183 handleMemoryInst(&LI, nullptr, LI.getPointerOperand(), LI.getType(),
1184 LI.getAlign(), LI.getOrdering(), LI.isVolatile(),
1185 LI.getSyncScopeID());
1186 return {nullptr, nullptr};
1187}
1188
1189PtrParts SplitPtrStructs::visitStoreInst(StoreInst &SI) {
1190 if (!isSplitFatPtr(SI.getPointerOperandType()))
1191 return {nullptr, nullptr};
1192 Value *Arg = SI.getValueOperand();
1193 handleMemoryInst(&SI, Arg, SI.getPointerOperand(), Arg->getType(),
1194 SI.getAlign(), SI.getOrdering(), SI.isVolatile(),
1195 SI.getSyncScopeID());
1196 return {nullptr, nullptr};
1197}
1198
1199PtrParts SplitPtrStructs::visitAtomicRMWInst(AtomicRMWInst &AI) {
1201 return {nullptr, nullptr};
1202 Value *Arg = AI.getValOperand();
1203 handleMemoryInst(&AI, Arg, AI.getPointerOperand(), Arg->getType(),
1204 AI.getAlign(), AI.getOrdering(), AI.isVolatile(),
1205 AI.getSyncScopeID());
1206 return {nullptr, nullptr};
1207}
1208
1209// Unlike load, store, and RMW, cmpxchg needs special handling to account
1210// for the boolean argument.
1211PtrParts SplitPtrStructs::visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI) {
1212 Value *Ptr = AI.getPointerOperand();
1213 if (!isSplitFatPtr(Ptr->getType()))
1214 return {nullptr, nullptr};
1215 IRB.SetInsertPoint(&AI);
1216
1217 Type *Ty = AI.getNewValOperand()->getType();
1218 AtomicOrdering Order = AI.getMergedOrdering();
1219 SyncScope::ID SSID = AI.getSyncScopeID();
1220 bool IsNonTemporal = AI.getMetadata(LLVMContext::MD_nontemporal);
1221
1222 auto [Rsrc, Off] = getPtrParts(Ptr);
1223 insertPreMemOpFence(Order, SSID);
1224
1225 uint32_t Aux = 0;
1226 if (IsNonTemporal)
1227 Aux |= AMDGPU::CPol::SLC;
1228 if (AI.isVolatile())
1230 auto *Call =
1231 IRB.CreateIntrinsic(Intrinsic::amdgcn_raw_ptr_buffer_atomic_cmpswap, Ty,
1232 {AI.getNewValOperand(), AI.getCompareOperand(), Rsrc,
1233 Off, IRB.getInt32(0), IRB.getInt32(Aux)});
1234 copyMetadata(Call, &AI);
1235 setAlign(Call, AI.getAlign(), 2);
1236 Call->takeName(&AI);
1237 insertPostMemOpFence(Order, SSID);
1238
1239 Value *Res = PoisonValue::get(AI.getType());
1240 Res = IRB.CreateInsertValue(Res, Call, 0);
1241 if (!AI.isWeak()) {
1242 Value *Succeeded = IRB.CreateICmpEQ(Call, AI.getCompareOperand());
1243 Res = IRB.CreateInsertValue(Res, Succeeded, 1);
1244 }
1245 SplitUsers.insert(&AI);
1246 AI.replaceAllUsesWith(Res);
1247 return {nullptr, nullptr};
1248}
1249
1250PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
1251 using namespace llvm::PatternMatch;
1252 Value *Ptr = GEP.getPointerOperand();
1253 if (!isSplitFatPtr(Ptr->getType()))
1254 return {nullptr, nullptr};
1255 IRB.SetInsertPoint(&GEP);
1256
1257 auto [Rsrc, Off] = getPtrParts(Ptr);
1258 const DataLayout &DL = GEP.getDataLayout();
1259 bool IsNUW = GEP.hasNoUnsignedWrap();
1260 bool IsNUSW = GEP.hasNoUnsignedSignedWrap();
1261
1262 // In order to call emitGEPOffset() and thus not have to reimplement it,
1263 // we need the GEP result to have ptr addrspace(7) type.
1264 Type *FatPtrTy = IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER);
1265 if (auto *VT = dyn_cast<VectorType>(Off->getType()))
1266 FatPtrTy = VectorType::get(FatPtrTy, VT->getElementCount());
1267 GEP.mutateType(FatPtrTy);
1268 Value *OffAccum = emitGEPOffset(&IRB, DL, &GEP);
1269 GEP.mutateType(Ptr->getType());
1270 if (match(OffAccum, m_Zero())) { // Constant-zero offset
1271 SplitUsers.insert(&GEP);
1272 return {Rsrc, Off};
1273 }
1274
1275 bool HasNonNegativeOff = false;
1276 if (auto *CI = dyn_cast<ConstantInt>(OffAccum)) {
1277 HasNonNegativeOff = !CI->isNegative();
1278 }
1279 Value *NewOff;
1280 if (match(Off, m_Zero())) {
1281 NewOff = OffAccum;
1282 } else {
1283 NewOff = IRB.CreateAdd(Off, OffAccum, "",
1284 /*hasNUW=*/IsNUW || (IsNUSW && HasNonNegativeOff),
1285 /*hasNSW=*/false);
1286 }
1287 copyMetadata(NewOff, &GEP);
1288 NewOff->takeName(&GEP);
1289 SplitUsers.insert(&GEP);
1290 return {Rsrc, NewOff};
1291}
1292
1293PtrParts SplitPtrStructs::visitPtrToIntInst(PtrToIntInst &PI) {
1294 Value *Ptr = PI.getPointerOperand();
1295 if (!isSplitFatPtr(Ptr->getType()))
1296 return {nullptr, nullptr};
1297 IRB.SetInsertPoint(&PI);
1298
1299 Type *ResTy = PI.getType();
1300 unsigned Width = ResTy->getScalarSizeInBits();
1301
1302 auto [Rsrc, Off] = getPtrParts(Ptr);
1303 const DataLayout &DL = PI.getDataLayout();
1304 unsigned FatPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
1305
1306 Value *Res;
1307 if (Width <= BufferOffsetWidth) {
1308 Res = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1309 PI.getName() + ".off");
1310 } else {
1311 Value *RsrcInt = IRB.CreatePtrToInt(Rsrc, ResTy, PI.getName() + ".rsrc");
1312 Value *Shl = IRB.CreateShl(
1313 RsrcInt,
1314 ConstantExpr::getIntegerValue(ResTy, APInt(Width, BufferOffsetWidth)),
1315 "", Width >= FatPtrWidth, Width > FatPtrWidth);
1316 Value *OffCast = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1317 PI.getName() + ".off");
1318 Res = IRB.CreateOr(Shl, OffCast);
1319 }
1320
1321 copyMetadata(Res, &PI);
1322 Res->takeName(&PI);
1323 SplitUsers.insert(&PI);
1324 PI.replaceAllUsesWith(Res);
1325 return {nullptr, nullptr};
1326}
1327
1328PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
1329 if (!isSplitFatPtr(IP.getType()))
1330 return {nullptr, nullptr};
1331 IRB.SetInsertPoint(&IP);
1332 const DataLayout &DL = IP.getDataLayout();
1333 unsigned RsrcPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_RESOURCE);
1334 Value *Int = IP.getOperand(0);
1335 Type *IntTy = Int->getType();
1336 Type *RsrcIntTy = IntTy->getWithNewBitWidth(RsrcPtrWidth);
1337 unsigned Width = IntTy->getScalarSizeInBits();
1338
1339 auto *RetTy = cast<StructType>(IP.getType());
1340 Type *RsrcTy = RetTy->getElementType(0);
1341 Type *OffTy = RetTy->getElementType(1);
1342 Value *RsrcPart = IRB.CreateLShr(
1343 Int,
1344 ConstantExpr::getIntegerValue(IntTy, APInt(Width, BufferOffsetWidth)));
1345 Value *RsrcInt = IRB.CreateIntCast(RsrcPart, RsrcIntTy, /*isSigned=*/false);
1346 Value *Rsrc = IRB.CreateIntToPtr(RsrcInt, RsrcTy, IP.getName() + ".rsrc");
1347 Value *Off =
1348 IRB.CreateIntCast(Int, OffTy, /*IsSigned=*/false, IP.getName() + ".off");
1349
1350 copyMetadata(Rsrc, &IP);
1351 SplitUsers.insert(&IP);
1352 return {Rsrc, Off};
1353}
1354
1355PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
1356 if (!isSplitFatPtr(I.getType()))
1357 return {nullptr, nullptr};
1358 IRB.SetInsertPoint(&I);
1359 Value *In = I.getPointerOperand();
1360 // No-op casts preserve parts
1361 if (In->getType() == I.getType()) {
1362 auto [Rsrc, Off] = getPtrParts(In);
1363 SplitUsers.insert(&I);
1364 return {Rsrc, Off};
1365 }
1366 if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
1367 report_fatal_error("Only buffer resources (addrspace 8) can be cast to "
1368 "buffer fat pointers (addrspace 7)");
1369 Type *OffTy = cast<StructType>(I.getType())->getElementType(1);
1370 Value *ZeroOff = Constant::getNullValue(OffTy);
1371 SplitUsers.insert(&I);
1372 return {In, ZeroOff};
1373}
1374
1375PtrParts SplitPtrStructs::visitICmpInst(ICmpInst &Cmp) {
1376 Value *Lhs = Cmp.getOperand(0);
1377 if (!isSplitFatPtr(Lhs->getType()))
1378 return {nullptr, nullptr};
1379 Value *Rhs = Cmp.getOperand(1);
1380 IRB.SetInsertPoint(&Cmp);
1381 ICmpInst::Predicate Pred = Cmp.getPredicate();
1382
1383 assert((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
1384 "Pointer comparison is only equal or unequal");
1385 auto [LhsRsrc, LhsOff] = getPtrParts(Lhs);
1386 auto [RhsRsrc, RhsOff] = getPtrParts(Rhs);
1387 Value *RsrcCmp =
1388 IRB.CreateICmp(Pred, LhsRsrc, RhsRsrc, Cmp.getName() + ".rsrc");
1389 copyMetadata(RsrcCmp, &Cmp);
1390 Value *OffCmp = IRB.CreateICmp(Pred, LhsOff, RhsOff, Cmp.getName() + ".off");
1391 copyMetadata(OffCmp, &Cmp);
1392
1393 Value *Res = nullptr;
1394 if (Pred == ICmpInst::ICMP_EQ)
1395 Res = IRB.CreateAnd(RsrcCmp, OffCmp);
1396 else if (Pred == ICmpInst::ICMP_NE)
1397 Res = IRB.CreateOr(RsrcCmp, OffCmp);
1398 copyMetadata(Res, &Cmp);
1399 Res->takeName(&Cmp);
1400 SplitUsers.insert(&Cmp);
1401 Cmp.replaceAllUsesWith(Res);
1402 return {nullptr, nullptr};
1403}
1404
1405PtrParts SplitPtrStructs::visitFreezeInst(FreezeInst &I) {
1406 if (!isSplitFatPtr(I.getType()))
1407 return {nullptr, nullptr};
1408 IRB.SetInsertPoint(&I);
1409 auto [Rsrc, Off] = getPtrParts(I.getOperand(0));
1410
1411 Value *RsrcRes = IRB.CreateFreeze(Rsrc, I.getName() + ".rsrc");
1412 copyMetadata(RsrcRes, &I);
1413 Value *OffRes = IRB.CreateFreeze(Off, I.getName() + ".off");
1414 copyMetadata(OffRes, &I);
1415 SplitUsers.insert(&I);
1416 return {RsrcRes, OffRes};
1417}
1418
1419PtrParts SplitPtrStructs::visitExtractElementInst(ExtractElementInst &I) {
1420 if (!isSplitFatPtr(I.getType()))
1421 return {nullptr, nullptr};
1422 IRB.SetInsertPoint(&I);
1423 Value *Vec = I.getVectorOperand();
1424 Value *Idx = I.getIndexOperand();
1425 auto [Rsrc, Off] = getPtrParts(Vec);
1426
1427 Value *RsrcRes = IRB.CreateExtractElement(Rsrc, Idx, I.getName() + ".rsrc");
1428 copyMetadata(RsrcRes, &I);
1429 Value *OffRes = IRB.CreateExtractElement(Off, Idx, I.getName() + ".off");
1430 copyMetadata(OffRes, &I);
1431 SplitUsers.insert(&I);
1432 return {RsrcRes, OffRes};
1433}
1434
1435PtrParts SplitPtrStructs::visitInsertElementInst(InsertElementInst &I) {
1436 // The mutated instructions temporarily don't return vectors, and so
1437 // we need the generic getType() here to avoid crashes.
1438 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
1439 return {nullptr, nullptr};
1440 IRB.SetInsertPoint(&I);
1441 Value *Vec = I.getOperand(0);
1442 Value *Elem = I.getOperand(1);
1443 Value *Idx = I.getOperand(2);
1444 auto [VecRsrc, VecOff] = getPtrParts(Vec);
1445 auto [ElemRsrc, ElemOff] = getPtrParts(Elem);
1446
1447 Value *RsrcRes =
1448 IRB.CreateInsertElement(VecRsrc, ElemRsrc, Idx, I.getName() + ".rsrc");
1449 copyMetadata(RsrcRes, &I);
1450 Value *OffRes =
1451 IRB.CreateInsertElement(VecOff, ElemOff, Idx, I.getName() + ".off");
1452 copyMetadata(OffRes, &I);
1453 SplitUsers.insert(&I);
1454 return {RsrcRes, OffRes};
1455}
1456
1457PtrParts SplitPtrStructs::visitShuffleVectorInst(ShuffleVectorInst &I) {
1458 // Cast is needed for the same reason as insertelement's.
1459 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
1460 return {nullptr, nullptr};
1461 IRB.SetInsertPoint(&I);
1462
1463 Value *V1 = I.getOperand(0);
1464 Value *V2 = I.getOperand(1);
1465 ArrayRef<int> Mask = I.getShuffleMask();
1466 auto [V1Rsrc, V1Off] = getPtrParts(V1);
1467 auto [V2Rsrc, V2Off] = getPtrParts(V2);
1468
1469 Value *RsrcRes =
1470 IRB.CreateShuffleVector(V1Rsrc, V2Rsrc, Mask, I.getName() + ".rsrc");
1471 copyMetadata(RsrcRes, &I);
1472 Value *OffRes =
1473 IRB.CreateShuffleVector(V1Off, V2Off, Mask, I.getName() + ".off");
1474 copyMetadata(OffRes, &I);
1475 SplitUsers.insert(&I);
1476 return {RsrcRes, OffRes};
1477}
1478
1479PtrParts SplitPtrStructs::visitPHINode(PHINode &PHI) {
1480 if (!isSplitFatPtr(PHI.getType()))
1481 return {nullptr, nullptr};
1482 IRB.SetInsertPoint(*PHI.getInsertionPointAfterDef());
1483 // Phi nodes will be handled in post-processing after we've visited every
1484 // instruction. However, instead of just returning {nullptr, nullptr},
1485 // we explicitly create the temporary extractvalue operations that are our
1486 // temporary results so that they end up at the beginning of the block with
1487 // the PHIs.
1488 Value *TmpRsrc = IRB.CreateExtractValue(&PHI, 0, PHI.getName() + ".rsrc");
1489 Value *TmpOff = IRB.CreateExtractValue(&PHI, 1, PHI.getName() + ".off");
1490 Conditionals.push_back(&PHI);
1491 SplitUsers.insert(&PHI);
1492 return {TmpRsrc, TmpOff};
1493}
1494
1495PtrParts SplitPtrStructs::visitSelectInst(SelectInst &SI) {
1496 if (!isSplitFatPtr(SI.getType()))
1497 return {nullptr, nullptr};
1498 IRB.SetInsertPoint(&SI);
1499
1500 Value *Cond = SI.getCondition();
1501 Value *True = SI.getTrueValue();
1502 Value *False = SI.getFalseValue();
1503 auto [TrueRsrc, TrueOff] = getPtrParts(True);
1504 auto [FalseRsrc, FalseOff] = getPtrParts(False);
1505
1506 Value *RsrcRes =
1507 IRB.CreateSelect(Cond, TrueRsrc, FalseRsrc, SI.getName() + ".rsrc", &SI);
1508 copyMetadata(RsrcRes, &SI);
1509 Conditionals.push_back(&SI);
1510 Value *OffRes =
1511 IRB.CreateSelect(Cond, TrueOff, FalseOff, SI.getName() + ".off", &SI);
1512 copyMetadata(OffRes, &SI);
1513 SplitUsers.insert(&SI);
1514 return {RsrcRes, OffRes};
1515}
1516
1517/// Returns true if this intrinsic needs to be removed when it is
1518/// applied to `ptr addrspace(7)` values. Calls to these intrinsics are
1519/// rewritten into calls to versions of that intrinsic on the resource
1520/// descriptor.
1522 switch (IID) {
1523 default:
1524 return false;
1525 case Intrinsic::ptrmask:
1526 case Intrinsic::invariant_start:
1527 case Intrinsic::invariant_end:
1528 case Intrinsic::launder_invariant_group:
1529 case Intrinsic::strip_invariant_group:
1530 return true;
1531 }
1532}
1533
1534PtrParts SplitPtrStructs::visitIntrinsicInst(IntrinsicInst &I) {
1535 Intrinsic::ID IID = I.getIntrinsicID();
1536 switch (IID) {
1537 default:
1538 break;
1539 case Intrinsic::ptrmask: {
1540 Value *Ptr = I.getArgOperand(0);
1541 if (!isSplitFatPtr(Ptr->getType()))
1542 return {nullptr, nullptr};
1543 Value *Mask = I.getArgOperand(1);
1544 IRB.SetInsertPoint(&I);
1545 auto [Rsrc, Off] = getPtrParts(Ptr);
1546 if (Mask->getType() != Off->getType())
1547 report_fatal_error("offset width is not equal to index width of fat "
1548 "pointer (data layout not set up correctly?)");
1549 Value *OffRes = IRB.CreateAnd(Off, Mask, I.getName() + ".off");
1550 copyMetadata(OffRes, &I);
1551 SplitUsers.insert(&I);
1552 return {Rsrc, OffRes};
1553 }
1554 // Pointer annotation intrinsics that, given their object-wide nature
1555 // operate on the resource part.
1556 case Intrinsic::invariant_start: {
1557 Value *Ptr = I.getArgOperand(1);
1558 if (!isSplitFatPtr(Ptr->getType()))
1559 return {nullptr, nullptr};
1560 IRB.SetInsertPoint(&I);
1561 auto [Rsrc, Off] = getPtrParts(Ptr);
1562 Type *NewTy = PointerType::get(I.getContext(), AMDGPUAS::BUFFER_RESOURCE);
1563 auto *NewRsrc = IRB.CreateIntrinsic(IID, {NewTy}, {I.getOperand(0), Rsrc});
1564 copyMetadata(NewRsrc, &I);
1565 NewRsrc->takeName(&I);
1566 SplitUsers.insert(&I);
1567 I.replaceAllUsesWith(NewRsrc);
1568 return {nullptr, nullptr};
1569 }
1570 case Intrinsic::invariant_end: {
1571 Value *RealPtr = I.getArgOperand(2);
1572 if (!isSplitFatPtr(RealPtr->getType()))
1573 return {nullptr, nullptr};
1574 IRB.SetInsertPoint(&I);
1575 Value *RealRsrc = getPtrParts(RealPtr).first;
1576 Value *InvPtr = I.getArgOperand(0);
1577 Value *Size = I.getArgOperand(1);
1578 Value *NewRsrc = IRB.CreateIntrinsic(IID, {RealRsrc->getType()},
1579 {InvPtr, Size, RealRsrc});
1580 copyMetadata(NewRsrc, &I);
1581 NewRsrc->takeName(&I);
1582 SplitUsers.insert(&I);
1583 I.replaceAllUsesWith(NewRsrc);
1584 return {nullptr, nullptr};
1585 }
1586 case Intrinsic::launder_invariant_group:
1587 case Intrinsic::strip_invariant_group: {
1588 Value *Ptr = I.getArgOperand(0);
1589 if (!isSplitFatPtr(Ptr->getType()))
1590 return {nullptr, nullptr};
1591 IRB.SetInsertPoint(&I);
1592 auto [Rsrc, Off] = getPtrParts(Ptr);
1593 Value *NewRsrc = IRB.CreateIntrinsic(IID, {Rsrc->getType()}, {Rsrc});
1594 copyMetadata(NewRsrc, &I);
1595 NewRsrc->takeName(&I);
1596 SplitUsers.insert(&I);
1597 return {NewRsrc, Off};
1598 }
1599 }
1600 return {nullptr, nullptr};
1601}
1602
1603void SplitPtrStructs::processFunction(Function &F) {
1604 ST = &TM->getSubtarget<GCNSubtarget>(F);
1606 LLVM_DEBUG(dbgs() << "Splitting pointer structs in function: " << F.getName()
1607 << "\n");
1608 for (Instruction &I : instructions(F))
1609 Originals.push_back(&I);
1610 for (Instruction *I : Originals) {
1611 auto [Rsrc, Off] = visit(I);
1612 assert(((Rsrc && Off) || (!Rsrc && !Off)) &&
1613 "Can't have a resource but no offset");
1614 if (Rsrc)
1615 RsrcParts[I] = Rsrc;
1616 if (Off)
1617 OffParts[I] = Off;
1618 }
1619 processConditionals();
1620 killAndReplaceSplitInstructions(Originals);
1621
1622 // Clean up after ourselves to save on memory.
1623 RsrcParts.clear();
1624 OffParts.clear();
1625 SplitUsers.clear();
1626 Conditionals.clear();
1627 ConditionalTemps.clear();
1628}
1629
1630namespace {
1631class AMDGPULowerBufferFatPointers : public ModulePass {
1632public:
1633 static char ID;
1634
1635 AMDGPULowerBufferFatPointers() : ModulePass(ID) {
1638 }
1639
1640 bool run(Module &M, const TargetMachine &TM);
1641 bool runOnModule(Module &M) override;
1642
1643 void getAnalysisUsage(AnalysisUsage &AU) const override;
1644};
1645} // namespace
1646
1647/// Returns true if there are values that have a buffer fat pointer in them,
1648/// which means we'll need to perform rewrites on this function. As a side
1649/// effect, this will populate the type remapping cache.
1651 BufferFatPtrToStructTypeMap *TypeMap) {
1652 bool HasFatPointers = false;
1653 for (const BasicBlock &BB : F)
1654 for (const Instruction &I : BB)
1655 HasFatPointers |= (I.getType() != TypeMap->remapType(I.getType()));
1656 return HasFatPointers;
1657}
1658
1660 BufferFatPtrToStructTypeMap *TypeMap) {
1661 Type *Ty = F.getFunctionType();
1662 return Ty != TypeMap->remapType(Ty);
1663}
1664
1665/// Move the body of `OldF` into a new function, returning it.
1667 ValueToValueMapTy &CloneMap) {
1668 bool IsIntrinsic = OldF->isIntrinsic();
1669 Function *NewF =
1670 Function::Create(NewTy, OldF->getLinkage(), OldF->getAddressSpace());
1672 NewF->copyAttributesFrom(OldF);
1673 NewF->copyMetadata(OldF, 0);
1674 NewF->takeName(OldF);
1675 NewF->updateAfterNameChange();
1677 OldF->getParent()->getFunctionList().insertAfter(OldF->getIterator(), NewF);
1678
1679 while (!OldF->empty()) {
1680 BasicBlock *BB = &OldF->front();
1681 BB->removeFromParent();
1682 BB->insertInto(NewF);
1683 CloneMap[BB] = BB;
1684 for (Instruction &I : *BB) {
1685 CloneMap[&I] = &I;
1686 }
1687 }
1688
1689 AttributeMask PtrOnlyAttrs;
1690 for (auto K :
1691 {Attribute::Dereferenceable, Attribute::DereferenceableOrNull,
1692 Attribute::NoAlias, Attribute::NoCapture, Attribute::NoFree,
1693 Attribute::NonNull, Attribute::NullPointerIsValid, Attribute::ReadNone,
1694 Attribute::ReadOnly, Attribute::WriteOnly}) {
1695 PtrOnlyAttrs.addAttribute(K);
1696 }
1698 AttributeList OldAttrs = OldF->getAttributes();
1699
1700 for (auto [I, OldArg, NewArg] : enumerate(OldF->args(), NewF->args())) {
1701 CloneMap[&NewArg] = &OldArg;
1702 NewArg.takeName(&OldArg);
1703 Type *OldArgTy = OldArg.getType(), *NewArgTy = NewArg.getType();
1704 // Temporarily mutate type of `NewArg` to allow RAUW to work.
1705 NewArg.mutateType(OldArgTy);
1706 OldArg.replaceAllUsesWith(&NewArg);
1707 NewArg.mutateType(NewArgTy);
1708
1709 AttributeSet ArgAttr = OldAttrs.getParamAttrs(I);
1710 // Intrinsics get their attributes fixed later.
1711 if (OldArgTy != NewArgTy && !IsIntrinsic)
1712 ArgAttr = ArgAttr.removeAttributes(NewF->getContext(), PtrOnlyAttrs);
1713 ArgAttrs.push_back(ArgAttr);
1714 }
1715 AttributeSet RetAttrs = OldAttrs.getRetAttrs();
1716 if (OldF->getReturnType() != NewF->getReturnType() && !IsIntrinsic)
1717 RetAttrs = RetAttrs.removeAttributes(NewF->getContext(), PtrOnlyAttrs);
1719 NewF->getContext(), OldAttrs.getFnAttrs(), RetAttrs, ArgAttrs));
1720 return NewF;
1721}
1722
1724 for (Argument &A : F->args())
1725 CloneMap[&A] = &A;
1726 for (BasicBlock &BB : *F) {
1727 CloneMap[&BB] = &BB;
1728 for (Instruction &I : BB)
1729 CloneMap[&I] = &I;
1730 }
1731}
1732
1733bool AMDGPULowerBufferFatPointers::run(Module &M, const TargetMachine &TM) {
1734 bool Changed = false;
1735 const DataLayout &DL = M.getDataLayout();
1736 // Record the functions which need to be remapped.
1737 // The second element of the pair indicates whether the function has to have
1738 // its arguments or return types adjusted.
1740
1741 BufferFatPtrToStructTypeMap StructTM(DL);
1742 BufferFatPtrToIntTypeMap IntTM(DL);
1743 for (const GlobalVariable &GV : M.globals()) {
1744 if (GV.getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER)
1745 report_fatal_error("Global variables with a buffer fat pointer address "
1746 "space (7) are not supported");
1747 Type *VT = GV.getValueType();
1748 if (VT != StructTM.remapType(VT))
1749 report_fatal_error("Global variables that contain buffer fat pointers "
1750 "(address space 7 pointers) are unsupported. Use "
1751 "buffer resource pointers (address space 8) instead.");
1752 }
1753
1754 {
1755 // Collect all constant exprs and aggregates referenced by any function.
1757 for (Function &F : M.functions())
1758 for (Instruction &I : instructions(F))
1759 for (Value *Op : I.operands())
1760 if (isa<ConstantExpr>(Op) || isa<ConstantAggregate>(Op))
1761 Worklist.push_back(cast<Constant>(Op));
1762
1763 // Recursively look for any referenced buffer pointer constants.
1765 SetVector<Constant *> BufferFatPtrConsts;
1766 while (!Worklist.empty()) {
1767 Constant *C = Worklist.pop_back_val();
1768 if (!Visited.insert(C).second)
1769 continue;
1770 if (isBufferFatPtrOrVector(C->getType()))
1771 BufferFatPtrConsts.insert(C);
1772 for (Value *Op : C->operands())
1773 if (isa<ConstantExpr>(Op) || isa<ConstantAggregate>(Op))
1774 Worklist.push_back(cast<Constant>(Op));
1775 }
1776
1777 // Expand all constant expressions using fat buffer pointers to
1778 // instructions.
1780 BufferFatPtrConsts.getArrayRef(), /*RestrictToFunc=*/nullptr,
1781 /*RemoveDeadConstants=*/false, /*IncludeSelf=*/true);
1782 }
1783
1784 StoreFatPtrsAsIntsVisitor MemOpsRewrite(&IntTM, M.getContext());
1785 for (Function &F : M.functions()) {
1786 bool InterfaceChange = hasFatPointerInterface(F, &StructTM);
1787 bool BodyChanges = containsBufferFatPointers(F, &StructTM);
1788 Changed |= MemOpsRewrite.processFunction(F);
1789 if (InterfaceChange || BodyChanges)
1790 NeedsRemap.push_back(std::make_pair(&F, InterfaceChange));
1791 }
1792 if (NeedsRemap.empty())
1793 return Changed;
1794
1795 SmallVector<Function *> NeedsPostProcess;
1796 SmallVector<Function *> Intrinsics;
1797 // Keep one big map so as to memoize constants across functions.
1798 ValueToValueMapTy CloneMap;
1799 FatPtrConstMaterializer Materializer(&StructTM, CloneMap);
1800
1801 ValueMapper LowerInFuncs(CloneMap, RF_None, &StructTM, &Materializer);
1802 for (auto [F, InterfaceChange] : NeedsRemap) {
1803 Function *NewF = F;
1804 if (InterfaceChange)
1806 F, cast<FunctionType>(StructTM.remapType(F->getFunctionType())),
1807 CloneMap);
1808 else
1809 makeCloneInPraceMap(F, CloneMap);
1810 LowerInFuncs.remapFunction(*NewF);
1811 if (NewF->isIntrinsic())
1812 Intrinsics.push_back(NewF);
1813 else
1814 NeedsPostProcess.push_back(NewF);
1815 if (InterfaceChange) {
1816 F->replaceAllUsesWith(NewF);
1817 F->eraseFromParent();
1818 }
1819 Changed = true;
1820 }
1821 StructTM.clear();
1822 IntTM.clear();
1823 CloneMap.clear();
1824
1825 SplitPtrStructs Splitter(M.getContext(), &TM);
1826 for (Function *F : NeedsPostProcess)
1827 Splitter.processFunction(*F);
1828 for (Function *F : Intrinsics) {
1829 if (isRemovablePointerIntrinsic(F->getIntrinsicID())) {
1830 F->eraseFromParent();
1831 } else {
1832 std::optional<Function *> NewF = Intrinsic::remangleIntrinsicFunction(F);
1833 if (NewF)
1834 F->replaceAllUsesWith(*NewF);
1835 }
1836 }
1837 return Changed;
1838}
1839
1840bool AMDGPULowerBufferFatPointers::runOnModule(Module &M) {
1841 TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
1842 const TargetMachine &TM = TPC.getTM<TargetMachine>();
1843 return run(M, TM);
1844}
1845
1846char AMDGPULowerBufferFatPointers::ID = 0;
1847
1848char &llvm::AMDGPULowerBufferFatPointersID = AMDGPULowerBufferFatPointers::ID;
1849
1850void AMDGPULowerBufferFatPointers::getAnalysisUsage(AnalysisUsage &AU) const {
1852}
1853
1854#define PASS_DESC "Lower buffer fat pointer operations to buffer resources"
1855INITIALIZE_PASS_BEGIN(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC,
1856 false, false)
1858INITIALIZE_PASS_END(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC, false,
1859 false)
1860#undef PASS_DESC
1861
1863 return new AMDGPULowerBufferFatPointers();
1864}
1865
1868 return AMDGPULowerBufferFatPointers().run(M, TM) ? PreservedAnalyses::none()
1870}
@ Poison
unsigned Intr
static Function * moveFunctionAdaptingType(Function *OldF, FunctionType *NewTy, ValueToValueMapTy &CloneMap)
Move the body of OldF into a new function, returning it.
static void makeCloneInPraceMap(Function *F, ValueToValueMapTy &CloneMap)
static bool isBufferFatPtrOrVector(Type *Ty)
static bool isSplitFatPtr(Type *Ty)
std::pair< Value *, Value * > PtrParts
static bool hasFatPointerInterface(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
static bool isRemovablePointerIntrinsic(Intrinsic::ID IID)
Returns true if this intrinsic needs to be removed when it is applied to ptr addrspace(7) values.
static bool containsBufferFatPointers(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
Returns true if there are values that have a buffer fat pointer in them, which means we'll need to pe...
static Value * rsrcPartRoot(Value *V)
Returns the instruction that defines the resource part of the value V.
static constexpr unsigned BufferOffsetWidth
static bool isBufferFatPtrConst(Constant *C)
static std::pair< Constant *, Constant * > splitLoweredFatBufferConst(Constant *C)
Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered buffer fat pointer const...
Rewrite undef for PHI
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Expand Atomic instructions
Atomic ordering constants.
BlockVerifier::State From
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
return RetTy
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(...)
Definition: Debug.h:106
std::string Name
uint64_t Size
AMD GCN specific subclass of TargetSubtarget.
Hexagon Common GEP
static const T * Find(StringRef S, ArrayRef< T > A)
Find KV in array using binary search.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file contains the declarations for metadata subclasses.
uint64_t IntrinsicInst * II
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:57
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
const SmallVectorImpl< MachineOperand > & Cond
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file defines generic set operations that may be used on set's of different types,...
This file defines the SmallVector class.
static SymbolRef::Type getType(const Symbol *Sym)
Definition: TapiFile.cpp:39
@ Struct
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:78
This class represents a conversion between pointers from one address space to another.
an instruction to allocate memory on the stack
Definition: Instructions.h:63
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:253
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
This class represents an incoming formal argument to a Function.
Definition: Argument.h:31
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
ArrayRef< T > slice(size_t N, size_t M) const
slice(n, m) - Chop off the first N elements of the array, and keep M elements in the array.
Definition: ArrayRef.h:198
An instruction that atomically checks whether a specified value is in a memory location,...
Definition: Instructions.h:501
AtomicOrdering getMergedOrdering() const
Returns a single ordering which is at least as strong as both the success and failure orderings for t...
Definition: Instructions.h:607
bool isVolatile() const
Return true if this is a cmpxchg from a volatile memory location.
Definition: Instructions.h:555
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:544
bool isWeak() const
Return true if this cmpxchg may spuriously fail.
Definition: Instructions.h:562
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this cmpxchg instruction.
Definition: Instructions.h:620
an instruction that atomically reads a memory location, combines it with another value,...
Definition: Instructions.h:704
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:827
bool isVolatile() const
Return true if this is a RMW on a volatile memory location.
Definition: Instructions.h:837
@ Add
*p = old + v
Definition: Instructions.h:720
@ FAdd
*p = old + v
Definition: Instructions.h:741
@ USubCond
Subtract only if no unsigned overflow.
Definition: Instructions.h:764
@ Min
*p = old <signed v ? old : v
Definition: Instructions.h:734
@ Or
*p = old | v
Definition: Instructions.h:728
@ Sub
*p = old - v
Definition: Instructions.h:722
@ And
*p = old & v
Definition: Instructions.h:724
@ Xor
*p = old ^ v
Definition: Instructions.h:730
@ USubSat
*p = usub.sat(old, v) usub.sat matches the behavior of llvm.usub.sat.
Definition: Instructions.h:768
@ FSub
*p = old - v
Definition: Instructions.h:744
@ UIncWrap
Increment one up to a maximum value.
Definition: Instructions.h:756
@ Max
*p = old >signed v ? old : v
Definition: Instructions.h:732
@ UMin
*p = old <unsigned v ? old : v
Definition: Instructions.h:738
@ FMin
*p = minnum(old, v) minnum matches the behavior of llvm.minnum.
Definition: Instructions.h:752
@ UMax
*p = old >unsigned v ? old : v
Definition: Instructions.h:736
@ FMax
*p = maxnum(old, v) maxnum matches the behavior of llvm.maxnum.
Definition: Instructions.h:748
@ UDecWrap
Decrement one until a minimum value or zero.
Definition: Instructions.h:760
@ Nand
*p = ~(old & v)
Definition: Instructions.h:726
Value * getPointerOperand()
Definition: Instructions.h:870
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this rmw instruction.
Definition: Instructions.h:861
Value * getValOperand()
Definition: Instructions.h:874
AtomicOrdering getOrdering() const
Returns the ordering constraint of this rmw instruction.
Definition: Instructions.h:847
AttributeSet getFnAttrs() const
The function attributes are returned.
static AttributeList get(LLVMContext &C, ArrayRef< std::pair< unsigned, Attribute > > Attrs)
Create an AttributeList with the specified parameters in it.
AttributeSet getRetAttrs() const
The attributes for the ret value are returned.
AttributeSet getParamAttrs(unsigned ArgNo) const
The attributes for the argument or parameter at the given index are returned.
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
Definition: AttributeMask.h:44
AttributeSet removeAttributes(LLVMContext &C, const AttributeMask &AttrsToRemove) const
Remove the specified attributes from this set.
Definition: Attributes.cpp:949
static Attribute getWithAlignment(LLVMContext &Context, Align Alignment)
Return a uniquified Attribute object that has the specific alignment set.
Definition: Attributes.cpp:234
LLVM Basic Block Representation.
Definition: BasicBlock.h:61
void removeFromParent()
Unlink 'this' from the containing function, but do not delete it.
Definition: BasicBlock.cpp:275
void insertInto(Function *Parent, BasicBlock *InsertBefore=nullptr)
Insert unlinked basic block into a function.
Definition: BasicBlock.cpp:198
This class represents a function call, abstracting a target machine's calling convention.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:673
static Constant * get(StructType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1378
static Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
Definition: Constants.cpp:1472
static Constant * get(ArrayRef< Constant * > V)
Definition: Constants.cpp:1421
This is an important base class in LLVM.
Definition: Constant.h:42
static Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
Definition: Constants.cpp:373
static std::optional< DIExpression * > createFragmentExpression(const DIExpression *Expr, unsigned OffsetInBits, unsigned SizeInBits)
Create a DIExpression to describe one part of an aggregate variable that is fragmented across multipl...
This class represents an Operation in the Expression.
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:63
A debug info location.
Definition: DebugLoc.h:33
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:156
iterator end()
Definition: DenseMap.h:84
Implements a dense probed hash-table based set.
Definition: DenseSet.h:278
This instruction extracts a single (scalar) element from a VectorType value.
This class represents a freeze function that returns random concrete value if an operand is either a ...
static Function * Create(FunctionType *Ty, LinkageTypes Linkage, unsigned AddrSpace, const Twine &N="", Module *M=nullptr)
Definition: Function.h:173
bool empty() const
Definition: Function.h:859
const BasicBlock & front() const
Definition: Function.h:860
iterator_range< arg_iterator > args()
Definition: Function.h:892
bool IsNewDbgInfoFormat
Is this function using intrinsics to record the position of debugging information,...
Definition: Function.h:116
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:353
bool isIntrinsic() const
isIntrinsic - Returns true if the function's name starts with "llvm.".
Definition: Function.h:256
void setAttributes(AttributeList Attrs)
Set the attribute list for this Function.
Definition: Function.h:356
LLVMContext & getContext() const
getContext - Return a reference to the LLVMContext associated with this function.
Definition: Function.cpp:369
void updateAfterNameChange()
Update internal caches that depend on the function name (such as the intrinsic ID and libcall cache).
Definition: Function.cpp:939
Type * getReturnType() const
Returns the type of the ret val.
Definition: Function.h:221
void copyAttributesFrom(const Function *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a Function) from the ...
Definition: Function.cpp:860
an instruction for type-safe pointer arithmetic to access elements of arrays and structs
Definition: Instructions.h:933
void copyMetadata(const GlobalObject *Src, unsigned Offset)
Copy metadata from Src, adjusting offsets by Offset.
Definition: Metadata.cpp:1799
LinkageTypes getLinkage() const
Definition: GlobalValue.h:546
void setDLLStorageClass(DLLStorageClassTypes C)
Definition: GlobalValue.h:284
unsigned getAddressSpace() const
Definition: GlobalValue.h:205
Module * getParent()
Get the module that this global value is contained inside of...
Definition: GlobalValue.h:656
DLLStorageClassTypes getDLLStorageClass() const
Definition: GlobalValue.h:275
This instruction compares its operands according to the predicate given to the constructor.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2697
This instruction inserts a single (scalar) element into a VectorType value.
Base class for instruction visitors.
Definition: InstVisitor.h:78
RetTy visitFreezeInst(FreezeInst &I)
Definition: InstVisitor.h:200
RetTy visitPtrToIntInst(PtrToIntInst &I)
Definition: InstVisitor.h:185
RetTy visitExtractElementInst(ExtractElementInst &I)
Definition: InstVisitor.h:191
RetTy visitIntrinsicInst(IntrinsicInst &I)
Definition: InstVisitor.h:222
RetTy visitShuffleVectorInst(ShuffleVectorInst &I)
Definition: InstVisitor.h:193
RetTy visitAtomicCmpXchgInst(AtomicCmpXchgInst &I)
Definition: InstVisitor.h:171
RetTy visitIntToPtrInst(IntToPtrInst &I)
Definition: InstVisitor.h:186
RetTy visitPHINode(PHINode &I)
Definition: InstVisitor.h:175
RetTy visitStoreInst(StoreInst &I)
Definition: InstVisitor.h:170
RetTy visitInsertElementInst(InsertElementInst &I)
Definition: InstVisitor.h:192
RetTy visitAtomicRMWInst(AtomicRMWInst &I)
Definition: InstVisitor.h:172
RetTy visitAddrSpaceCastInst(AddrSpaceCastInst &I)
Definition: InstVisitor.h:188
RetTy visitAllocaInst(AllocaInst &I)
Definition: InstVisitor.h:168
RetTy visitICmpInst(ICmpInst &I)
Definition: InstVisitor.h:166
RetTy visitSelectInst(SelectInst &I)
Definition: InstVisitor.h:189
RetTy visitGetElementPtrInst(GetElementPtrInst &I)
Definition: InstVisitor.h:174
void visitInstruction(Instruction &I)
Definition: InstVisitor.h:283
RetTy visitLoadInst(LoadInst &I)
Definition: InstVisitor.h:169
Instruction * clone() const
Create a copy of 'this' instruction that is identical in all ways except the following:
InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
Definition: Instruction.cpp:94
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
Definition: Instruction.h:390
const DataLayout & getDataLayout() const
Get the data layout of the module this instruction belongs to.
Definition: Instruction.cpp:76
This class represents a cast from an integer to a pointer.
static IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition: Type.cpp:311
A wrapper class for inspecting calls to intrinsic functions.
Definition: IntrinsicInst.h:48
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
An instruction for reading from memory.
Definition: Instructions.h:176
Value * getPointerOperand()
Definition: Instructions.h:255
bool isVolatile() const
Return true if this is a load from a volatile memory location.
Definition: Instructions.h:205
AtomicOrdering getOrdering() const
Returns the ordering constraint of this load instruction.
Definition: Instructions.h:220
Type * getPointerOperandType() const
Definition: Instructions.h:258
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this load instruction.
Definition: Instructions.h:230
Align getAlign() const
Return the alignment of the access that is being performed.
Definition: Instructions.h:211
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
const FunctionListType & getFunctionList() const
Get the Module's list of functions (constant).
Definition: Module.h:614
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1878
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:111
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:114
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:117
This class represents a cast from a pointer to an integer.
Value * getPointerOperand()
Gets the pointer operand.
This class represents the LLVM 'select' instruction.
A vector that has set insertion semantics.
Definition: SetVector.h:57
ArrayRef< value_type > getArrayRef() const
Definition: SetVector.h:84
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition: SetVector.h:162
This instruction constructs a fixed permutation of two input vectors.
A templated base class for SmallPtrSet which provides the typesafe interface that is common across al...
Definition: SmallPtrSet.h:363
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:384
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:519
SmallString - A SmallString is just a SmallVector with methods and accessors that make it work better...
Definition: SmallString.h:26
bool empty() const
Definition: SmallVector.h:81
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:573
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
An instruction for storing to memory.
Definition: Instructions.h:292
Class to represent struct types.
Definition: DerivedTypes.h:218
static StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition: Type.cpp:406
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:612
bool isLiteral() const
Return true if this type is uniqued by structural equivalence, false if it is a struct definition.
Definition: DerivedTypes.h:288
Type * getElementType(unsigned N) const
Definition: DerivedTypes.h:366
Primary interface to the complete machine description for the target machine.
Definition: TargetMachine.h:77
Target-Independent Code Generator Pass Configuration Options.
TMC & getTM() const
Get the right type of TargetMachine for this target.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
Type * getArrayElementType() const
Definition: Type.h:411
ArrayRef< Type * > subtypes() const
Definition: Type.h:368
unsigned getNumContainedTypes() const
Return the number of types in the derived type.
Definition: Type.h:390
unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Type * getWithNewBitWidth(unsigned NewBitWidth) const
Given an integer or vector type, change the lane bitwidth to NewBitwidth, whilst keeping the old numb...
LLVMContext & getContext() const
Return the LLVMContext in which this type was uniqued.
Definition: Type.h:128
Type * getContainedType(unsigned i) const
This method is used to implement the type iterator (defined at the end of the file).
Definition: Type.h:384
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition: Type.h:355
static UndefValue * get(Type *T)
Static factory methods - Return an 'undef' object of the specified type.
Definition: Constants.cpp:1859
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
Value * getOperand(unsigned i) const
Definition: User.h:228
This is a class that can be implemented by clients to remap types when cloning constants and instruct...
Definition: ValueMapper.h:41
virtual Type * remapType(Type *SrcTy)=0
The client should implement this method if they want to remap types while mapping values.
void clear()
Definition: ValueMap.h:145
Context for (re-)mapping values (and metadata).
Definition: ValueMapper.h:149
This is a class that can be implemented by clients to materialize Values on demand.
Definition: ValueMapper.h:54
virtual Value * materialize(Value *V)=0
This method can be implemented to generate a mapped Value on demand.
LLVM Value Representation.
Definition: Value.h:74
Type * getType() const
All values are typed, get the type of this value.
Definition: Value.h:255
void replaceAllUsesWith(Value *V)
Change all uses of this to point to a new Value.
Definition: Value.cpp:534
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
void takeName(Value *V)
Transfer the name from V to this value.
Definition: Value.cpp:383
self_iterator getIterator()
Definition: ilist_node.h:132
iterator insertAfter(iterator where, pointer New)
Definition: ilist.h:174
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ BUFFER_FAT_POINTER
Address space for 160-bit buffer fat pointers.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
constexpr char Args[]
Key for Kernel::Metadata::mArgs.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
Definition: BitmaskEnum.h:125
@ Entry
Definition: COFF.h:844
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
std::optional< Function * > remangleIntrinsicFunction(Function *F)
bool match(Val *V, const Pattern &P)
Definition: PatternMatch.h:49
is_zero m_Zero()
Match any null constant or a vector with all elements equal to 0.
Definition: PatternMatch.h:612
AssignmentMarkerRange getAssignmentMarkers(DIAssignID *ID)
Return a range of dbg.assign intrinsics which use \ID as an operand.
Definition: DebugInfo.cpp:1866
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
@ FalseVal
Definition: TGLexer.h:59
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
detail::zippy< detail::zip_shortest, T, U, Args... > zip(T &&t, U &&u, Args &&...args)
zip iterator for two or more iteratable types.
Definition: STLExtras.h:854
ModulePass * createAMDGPULowerBufferFatPointersPass()
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition: STLExtras.h:2448
void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source)
Copy the metadata from the source instruction to the destination (the replacement for the source inst...
Definition: Local.cpp:3449
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:657
void findDbgValues(SmallVectorImpl< DbgValueInst * > &DbgValues, Value *V, SmallVectorImpl< DbgVariableRecord * > *DbgVariableRecords=nullptr)
Finds the llvm.dbg.value intrinsics describing a value.
Definition: DebugInfo.cpp:155
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts, Function *RestrictToFunc=nullptr, bool RemoveDeadConstants=true, bool IncludeSelf=false)
Replace constant expressions users of the given constants with instructions.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1746
Value * emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL, User *GEP, bool NoAssumptions=false)
Given a getelementptr instruction/constantexpr, emit the code necessary to compute the offset from th...
Definition: Local.cpp:22
@ RF_None
Definition: ValueMapper.h:71
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
char & AMDGPULowerBufferFatPointersID
AtomicOrdering
Atomic ordering for LLVM's memory model.
S1Ty set_difference(const S1Ty &S1, const S2Ty &S2)
set_difference(A, B) - Return A - B
Definition: SetOperations.h:93
void initializeAMDGPULowerBufferFatPointersPass(PassRegistry &)
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39