LLVM 23.0.0git
AMDGPULowerBufferFatPointers.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerBufferFatPointers.cpp ---------------------------=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass lowers operations on buffer fat pointers (addrspace 7) to
10// operations on buffer resources (addrspace 8) and is needed for correct
11// codegen.
12//
13// # Background
14//
15// Address space 7 (the buffer fat pointer) is a 160-bit pointer that consists
16// of a 128-bit buffer descriptor and a 32-bit offset into that descriptor.
17// The buffer resource part needs to be it needs to be a "raw" buffer resource
18// (it must have a stride of 0 and bounds checks must be in raw buffer mode
19// or disabled).
20//
21// When these requirements are met, a buffer resource can be treated as a
22// typical (though quite wide) pointer that follows typical LLVM pointer
23// semantics. This allows the frontend to reason about such buffers (which are
24// often encountered in the context of SPIR-V kernels).
25//
26// However, because of their non-power-of-2 size, these fat pointers cannot be
27// present during translation to MIR (though this restriction may be lifted
28// during the transition to GlobalISel). Therefore, this pass is needed in order
29// to correctly implement these fat pointers.
30//
31// The resource intrinsics take the resource part (the address space 8 pointer)
32// and the offset part (the 32-bit integer) as separate arguments. In addition,
33// many users of these buffers manipulate the offset while leaving the resource
34// part alone. For these reasons, we want to typically separate the resource
35// and offset parts into separate variables, but combine them together when
36// encountering cases where this is required, such as by inserting these values
37// into aggretates or moving them to memory.
38//
39// Therefore, at a high level, `ptr addrspace(7) %x` becomes `ptr addrspace(8)
40// %x.rsrc` and `i32 %x.off`, which will be combined into `{ptr addrspace(8),
41// i32} %x = {%x.rsrc, %x.off}` if needed. Similarly, `vector<Nxp7>` becomes
42// `{vector<Nxp8>, vector<Nxi32 >}` and its component parts.
43//
44// # Implementation
45//
46// This pass proceeds in three main phases:
47//
48// ## Rewriting loads and stores of p7 and memcpy()-like handling
49//
50// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`,
51// including aggregates containing such pointers, to ones that use `i160`. This
52// is handled by `StoreFatPtrsAsIntsAndExpandMemcpyVisitor` , which visits
53// loads, stores, and allocas and, if the loaded or stored type contains `ptr
54// addrspace(7)`, rewrites that type to one where the p7s are replaced by i160s,
55// copying other parts of aggregates as needed. In the case of a store, each
56// pointer is `ptrtoint`d to i160 before storing, and load integers are
57// `inttoptr`d back. This same transformation is applied to vectors of pointers.
58//
59// Such a transformation allows the later phases of the pass to not need
60// to handle buffer fat pointers moving to and from memory, where we load
61// have to handle the incompatibility between a `{Nxp8, Nxi32}` representation
62// and `Nxi60` directly. Instead, that transposing action (where the vectors
63// of resources and vectors of offsets are concatentated before being stored to
64// memory) are handled through implementing `inttoptr` and `ptrtoint` only.
65//
66// Atomics operations on `ptr addrspace(7)` values are not suppported, as the
67// hardware does not include a 160-bit atomic.
68//
69// In order to save on O(N) work and to ensure that the contents type
70// legalizer correctly splits up wide loads, also unconditionally lower
71// memcpy-like intrinsics into loops here.
72//
73// ## Buffer contents type legalization
74//
75// The underlying buffer intrinsics only support types up to 128 bits long,
76// and don't support complex types. If buffer operations were
77// standard pointer operations that could be represented as MIR-level loads,
78// this would be handled by the various legalization schemes in instruction
79// selection. However, because we have to do the conversion from `load` and
80// `store` to intrinsics at LLVM IR level, we must perform that legalization
81// ourselves.
82//
83// This involves a combination of
84// - Converting arrays to vectors where possible
85// - Otherwise, splitting loads and stores of aggregates into loads/stores of
86// each component.
87// - Zero-extending things to fill a whole number of bytes
88// - Casting values of types that don't neatly correspond to supported machine
89// value
90// (for example, an i96 or i256) into ones that would work (
91// like <3 x i32> and <8 x i32>, respectively)
92// - Splitting values that are too long (such as aforementioned <8 x i32>) into
93// multiple operations.
94//
95// ## Type remapping
96//
97// We use a `ValueMapper` to mangle uses of [vectors of] buffer fat pointers
98// to the corresponding struct type, which has a resource part and an offset
99// part.
100//
101// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
102// to, usually by way of `setType`ing values. Constants are handled here
103// because there isn't a good way to fix them up later.
104//
105// This has the downside of leaving the IR in an invalid state (for example,
106// the instruction `getelementptr {ptr addrspace(8), i32} %p, ...` will exist),
107// but all such invalid states will be resolved by the third phase.
108//
109// Functions that don't take buffer fat pointers are modified in place. Those
110// that do take such pointers have their basic blocks moved to a new function
111// with arguments that are {ptr addrspace(8), i32} arguments and return values.
112// This phase also records intrinsics so that they can be remangled or deleted
113// later.
114//
115// ## Splitting pointer structs
116//
117// The meat of this pass consists of defining semantics for operations that
118// produce or consume [vectors of] buffer fat pointers in terms of their
119// resource and offset parts. This is accomplished throgh the `SplitPtrStructs`
120// visitor.
121//
122// In the first pass through each function that is being lowered, the splitter
123// inserts new instructions to implement the split-structures behavior, which is
124// needed for correctness and performance. It records a list of "split users",
125// instructions that are being replaced by operations on the resource and offset
126// parts.
127//
128// Split users do not necessarily need to produce parts themselves (
129// a `load float, ptr addrspace(7)` does not, for example), but, if they do not
130// generate fat buffer pointers, they must RAUW in their replacement
131// instructions during the initial visit.
132//
133// When these new instructions are created, they use the split parts recorded
134// for their initial arguments in order to generate their replacements, creating
135// a parallel set of instructions that does not refer to the original fat
136// pointer values but instead to their resource and offset components.
137//
138// Instructions, such as `extractvalue`, that produce buffer fat pointers from
139// sources that do not have split parts, have such parts generated using
140// `extractvalue`. This is also the initial handling of PHI nodes, which
141// are then cleaned up.
142//
143// ### Conditionals
144//
145// PHI nodes are initially given resource parts via `extractvalue`. However,
146// this is not an efficient rewrite of such nodes, as, in most cases, the
147// resource part in a conditional or loop remains constant throughout the loop
148// and only the offset varies. Failing to optimize away these constant resources
149// would cause additional registers to be sent around loops and might lead to
150// waterfall loops being generated for buffer operations due to the
151// "non-uniform" resource argument.
152//
153// Therefore, after all instructions have been visited, the pointer splitter
154// post-processes all encountered conditionals. Given a PHI node or select,
155// getPossibleRsrcRoots() collects all values that the resource parts of that
156// conditional's input could come from as well as collecting all conditional
157// instructions encountered during the search. If, after filtering out the
158// initial node itself, the set of encountered conditionals is a subset of the
159// potential roots and there is a single potential resource that isn't in the
160// conditional set, that value is the only possible value the resource argument
161// could have throughout the control flow.
162//
163// If that condition is met, then a PHI node can have its resource part changed
164// to the singleton value and then be replaced by a PHI on the offsets.
165// Otherwise, each PHI node is split into two, one for the resource part and one
166// for the offset part, which replace the temporary `extractvalue` instructions
167// that were added during the first pass.
168//
169// Similar logic applies to `select`, where
170// `%z = select i1 %cond, %cond, ptr addrspace(7) %x, ptr addrspace(7) %y`
171// can be split into `%z.rsrc = %x.rsrc` and
172// `%z.off = select i1 %cond, ptr i32 %x.off, i32 %y.off`
173// if both `%x` and `%y` have the same resource part, but two `select`
174// operations will be needed if they do not.
175//
176// ### Final processing
177//
178// After conditionals have been cleaned up, the IR for each function is
179// rewritten to remove all the old instructions that have been split up.
180//
181// Any instruction that used to produce a buffer fat pointer (and therefore now
182// produces a resource-and-offset struct after type remapping) is
183// replaced as follows:
184// 1. All debug value annotations are cloned to reflect that the resource part
185// and offset parts are computed separately and constitute different
186// fragments of the underlying source language variable.
187// 2. All uses that were themselves split are replaced by a `poison` of the
188// struct type, as they will themselves be erased soon. This rule, combined
189// with debug handling, should leave the use lists of split instructions
190// empty in almost all cases.
191// 3. If a user of the original struct-valued result remains, the structure
192// needed for the new types to work is constructed out of the newly-defined
193// parts, and the original instruction is replaced by this structure
194// before being erased. Instructions requiring this construction include
195// `ret` and `insertvalue`.
196//
197// # Consequences
198//
199// This pass does not alter the CFG.
200//
201// Alias analysis information will become coarser, as the LLVM alias analyzer
202// cannot handle the buffer intrinsics. Specifically, while we can determine
203// that the following two loads do not alias:
204// ```
205// %y = getelementptr i32, ptr addrspace(7) %x, i32 1
206// %a = load i32, ptr addrspace(7) %x
207// %b = load i32, ptr addrspace(7) %y
208// ```
209// we cannot (except through some code that runs during scheduling) determine
210// that the rewritten loads below do not alias.
211// ```
212// %y.off = add i32 %x.off, 1
213// %a = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %x.rsrc, i32
214// %x.off, ...)
215// %b = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8)
216// %x.rsrc, i32 %y.off, ...)
217// ```
218// However, existing alias information is preserved.
219//===----------------------------------------------------------------------===//
220
221#include "AMDGPU.h"
222#include "AMDGPUTargetMachine.h"
223#include "GCNSubtarget.h"
224#include "SIDefines.h"
226#include "llvm/ADT/SmallVector.h"
232#include "llvm/IR/Constants.h"
233#include "llvm/IR/DebugInfo.h"
234#include "llvm/IR/DerivedTypes.h"
235#include "llvm/IR/IRBuilder.h"
236#include "llvm/IR/InstIterator.h"
237#include "llvm/IR/InstVisitor.h"
238#include "llvm/IR/Instructions.h"
240#include "llvm/IR/Intrinsics.h"
241#include "llvm/IR/IntrinsicsAMDGPU.h"
242#include "llvm/IR/Metadata.h"
243#include "llvm/IR/Operator.h"
244#include "llvm/IR/PatternMatch.h"
246#include "llvm/IR/ValueHandle.h"
248#include "llvm/Pass.h"
252#include "llvm/Support/Debug.h"
258
259#define DEBUG_TYPE "amdgpu-lower-buffer-fat-pointers"
260
261using namespace llvm;
262
263static constexpr unsigned BufferOffsetWidth = 32;
264
265namespace {
266/// Recursively replace instances of ptr addrspace(7) and vector<Nxptr
267/// addrspace(7)> with some other type as defined by the relevant subclass.
268class BufferFatPtrTypeLoweringBase : public ValueMapTypeRemapper {
270
271 Type *remapTypeImpl(Type *Ty);
272
273protected:
274 virtual Type *remapScalar(PointerType *PT) = 0;
275 virtual Type *remapVector(VectorType *VT) = 0;
276
277 const DataLayout &DL;
278
279public:
280 BufferFatPtrTypeLoweringBase(const DataLayout &DL) : DL(DL) {}
281 Type *remapType(Type *SrcTy) override;
282 void clear() { Map.clear(); }
283};
284
285/// Remap ptr addrspace(7) to i160 and vector<Nxptr addrspace(7)> to
286/// vector<Nxi60> in order to correctly handling loading/storing these values
287/// from memory.
288class BufferFatPtrToIntTypeMap : public BufferFatPtrTypeLoweringBase {
289 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
290
291protected:
292 Type *remapScalar(PointerType *PT) override { return DL.getIntPtrType(PT); }
293 Type *remapVector(VectorType *VT) override { return DL.getIntPtrType(VT); }
294};
295
296/// Remap ptr addrspace(7) to {ptr addrspace(8), i32} (the resource and offset
297/// parts of the pointer) so that we can easily rewrite operations on these
298/// values that aren't loading them from or storing them to memory.
299class BufferFatPtrToStructTypeMap : public BufferFatPtrTypeLoweringBase {
300 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
301
302protected:
303 Type *remapScalar(PointerType *PT) override;
304 Type *remapVector(VectorType *VT) override;
305};
306} // namespace
307
308// This code is adapted from the type remapper in lib/Linker/IRMover.cpp
309Type *BufferFatPtrTypeLoweringBase::remapTypeImpl(Type *Ty) {
310 Type **Entry = &Map[Ty];
311 if (*Entry)
312 return *Entry;
313 if (auto *PT = dyn_cast<PointerType>(Ty)) {
314 if (PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
315 return *Entry = remapScalar(PT);
316 }
317 }
318 if (auto *VT = dyn_cast<VectorType>(Ty)) {
319 auto *PT = dyn_cast<PointerType>(VT->getElementType());
320 if (PT && PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
321 return *Entry = remapVector(VT);
322 }
323 return *Entry = Ty;
324 }
325 // Whether the type is one that is structurally uniqued - that is, if it is
326 // not a named struct (the only kind of type where multiple structurally
327 // identical types that have a distinct `Type*`)
328 StructType *TyAsStruct = dyn_cast<StructType>(Ty);
329 bool IsUniqued = !TyAsStruct || TyAsStruct->isLiteral();
330 // Base case for ints, floats, opaque pointers, and so on, which don't
331 // require recursion.
332 if (Ty->getNumContainedTypes() == 0 && IsUniqued)
333 return *Entry = Ty;
334 bool Changed = false;
335 SmallVector<Type *> ElementTypes(Ty->getNumContainedTypes(), nullptr);
336 for (unsigned int I = 0, E = Ty->getNumContainedTypes(); I < E; ++I) {
337 Type *OldElem = Ty->getContainedType(I);
338 Type *NewElem = remapTypeImpl(OldElem);
339 ElementTypes[I] = NewElem;
340 Changed |= (OldElem != NewElem);
341 }
342 // Recursive calls to remapTypeImpl() may have invalidated pointer.
343 Entry = &Map[Ty];
344 if (!Changed) {
345 return *Entry = Ty;
346 }
347 if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
348 return *Entry = ArrayType::get(ElementTypes[0], ArrTy->getNumElements());
349 if (auto *FnTy = dyn_cast<FunctionType>(Ty))
350 return *Entry = FunctionType::get(ElementTypes[0],
351 ArrayRef(ElementTypes).slice(1),
352 FnTy->isVarArg());
353 if (auto *STy = dyn_cast<StructType>(Ty)) {
354 // Genuine opaque types don't have a remapping.
355 if (STy->isOpaque())
356 return *Entry = Ty;
357 bool IsPacked = STy->isPacked();
358 if (IsUniqued)
359 return *Entry = StructType::get(Ty->getContext(), ElementTypes, IsPacked);
360 SmallString<16> Name(STy->getName());
361 STy->setName("");
362 return *Entry = StructType::create(Ty->getContext(), ElementTypes, Name,
363 IsPacked);
364 }
365 llvm_unreachable("Unknown type of type that contains elements");
366}
367
368Type *BufferFatPtrTypeLoweringBase::remapType(Type *SrcTy) {
369 return remapTypeImpl(SrcTy);
370}
371
372Type *BufferFatPtrToStructTypeMap::remapScalar(PointerType *PT) {
373 LLVMContext &Ctx = PT->getContext();
374 return StructType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
376}
377
378Type *BufferFatPtrToStructTypeMap::remapVector(VectorType *VT) {
379 ElementCount EC = VT->getElementCount();
380 LLVMContext &Ctx = VT->getContext();
381 Type *RsrcVec =
382 VectorType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE), EC);
383 Type *OffVec = VectorType::get(IntegerType::get(Ctx, BufferOffsetWidth), EC);
384 return StructType::get(RsrcVec, OffVec);
385}
386
387static bool isBufferFatPtrOrVector(Type *Ty) {
388 if (auto *PT = dyn_cast<PointerType>(Ty->getScalarType()))
389 return PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER;
390 return false;
391}
392
393// True if the type is {ptr addrspace(8), i32} or a struct containing vectors of
394// those types. Used to quickly skip instructions we don't need to process.
395static bool isSplitFatPtr(Type *Ty) {
396 auto *ST = dyn_cast<StructType>(Ty);
397 if (!ST)
398 return false;
399 if (!ST->isLiteral() || ST->getNumElements() != 2)
400 return false;
401 auto *MaybeRsrc =
402 dyn_cast<PointerType>(ST->getElementType(0)->getScalarType());
403 auto *MaybeOff =
404 dyn_cast<IntegerType>(ST->getElementType(1)->getScalarType());
405 return MaybeRsrc && MaybeOff &&
406 MaybeRsrc->getAddressSpace() == AMDGPUAS::BUFFER_RESOURCE &&
407 MaybeOff->getBitWidth() == BufferOffsetWidth;
408}
409
410// True if the result type or any argument types are buffer fat pointers.
412 Type *T = C->getType();
413 return isBufferFatPtrOrVector(T) || any_of(C->operands(), [](const Use &U) {
414 return isBufferFatPtrOrVector(U.get()->getType());
415 });
416}
417
418namespace {
419/// Convert [vectors of] buffer fat pointers to integers when they are read from
420/// or stored to memory. This ensures that these pointers will have the same
421/// memory layout as before they are lowered, even though they will no longer
422/// have their previous layout in registers/in the program (they'll be broken
423/// down into resource and offset parts). This has the downside of imposing
424/// marshalling costs when reading or storing these values, but since placing
425/// such pointers into memory is an uncommon operation at best, we feel that
426/// this cost is acceptable for better performance in the common case.
427class StoreFatPtrsAsIntsAndExpandMemcpyVisitor
428 : public InstVisitor<StoreFatPtrsAsIntsAndExpandMemcpyVisitor, bool> {
429 BufferFatPtrToIntTypeMap *TypeMap;
430
431 ValueToValueMapTy ConvertedForStore;
432
434
435 const TargetMachine *TM;
436
437 // Convert all the buffer fat pointers within the input value to inttegers
438 // so that it can be stored in memory.
439 Value *fatPtrsToInts(Value *V, Type *From, Type *To, const Twine &Name);
440 // Convert all the i160s that need to be buffer fat pointers (as specified)
441 // by the To type) into those pointers to preserve the semantics of the rest
442 // of the program.
443 Value *intsToFatPtrs(Value *V, Type *From, Type *To, const Twine &Name);
444
445public:
446 StoreFatPtrsAsIntsAndExpandMemcpyVisitor(BufferFatPtrToIntTypeMap *TypeMap,
447 const DataLayout &DL,
448 LLVMContext &Ctx,
449 const TargetMachine *TM)
450 : TypeMap(TypeMap), IRB(Ctx, InstSimplifyFolder(DL)), TM(TM) {}
451 bool processFunction(Function &F);
452
453 bool visitInstruction(Instruction &I) { return false; }
454 bool visitAllocaInst(AllocaInst &I);
455 bool visitLoadInst(LoadInst &LI);
456 bool visitStoreInst(StoreInst &SI);
457 bool visitGetElementPtrInst(GetElementPtrInst &I);
458
459 bool visitMemCpyInst(MemCpyInst &MCI);
460 bool visitMemMoveInst(MemMoveInst &MMI);
461 bool visitMemSetInst(MemSetInst &MSI);
462 bool visitMemSetPatternInst(MemSetPatternInst &MSPI);
463};
464} // namespace
465
466Value *StoreFatPtrsAsIntsAndExpandMemcpyVisitor::fatPtrsToInts(
467 Value *V, Type *From, Type *To, const Twine &Name) {
468 if (From == To)
469 return V;
470 ValueToValueMapTy::iterator Find = ConvertedForStore.find(V);
471 if (Find != ConvertedForStore.end())
472 return Find->second;
473 if (isBufferFatPtrOrVector(From)) {
474 Value *Cast = IRB.CreatePtrToInt(V, To, Name + ".int");
475 ConvertedForStore[V] = Cast;
476 return Cast;
477 }
478 if (From->getNumContainedTypes() == 0)
479 return V;
480 // Structs, arrays, and other compound types.
481 Value *Ret = PoisonValue::get(To);
482 if (auto *AT = dyn_cast<ArrayType>(From)) {
483 Type *FromPart = AT->getArrayElementType();
484 Type *ToPart = cast<ArrayType>(To)->getElementType();
485 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
486 Value *Field = IRB.CreateExtractValue(V, I);
487 Value *NewField =
488 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(I));
489 Ret = IRB.CreateInsertValue(Ret, NewField, I);
490 }
491 } else {
492 for (auto [Idx, FromPart, ToPart] :
493 enumerate(From->subtypes(), To->subtypes())) {
494 Value *Field = IRB.CreateExtractValue(V, Idx);
495 Value *NewField =
496 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(Idx));
497 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
498 }
499 }
500 ConvertedForStore[V] = Ret;
501 return Ret;
502}
503
504Value *StoreFatPtrsAsIntsAndExpandMemcpyVisitor::intsToFatPtrs(
505 Value *V, Type *From, Type *To, const Twine &Name) {
506 if (From == To)
507 return V;
508 if (isBufferFatPtrOrVector(To)) {
509 Value *Cast = IRB.CreateIntToPtr(V, To, Name + ".ptr");
510 return Cast;
511 }
512 if (From->getNumContainedTypes() == 0)
513 return V;
514 // Structs, arrays, and other compound types.
515 Value *Ret = PoisonValue::get(To);
516 if (auto *AT = dyn_cast<ArrayType>(From)) {
517 Type *FromPart = AT->getArrayElementType();
518 Type *ToPart = cast<ArrayType>(To)->getElementType();
519 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
520 Value *Field = IRB.CreateExtractValue(V, I);
521 Value *NewField =
522 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(I));
523 Ret = IRB.CreateInsertValue(Ret, NewField, I);
524 }
525 } else {
526 for (auto [Idx, FromPart, ToPart] :
527 enumerate(From->subtypes(), To->subtypes())) {
528 Value *Field = IRB.CreateExtractValue(V, Idx);
529 Value *NewField =
530 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(Idx));
531 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
532 }
533 }
534 return Ret;
535}
536
537bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::processFunction(Function &F) {
538 bool Changed = false;
539 // Process memcpy-like instructions after the main iteration because they can
540 // invalidate iterators.
541 SmallVector<WeakTrackingVH> CanBecomeLoops;
542 for (Instruction &I : make_early_inc_range(instructions(F))) {
544 CanBecomeLoops.push_back(&I);
545 else
546 Changed |= visit(I);
547 }
548 for (WeakTrackingVH VH : make_early_inc_range(CanBecomeLoops)) {
550 }
551 ConvertedForStore.clear();
552 return Changed;
553}
554
555bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitAllocaInst(AllocaInst &I) {
556 Type *Ty = I.getAllocatedType();
557 Type *NewTy = TypeMap->remapType(Ty);
558 if (Ty == NewTy)
559 return false;
560 I.setAllocatedType(NewTy);
561 return true;
562}
563
564bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitGetElementPtrInst(
565 GetElementPtrInst &I) {
566 Type *Ty = I.getSourceElementType();
567 Type *NewTy = TypeMap->remapType(Ty);
568 if (Ty == NewTy)
569 return false;
570 // We'll be rewriting the type `ptr addrspace(7)` out of existence soon, so
571 // make sure GEPs don't have different semantics with the new type.
572 I.setSourceElementType(NewTy);
573 I.setResultElementType(TypeMap->remapType(I.getResultElementType()));
574 return true;
575}
576
577bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitLoadInst(LoadInst &LI) {
578 Type *Ty = LI.getType();
579 Type *IntTy = TypeMap->remapType(Ty);
580 if (Ty == IntTy)
581 return false;
582
583 IRB.SetInsertPoint(&LI);
584 auto *NLI = cast<LoadInst>(LI.clone());
585 NLI->mutateType(IntTy);
586 NLI = IRB.Insert(NLI);
587 NLI->takeName(&LI);
588
589 Value *CastBack = intsToFatPtrs(NLI, IntTy, Ty, NLI->getName());
590 LI.replaceAllUsesWith(CastBack);
591 LI.eraseFromParent();
592 return true;
593}
594
595bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitStoreInst(StoreInst &SI) {
596 Value *V = SI.getValueOperand();
597 Type *Ty = V->getType();
598 Type *IntTy = TypeMap->remapType(Ty);
599 if (Ty == IntTy)
600 return false;
601
602 IRB.SetInsertPoint(&SI);
603 Value *IntV = fatPtrsToInts(V, Ty, IntTy, V->getName());
604 for (auto *Dbg : at::getDVRAssignmentMarkers(&SI))
605 Dbg->setRawLocation(ValueAsMetadata::get(IntV));
606
607 SI.setOperand(0, IntV);
608 return true;
609}
610
611bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemCpyInst(
612 MemCpyInst &MCI) {
613 // TODO: Allow memcpy.p7.p3 as a synonym for the direct-to-LDS copy, which'll
614 // need loop expansion here.
617 return false;
620 MCI.eraseFromParent();
621 return true;
622}
623
624bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemMoveInst(
625 MemMoveInst &MMI) {
628 return false;
630 "memmove() on buffer descriptors is not implemented because pointer "
631 "comparison on buffer descriptors isn't implemented\n");
632}
633
634bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemSetInst(
635 MemSetInst &MSI) {
637 return false;
640 MSI.eraseFromParent();
641 return true;
642}
643
644bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemSetPatternInst(
645 MemSetPatternInst &MSPI) {
647 return false;
649 MSPI.eraseFromParent();
650 return true;
651}
652
653namespace {
654/// Convert loads/stores of types that the buffer intrinsics can't handle into
655/// one ore more such loads/stores that consist of legal types.
656///
657/// Do this by
658/// 1. Recursing into structs (and arrays that don't share a memory layout with
659/// vectors) since the intrinsics can't handle complex types.
660/// 2. Converting arrays of non-aggregate, byte-sized types into their
661/// corresponding vectors
662/// 3. Bitcasting unsupported types, namely overly-long scalars and byte
663/// vectors, into vectors of supported types.
664/// 4. Splitting up excessively long reads/writes into multiple operations.
665///
666/// Note that this doesn't handle complex data strucures, but, in the future,
667/// the aggregate load splitter from SROA could be refactored to allow for that
668/// case.
669class LegalizeBufferContentTypesVisitor
670 : public InstVisitor<LegalizeBufferContentTypesVisitor, bool> {
671 friend class InstVisitor<LegalizeBufferContentTypesVisitor, bool>;
672
674
675 const DataLayout &DL;
676
677 /// If T is [N x U], where U is a scalar type, return the vector type
678 /// <N x U>, otherwise, return T.
679 Type *scalarArrayTypeAsVector(Type *MaybeArrayType);
680 Value *arrayToVector(Value *V, Type *TargetType, const Twine &Name);
681 Value *vectorToArray(Value *V, Type *OrigType, const Twine &Name);
682
683 /// Break up the loads of a struct into the loads of its components
684
685 /// Convert a vector or scalar type that can't be operated on by buffer
686 /// intrinsics to one that would be legal through bitcasts and/or truncation.
687 /// Uses the wider of i32, i16, or i8 where possible.
688 Type *legalNonAggregateFor(Type *T);
689 Value *makeLegalNonAggregate(Value *V, Type *TargetType, const Twine &Name);
690 Value *makeIllegalNonAggregate(Value *V, Type *OrigType, const Twine &Name);
691
692 struct VecSlice {
693 uint64_t Index = 0;
694 uint64_t Length = 0;
695 VecSlice() = delete;
696 // Needed for some Clangs
697 VecSlice(uint64_t Index, uint64_t Length) : Index(Index), Length(Length) {}
698 };
699 /// Return the [index, length] pairs into which `T` needs to be cut to form
700 /// legal buffer load or store operations. Clears `Slices`. Creates an empty
701 /// `Slices` for non-vector inputs and creates one slice if no slicing will be
702 /// needed.
703 void getVecSlices(Type *T, SmallVectorImpl<VecSlice> &Slices);
704
705 Value *extractSlice(Value *Vec, VecSlice S, const Twine &Name);
706 Value *insertSlice(Value *Whole, Value *Part, VecSlice S, const Twine &Name);
707
708 /// In most cases, return `LegalType`. However, when given an input that would
709 /// normally be a legal type for the buffer intrinsics to return but that
710 /// isn't hooked up through SelectionDAG, return a type of the same width that
711 /// can be used with the relevant intrinsics. Specifically, handle the cases:
712 /// - <1 x T> => T for all T
713 /// - <N x i8> <=> i16, i32, 2xi32, 4xi32 (as needed)
714 /// - <N x T> where T is under 32 bits and the total size is 96 bits <=> <3 x
715 /// i32>
716 Type *intrinsicTypeFor(Type *LegalType);
717
718 bool visitLoadImpl(LoadInst &OrigLI, Type *PartType,
719 SmallVectorImpl<uint32_t> &AggIdxs, uint64_t AggByteOffset,
720 Value *&Result, const Twine &Name);
721 /// Return value is (Changed, ModifiedInPlace)
722 std::pair<bool, bool> visitStoreImpl(StoreInst &OrigSI, Type *PartType,
723 SmallVectorImpl<uint32_t> &AggIdxs,
724 uint64_t AggByteOffset,
725 const Twine &Name);
726
727 bool visitInstruction(Instruction &I) { return false; }
728 bool visitLoadInst(LoadInst &LI);
729 bool visitStoreInst(StoreInst &SI);
730
731public:
732 LegalizeBufferContentTypesVisitor(const DataLayout &DL, LLVMContext &Ctx)
733 : IRB(Ctx, InstSimplifyFolder(DL)), DL(DL) {}
734 bool processFunction(Function &F);
735};
736} // namespace
737
738Type *LegalizeBufferContentTypesVisitor::scalarArrayTypeAsVector(Type *T) {
740 if (!AT)
741 return T;
742 Type *ET = AT->getElementType();
743 if (!ET->isSingleValueType() || isa<VectorType>(ET))
744 reportFatalUsageError("loading non-scalar arrays from buffer fat pointers "
745 "should have recursed");
746 if (!DL.typeSizeEqualsStoreSize(AT))
748 "loading padded arrays from buffer fat pinters should have recursed");
749 return FixedVectorType::get(ET, AT->getNumElements());
750}
751
752Value *LegalizeBufferContentTypesVisitor::arrayToVector(Value *V,
753 Type *TargetType,
754 const Twine &Name) {
755 Value *VectorRes = PoisonValue::get(TargetType);
756 auto *VT = cast<FixedVectorType>(TargetType);
757 unsigned EC = VT->getNumElements();
758 for (auto I : iota_range<unsigned>(0, EC, /*Inclusive=*/false)) {
759 Value *Elem = IRB.CreateExtractValue(V, I, Name + ".elem." + Twine(I));
760 VectorRes = IRB.CreateInsertElement(VectorRes, Elem, I,
761 Name + ".as.vec." + Twine(I));
762 }
763 return VectorRes;
764}
765
766Value *LegalizeBufferContentTypesVisitor::vectorToArray(Value *V,
767 Type *OrigType,
768 const Twine &Name) {
769 Value *ArrayRes = PoisonValue::get(OrigType);
770 ArrayType *AT = cast<ArrayType>(OrigType);
771 unsigned EC = AT->getNumElements();
772 for (auto I : iota_range<unsigned>(0, EC, /*Inclusive=*/false)) {
773 Value *Elem = IRB.CreateExtractElement(V, I, Name + ".elem." + Twine(I));
774 ArrayRes = IRB.CreateInsertValue(ArrayRes, Elem, I,
775 Name + ".as.array." + Twine(I));
776 }
777 return ArrayRes;
778}
779
780Type *LegalizeBufferContentTypesVisitor::legalNonAggregateFor(Type *T) {
781 TypeSize Size = DL.getTypeStoreSizeInBits(T);
782 // Implicitly zero-extend to the next byte if needed
783 if (!DL.typeSizeEqualsStoreSize(T))
784 T = IRB.getIntNTy(Size.getFixedValue());
785 Type *ElemTy = T->getScalarType();
787 // Pointers are always big enough, and we'll let scalable vectors through to
788 // fail in codegen.
789 return T;
790 }
791 unsigned ElemSize = DL.getTypeSizeInBits(ElemTy).getFixedValue();
792 if (isPowerOf2_32(ElemSize) && ElemSize >= 16 && ElemSize <= 128) {
793 // [vectors of] anything that's 16/32/64/128 bits can be cast and split into
794 // legal buffer operations.
795 return T;
796 }
797 Type *BestVectorElemType = nullptr;
798 if (Size.isKnownMultipleOf(32))
799 BestVectorElemType = IRB.getInt32Ty();
800 else if (Size.isKnownMultipleOf(16))
801 BestVectorElemType = IRB.getInt16Ty();
802 else
803 BestVectorElemType = IRB.getInt8Ty();
804 unsigned NumCastElems =
805 Size.getFixedValue() / BestVectorElemType->getIntegerBitWidth();
806 if (NumCastElems == 1)
807 return BestVectorElemType;
808 return FixedVectorType::get(BestVectorElemType, NumCastElems);
809}
810
811Value *LegalizeBufferContentTypesVisitor::makeLegalNonAggregate(
812 Value *V, Type *TargetType, const Twine &Name) {
813 Type *SourceType = V->getType();
814 TypeSize SourceSize = DL.getTypeSizeInBits(SourceType);
815 TypeSize TargetSize = DL.getTypeSizeInBits(TargetType);
816 if (SourceSize != TargetSize) {
817 Type *ShortScalarTy = IRB.getIntNTy(SourceSize.getFixedValue());
818 Type *ByteScalarTy = IRB.getIntNTy(TargetSize.getFixedValue());
819 Value *AsScalar = IRB.CreateBitCast(V, ShortScalarTy, Name + ".as.scalar");
820 Value *Zext = IRB.CreateZExt(AsScalar, ByteScalarTy, Name + ".zext");
821 V = Zext;
822 SourceType = ByteScalarTy;
823 }
824 return IRB.CreateBitCast(V, TargetType, Name + ".legal");
825}
826
827Value *LegalizeBufferContentTypesVisitor::makeIllegalNonAggregate(
828 Value *V, Type *OrigType, const Twine &Name) {
829 Type *LegalType = V->getType();
830 TypeSize LegalSize = DL.getTypeSizeInBits(LegalType);
831 TypeSize OrigSize = DL.getTypeSizeInBits(OrigType);
832 if (LegalSize != OrigSize) {
833 Type *ShortScalarTy = IRB.getIntNTy(OrigSize.getFixedValue());
834 Type *ByteScalarTy = IRB.getIntNTy(LegalSize.getFixedValue());
835 Value *AsScalar = IRB.CreateBitCast(V, ByteScalarTy, Name + ".bytes.cast");
836 Value *Trunc = IRB.CreateTrunc(AsScalar, ShortScalarTy, Name + ".trunc");
837 return IRB.CreateBitCast(Trunc, OrigType, Name + ".orig");
838 }
839 return IRB.CreateBitCast(V, OrigType, Name + ".real.ty");
840}
841
842Type *LegalizeBufferContentTypesVisitor::intrinsicTypeFor(Type *LegalType) {
843 auto *VT = dyn_cast<FixedVectorType>(LegalType);
844 if (!VT)
845 return LegalType;
846 Type *ET = VT->getElementType();
847 // Explicitly return the element type of 1-element vectors because the
848 // underlying intrinsics don't like <1 x T> even though it's a synonym for T.
849 if (VT->getNumElements() == 1)
850 return ET;
851 if (DL.getTypeSizeInBits(LegalType) == 96 && DL.getTypeSizeInBits(ET) < 32)
852 return FixedVectorType::get(IRB.getInt32Ty(), 3);
853 if (ET->isIntegerTy(8)) {
854 switch (VT->getNumElements()) {
855 default:
856 return LegalType; // Let it crash later
857 case 1:
858 return IRB.getInt8Ty();
859 case 2:
860 return IRB.getInt16Ty();
861 case 4:
862 return IRB.getInt32Ty();
863 case 8:
864 return FixedVectorType::get(IRB.getInt32Ty(), 2);
865 case 16:
866 return FixedVectorType::get(IRB.getInt32Ty(), 4);
867 }
868 }
869 return LegalType;
870}
871
872void LegalizeBufferContentTypesVisitor::getVecSlices(
873 Type *T, SmallVectorImpl<VecSlice> &Slices) {
874 Slices.clear();
875 auto *VT = dyn_cast<FixedVectorType>(T);
876 if (!VT)
877 return;
878
879 uint64_t ElemBitWidth =
880 DL.getTypeSizeInBits(VT->getElementType()).getFixedValue();
881
882 uint64_t ElemsPer4Words = 128 / ElemBitWidth;
883 uint64_t ElemsPer2Words = ElemsPer4Words / 2;
884 uint64_t ElemsPerWord = ElemsPer2Words / 2;
885 uint64_t ElemsPerShort = ElemsPerWord / 2;
886 uint64_t ElemsPerByte = ElemsPerShort / 2;
887 // If the elements evenly pack into 32-bit words, we can use 3-word stores,
888 // such as for <6 x bfloat> or <3 x i32>, but we can't dot his for, for
889 // example, <3 x i64>, since that's not slicing.
890 uint64_t ElemsPer3Words = ElemsPerWord * 3;
891
892 uint64_t TotalElems = VT->getNumElements();
893 uint64_t Index = 0;
894 auto TrySlice = [&](unsigned MaybeLen) {
895 if (MaybeLen > 0 && Index + MaybeLen <= TotalElems) {
896 VecSlice Slice{/*Index=*/Index, /*Length=*/MaybeLen};
897 Slices.push_back(Slice);
898 Index += MaybeLen;
899 return true;
900 }
901 return false;
902 };
903 while (Index < TotalElems) {
904 TrySlice(ElemsPer4Words) || TrySlice(ElemsPer3Words) ||
905 TrySlice(ElemsPer2Words) || TrySlice(ElemsPerWord) ||
906 TrySlice(ElemsPerShort) || TrySlice(ElemsPerByte);
907 }
908}
909
910Value *LegalizeBufferContentTypesVisitor::extractSlice(Value *Vec, VecSlice S,
911 const Twine &Name) {
912 auto *VecVT = dyn_cast<FixedVectorType>(Vec->getType());
913 if (!VecVT)
914 return Vec;
915 if (S.Length == VecVT->getNumElements() && S.Index == 0)
916 return Vec;
917 if (S.Length == 1)
918 return IRB.CreateExtractElement(Vec, S.Index,
919 Name + ".slice." + Twine(S.Index));
920 SmallVector<int> Mask = llvm::to_vector(
921 llvm::iota_range<int>(S.Index, S.Index + S.Length, /*Inclusive=*/false));
922 return IRB.CreateShuffleVector(Vec, Mask, Name + ".slice." + Twine(S.Index));
923}
924
925Value *LegalizeBufferContentTypesVisitor::insertSlice(Value *Whole, Value *Part,
926 VecSlice S,
927 const Twine &Name) {
928 auto *WholeVT = dyn_cast<FixedVectorType>(Whole->getType());
929 if (!WholeVT)
930 return Part;
931 if (S.Length == WholeVT->getNumElements() && S.Index == 0)
932 return Part;
933 if (S.Length == 1) {
934 return IRB.CreateInsertElement(Whole, Part, S.Index,
935 Name + ".slice." + Twine(S.Index));
936 }
937 int NumElems = cast<FixedVectorType>(Whole->getType())->getNumElements();
938
939 // Extend the slice with poisons to make the main shufflevector happy.
940 SmallVector<int> ExtPartMask(NumElems, -1);
941 for (auto [I, E] : llvm::enumerate(
942 MutableArrayRef<int>(ExtPartMask).take_front(S.Length))) {
943 E = I;
944 }
945 Value *ExtPart = IRB.CreateShuffleVector(Part, ExtPartMask,
946 Name + ".ext." + Twine(S.Index));
947
948 SmallVector<int> Mask =
949 llvm::to_vector(llvm::iota_range<int>(0, NumElems, /*Inclusive=*/false));
950 for (auto [I, E] :
951 llvm::enumerate(MutableArrayRef<int>(Mask).slice(S.Index, S.Length)))
952 E = I + NumElems;
953 return IRB.CreateShuffleVector(Whole, ExtPart, Mask,
954 Name + ".parts." + Twine(S.Index));
955}
956
957bool LegalizeBufferContentTypesVisitor::visitLoadImpl(
958 LoadInst &OrigLI, Type *PartType, SmallVectorImpl<uint32_t> &AggIdxs,
959 uint64_t AggByteOff, Value *&Result, const Twine &Name) {
960 if (auto *ST = dyn_cast<StructType>(PartType)) {
961 const StructLayout *Layout = DL.getStructLayout(ST);
962 bool Changed = false;
963 for (auto [I, ElemTy, Offset] :
964 llvm::enumerate(ST->elements(), Layout->getMemberOffsets())) {
965 AggIdxs.push_back(I);
966 Changed |= visitLoadImpl(OrigLI, ElemTy, AggIdxs,
967 AggByteOff + Offset.getFixedValue(), Result,
968 Name + "." + Twine(I));
969 AggIdxs.pop_back();
970 }
971 return Changed;
972 }
973 if (auto *AT = dyn_cast<ArrayType>(PartType)) {
974 Type *ElemTy = AT->getElementType();
975 if (!ElemTy->isSingleValueType() || !DL.typeSizeEqualsStoreSize(ElemTy) ||
976 ElemTy->isVectorTy()) {
977 TypeSize ElemStoreSize = DL.getTypeStoreSize(ElemTy);
978 bool Changed = false;
979 for (auto I : llvm::iota_range<uint32_t>(0, AT->getNumElements(),
980 /*Inclusive=*/false)) {
981 AggIdxs.push_back(I);
982 Changed |= visitLoadImpl(OrigLI, ElemTy, AggIdxs,
983 AggByteOff + I * ElemStoreSize.getFixedValue(),
984 Result, Name + Twine(I));
985 AggIdxs.pop_back();
986 }
987 return Changed;
988 }
989 }
990
991 // Typical case
992
993 Type *ArrayAsVecType = scalarArrayTypeAsVector(PartType);
994 Type *LegalType = legalNonAggregateFor(ArrayAsVecType);
995
997 getVecSlices(LegalType, Slices);
998 bool HasSlices = Slices.size() > 1;
999 bool IsAggPart = !AggIdxs.empty();
1000 Value *LoadsRes;
1001 if (!HasSlices && !IsAggPart) {
1002 Type *LoadableType = intrinsicTypeFor(LegalType);
1003 if (LoadableType == PartType)
1004 return false;
1005
1006 IRB.SetInsertPoint(&OrigLI);
1007 auto *NLI = cast<LoadInst>(OrigLI.clone());
1008 NLI->mutateType(LoadableType);
1009 NLI = IRB.Insert(NLI);
1010 NLI->setName(Name + ".loadable");
1011
1012 LoadsRes = IRB.CreateBitCast(NLI, LegalType, Name + ".from.loadable");
1013 } else {
1014 IRB.SetInsertPoint(&OrigLI);
1015 LoadsRes = PoisonValue::get(LegalType);
1016 Value *OrigPtr = OrigLI.getPointerOperand();
1017 // If we're needing to spill something into more than one load, its legal
1018 // type will be a vector (ex. an i256 load will have LegalType = <8 x i32>).
1019 // But if we're already a scalar (which can happen if we're splitting up a
1020 // struct), the element type will be the legal type itself.
1021 Type *ElemType = LegalType->getScalarType();
1022 unsigned ElemBytes = DL.getTypeStoreSize(ElemType);
1023 AAMDNodes AANodes = OrigLI.getAAMetadata();
1024 if (IsAggPart && Slices.empty())
1025 Slices.push_back(VecSlice{/*Index=*/0, /*Length=*/1});
1026 for (VecSlice S : Slices) {
1027 Type *SliceType =
1028 S.Length != 1 ? FixedVectorType::get(ElemType, S.Length) : ElemType;
1029 int64_t ByteOffset = AggByteOff + S.Index * ElemBytes;
1030 // You can't reasonably expect loads to wrap around the edge of memory.
1031 Value *NewPtr = IRB.CreateGEP(
1032 IRB.getInt8Ty(), OrigLI.getPointerOperand(), IRB.getInt32(ByteOffset),
1033 OrigPtr->getName() + ".off.ptr." + Twine(ByteOffset),
1035 Type *LoadableType = intrinsicTypeFor(SliceType);
1036 LoadInst *NewLI = IRB.CreateAlignedLoad(
1037 LoadableType, NewPtr, commonAlignment(OrigLI.getAlign(), ByteOffset),
1038 Name + ".off." + Twine(ByteOffset));
1039 copyMetadataForLoad(*NewLI, OrigLI);
1040 NewLI->setAAMetadata(
1041 AANodes.adjustForAccess(ByteOffset, LoadableType, DL));
1042 NewLI->setAtomic(OrigLI.getOrdering(), OrigLI.getSyncScopeID());
1043 NewLI->setVolatile(OrigLI.isVolatile());
1044 Value *Loaded = IRB.CreateBitCast(NewLI, SliceType,
1045 NewLI->getName() + ".from.loadable");
1046 LoadsRes = insertSlice(LoadsRes, Loaded, S, Name);
1047 }
1048 }
1049 if (LegalType != ArrayAsVecType)
1050 LoadsRes = makeIllegalNonAggregate(LoadsRes, ArrayAsVecType, Name);
1051 if (ArrayAsVecType != PartType)
1052 LoadsRes = vectorToArray(LoadsRes, PartType, Name);
1053
1054 if (IsAggPart)
1055 Result = IRB.CreateInsertValue(Result, LoadsRes, AggIdxs, Name);
1056 else
1057 Result = LoadsRes;
1058 return true;
1059}
1060
1061bool LegalizeBufferContentTypesVisitor::visitLoadInst(LoadInst &LI) {
1063 return false;
1064
1065 SmallVector<uint32_t> AggIdxs;
1066 Type *OrigType = LI.getType();
1067 Value *Result = PoisonValue::get(OrigType);
1068 bool Changed = visitLoadImpl(LI, OrigType, AggIdxs, 0, Result, LI.getName());
1069 if (!Changed)
1070 return false;
1071 Result->takeName(&LI);
1072 LI.replaceAllUsesWith(Result);
1073 LI.eraseFromParent();
1074 return Changed;
1075}
1076
1077std::pair<bool, bool> LegalizeBufferContentTypesVisitor::visitStoreImpl(
1078 StoreInst &OrigSI, Type *PartType, SmallVectorImpl<uint32_t> &AggIdxs,
1079 uint64_t AggByteOff, const Twine &Name) {
1080 if (auto *ST = dyn_cast<StructType>(PartType)) {
1081 const StructLayout *Layout = DL.getStructLayout(ST);
1082 bool Changed = false;
1083 for (auto [I, ElemTy, Offset] :
1084 llvm::enumerate(ST->elements(), Layout->getMemberOffsets())) {
1085 AggIdxs.push_back(I);
1086 Changed |= std::get<0>(visitStoreImpl(OrigSI, ElemTy, AggIdxs,
1087 AggByteOff + Offset.getFixedValue(),
1088 Name + "." + Twine(I)));
1089 AggIdxs.pop_back();
1090 }
1091 return std::make_pair(Changed, /*ModifiedInPlace=*/false);
1092 }
1093 if (auto *AT = dyn_cast<ArrayType>(PartType)) {
1094 Type *ElemTy = AT->getElementType();
1095 if (!ElemTy->isSingleValueType() || !DL.typeSizeEqualsStoreSize(ElemTy) ||
1096 ElemTy->isVectorTy()) {
1097 TypeSize ElemStoreSize = DL.getTypeStoreSize(ElemTy);
1098 bool Changed = false;
1099 for (auto I : llvm::iota_range<uint32_t>(0, AT->getNumElements(),
1100 /*Inclusive=*/false)) {
1101 AggIdxs.push_back(I);
1102 Changed |= std::get<0>(visitStoreImpl(
1103 OrigSI, ElemTy, AggIdxs,
1104 AggByteOff + I * ElemStoreSize.getFixedValue(), Name + Twine(I)));
1105 AggIdxs.pop_back();
1106 }
1107 return std::make_pair(Changed, /*ModifiedInPlace=*/false);
1108 }
1109 }
1110
1111 Value *OrigData = OrigSI.getValueOperand();
1112 Value *NewData = OrigData;
1113
1114 bool IsAggPart = !AggIdxs.empty();
1115 if (IsAggPart)
1116 NewData = IRB.CreateExtractValue(NewData, AggIdxs, Name);
1117
1118 Type *ArrayAsVecType = scalarArrayTypeAsVector(PartType);
1119 if (ArrayAsVecType != PartType) {
1120 NewData = arrayToVector(NewData, ArrayAsVecType, Name);
1121 }
1122
1123 Type *LegalType = legalNonAggregateFor(ArrayAsVecType);
1124 if (LegalType != ArrayAsVecType) {
1125 NewData = makeLegalNonAggregate(NewData, LegalType, Name);
1126 }
1127
1128 SmallVector<VecSlice> Slices;
1129 getVecSlices(LegalType, Slices);
1130 bool NeedToSplit = Slices.size() > 1 || IsAggPart;
1131 if (!NeedToSplit) {
1132 Type *StorableType = intrinsicTypeFor(LegalType);
1133 if (StorableType == PartType)
1134 return std::make_pair(/*Changed=*/false, /*ModifiedInPlace=*/false);
1135 NewData = IRB.CreateBitCast(NewData, StorableType, Name + ".storable");
1136 OrigSI.setOperand(0, NewData);
1137 return std::make_pair(/*Changed=*/true, /*ModifiedInPlace=*/true);
1138 }
1139
1140 Value *OrigPtr = OrigSI.getPointerOperand();
1141 Type *ElemType = LegalType->getScalarType();
1142 if (IsAggPart && Slices.empty())
1143 Slices.push_back(VecSlice{/*Index=*/0, /*Length=*/1});
1144 unsigned ElemBytes = DL.getTypeStoreSize(ElemType);
1145 AAMDNodes AANodes = OrigSI.getAAMetadata();
1146 for (VecSlice S : Slices) {
1147 Type *SliceType =
1148 S.Length != 1 ? FixedVectorType::get(ElemType, S.Length) : ElemType;
1149 int64_t ByteOffset = AggByteOff + S.Index * ElemBytes;
1150 Value *NewPtr =
1151 IRB.CreateGEP(IRB.getInt8Ty(), OrigPtr, IRB.getInt32(ByteOffset),
1152 OrigPtr->getName() + ".part." + Twine(S.Index),
1154 Value *DataSlice = extractSlice(NewData, S, Name);
1155 Type *StorableType = intrinsicTypeFor(SliceType);
1156 DataSlice = IRB.CreateBitCast(DataSlice, StorableType,
1157 DataSlice->getName() + ".storable");
1158 auto *NewSI = cast<StoreInst>(OrigSI.clone());
1159 NewSI->setAlignment(commonAlignment(OrigSI.getAlign(), ByteOffset));
1160 IRB.Insert(NewSI);
1161 NewSI->setOperand(0, DataSlice);
1162 NewSI->setOperand(1, NewPtr);
1163 NewSI->setAAMetadata(AANodes.adjustForAccess(ByteOffset, StorableType, DL));
1164 }
1165 return std::make_pair(/*Changed=*/true, /*ModifiedInPlace=*/false);
1166}
1167
1168bool LegalizeBufferContentTypesVisitor::visitStoreInst(StoreInst &SI) {
1169 if (SI.getPointerAddressSpace() != AMDGPUAS::BUFFER_FAT_POINTER)
1170 return false;
1171 IRB.SetInsertPoint(&SI);
1172 SmallVector<uint32_t> AggIdxs;
1173 Value *OrigData = SI.getValueOperand();
1174 auto [Changed, ModifiedInPlace] =
1175 visitStoreImpl(SI, OrigData->getType(), AggIdxs, 0, OrigData->getName());
1176 if (Changed && !ModifiedInPlace)
1177 SI.eraseFromParent();
1178 return Changed;
1179}
1180
1181bool LegalizeBufferContentTypesVisitor::processFunction(Function &F) {
1182 bool Changed = false;
1183 // Note, memory transfer intrinsics won't
1184 for (Instruction &I : make_early_inc_range(instructions(F))) {
1185 Changed |= visit(I);
1186 }
1187 return Changed;
1188}
1189
1190/// Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered
1191/// buffer fat pointer constant.
1192static std::pair<Constant *, Constant *>
1194 assert(isSplitFatPtr(C->getType()) && "Not a split fat buffer pointer");
1195 return std::make_pair(C->getAggregateElement(0u), C->getAggregateElement(1u));
1196}
1197
1198namespace {
1199/// Handle the remapping of ptr addrspace(7) constants.
1200class FatPtrConstMaterializer final : public ValueMaterializer {
1201 BufferFatPtrToStructTypeMap *TypeMap;
1202 // An internal mapper that is used to recurse into the arguments of constants.
1203 // While the documentation for `ValueMapper` specifies not to use it
1204 // recursively, examination of the logic in mapValue() shows that it can
1205 // safely be used recursively when handling constants, like it does in its own
1206 // logic.
1207 ValueMapper InternalMapper;
1208
1209 Constant *materializeBufferFatPtrConst(Constant *C);
1210
1211public:
1212 // UnderlyingMap is the value map this materializer will be filling.
1213 FatPtrConstMaterializer(BufferFatPtrToStructTypeMap *TypeMap,
1214 ValueToValueMapTy &UnderlyingMap)
1215 : TypeMap(TypeMap),
1216 InternalMapper(UnderlyingMap, RF_None, TypeMap, this) {}
1217 ~FatPtrConstMaterializer() = default;
1218
1219 Value *materialize(Value *V) override;
1220};
1221} // namespace
1222
1223Constant *FatPtrConstMaterializer::materializeBufferFatPtrConst(Constant *C) {
1224 Type *SrcTy = C->getType();
1225 auto *NewTy = dyn_cast<StructType>(TypeMap->remapType(SrcTy));
1226 if (C->isNullValue())
1227 return ConstantAggregateZero::getNullValue(NewTy);
1228 if (isa<PoisonValue>(C)) {
1229 return ConstantStruct::get(NewTy,
1230 {PoisonValue::get(NewTy->getElementType(0)),
1231 PoisonValue::get(NewTy->getElementType(1))});
1232 }
1233 if (isa<UndefValue>(C)) {
1234 return ConstantStruct::get(NewTy,
1235 {UndefValue::get(NewTy->getElementType(0)),
1236 UndefValue::get(NewTy->getElementType(1))});
1237 }
1238
1239 if (auto *VC = dyn_cast<ConstantVector>(C)) {
1240 if (Constant *S = VC->getSplatValue()) {
1241 Constant *NewS = InternalMapper.mapConstant(*S);
1242 if (!NewS)
1243 return nullptr;
1244 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewS);
1245 auto EC = VC->getType()->getElementCount();
1246 return ConstantStruct::get(NewTy, {ConstantVector::getSplat(EC, Rsrc),
1247 ConstantVector::getSplat(EC, Off)});
1248 }
1251 for (Value *Op : VC->operand_values()) {
1252 auto *NewOp = dyn_cast_or_null<Constant>(InternalMapper.mapValue(*Op));
1253 if (!NewOp)
1254 return nullptr;
1255 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewOp);
1256 Rsrcs.push_back(Rsrc);
1257 Offs.push_back(Off);
1258 }
1259 Constant *RsrcVec = ConstantVector::get(Rsrcs);
1260 Constant *OffVec = ConstantVector::get(Offs);
1261 return ConstantStruct::get(NewTy, {RsrcVec, OffVec});
1262 }
1263
1264 if (isa<GlobalValue>(C))
1265 reportFatalUsageError("global values containing ptr addrspace(7) (buffer "
1266 "fat pointer) values are not supported");
1267
1268 if (isa<ConstantExpr>(C))
1270 "constant exprs containing ptr addrspace(7) (buffer "
1271 "fat pointer) values should have been expanded earlier");
1272
1273 return nullptr;
1274}
1275
1276Value *FatPtrConstMaterializer::materialize(Value *V) {
1278 if (!C)
1279 return nullptr;
1280 // Structs and other types that happen to contain fat pointers get remapped
1281 // by the mapValue() logic.
1282 if (!isBufferFatPtrConst(C))
1283 return nullptr;
1284 return materializeBufferFatPtrConst(C);
1285}
1286
1287using PtrParts = std::pair<Value *, Value *>;
1288namespace {
1289// The visitor returns the resource and offset parts for an instruction if they
1290// can be computed, or (nullptr, nullptr) for cases that don't have a meaningful
1291// value mapping.
1292class SplitPtrStructs : public InstVisitor<SplitPtrStructs, PtrParts> {
1293 ValueToValueMapTy RsrcParts;
1294 ValueToValueMapTy OffParts;
1295
1296 // Track instructions that have been rewritten into a user of the component
1297 // parts of their ptr addrspace(7) input. Instructions that produced
1298 // ptr addrspace(7) parts should **not** be RAUW'd before being added to this
1299 // set, as that replacement will be handled in a post-visit step. However,
1300 // instructions that yield values that aren't fat pointers (ex. ptrtoint)
1301 // should RAUW themselves with new instructions that use the split parts
1302 // of their arguments during processing.
1303 DenseSet<Instruction *> SplitUsers;
1304
1305 // Nodes that need a second look once we've computed the parts for all other
1306 // instructions to see if, for example, we really need to phi on the resource
1307 // part.
1308 SmallVector<Instruction *> Conditionals;
1309 // Temporary instructions produced while lowering conditionals that should be
1310 // killed.
1311 SmallVector<Instruction *> ConditionalTemps;
1312
1313 // Subtarget info, needed for determining what cache control bits to set.
1314 const TargetMachine *TM;
1315 const GCNSubtarget *ST = nullptr;
1316
1318
1319 // Copy metadata between instructions if applicable.
1320 void copyMetadata(Value *Dest, Value *Src);
1321
1322 // Get the resource and offset parts of the value V, inserting appropriate
1323 // extractvalue calls if needed.
1324 PtrParts getPtrParts(Value *V);
1325
1326 // Given an instruction that could produce multiple resource parts (a PHI or
1327 // select), collect the set of possible instructions that could have provided
1328 // its resource parts that it could have (the `Roots`) and the set of
1329 // conditional instructions visited during the search (`Seen`). If, after
1330 // removing the root of the search from `Seen` and `Roots`, `Seen` is a subset
1331 // of `Roots` and `Roots - Seen` contains one element, the resource part of
1332 // that element can replace the resource part of all other elements in `Seen`.
1333 void getPossibleRsrcRoots(Instruction *I, SmallPtrSetImpl<Value *> &Roots,
1335 void processConditionals();
1336
1337 // If an instruction hav been split into resource and offset parts,
1338 // delete that instruction. If any of its uses have not themselves been split
1339 // into parts (for example, an insertvalue), construct the structure
1340 // that the type rewrites declared should be produced by the dying instruction
1341 // and use that.
1342 // Also, kill the temporary extractvalue operations produced by the two-stage
1343 // lowering of PHIs and conditionals.
1344 void killAndReplaceSplitInstructions(SmallVectorImpl<Instruction *> &Origs);
1345
1346 void setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx);
1347 void insertPreMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
1348 void insertPostMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
1349 Value *handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr, Type *Ty,
1350 Align Alignment, AtomicOrdering Order,
1351 bool IsVolatile, SyncScope::ID SSID);
1352
1353public:
1354 SplitPtrStructs(const DataLayout &DL, LLVMContext &Ctx,
1355 const TargetMachine *TM)
1356 : TM(TM), IRB(Ctx, InstSimplifyFolder(DL)) {}
1357
1358 void processFunction(Function &F);
1359
1360 PtrParts visitInstruction(Instruction &I);
1361 PtrParts visitLoadInst(LoadInst &LI);
1362 PtrParts visitStoreInst(StoreInst &SI);
1363 PtrParts visitAtomicRMWInst(AtomicRMWInst &AI);
1364 PtrParts visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI);
1365 PtrParts visitGetElementPtrInst(GetElementPtrInst &GEP);
1366
1367 PtrParts visitPtrToAddrInst(PtrToAddrInst &PA);
1368 PtrParts visitPtrToIntInst(PtrToIntInst &PI);
1369 PtrParts visitIntToPtrInst(IntToPtrInst &IP);
1370 PtrParts visitAddrSpaceCastInst(AddrSpaceCastInst &I);
1371 PtrParts visitICmpInst(ICmpInst &Cmp);
1372 PtrParts visitFreezeInst(FreezeInst &I);
1373
1374 PtrParts visitExtractElementInst(ExtractElementInst &I);
1375 PtrParts visitInsertElementInst(InsertElementInst &I);
1376 PtrParts visitShuffleVectorInst(ShuffleVectorInst &I);
1377
1378 PtrParts visitPHINode(PHINode &PHI);
1379 PtrParts visitSelectInst(SelectInst &SI);
1380
1381 PtrParts visitIntrinsicInst(IntrinsicInst &II);
1382};
1383} // namespace
1384
1385void SplitPtrStructs::copyMetadata(Value *Dest, Value *Src) {
1386 auto *DestI = dyn_cast<Instruction>(Dest);
1387 auto *SrcI = dyn_cast<Instruction>(Src);
1388
1389 if (!DestI || !SrcI)
1390 return;
1391
1392 DestI->copyMetadata(*SrcI);
1393}
1394
1395PtrParts SplitPtrStructs::getPtrParts(Value *V) {
1396 assert(isSplitFatPtr(V->getType()) && "it's not meaningful to get the parts "
1397 "of something that wasn't rewritten");
1398 auto *RsrcEntry = &RsrcParts[V];
1399 auto *OffEntry = &OffParts[V];
1400 if (*RsrcEntry && *OffEntry)
1401 return {*RsrcEntry, *OffEntry};
1402
1403 if (auto *C = dyn_cast<Constant>(V)) {
1404 auto [Rsrc, Off] = splitLoweredFatBufferConst(C);
1405 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1406 }
1407
1408 IRBuilder<InstSimplifyFolder>::InsertPointGuard Guard(IRB);
1409 if (auto *I = dyn_cast<Instruction>(V)) {
1410 LLVM_DEBUG(dbgs() << "Recursing to split parts of " << *I << "\n");
1411 auto [Rsrc, Off] = visit(*I);
1412 if (Rsrc && Off)
1413 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1414 // We'll be creating the new values after the relevant instruction.
1415 // This instruction generates a value and so isn't a terminator.
1416 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1417 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1418 } else if (auto *A = dyn_cast<Argument>(V)) {
1419 IRB.SetInsertPointPastAllocas(A->getParent());
1420 IRB.SetCurrentDebugLocation(DebugLoc());
1421 }
1422 Value *Rsrc = IRB.CreateExtractValue(V, 0, V->getName() + ".rsrc");
1423 Value *Off = IRB.CreateExtractValue(V, 1, V->getName() + ".off");
1424 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1425}
1426
1427/// Returns the instruction that defines the resource part of the value V.
1428/// Note that this is not getUnderlyingObject(), since that looks through
1429/// operations like ptrmask which might modify the resource part.
1430///
1431/// We can limit ourselves to just looking through GEPs followed by looking
1432/// through addrspacecasts because only those two operations preserve the
1433/// resource part, and because operations on an `addrspace(8)` (which is the
1434/// legal input to this addrspacecast) would produce a different resource part.
1436 while (auto *GEP = dyn_cast<GEPOperator>(V))
1437 V = GEP->getPointerOperand();
1438 while (auto *ASC = dyn_cast<AddrSpaceCastOperator>(V))
1439 V = ASC->getPointerOperand();
1440 return V;
1441}
1442
1443void SplitPtrStructs::getPossibleRsrcRoots(Instruction *I,
1444 SmallPtrSetImpl<Value *> &Roots,
1445 SmallPtrSetImpl<Value *> &Seen) {
1446 if (auto *PHI = dyn_cast<PHINode>(I)) {
1447 if (!Seen.insert(I).second)
1448 return;
1449 for (Value *In : PHI->incoming_values()) {
1450 In = rsrcPartRoot(In);
1451 Roots.insert(In);
1453 getPossibleRsrcRoots(cast<Instruction>(In), Roots, Seen);
1454 }
1455 } else if (auto *SI = dyn_cast<SelectInst>(I)) {
1456 if (!Seen.insert(SI).second)
1457 return;
1458 Value *TrueVal = rsrcPartRoot(SI->getTrueValue());
1459 Value *FalseVal = rsrcPartRoot(SI->getFalseValue());
1460 Roots.insert(TrueVal);
1461 Roots.insert(FalseVal);
1462 if (isa<PHINode, SelectInst>(TrueVal))
1463 getPossibleRsrcRoots(cast<Instruction>(TrueVal), Roots, Seen);
1464 if (isa<PHINode, SelectInst>(FalseVal))
1465 getPossibleRsrcRoots(cast<Instruction>(FalseVal), Roots, Seen);
1466 } else {
1467 llvm_unreachable("getPossibleRsrcParts() only works on phi and select");
1468 }
1469}
1470
1471void SplitPtrStructs::processConditionals() {
1472 SmallDenseMap<Value *, Value *> FoundRsrcs;
1473 SmallPtrSet<Value *, 4> Roots;
1474 SmallPtrSet<Value *, 4> Seen;
1475 for (Instruction *I : Conditionals) {
1476 // These have to exist by now because we've visited these nodes.
1477 Value *Rsrc = RsrcParts[I];
1478 Value *Off = OffParts[I];
1479 assert(Rsrc && Off && "must have visited conditionals by now");
1480
1481 std::optional<Value *> MaybeRsrc;
1482 auto MaybeFoundRsrc = FoundRsrcs.find(I);
1483 if (MaybeFoundRsrc != FoundRsrcs.end()) {
1484 MaybeRsrc = MaybeFoundRsrc->second;
1485 } else {
1486 IRBuilder<InstSimplifyFolder>::InsertPointGuard Guard(IRB);
1487 Roots.clear();
1488 Seen.clear();
1489 getPossibleRsrcRoots(I, Roots, Seen);
1490 LLVM_DEBUG(dbgs() << "Processing conditional: " << *I << "\n");
1491#ifndef NDEBUG
1492 for (Value *V : Roots)
1493 LLVM_DEBUG(dbgs() << "Root: " << *V << "\n");
1494 for (Value *V : Seen)
1495 LLVM_DEBUG(dbgs() << "Seen: " << *V << "\n");
1496#endif
1497 // If we are our own possible root, then we shouldn't block our
1498 // replacement with a valid incoming value.
1499 Roots.erase(I);
1500 // We don't want to block the optimization for conditionals that don't
1501 // refer to themselves but did see themselves during the traversal.
1502 Seen.erase(I);
1503
1504 if (set_is_subset(Seen, Roots)) {
1505 auto Diff = set_difference(Roots, Seen);
1506 if (Diff.size() == 1) {
1507 Value *RootVal = *Diff.begin();
1508 // Handle the case where previous loops already looked through
1509 // an addrspacecast.
1510 if (isSplitFatPtr(RootVal->getType()))
1511 MaybeRsrc = std::get<0>(getPtrParts(RootVal));
1512 else
1513 MaybeRsrc = RootVal;
1514 }
1515 }
1516 }
1517
1518 if (auto *PHI = dyn_cast<PHINode>(I)) {
1519 Value *NewRsrc;
1520 StructType *PHITy = cast<StructType>(PHI->getType());
1521 IRB.SetInsertPoint(*PHI->getInsertionPointAfterDef());
1522 IRB.SetCurrentDebugLocation(PHI->getDebugLoc());
1523 if (MaybeRsrc) {
1524 NewRsrc = *MaybeRsrc;
1525 } else {
1526 Type *RsrcTy = PHITy->getElementType(0);
1527 auto *RsrcPHI = IRB.CreatePHI(RsrcTy, PHI->getNumIncomingValues());
1528 RsrcPHI->takeName(Rsrc);
1529 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
1530 Value *VRsrc = std::get<0>(getPtrParts(V));
1531 RsrcPHI->addIncoming(VRsrc, BB);
1532 }
1533 copyMetadata(RsrcPHI, PHI);
1534 NewRsrc = RsrcPHI;
1535 }
1536
1537 Type *OffTy = PHITy->getElementType(1);
1538 auto *NewOff = IRB.CreatePHI(OffTy, PHI->getNumIncomingValues());
1539 NewOff->takeName(Off);
1540 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
1541 assert(OffParts.count(V) && "An offset part had to be created by now");
1542 Value *VOff = std::get<1>(getPtrParts(V));
1543 NewOff->addIncoming(VOff, BB);
1544 }
1545 copyMetadata(NewOff, PHI);
1546
1547 // Note: We don't eraseFromParent() the temporaries because we don't want
1548 // to put the corrections maps in an inconstent state. That'll be handed
1549 // during the rest of the killing. Also, `ValueToValueMapTy` guarantees
1550 // that references in that map will be updated as well.
1551 // Note that if the temporary instruction got `InstSimplify`'d away, it
1552 // might be something like a block argument.
1553 if (auto *RsrcInst = dyn_cast<Instruction>(Rsrc)) {
1554 ConditionalTemps.push_back(RsrcInst);
1555 RsrcInst->replaceAllUsesWith(NewRsrc);
1556 }
1557 if (auto *OffInst = dyn_cast<Instruction>(Off)) {
1558 ConditionalTemps.push_back(OffInst);
1559 OffInst->replaceAllUsesWith(NewOff);
1560 }
1561
1562 // Save on recomputing the cycle traversals in known-root cases.
1563 if (MaybeRsrc)
1564 for (Value *V : Seen)
1565 FoundRsrcs[V] = NewRsrc;
1566 } else if (isa<SelectInst>(I)) {
1567 if (MaybeRsrc) {
1568 if (auto *RsrcInst = dyn_cast<Instruction>(Rsrc)) {
1569 // Guard against conditionals that were already folded away.
1570 if (RsrcInst != *MaybeRsrc) {
1571 ConditionalTemps.push_back(RsrcInst);
1572 RsrcInst->replaceAllUsesWith(*MaybeRsrc);
1573 }
1574 }
1575 for (Value *V : Seen)
1576 FoundRsrcs[V] = *MaybeRsrc;
1577 }
1578 } else {
1579 llvm_unreachable("Only PHIs and selects go in the conditionals list");
1580 }
1581 }
1582}
1583
1584void SplitPtrStructs::killAndReplaceSplitInstructions(
1585 SmallVectorImpl<Instruction *> &Origs) {
1586 for (Instruction *I : ConditionalTemps)
1587 I->eraseFromParent();
1588
1589 for (Instruction *I : Origs) {
1590 if (!SplitUsers.contains(I))
1591 continue;
1592
1594 findDbgValues(I, Dbgs);
1595 for (DbgVariableRecord *Dbg : Dbgs) {
1596 auto &DL = I->getDataLayout();
1597 assert(isSplitFatPtr(I->getType()) &&
1598 "We should've RAUW'd away loads, stores, etc. at this point");
1599 DbgVariableRecord *OffDbg = Dbg->clone();
1600 auto [Rsrc, Off] = getPtrParts(I);
1601
1602 int64_t RsrcSz = DL.getTypeSizeInBits(Rsrc->getType());
1603 int64_t OffSz = DL.getTypeSizeInBits(Off->getType());
1604
1605 std::optional<DIExpression *> RsrcExpr =
1606 DIExpression::createFragmentExpression(Dbg->getExpression(), 0,
1607 RsrcSz);
1608 std::optional<DIExpression *> OffExpr =
1609 DIExpression::createFragmentExpression(Dbg->getExpression(), RsrcSz,
1610 OffSz);
1611 if (OffExpr) {
1612 OffDbg->setExpression(*OffExpr);
1613 OffDbg->replaceVariableLocationOp(I, Off);
1614 OffDbg->insertBefore(Dbg);
1615 } else {
1616 OffDbg->eraseFromParent();
1617 }
1618 if (RsrcExpr) {
1619 Dbg->setExpression(*RsrcExpr);
1620 Dbg->replaceVariableLocationOp(I, Rsrc);
1621 } else {
1622 Dbg->replaceVariableLocationOp(I, PoisonValue::get(I->getType()));
1623 }
1624 }
1625
1626 Value *Poison = PoisonValue::get(I->getType());
1627 I->replaceUsesWithIf(Poison, [&](const Use &U) -> bool {
1628 if (const auto *UI = dyn_cast<Instruction>(U.getUser()))
1629 return SplitUsers.contains(UI);
1630 return false;
1631 });
1632
1633 if (I->use_empty()) {
1634 I->eraseFromParent();
1635 continue;
1636 }
1637 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1638 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1639 auto [Rsrc, Off] = getPtrParts(I);
1640 Value *Struct = PoisonValue::get(I->getType());
1641 Struct = IRB.CreateInsertValue(Struct, Rsrc, 0);
1642 Struct = IRB.CreateInsertValue(Struct, Off, 1);
1643 copyMetadata(Struct, I);
1644 Struct->takeName(I);
1645 I->replaceAllUsesWith(Struct);
1646 I->eraseFromParent();
1647 }
1648}
1649
1650void SplitPtrStructs::setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx) {
1651 LLVMContext &Ctx = Intr->getContext();
1652 Intr->addParamAttr(RsrcArgIdx, Attribute::getWithAlignment(Ctx, A));
1653}
1654
1655void SplitPtrStructs::insertPreMemOpFence(AtomicOrdering Order,
1656 SyncScope::ID SSID) {
1657 switch (Order) {
1658 case AtomicOrdering::Release:
1659 case AtomicOrdering::AcquireRelease:
1660 case AtomicOrdering::SequentiallyConsistent:
1661 IRB.CreateFence(AtomicOrdering::Release, SSID);
1662 break;
1663 default:
1664 break;
1665 }
1666}
1667
1668void SplitPtrStructs::insertPostMemOpFence(AtomicOrdering Order,
1669 SyncScope::ID SSID) {
1670 switch (Order) {
1671 case AtomicOrdering::Acquire:
1672 case AtomicOrdering::AcquireRelease:
1673 case AtomicOrdering::SequentiallyConsistent:
1674 IRB.CreateFence(AtomicOrdering::Acquire, SSID);
1675 break;
1676 default:
1677 break;
1678 }
1679}
1680
1681Value *SplitPtrStructs::handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr,
1682 Type *Ty, Align Alignment,
1683 AtomicOrdering Order, bool IsVolatile,
1684 SyncScope::ID SSID) {
1685 IRB.SetInsertPoint(I);
1686
1687 auto [Rsrc, Off] = getPtrParts(Ptr);
1689 if (Arg)
1690 Args.push_back(Arg);
1691 Args.push_back(Rsrc);
1692 Args.push_back(Off);
1693 insertPreMemOpFence(Order, SSID);
1694 // soffset is always 0 for these cases, where we always want any offset to be
1695 // part of bounds checking and we don't know which parts of the GEPs is
1696 // uniform.
1697 Args.push_back(IRB.getInt32(0));
1698
1699 uint32_t Aux = 0;
1700 if (IsVolatile)
1702 Args.push_back(IRB.getInt32(Aux));
1703
1705 if (isa<LoadInst>(I))
1706 IID = Order == AtomicOrdering::NotAtomic
1707 ? Intrinsic::amdgcn_raw_ptr_buffer_load
1708 : Intrinsic::amdgcn_raw_ptr_atomic_buffer_load;
1709 else if (isa<StoreInst>(I))
1710 IID = Intrinsic::amdgcn_raw_ptr_buffer_store;
1711 else if (auto *RMW = dyn_cast<AtomicRMWInst>(I)) {
1712 switch (RMW->getOperation()) {
1714 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap;
1715 break;
1716 case AtomicRMWInst::Add:
1717 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_add;
1718 break;
1719 case AtomicRMWInst::Sub:
1720 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub;
1721 break;
1722 case AtomicRMWInst::And:
1723 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_and;
1724 break;
1725 case AtomicRMWInst::Or:
1726 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_or;
1727 break;
1728 case AtomicRMWInst::Xor:
1729 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_xor;
1730 break;
1731 case AtomicRMWInst::Max:
1732 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smax;
1733 break;
1734 case AtomicRMWInst::Min:
1735 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smin;
1736 break;
1738 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umax;
1739 break;
1741 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umin;
1742 break;
1744 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fadd;
1745 break;
1747 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmax;
1748 break;
1750 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmin;
1751 break;
1753 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_cond_sub_u32;
1754 break;
1756 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub_clamp_u32;
1757 break;
1758 case AtomicRMWInst::FSub: {
1760 "atomic floating point subtraction not supported for "
1761 "buffer resources and should've been expanded away");
1762 break;
1763 }
1766 "atomic floating point fmaximum not supported for "
1767 "buffer resources and should've been expanded away");
1768 break;
1769 }
1772 "atomic floating point fminimum not supported for "
1773 "buffer resources and should've been expanded away");
1774 break;
1775 }
1778 "atomic nand not supported for buffer resources and "
1779 "should've been expanded away");
1780 break;
1784 "wrapping increment/decrement not supported for "
1785 "buffer resources and should've been expanded away");
1786 break;
1788 llvm_unreachable("Not sure how we got a bad binop");
1789 }
1790 }
1791
1792 auto *Call = IRB.CreateIntrinsic(IID, Ty, Args);
1793 copyMetadata(Call, I);
1794 setAlign(Call, Alignment, Arg ? 1 : 0);
1795 Call->takeName(I);
1796
1797 insertPostMemOpFence(Order, SSID);
1798 // The "no moving p7 directly" rewrites ensure that this load or store won't
1799 // itself need to be split into parts.
1800 SplitUsers.insert(I);
1801 I->replaceAllUsesWith(Call);
1802 return Call;
1803}
1804
1805PtrParts SplitPtrStructs::visitInstruction(Instruction &I) {
1806 return {nullptr, nullptr};
1807}
1808
1809PtrParts SplitPtrStructs::visitLoadInst(LoadInst &LI) {
1811 return {nullptr, nullptr};
1812 handleMemoryInst(&LI, nullptr, LI.getPointerOperand(), LI.getType(),
1813 LI.getAlign(), LI.getOrdering(), LI.isVolatile(),
1814 LI.getSyncScopeID());
1815 return {nullptr, nullptr};
1816}
1817
1818PtrParts SplitPtrStructs::visitStoreInst(StoreInst &SI) {
1819 if (!isSplitFatPtr(SI.getPointerOperandType()))
1820 return {nullptr, nullptr};
1821 Value *Arg = SI.getValueOperand();
1822 handleMemoryInst(&SI, Arg, SI.getPointerOperand(), Arg->getType(),
1823 SI.getAlign(), SI.getOrdering(), SI.isVolatile(),
1824 SI.getSyncScopeID());
1825 return {nullptr, nullptr};
1826}
1827
1828PtrParts SplitPtrStructs::visitAtomicRMWInst(AtomicRMWInst &AI) {
1830 return {nullptr, nullptr};
1831 Value *Arg = AI.getValOperand();
1832 handleMemoryInst(&AI, Arg, AI.getPointerOperand(), Arg->getType(),
1833 AI.getAlign(), AI.getOrdering(), AI.isVolatile(),
1834 AI.getSyncScopeID());
1835 return {nullptr, nullptr};
1836}
1837
1838// Unlike load, store, and RMW, cmpxchg needs special handling to account
1839// for the boolean argument.
1840PtrParts SplitPtrStructs::visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI) {
1841 Value *Ptr = AI.getPointerOperand();
1842 if (!isSplitFatPtr(Ptr->getType()))
1843 return {nullptr, nullptr};
1844 IRB.SetInsertPoint(&AI);
1845
1846 Type *Ty = AI.getNewValOperand()->getType();
1847 AtomicOrdering Order = AI.getMergedOrdering();
1848 SyncScope::ID SSID = AI.getSyncScopeID();
1849 bool IsNonTemporal = AI.getMetadata(LLVMContext::MD_nontemporal);
1850
1851 auto [Rsrc, Off] = getPtrParts(Ptr);
1852 insertPreMemOpFence(Order, SSID);
1853
1854 uint32_t Aux = 0;
1855 if (IsNonTemporal)
1856 Aux |= AMDGPU::CPol::SLC;
1857 if (AI.isVolatile())
1859 auto *Call =
1860 IRB.CreateIntrinsic(Intrinsic::amdgcn_raw_ptr_buffer_atomic_cmpswap, Ty,
1861 {AI.getNewValOperand(), AI.getCompareOperand(), Rsrc,
1862 Off, IRB.getInt32(0), IRB.getInt32(Aux)});
1863 copyMetadata(Call, &AI);
1864 setAlign(Call, AI.getAlign(), 2);
1865 Call->takeName(&AI);
1866 insertPostMemOpFence(Order, SSID);
1867
1868 Value *Res = PoisonValue::get(AI.getType());
1869 Res = IRB.CreateInsertValue(Res, Call, 0);
1870 if (!AI.isWeak()) {
1871 Value *Succeeded = IRB.CreateICmpEQ(Call, AI.getCompareOperand());
1872 Res = IRB.CreateInsertValue(Res, Succeeded, 1);
1873 }
1874 SplitUsers.insert(&AI);
1875 AI.replaceAllUsesWith(Res);
1876 return {nullptr, nullptr};
1877}
1878
1879PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
1880 using namespace llvm::PatternMatch;
1881 Value *Ptr = GEP.getPointerOperand();
1882 if (!isSplitFatPtr(Ptr->getType()))
1883 return {nullptr, nullptr};
1884 IRB.SetInsertPoint(&GEP);
1885
1886 auto [Rsrc, Off] = getPtrParts(Ptr);
1887 const DataLayout &DL = GEP.getDataLayout();
1888 bool IsNUW = GEP.hasNoUnsignedWrap();
1889 bool IsNUSW = GEP.hasNoUnsignedSignedWrap();
1890
1891 StructType *ResTy = cast<StructType>(GEP.getType());
1892 Type *ResRsrcTy = ResTy->getElementType(0);
1893 VectorType *ResRsrcVecTy = dyn_cast<VectorType>(ResRsrcTy);
1894 bool BroadcastsPtr = ResRsrcVecTy && !isa<VectorType>(Off->getType());
1895
1896 // In order to call emitGEPOffset() and thus not have to reimplement it,
1897 // we need the GEP result to have ptr addrspace(7) type.
1898 Type *FatPtrTy =
1899 ResRsrcTy->getWithNewType(IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER));
1900 GEP.mutateType(FatPtrTy);
1901 Value *OffAccum = emitGEPOffset(&IRB, DL, &GEP);
1902 GEP.mutateType(ResTy);
1903
1904 if (BroadcastsPtr) {
1905 Rsrc = IRB.CreateVectorSplat(ResRsrcVecTy->getElementCount(), Rsrc,
1906 Rsrc->getName());
1907 Off = IRB.CreateVectorSplat(ResRsrcVecTy->getElementCount(), Off,
1908 Off->getName());
1909 }
1910 if (match(OffAccum, m_Zero())) { // Constant-zero offset
1911 SplitUsers.insert(&GEP);
1912 return {Rsrc, Off};
1913 }
1914
1915 bool HasNonNegativeOff = false;
1916 if (auto *CI = dyn_cast<ConstantInt>(OffAccum)) {
1917 HasNonNegativeOff = !CI->isNegative();
1918 }
1919 Value *NewOff;
1920 if (match(Off, m_Zero())) {
1921 NewOff = OffAccum;
1922 } else {
1923 NewOff = IRB.CreateAdd(Off, OffAccum, "",
1924 /*hasNUW=*/IsNUW || (IsNUSW && HasNonNegativeOff),
1925 /*hasNSW=*/false);
1926 }
1927 copyMetadata(NewOff, &GEP);
1928 NewOff->takeName(&GEP);
1929 SplitUsers.insert(&GEP);
1930 return {Rsrc, NewOff};
1931}
1932
1933PtrParts SplitPtrStructs::visitPtrToIntInst(PtrToIntInst &PI) {
1934 Value *Ptr = PI.getPointerOperand();
1935 if (!isSplitFatPtr(Ptr->getType()))
1936 return {nullptr, nullptr};
1937 IRB.SetInsertPoint(&PI);
1938
1939 Type *ResTy = PI.getType();
1940 unsigned Width = ResTy->getScalarSizeInBits();
1941
1942 auto [Rsrc, Off] = getPtrParts(Ptr);
1943 const DataLayout &DL = PI.getDataLayout();
1944 unsigned FatPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
1945
1946 Value *Res;
1947 if (Width <= BufferOffsetWidth) {
1948 Res = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1949 PI.getName() + ".off");
1950 } else {
1951 Value *RsrcInt = IRB.CreatePtrToInt(Rsrc, ResTy, PI.getName() + ".rsrc");
1952 Value *Shl = IRB.CreateShl(
1953 RsrcInt,
1954 ConstantExpr::getIntegerValue(ResTy, APInt(Width, BufferOffsetWidth)),
1955 "", Width >= FatPtrWidth, Width > FatPtrWidth);
1956 Value *OffCast = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1957 PI.getName() + ".off");
1958 Res = IRB.CreateOr(Shl, OffCast);
1959 }
1960
1961 copyMetadata(Res, &PI);
1962 Res->takeName(&PI);
1963 SplitUsers.insert(&PI);
1964 PI.replaceAllUsesWith(Res);
1965 return {nullptr, nullptr};
1966}
1967
1968PtrParts SplitPtrStructs::visitPtrToAddrInst(PtrToAddrInst &PA) {
1969 Value *Ptr = PA.getPointerOperand();
1970 if (!isSplitFatPtr(Ptr->getType()))
1971 return {nullptr, nullptr};
1972 IRB.SetInsertPoint(&PA);
1973
1974 auto [Rsrc, Off] = getPtrParts(Ptr);
1975 Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false);
1976 copyMetadata(Res, &PA);
1977 Res->takeName(&PA);
1978 SplitUsers.insert(&PA);
1979 PA.replaceAllUsesWith(Res);
1980 return {nullptr, nullptr};
1981}
1982
1983PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
1984 if (!isSplitFatPtr(IP.getType()))
1985 return {nullptr, nullptr};
1986 IRB.SetInsertPoint(&IP);
1987 const DataLayout &DL = IP.getDataLayout();
1988 unsigned RsrcPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_RESOURCE);
1989 Value *Int = IP.getOperand(0);
1990 Type *IntTy = Int->getType();
1991 Type *RsrcIntTy = IntTy->getWithNewBitWidth(RsrcPtrWidth);
1992 unsigned Width = IntTy->getScalarSizeInBits();
1993
1994 auto *RetTy = cast<StructType>(IP.getType());
1995 Type *RsrcTy = RetTy->getElementType(0);
1996 Type *OffTy = RetTy->getElementType(1);
1997 Value *RsrcPart = IRB.CreateLShr(
1998 Int,
1999 ConstantExpr::getIntegerValue(IntTy, APInt(Width, BufferOffsetWidth)));
2000 Value *RsrcInt = IRB.CreateIntCast(RsrcPart, RsrcIntTy, /*isSigned=*/false);
2001 Value *Rsrc = IRB.CreateIntToPtr(RsrcInt, RsrcTy, IP.getName() + ".rsrc");
2002 Value *Off =
2003 IRB.CreateIntCast(Int, OffTy, /*IsSigned=*/false, IP.getName() + ".off");
2004
2005 copyMetadata(Rsrc, &IP);
2006 SplitUsers.insert(&IP);
2007 return {Rsrc, Off};
2008}
2009
2010PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
2011 // TODO(krzysz00): handle casts from ptr addrspace(7) to global pointers
2012 // by computing the effective address.
2013 if (!isSplitFatPtr(I.getType()))
2014 return {nullptr, nullptr};
2015 IRB.SetInsertPoint(&I);
2016 Value *In = I.getPointerOperand();
2017 // No-op casts preserve parts
2018 if (In->getType() == I.getType()) {
2019 auto [Rsrc, Off] = getPtrParts(In);
2020 SplitUsers.insert(&I);
2021 return {Rsrc, Off};
2022 }
2023
2024 auto *ResTy = cast<StructType>(I.getType());
2025 Type *RsrcTy = ResTy->getElementType(0);
2026 Type *OffTy = ResTy->getElementType(1);
2027 Value *ZeroOff = Constant::getNullValue(OffTy);
2028
2029 // Special case for null pointers, undef, and poison, which can be created by
2030 // address space propagation.
2031 auto *InConst = dyn_cast<Constant>(In);
2032 if (InConst && InConst->isNullValue()) {
2033 Value *NullRsrc = Constant::getNullValue(RsrcTy);
2034 SplitUsers.insert(&I);
2035 return {NullRsrc, ZeroOff};
2036 }
2037 if (isa<PoisonValue>(In)) {
2038 Value *PoisonRsrc = PoisonValue::get(RsrcTy);
2039 Value *PoisonOff = PoisonValue::get(OffTy);
2040 SplitUsers.insert(&I);
2041 return {PoisonRsrc, PoisonOff};
2042 }
2043 if (isa<UndefValue>(In)) {
2044 Value *UndefRsrc = UndefValue::get(RsrcTy);
2045 Value *UndefOff = UndefValue::get(OffTy);
2046 SplitUsers.insert(&I);
2047 return {UndefRsrc, UndefOff};
2048 }
2049
2050 if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
2052 "only buffer resources (addrspace 8) and null/poison pointers can be "
2053 "cast to buffer fat pointers (addrspace 7)");
2054 SplitUsers.insert(&I);
2055 return {In, ZeroOff};
2056}
2057
2058PtrParts SplitPtrStructs::visitICmpInst(ICmpInst &Cmp) {
2059 Value *Lhs = Cmp.getOperand(0);
2060 if (!isSplitFatPtr(Lhs->getType()))
2061 return {nullptr, nullptr};
2062 Value *Rhs = Cmp.getOperand(1);
2063 IRB.SetInsertPoint(&Cmp);
2064 ICmpInst::Predicate Pred = Cmp.getPredicate();
2065
2066 assert((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
2067 "Pointer comparison is only equal or unequal");
2068 auto [LhsRsrc, LhsOff] = getPtrParts(Lhs);
2069 auto [RhsRsrc, RhsOff] = getPtrParts(Rhs);
2070 Value *Res = IRB.CreateICmp(Pred, LhsOff, RhsOff);
2071 copyMetadata(Res, &Cmp);
2072 Res->takeName(&Cmp);
2073 SplitUsers.insert(&Cmp);
2074 Cmp.replaceAllUsesWith(Res);
2075 return {nullptr, nullptr};
2076}
2077
2078PtrParts SplitPtrStructs::visitFreezeInst(FreezeInst &I) {
2079 if (!isSplitFatPtr(I.getType()))
2080 return {nullptr, nullptr};
2081 IRB.SetInsertPoint(&I);
2082 auto [Rsrc, Off] = getPtrParts(I.getOperand(0));
2083
2084 Value *RsrcRes = IRB.CreateFreeze(Rsrc, I.getName() + ".rsrc");
2085 copyMetadata(RsrcRes, &I);
2086 Value *OffRes = IRB.CreateFreeze(Off, I.getName() + ".off");
2087 copyMetadata(OffRes, &I);
2088 SplitUsers.insert(&I);
2089 return {RsrcRes, OffRes};
2090}
2091
2092PtrParts SplitPtrStructs::visitExtractElementInst(ExtractElementInst &I) {
2093 if (!isSplitFatPtr(I.getType()))
2094 return {nullptr, nullptr};
2095 IRB.SetInsertPoint(&I);
2096 Value *Vec = I.getVectorOperand();
2097 Value *Idx = I.getIndexOperand();
2098 auto [Rsrc, Off] = getPtrParts(Vec);
2099
2100 Value *RsrcRes = IRB.CreateExtractElement(Rsrc, Idx, I.getName() + ".rsrc");
2101 copyMetadata(RsrcRes, &I);
2102 Value *OffRes = IRB.CreateExtractElement(Off, Idx, I.getName() + ".off");
2103 copyMetadata(OffRes, &I);
2104 SplitUsers.insert(&I);
2105 return {RsrcRes, OffRes};
2106}
2107
2108PtrParts SplitPtrStructs::visitInsertElementInst(InsertElementInst &I) {
2109 // The mutated instructions temporarily don't return vectors, and so
2110 // we need the generic getType() here to avoid crashes.
2112 return {nullptr, nullptr};
2113 IRB.SetInsertPoint(&I);
2114 Value *Vec = I.getOperand(0);
2115 Value *Elem = I.getOperand(1);
2116 Value *Idx = I.getOperand(2);
2117 auto [VecRsrc, VecOff] = getPtrParts(Vec);
2118 auto [ElemRsrc, ElemOff] = getPtrParts(Elem);
2119
2120 Value *RsrcRes =
2121 IRB.CreateInsertElement(VecRsrc, ElemRsrc, Idx, I.getName() + ".rsrc");
2122 copyMetadata(RsrcRes, &I);
2123 Value *OffRes =
2124 IRB.CreateInsertElement(VecOff, ElemOff, Idx, I.getName() + ".off");
2125 copyMetadata(OffRes, &I);
2126 SplitUsers.insert(&I);
2127 return {RsrcRes, OffRes};
2128}
2129
2130PtrParts SplitPtrStructs::visitShuffleVectorInst(ShuffleVectorInst &I) {
2131 // Cast is needed for the same reason as insertelement's.
2133 return {nullptr, nullptr};
2134 IRB.SetInsertPoint(&I);
2135
2136 Value *V1 = I.getOperand(0);
2137 Value *V2 = I.getOperand(1);
2138 ArrayRef<int> Mask = I.getShuffleMask();
2139 auto [V1Rsrc, V1Off] = getPtrParts(V1);
2140 auto [V2Rsrc, V2Off] = getPtrParts(V2);
2141
2142 Value *RsrcRes =
2143 IRB.CreateShuffleVector(V1Rsrc, V2Rsrc, Mask, I.getName() + ".rsrc");
2144 copyMetadata(RsrcRes, &I);
2145 Value *OffRes =
2146 IRB.CreateShuffleVector(V1Off, V2Off, Mask, I.getName() + ".off");
2147 copyMetadata(OffRes, &I);
2148 SplitUsers.insert(&I);
2149 return {RsrcRes, OffRes};
2150}
2151
2152PtrParts SplitPtrStructs::visitPHINode(PHINode &PHI) {
2153 if (!isSplitFatPtr(PHI.getType()))
2154 return {nullptr, nullptr};
2155 IRB.SetInsertPoint(*PHI.getInsertionPointAfterDef());
2156 // Phi nodes will be handled in post-processing after we've visited every
2157 // instruction. However, instead of just returning {nullptr, nullptr},
2158 // we explicitly create the temporary extractvalue operations that are our
2159 // temporary results so that they end up at the beginning of the block with
2160 // the PHIs.
2161 Value *TmpRsrc = IRB.CreateExtractValue(&PHI, 0, PHI.getName() + ".rsrc");
2162 Value *TmpOff = IRB.CreateExtractValue(&PHI, 1, PHI.getName() + ".off");
2163 Conditionals.push_back(&PHI);
2164 SplitUsers.insert(&PHI);
2165 return {TmpRsrc, TmpOff};
2166}
2167
2168PtrParts SplitPtrStructs::visitSelectInst(SelectInst &SI) {
2169 if (!isSplitFatPtr(SI.getType()))
2170 return {nullptr, nullptr};
2171 IRB.SetInsertPoint(&SI);
2172
2173 Value *Cond = SI.getCondition();
2174 Value *True = SI.getTrueValue();
2175 Value *False = SI.getFalseValue();
2176 auto [TrueRsrc, TrueOff] = getPtrParts(True);
2177 auto [FalseRsrc, FalseOff] = getPtrParts(False);
2178
2179 Value *RsrcRes =
2180 IRB.CreateSelect(Cond, TrueRsrc, FalseRsrc, SI.getName() + ".rsrc", &SI);
2181 copyMetadata(RsrcRes, &SI);
2182 Conditionals.push_back(&SI);
2183 Value *OffRes =
2184 IRB.CreateSelect(Cond, TrueOff, FalseOff, SI.getName() + ".off", &SI);
2185 copyMetadata(OffRes, &SI);
2186 SplitUsers.insert(&SI);
2187 return {RsrcRes, OffRes};
2188}
2189
2190/// Returns true if this intrinsic needs to be removed when it is
2191/// applied to `ptr addrspace(7)` values. Calls to these intrinsics are
2192/// rewritten into calls to versions of that intrinsic on the resource
2193/// descriptor.
2195 switch (IID) {
2196 default:
2197 return false;
2198 case Intrinsic::amdgcn_make_buffer_rsrc:
2199 case Intrinsic::ptrmask:
2200 case Intrinsic::invariant_start:
2201 case Intrinsic::invariant_end:
2202 case Intrinsic::launder_invariant_group:
2203 case Intrinsic::strip_invariant_group:
2204 case Intrinsic::memcpy:
2205 case Intrinsic::memcpy_inline:
2206 case Intrinsic::memmove:
2207 case Intrinsic::memset:
2208 case Intrinsic::memset_inline:
2209 case Intrinsic::experimental_memset_pattern:
2210 case Intrinsic::amdgcn_load_to_lds:
2211 case Intrinsic::amdgcn_load_async_to_lds:
2212 return true;
2213 }
2214}
2215
2216PtrParts SplitPtrStructs::visitIntrinsicInst(IntrinsicInst &I) {
2217 Intrinsic::ID IID = I.getIntrinsicID();
2218 switch (IID) {
2219 default:
2220 break;
2221 case Intrinsic::amdgcn_make_buffer_rsrc: {
2222 if (!isSplitFatPtr(I.getType()))
2223 return {nullptr, nullptr};
2224 Value *Base = I.getArgOperand(0);
2225 Value *Stride = I.getArgOperand(1);
2226 Value *NumRecords = I.getArgOperand(2);
2227 Value *Flags = I.getArgOperand(3);
2228 auto *SplitType = cast<StructType>(I.getType());
2229 Type *RsrcType = SplitType->getElementType(0);
2230 Type *OffType = SplitType->getElementType(1);
2231 IRB.SetInsertPoint(&I);
2232 Value *Rsrc = IRB.CreateIntrinsic(IID, {RsrcType, Base->getType()},
2233 {Base, Stride, NumRecords, Flags});
2234 copyMetadata(Rsrc, &I);
2235 Rsrc->takeName(&I);
2236 Value *Zero = Constant::getNullValue(OffType);
2237 SplitUsers.insert(&I);
2238 return {Rsrc, Zero};
2239 }
2240 case Intrinsic::ptrmask: {
2241 Value *Ptr = I.getArgOperand(0);
2242 if (!isSplitFatPtr(Ptr->getType()))
2243 return {nullptr, nullptr};
2244 Value *Mask = I.getArgOperand(1);
2245 IRB.SetInsertPoint(&I);
2246 auto [Rsrc, Off] = getPtrParts(Ptr);
2247 if (Mask->getType() != Off->getType())
2248 reportFatalUsageError("offset width is not equal to index width of fat "
2249 "pointer (data layout not set up correctly?)");
2250 Value *OffRes = IRB.CreateAnd(Off, Mask, I.getName() + ".off");
2251 copyMetadata(OffRes, &I);
2252 SplitUsers.insert(&I);
2253 return {Rsrc, OffRes};
2254 }
2255 // Pointer annotation intrinsics that, given their object-wide nature
2256 // operate on the resource part.
2257 case Intrinsic::invariant_start: {
2258 Value *Ptr = I.getArgOperand(1);
2259 if (!isSplitFatPtr(Ptr->getType()))
2260 return {nullptr, nullptr};
2261 IRB.SetInsertPoint(&I);
2262 auto [Rsrc, Off] = getPtrParts(Ptr);
2263 Type *NewTy = PointerType::get(I.getContext(), AMDGPUAS::BUFFER_RESOURCE);
2264 auto *NewRsrc = IRB.CreateIntrinsic(IID, {NewTy}, {I.getOperand(0), Rsrc});
2265 copyMetadata(NewRsrc, &I);
2266 NewRsrc->takeName(&I);
2267 SplitUsers.insert(&I);
2268 I.replaceAllUsesWith(NewRsrc);
2269 return {nullptr, nullptr};
2270 }
2271 case Intrinsic::invariant_end: {
2272 Value *RealPtr = I.getArgOperand(2);
2273 if (!isSplitFatPtr(RealPtr->getType()))
2274 return {nullptr, nullptr};
2275 IRB.SetInsertPoint(&I);
2276 Value *RealRsrc = getPtrParts(RealPtr).first;
2277 Value *InvPtr = I.getArgOperand(0);
2278 Value *Size = I.getArgOperand(1);
2279 Value *NewRsrc = IRB.CreateIntrinsic(IID, {RealRsrc->getType()},
2280 {InvPtr, Size, RealRsrc});
2281 copyMetadata(NewRsrc, &I);
2282 NewRsrc->takeName(&I);
2283 SplitUsers.insert(&I);
2284 I.replaceAllUsesWith(NewRsrc);
2285 return {nullptr, nullptr};
2286 }
2287 case Intrinsic::launder_invariant_group:
2288 case Intrinsic::strip_invariant_group: {
2289 Value *Ptr = I.getArgOperand(0);
2290 if (!isSplitFatPtr(Ptr->getType()))
2291 return {nullptr, nullptr};
2292 IRB.SetInsertPoint(&I);
2293 auto [Rsrc, Off] = getPtrParts(Ptr);
2294 Value *NewRsrc = IRB.CreateIntrinsic(IID, {Rsrc->getType()}, {Rsrc});
2295 copyMetadata(NewRsrc, &I);
2296 NewRsrc->takeName(&I);
2297 SplitUsers.insert(&I);
2298 return {NewRsrc, Off};
2299 }
2300 case Intrinsic::amdgcn_load_to_lds:
2301 case Intrinsic::amdgcn_load_async_to_lds: {
2302 Value *Ptr = I.getArgOperand(0);
2303 if (!isSplitFatPtr(Ptr->getType()))
2304 return {nullptr, nullptr};
2305 IRB.SetInsertPoint(&I);
2306 auto [Rsrc, Off] = getPtrParts(Ptr);
2307 Value *LDSPtr = I.getArgOperand(1);
2308 Value *LoadSize = I.getArgOperand(2);
2309 Value *ImmOff = I.getArgOperand(3);
2310 Value *Aux = I.getArgOperand(4);
2311 Value *SOffset = IRB.getInt32(0);
2312 Intrinsic::ID NewIntr =
2313 IID == Intrinsic::amdgcn_load_to_lds
2314 ? Intrinsic::amdgcn_raw_ptr_buffer_load_lds
2315 : Intrinsic::amdgcn_raw_ptr_buffer_load_async_lds;
2316 Instruction *NewLoad = IRB.CreateIntrinsic(
2317 NewIntr, {}, {Rsrc, LDSPtr, LoadSize, Off, SOffset, ImmOff, Aux});
2318 copyMetadata(NewLoad, &I);
2319 SplitUsers.insert(&I);
2320 I.replaceAllUsesWith(NewLoad);
2321 return {nullptr, nullptr};
2322 }
2323 }
2324 return {nullptr, nullptr};
2325}
2326
2327void SplitPtrStructs::processFunction(Function &F) {
2328 ST = &TM->getSubtarget<GCNSubtarget>(F);
2329 SmallVector<Instruction *, 0> Originals(
2331 LLVM_DEBUG(dbgs() << "Splitting pointer structs in function: " << F.getName()
2332 << "\n");
2333 for (Instruction *I : Originals) {
2334 // In some cases, instruction order doesn't reflect program order,
2335 // so the visit() call will have already visited coertain instructions
2336 // by the time this loop gets to them. Avoid re-visiting these so as to,
2337 // for example, avoid processing the same conditional twice.
2338 if (SplitUsers.contains(I))
2339 continue;
2340 auto [Rsrc, Off] = visit(I);
2341 assert(((Rsrc && Off) || (!Rsrc && !Off)) &&
2342 "Can't have a resource but no offset");
2343 if (Rsrc)
2344 RsrcParts[I] = Rsrc;
2345 if (Off)
2346 OffParts[I] = Off;
2347 }
2348 processConditionals();
2349 killAndReplaceSplitInstructions(Originals);
2350
2351 // Clean up after ourselves to save on memory.
2352 RsrcParts.clear();
2353 OffParts.clear();
2354 SplitUsers.clear();
2355 Conditionals.clear();
2356 ConditionalTemps.clear();
2357}
2358
2359namespace {
2360class AMDGPULowerBufferFatPointers : public ModulePass {
2361public:
2362 static char ID;
2363
2364 AMDGPULowerBufferFatPointers() : ModulePass(ID) {}
2365
2366 bool run(Module &M, const TargetMachine &TM);
2367 bool runOnModule(Module &M) override;
2368
2369 void getAnalysisUsage(AnalysisUsage &AU) const override;
2370};
2371} // namespace
2372
2373/// Returns true if there are values that have a buffer fat pointer in them,
2374/// which means we'll need to perform rewrites on this function. As a side
2375/// effect, this will populate the type remapping cache.
2377 BufferFatPtrToStructTypeMap *TypeMap) {
2378 bool HasFatPointers = false;
2379 for (const BasicBlock &BB : F)
2380 for (const Instruction &I : BB) {
2381 HasFatPointers |= (I.getType() != TypeMap->remapType(I.getType()));
2382 // Catch null pointer constants in loads, stores, etc.
2383 for (const Value *V : I.operand_values())
2384 HasFatPointers |= (V->getType() != TypeMap->remapType(V->getType()));
2385 }
2386 return HasFatPointers;
2387}
2388
2390 BufferFatPtrToStructTypeMap *TypeMap) {
2391 Type *Ty = F.getFunctionType();
2392 return Ty != TypeMap->remapType(Ty);
2393}
2394
2395/// Move the body of `OldF` into a new function, returning it.
2397 ValueToValueMapTy &CloneMap) {
2398 bool IsIntrinsic = OldF->isIntrinsic();
2399 Function *NewF =
2400 Function::Create(NewTy, OldF->getLinkage(), OldF->getAddressSpace());
2401 NewF->copyAttributesFrom(OldF);
2402 NewF->copyMetadata(OldF, 0);
2403 NewF->takeName(OldF);
2404 NewF->updateAfterNameChange();
2406 OldF->getParent()->getFunctionList().insertAfter(OldF->getIterator(), NewF);
2407
2408 while (!OldF->empty()) {
2409 BasicBlock *BB = &OldF->front();
2410 BB->removeFromParent();
2411 BB->insertInto(NewF);
2412 CloneMap[BB] = BB;
2413 for (Instruction &I : *BB) {
2414 CloneMap[&I] = &I;
2415 }
2416 }
2417
2419 AttributeList OldAttrs = OldF->getAttributes();
2420
2421 for (auto [I, OldArg, NewArg] : enumerate(OldF->args(), NewF->args())) {
2422 CloneMap[&NewArg] = &OldArg;
2423 NewArg.takeName(&OldArg);
2424 Type *OldArgTy = OldArg.getType(), *NewArgTy = NewArg.getType();
2425 // Temporarily mutate type of `NewArg` to allow RAUW to work.
2426 NewArg.mutateType(OldArgTy);
2427 OldArg.replaceAllUsesWith(&NewArg);
2428 NewArg.mutateType(NewArgTy);
2429
2430 AttributeSet ArgAttr = OldAttrs.getParamAttrs(I);
2431 // Intrinsics get their attributes fixed later.
2432 if (OldArgTy != NewArgTy && !IsIntrinsic)
2433 ArgAttr = ArgAttr.removeAttributes(
2434 NewF->getContext(),
2435 AttributeFuncs::typeIncompatible(NewArgTy, ArgAttr));
2436 ArgAttrs.push_back(ArgAttr);
2437 }
2438 AttributeSet RetAttrs = OldAttrs.getRetAttrs();
2439 if (OldF->getReturnType() != NewF->getReturnType() && !IsIntrinsic)
2440 RetAttrs = RetAttrs.removeAttributes(
2441 NewF->getContext(),
2442 AttributeFuncs::typeIncompatible(NewF->getReturnType(), RetAttrs));
2443 NewF->setAttributes(AttributeList::get(
2444 NewF->getContext(), OldAttrs.getFnAttrs(), RetAttrs, ArgAttrs));
2445 return NewF;
2446}
2447
2449 for (Argument &A : F->args())
2450 CloneMap[&A] = &A;
2451 for (BasicBlock &BB : *F) {
2452 CloneMap[&BB] = &BB;
2453 for (Instruction &I : BB)
2454 CloneMap[&I] = &I;
2455 }
2456}
2457
2458bool AMDGPULowerBufferFatPointers::run(Module &M, const TargetMachine &TM) {
2459 bool Changed = false;
2460 const DataLayout &DL = M.getDataLayout();
2461 // Record the functions which need to be remapped.
2462 // The second element of the pair indicates whether the function has to have
2463 // its arguments or return types adjusted.
2465
2466 LLVMContext &Ctx = M.getContext();
2467
2468 BufferFatPtrToStructTypeMap StructTM(DL);
2469 BufferFatPtrToIntTypeMap IntTM(DL);
2470 for (const GlobalVariable &GV : M.globals()) {
2471 if (GV.getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
2472 // FIXME: Use DiagnosticInfo unsupported but it requires a Function
2473 Ctx.emitError("global variables with a buffer fat pointer address "
2474 "space (7) are not supported");
2475 continue;
2476 }
2477
2478 Type *VT = GV.getValueType();
2479 if (VT != StructTM.remapType(VT)) {
2480 // FIXME: Use DiagnosticInfo unsupported but it requires a Function
2481 Ctx.emitError("global variables that contain buffer fat pointers "
2482 "(address space 7 pointers) are unsupported. Use "
2483 "buffer resource pointers (address space 8) instead");
2484 continue;
2485 }
2486 }
2487
2488 {
2489 // Collect all constant exprs and aggregates referenced by any function.
2491 for (Function &F : M.functions())
2492 for (Instruction &I : instructions(F))
2493 for (Value *Op : I.operands())
2495 Worklist.push_back(cast<Constant>(Op));
2496
2497 // Recursively look for any referenced buffer pointer constants.
2498 SmallPtrSet<Constant *, 8> Visited;
2499 SetVector<Constant *> BufferFatPtrConsts;
2500 while (!Worklist.empty()) {
2501 Constant *C = Worklist.pop_back_val();
2502 if (!Visited.insert(C).second)
2503 continue;
2504 if (isBufferFatPtrOrVector(C->getType()))
2505 BufferFatPtrConsts.insert(C);
2506 for (Value *Op : C->operands())
2508 Worklist.push_back(cast<Constant>(Op));
2509 }
2510
2511 // Expand all constant expressions using fat buffer pointers to
2512 // instructions.
2514 BufferFatPtrConsts.getArrayRef(), /*RestrictToFunc=*/nullptr,
2515 /*RemoveDeadConstants=*/false, /*IncludeSelf=*/true);
2516 }
2517
2518 StoreFatPtrsAsIntsAndExpandMemcpyVisitor MemOpsRewrite(&IntTM, DL,
2519 M.getContext(), &TM);
2520 LegalizeBufferContentTypesVisitor BufferContentsTypeRewrite(DL,
2521 M.getContext());
2522 for (Function &F : M.functions()) {
2523 bool InterfaceChange = hasFatPointerInterface(F, &StructTM);
2524 bool BodyChanges = containsBufferFatPointers(F, &StructTM);
2525 Changed |= MemOpsRewrite.processFunction(F);
2526 if (InterfaceChange || BodyChanges) {
2527 NeedsRemap.push_back(std::make_pair(&F, InterfaceChange));
2528 Changed |= BufferContentsTypeRewrite.processFunction(F);
2529 }
2530 }
2531 if (NeedsRemap.empty())
2532 return Changed;
2533
2534 SmallVector<Function *> NeedsPostProcess;
2535 SmallVector<Function *> Intrinsics;
2536 // Keep one big map so as to memoize constants across functions.
2537 ValueToValueMapTy CloneMap;
2538 FatPtrConstMaterializer Materializer(&StructTM, CloneMap);
2539
2540 ValueMapper LowerInFuncs(CloneMap, RF_None, &StructTM, &Materializer);
2541 for (auto [F, InterfaceChange] : NeedsRemap) {
2542 Function *NewF = F;
2543 if (InterfaceChange)
2545 F, cast<FunctionType>(StructTM.remapType(F->getFunctionType())),
2546 CloneMap);
2547 else
2548 makeCloneInPraceMap(F, CloneMap);
2549 LowerInFuncs.remapFunction(*NewF);
2550 if (NewF->isIntrinsic())
2551 Intrinsics.push_back(NewF);
2552 else
2553 NeedsPostProcess.push_back(NewF);
2554 if (InterfaceChange) {
2555 F->replaceAllUsesWith(NewF);
2556 F->eraseFromParent();
2557 }
2558 Changed = true;
2559 }
2560 StructTM.clear();
2561 IntTM.clear();
2562 CloneMap.clear();
2563
2564 SplitPtrStructs Splitter(DL, M.getContext(), &TM);
2565 for (Function *F : NeedsPostProcess)
2566 Splitter.processFunction(*F);
2567 for (Function *F : Intrinsics) {
2568 // use_empty() can also occur with cases like masked load, which will
2569 // have been rewritten out of the module by now but not erased.
2570 if (F->use_empty() || isRemovablePointerIntrinsic(F->getIntrinsicID())) {
2571 F->eraseFromParent();
2572 } else {
2573 std::optional<Function *> NewF = Intrinsic::remangleIntrinsicFunction(F);
2574 if (NewF)
2575 F->replaceAllUsesWith(*NewF);
2576 }
2577 }
2578 return Changed;
2579}
2580
2581bool AMDGPULowerBufferFatPointers::runOnModule(Module &M) {
2582 TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
2583 const TargetMachine &TM = TPC.getTM<TargetMachine>();
2584 return run(M, TM);
2585}
2586
2587char AMDGPULowerBufferFatPointers::ID = 0;
2588
2589char &llvm::AMDGPULowerBufferFatPointersID = AMDGPULowerBufferFatPointers::ID;
2590
2591void AMDGPULowerBufferFatPointers::getAnalysisUsage(AnalysisUsage &AU) const {
2593}
2594
2595#define PASS_DESC "Lower buffer fat pointer operations to buffer resources"
2596INITIALIZE_PASS_BEGIN(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC,
2597 false, false)
2599INITIALIZE_PASS_END(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC, false,
2600 false)
2601#undef PASS_DESC
2602
2604 return new AMDGPULowerBufferFatPointers();
2605}
2606
2609 return AMDGPULowerBufferFatPointers().run(M, TM) ? PreservedAnalyses::none()
2611}
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
AMDGPU address space definition.
static Function * moveFunctionAdaptingType(Function *OldF, FunctionType *NewTy, ValueToValueMapTy &CloneMap)
Move the body of OldF into a new function, returning it.
static void makeCloneInPraceMap(Function *F, ValueToValueMapTy &CloneMap)
static bool isBufferFatPtrOrVector(Type *Ty)
static bool isSplitFatPtr(Type *Ty)
std::pair< Value *, Value * > PtrParts
static bool hasFatPointerInterface(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
static bool isRemovablePointerIntrinsic(Intrinsic::ID IID)
Returns true if this intrinsic needs to be removed when it is applied to ptr addrspace(7) values.
static bool containsBufferFatPointers(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
Returns true if there are values that have a buffer fat pointer in them, which means we'll need to pe...
static Value * rsrcPartRoot(Value *V)
Returns the instruction that defines the resource part of the value V.
static constexpr unsigned BufferOffsetWidth
static bool isBufferFatPtrConst(Constant *C)
static std::pair< Constant *, Constant * > splitLoweredFatBufferConst(Constant *C)
Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered buffer fat pointer const...
Rewrite undef for PHI
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Expand Atomic instructions
Atomic ordering constants.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
AMD GCN specific subclass of TargetSubtarget.
#define DEBUG_TYPE
Hexagon Common GEP
static const T * Find(StringRef S, ArrayRef< T > A)
Find KV in array using binary search.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
This file contains the declarations for metadata subclasses.
#define T
static bool processFunction(Function &F, NVPTXTargetMachine &TM)
uint64_t IntrinsicInst * II
OptimizedStructLayoutField Field
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition PassSupport.h:42
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition PassSupport.h:44
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition PassSupport.h:39
const SmallVectorImpl< MachineOperand > & Cond
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file defines generic set operations that may be used on set's of different types,...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Target-Independent Code Generator Pass Configuration Options pass.
This pass exposes codegen information to IR-level passes.
This class represents a conversion between pointers from one address space to another.
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
This class represents an incoming formal argument to a Function.
Definition Argument.h:32
An instruction that atomically checks whether a specified value is in a memory location,...
AtomicOrdering getMergedOrdering() const
Returns a single ordering which is at least as strong as both the success and failure orderings for t...
bool isVolatile() const
Return true if this is a cmpxchg from a volatile memory location.
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
bool isWeak() const
Return true if this cmpxchg may spuriously fail.
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this cmpxchg instruction.
an instruction that atomically reads a memory location, combines it with another value,...
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
bool isVolatile() const
Return true if this is a RMW on a volatile memory location.
@ Add
*p = old + v
@ FAdd
*p = old + v
@ USubCond
Subtract only if no unsigned overflow.
@ FMinimum
*p = minimum(old, v) minimum matches the behavior of llvm.minimum.
@ Min
*p = old <signed v ? old : v
@ Sub
*p = old - v
@ And
*p = old & v
@ Xor
*p = old ^ v
@ USubSat
*p = usub.sat(old, v) usub.sat matches the behavior of llvm.usub.sat.
@ FMaximum
*p = maximum(old, v) maximum matches the behavior of llvm.maximum.
@ FSub
*p = old - v
@ UIncWrap
Increment one up to a maximum value.
@ Max
*p = old >signed v ? old : v
@ UMin
*p = old <unsigned v ? old : v
@ FMin
*p = minnum(old, v) minnum matches the behavior of llvm.minnum.
@ UMax
*p = old >unsigned v ? old : v
@ FMax
*p = maxnum(old, v) maxnum matches the behavior of llvm.maxnum.
@ UDecWrap
Decrement one until a minimum value or zero.
@ Nand
*p = ~(old & v)
Value * getPointerOperand()
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this rmw instruction.
AtomicOrdering getOrdering() const
Returns the ordering constraint of this rmw instruction.
This class holds the attributes for a particular argument, parameter, function, or return value.
Definition Attributes.h:407
LLVM_ABI AttributeSet removeAttributes(LLVMContext &C, const AttributeMask &AttrsToRemove) const
Remove the specified attributes from this set.
LLVM Basic Block Representation.
Definition BasicBlock.h:62
LLVM_ABI void removeFromParent()
Unlink 'this' from the containing function, but do not delete it.
LLVM_ABI void insertInto(Function *Parent, BasicBlock *InsertBefore=nullptr)
Insert unlinked basic block into a function.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
This class represents a function call, abstracting a target machine's calling convention.
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
static LLVM_ABI std::optional< DIExpression * > createFragmentExpression(const DIExpression *Expr, unsigned OffsetInBits, unsigned SizeInBits)
Create a DIExpression to describe one part of an aggregate variable that is fragmented across multipl...
A parsed version of the target data layout string in and methods for querying it.
Definition DataLayout.h:64
LLVM_ABI void insertBefore(DbgRecord *InsertBefore)
LLVM_ABI void eraseFromParent()
LLVM_ABI void replaceVariableLocationOp(Value *OldValue, Value *NewValue, bool AllowEmpty=false)
void setExpression(DIExpression *NewExpr)
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:178
iterator end()
Definition DenseMap.h:81
Implements a dense probed hash-table based set.
Definition DenseSet.h:279
This instruction extracts a single (scalar) element from a VectorType value.
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
This class represents a freeze function that returns random concrete value if an operand is either a ...
static Function * Create(FunctionType *Ty, LinkageTypes Linkage, unsigned AddrSpace, const Twine &N="", Module *M=nullptr)
Definition Function.h:168
bool empty() const
Definition Function.h:859
const BasicBlock & front() const
Definition Function.h:860
iterator_range< arg_iterator > args()
Definition Function.h:892
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:354
bool isIntrinsic() const
isIntrinsic - Returns true if the function's name starts with "llvm.".
Definition Function.h:251
void setAttributes(AttributeList Attrs)
Set the attribute list for this Function.
Definition Function.h:357
LLVMContext & getContext() const
getContext - Return a reference to the LLVMContext associated with this function.
Definition Function.cpp:358
void updateAfterNameChange()
Update internal caches that depend on the function name (such as the intrinsic ID and libcall cache).
Definition Function.cpp:923
Type * getReturnType() const
Returns the type of the ret val.
Definition Function.h:216
void copyAttributesFrom(const Function *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a Function) from the ...
Definition Function.cpp:844
static GEPNoWrapFlags noUnsignedWrap()
an instruction for type-safe pointer arithmetic to access elements of arrays and structs
LLVM_ABI void copyMetadata(const GlobalObject *Src, unsigned Offset)
Copy metadata from Src, adjusting offsets by Offset.
LinkageTypes getLinkage() const
void setDLLStorageClass(DLLStorageClassTypes C)
unsigned getAddressSpace() const
Module * getParent()
Get the module that this global value is contained inside of...
DLLStorageClassTypes getDLLStorageClass() const
This instruction compares its operands according to the predicate given to the constructor.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2787
This instruction inserts a single (scalar) element into a VectorType value.
InstSimplifyFolder - Use InstructionSimplify to fold operations to existing values.
Base class for instruction visitors.
Definition InstVisitor.h:78
LLVM_ABI Instruction * clone() const
Create a copy of 'this' instruction that is identical in all ways except the following:
LLVM_ABI void setAAMetadata(const AAMDNodes &N)
Sets the AA metadata on this instruction from the AAMDNodes structure.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
LLVM_ABI const Function * getFunction() const
Return the function this instruction belongs to.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI AAMDNodes getAAMetadata() const
Returns the AA metadata for this instruction.
LLVM_ABI const DataLayout & getDataLayout() const
Get the data layout of the module this instruction belongs to.
This class represents a cast from an integer to a pointer.
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
A wrapper class for inspecting calls to intrinsic functions.
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
LLVM_ABI void emitError(const Instruction *I, const Twine &ErrorStr)
emitError - Emit an error message to the currently installed error handler with optional location inf...
An instruction for reading from memory.
unsigned getPointerAddressSpace() const
Returns the address space of the pointer operand.
Value * getPointerOperand()
bool isVolatile() const
Return true if this is a load from a volatile memory location.
void setAtomic(AtomicOrdering Ordering, SyncScope::ID SSID=SyncScope::System)
Sets the ordering constraint and the synchronization scope ID of this load instruction.
AtomicOrdering getOrdering() const
Returns the ordering constraint of this load instruction.
Type * getPointerOperandType() const
void setVolatile(bool V)
Specify whether this is a volatile load or not.
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this load instruction.
Align getAlign() const
Return the alignment of the access that is being performed.
unsigned getDestAddressSpace() const
unsigned getSourceAddressSpace() const
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition Pass.h:255
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
const FunctionListType & getFunctionList() const
Get the Module's list of functions (constant).
Definition Module.h:596
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
This class represents a cast from a pointer to an address (non-capturing ptrtoint).
Value * getPointerOperand()
Gets the pointer operand.
This class represents a cast from a pointer to an integer.
Value * getPointerOperand()
Gets the pointer operand.
This class represents the LLVM 'select' instruction.
ArrayRef< value_type > getArrayRef() const
Definition SetVector.h:91
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
This instruction constructs a fixed permutation of two input vectors.
A templated base class for SmallPtrSet which provides the typesafe interface that is common across al...
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
An instruction for storing to memory.
Align getAlign() const
Value * getValueOperand()
Value * getPointerOperand()
MutableArrayRef< TypeSize > getMemberOffsets()
Definition DataLayout.h:746
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition Type.cpp:619
bool isLiteral() const
Return true if this type is uniqued by structural equivalence, false if it is a struct definition.
Type * getElementType(unsigned N) const
Primary interface to the complete machine description for the target machine.
const STC & getSubtarget(const Function &F) const
This method returns a pointer to the specified type of TargetSubtargetInfo.
virtual TargetTransformInfo getTargetTransformInfo(const Function &F) const
Return a TargetTransformInfo for a given function.
Target-Independent Code Generator Pass Configuration Options.
TMC & getTM() const
Get the right type of TargetMachine for this target.
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
Type * getArrayElementType() const
Definition Type.h:408
ArrayRef< Type * > subtypes() const
Definition Type.h:365
bool isSingleValueType() const
Return true if the type is a valid type for a register in codegen.
Definition Type.h:296
unsigned getNumContainedTypes() const
Return the number of types in the derived type.
Definition Type.h:387
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI Type * getWithNewBitWidth(unsigned NewBitWidth) const
Given an integer or vector type, change the lane bitwidth to NewBitwidth, whilst keeping the old numb...
LLVM_ABI Type * getWithNewType(Type *EltTy) const
Given vector type, change the element type, whilst keeping the old number of elements.
LLVMContext & getContext() const
Return the LLVMContext in which this type was uniqued.
Definition Type.h:128
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
Type * getContainedType(unsigned i) const
This method is used to implement the type iterator (defined at the end of the file).
Definition Type.h:381
static LLVM_ABI UndefValue * get(Type *T)
Static factory methods - Return an 'undef' object of the specified type.
A Use represents the edge between a Value definition and its users.
Definition Use.h:35
void setOperand(unsigned i, Value *Val)
Definition User.h:212
Value * getOperand(unsigned i) const
Definition User.h:207
static LLVM_ABI ValueAsMetadata * get(Value *V)
Definition Metadata.cpp:509
This is a class that can be implemented by clients to remap types when cloning constants and instruct...
Definition ValueMapper.h:45
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
iterator find(const KeyT &Val)
Definition ValueMap.h:160
iterator end()
Definition ValueMap.h:139
ValueMapIteratorImpl< MapT, const Value *, false > iterator
Definition ValueMap.h:135
LLVM_ABI Constant * mapConstant(const Constant &C)
LLVM_ABI Value * mapValue(const Value &V)
LLVM Value Representation.
Definition Value.h:75
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void replaceAllUsesWith(Value *V)
Change all uses of this to point to a new Value.
Definition Value.cpp:553
LLVMContext & getContext() const
All values hold a context through their type.
Definition Value.h:259
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
LLVM_ABI void takeName(Value *V)
Transfer the name from V to this value.
Definition Value.cpp:403
std::pair< iterator, bool > insert(const ValueT &V)
Definition DenseSet.h:202
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition DenseSet.h:175
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
self_iterator getIterator()
Definition ilist_node.h:123
iterator insertAfter(iterator where, pointer New)
Definition ilist.h:174
CallInst * Call
Changed
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ BUFFER_FAT_POINTER
Address space for 160-bit buffer fat pointers.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
constexpr char Args[]
Key for Kernel::Metadata::mArgs.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ Entry
Definition COFF.h:862
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
LLVM_ABI std::optional< Function * > remangleIntrinsicFunction(Function *F)
bool match(Val *V, const Pattern &P)
is_zero m_Zero()
Match any null constant or a vector with all elements equal to 0.
SmallVector< DbgVariableRecord * > getDVRAssignmentMarkers(const Instruction *Inst)
Return a range of dbg_assign records for which Inst performs the assignment they encode.
Definition DebugInfo.h:203
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
Definition Types.h:26
@ Offset
Definition DWP.cpp:532
@ Length
Definition DWP.cpp:532
detail::zippy< detail::zip_shortest, T, U, Args... > zip(T &&t, U &&u, Args &&...args)
zip iterator for two or more iteratable types.
Definition STLExtras.h:831
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
LLVM_ABI void findDbgValues(Value *V, SmallVectorImpl< DbgVariableRecord * > &DbgVariableRecords)
Finds the dbg.values describing a value.
ModulePass * createAMDGPULowerBufferFatPointersPass()
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2554
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
LLVM_ABI void expandMemSetPatternAsLoop(MemSetPatternInst *MemSet)
Expand MemSetPattern as a loop. MemSet is not deleted.
LLVM_ABI void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source)
Copy the metadata from the source instruction to the destination (the replacement for the source inst...
Definition Local.cpp:3125
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:634
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts, Function *RestrictToFunc=nullptr, bool RemoveDeadConstants=true, bool IncludeSelf=false)
Replace constant expressions users of the given constants with instructions.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1746
LLVM_ABI Value * emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL, User *GEP, bool NoAssumptions=false)
Given a getelementptr instruction/constantexpr, emit the code necessary to compute the offset from th...
Definition Local.cpp:22
constexpr bool isPowerOf2_32(uint32_t Value)
Return true if the argument is a power of two > 0.
Definition MathExtras.h:279
@ RF_None
Definition ValueMapper.h:75
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
SmallVector< ValueTypeFromRangeType< R >, Size > to_vector(R &&Range)
Given a range of type R, iterate the entire range and return a SmallVector with elements of the vecto...
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
MutableArrayRef(T &OneElt) -> MutableArrayRef< T >
AtomicOrdering
Atomic ordering for LLVM's memory model.
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
DWARFExpression::Operation Op
S1Ty set_difference(const S1Ty &S1, const S2Ty &S2)
set_difference(A, B) - Return A - B
ArrayRef(const T &OneElt) -> ArrayRef< T >
ValueMap< const Value *, WeakTrackingVH > ValueToValueMapTy
LLVM_ABI void expandMemSetAsLoop(MemSetInst *MemSet, const TargetTransformInfo *TTI=nullptr)
Expand MemSet as a loop.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< pointer_iterator< WrappedIteratorT > > make_pointer_range(RangeT &&Range)
Definition iterator.h:368
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition Alignment.h:201
LLVM_ABI void expandMemCpyAsLoop(MemCpyInst *MemCpy, const TargetTransformInfo &TTI, ScalarEvolution *SE=nullptr)
Expand MemCpy as a loop. MemCpy is not deleted.
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition Error.cpp:177
LLVM_ABI AAMDNodes adjustForAccess(unsigned AccessSize)
Create a new AAMDNode for accessing AccessSize bytes of this AAMDNode.
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39