Go to the documentation of this file.
21 IsModuleEntryFunction(
23 NoSignedZerosFPMath(MF.getTarget().
Options.NoSignedZerosFPMath) {
30 Attribute MemBoundAttr =
F.getFnAttribute(
"amdgpu-memory-bound");
33 Attribute WaveLimitAttr =
F.getFnAttribute(
"amdgpu-wave-limiter");
38 StringRef S =
F.getFnAttribute(
"amdgpu-gds-size").getValueAsString();
52 auto Entry = LocalMemoryObjects.insert(std::make_pair(&GV, 0));
54 return Entry.first->second;
73 "expected region address space");
82 Entry.first->second = Offset;
88 return F.hasFnAttribute(
"amdgpu-elide-module-lds");
99 "Module LDS expected to be allocated before other LDS");
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV)
This is an optimization pass for GlobalISel generic memory operations.
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium M
uint64_t ExplicitKernArgSize
A parsed version of the target data layout string in and methods for querying it.
static Function * getFunction(Constant *C)
MaybeAlign getAlign() const
Returns the alignment of the given variable or function.
AMDGPUMachineFunction(const MachineFunction &MF)
uint32_t LDSSize
Number of bytes in the LDS that are being used.
bool getValueAsBool() const
Return the attribute's value as a boolean.
static const AMDGPUSubtarget & get(const MachineFunction &MF)
@ LOCAL_ADDRESS
Address space for local memory.
const char LLVMTargetMachineRef LLVMPassBuilderOptionsRef Options
This struct is a compact representation of a valid (non-zero power of two) alignment.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
bool isEntryFunctionCC(CallingConv::ID CC)
void allocateModuleLDSGlobal(const Function &F)
Analyzes if a function potentially memory bound and if a kernel kernel may benefit from limiting numb...
static bool canElideModuleLDS(const Function &F)
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
A Module instance is used to store all the information related to an LLVM module.
StringRef - Represent a constant reference to a string, i.e.
bool isModuleEntryFunctionCC(CallingConv::ID CC)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
add sub stmia L5 ldr r0 bl L_printf $stub Instead of a and a wouldn t it be better to do three moves *Return an aggregate type is even return S
uint32_t StaticLDSSize
Number of bytes in the LDS allocated statically.
@ AMDGPU_KERNEL
Calling convention for AMDGPU code object kernels.
Function & getFunction()
Return the LLVM function that this machine code represents.
bool isModuleEntryFunction() const
@ REGION_ADDRESS
Address space for region memory. (GDS)
unsigned getAddressSpace() const
@ SPIR_KERNEL
SPIR_KERNEL - Calling convention for SPIR kernel functions.
Type * getValueType() const
Align DynLDSAlign
Align for dynamic shared memory if any.
void setDynLDSAlign(const DataLayout &DL, const GlobalVariable &GV)