LLVM  16.0.0git
Macros | Functions | Variables
SIWholeQuadMode.cpp File Reference
#include "AMDGPU.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/InitializePasses.h"
#include "llvm/Support/raw_ostream.h"
Include dependency graph for SIWholeQuadMode.cpp:

Go to the source code of this file.


#define DEBUG_TYPE   "si-wqm"


 INITIALIZE_PASS_BEGIN (SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false, false) INITIALIZE_PASS_END(SIWholeQuadMode


SI Whole Quad Mode
SI Whole Quad false

Detailed Description

This pass adds instructions to enable whole quad mode (strict or non-strict) for pixel shaders, and strict whole wavefront mode for all programs.

The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict WQM/WWM will always be enabled irrespective of control flow decisions. Conversely in non-strict WQM inactive lanes may control flow decisions.

Whole quad mode is required for derivative computations, but it interferes with shader side effects (stores and atomics). It ensures that WQM is enabled when necessary, but disabled around stores and atomics.

When necessary, this pass creates a function prolog


to enter WQM at the top of the function and surrounds blocks of Exact instructions by

S_AND_SAVEEXEC_B64 Tmp, LiveMask ... S_MOV_B64 EXEC, Tmp

We also compute when a sequence of instructions requires strict whole wavefront mode (StrictWWM) and insert instructions to save and restore it:

S_OR_SAVEEXEC_B64 Tmp, -1 ... S_MOV_B64 EXEC, Tmp

When a sequence of instructions requires strict whole quad mode (StrictWQM) we use a similar save and restore mechanism and force whole quad mode for those instructions:


In order to avoid excessive switching during sequences of Exact instructions, the pass first analyzes which instructions must be run in WQM (aka which instructions produce values that lead to derivative computations).

Basic blocks are always exited in WQM as long as some successor needs WQM.

There is room for improvement given better control flow analysis:

(1) at the top level (outside of control flow statements, and as long as kill hasn't been used), one SGPR can be saved by recovering WQM from the LiveMask (this is implemented for the entry block).

(2) when entire regions (e.g. if-else blocks or entire loops) only consist of exact and don't-care instructions, the switch only has to be done at the entry and exit points rather than potentially in each block of the region.

Definition in file SIWholeQuadMode.cpp.

Macro Definition Documentation


#define DEBUG_TYPE   "si-wqm"

Definition at line 87 of file SIWholeQuadMode.cpp.

Function Documentation


"SI Whole Quad Mode ,
false  ,

Variable Documentation



Definition at line 264 of file SIWholeQuadMode.cpp.

◆ false

SI Whole Quad false

Definition at line 264 of file SIWholeQuadMode.cpp.

◆ Mode

SI Whole Quad Mode

Definition at line 264 of file SIWholeQuadMode.cpp.

Referenced by applyDebugify(), ARM64EmitUnwindCode(), llvm::AAResults::canInstructionRangeModRef(), canReduceVMulWidth(), llvm::FileOutputBuffer::create(), llvm::sys::fs::TempFile::create(), createCheckDebugifyFunctionPass(), createCheckDebugifyModulePass(), createDebugifyFunctionPass(), createDebugifyModulePass(), createInMemoryBuffer(), createOnDiskBuffer(), llvm::remarks::createRemarkSerializer(), createUniqueEntity(), llvm::sys::fs::createUniqueFile(), Status::delta(), llvm::denormalModeKindName(), llvm::PPCTargetLowering::EmitInstrWithCustomInserter(), ExpandHorizontalBinOp(), llvm::FlushFPConstant(), foldFabsWithFcmpZero(), llvm::GCNTTIImpl::GCNTTIImpl(), llvm::ARM_AM::getAM4SubMode(), llvm::ARM_AM::getAMSubModeStr(), llvm::AMDGPU::SIModeRegisterDefaults::getDefaultForCallingConv(), getFPMode(), getLoadStoreMultipleOpcode(), getPostIndexedLoadStoreOpcode(), getPreIndexedLoadStoreOpcode(), getSPDenormModeValue(), llvm::TargetLowering::getSqrtInputTest(), getUpdatingLSMultipleOpcode(), getVectorComparison(), getVectorComparisonOrInvert(), Status::intersect(), Status::isCompatible(), llvm::TargetTransformInfo::isIndexedLoadLegal(), llvm::TargetTransformInfo::isIndexedStoreLegal(), isMemberPointer(), llvm::RISCVFPRndMode::isValidRoundingMode(), llvm::AMDGPULegalizerInfo::legalizeFDIV32(), LLVMSetThreadLocalMode(), llvm::yaml::MappingTraits< SIMode >::mapping(), matchPMADDWD(), Status::merge(), Status::mergeUnknown(), llvm::orc::rt_bootstrap::SimpleExecutorDylibManager::open(), llvm::sys::fs::openFileForReadWrite(), llvm::sys::fs::openFileForWrite(), llvm::sys::fs::openNativeFileForReadWrite(), llvm::sys::fs::openNativeFileForWrite(), llvm::operator<<(), Status::operator==(), llvm::parseDenormalFPAttribute(), llvm::SMDiagnostic::print(), printAsmMRegister(), printAsmVRegister(), llvm::ARMInstPrinter::printLdStmModeOperand(), reduceVMULWidth(), NewPMDebugifyPass::run(), NewPMCheckDebugifyPass::run(), llvm::PPCTargetLowering::SelectForceXFormMode(), llvm::PPCTargetLowering::SelectOptimalAddrMode(), llvm::TargetOptions::setFP32DenormalMode(), llvm::TargetOptions::setFPDenormalMode(), llvm::TargetMachine::setGlobalISelAbort(), setXFormForUnalignedFI(), llvm::yaml::SIMode::SIMode(), simplifyNvvmIntrinsic(), Status::Status(), toggleSPDenormMode(), llvm::InlineAdvisorAnalysis::Result::tryCreate(), llvm::writeToOutput(), and llvm::object::writeUniversalBinary().