LLVM 20.0.0git
|
This pass adds instructions to enable whole quad mode (strict or non-strict) for pixel shaders, and strict whole wavefront mode for all programs. More...
#include "AMDGPU.h"
#include "GCNSubtarget.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachinePostDominators.h"
#include "llvm/IR/CallingConv.h"
#include "llvm/InitializePasses.h"
#include "llvm/Support/raw_ostream.h"
Go to the source code of this file.
Macros | |
#define | DEBUG_TYPE "si-wqm" |
Functions | |
INITIALIZE_PASS_BEGIN (SIWholeQuadMode, DEBUG_TYPE, "SI Whole Quad Mode", false, false) INITIALIZE_PASS_END(SIWholeQuadMode | |
Variables | |
DEBUG_TYPE | |
SI Whole Quad | Mode |
SI Whole Quad | false |
This pass adds instructions to enable whole quad mode (strict or non-strict) for pixel shaders, and strict whole wavefront mode for all programs.
The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict WQM/WWM will always be enabled irrespective of control flow decisions. Conversely in non-strict WQM inactive lanes may control flow decisions.
Whole quad mode is required for derivative computations, but it interferes with shader side effects (stores and atomics). It ensures that WQM is enabled when necessary, but disabled around stores and atomics.
When necessary, this pass creates a function prolog
S_MOV_B64 LiveMask, EXEC S_WQM_B64 EXEC, EXEC
to enter WQM at the top of the function and surrounds blocks of Exact instructions by
S_AND_SAVEEXEC_B64 Tmp, LiveMask ... S_MOV_B64 EXEC, Tmp
We also compute when a sequence of instructions requires strict whole wavefront mode (StrictWWM) and insert instructions to save and restore it:
S_OR_SAVEEXEC_B64 Tmp, -1 ... S_MOV_B64 EXEC, Tmp
When a sequence of instructions requires strict whole quad mode (StrictWQM) we use a similar save and restore mechanism and force whole quad mode for those instructions:
S_MOV_B64 Tmp, EXEC S_WQM_B64 EXEC, EXEC ... S_MOV_B64 EXEC, Tmp
In order to avoid excessive switching during sequences of Exact instructions, the pass first analyzes which instructions must be run in WQM (aka which instructions produce values that lead to derivative computations).
Basic blocks are always exited in WQM as long as some successor needs WQM.
There is room for improvement given better control flow analysis:
(1) at the top level (outside of control flow statements, and as long as kill hasn't been used), one SGPR can be saved by recovering WQM from the LiveMask (this is implemented for the entry block).
(2) when entire regions (e.g. if-else blocks or entire loops) only consist of exact and don't-care instructions, the switch only has to be done at the entry and exit points rather than potentially in each block of the region.
Definition in file SIWholeQuadMode.cpp.
#define DEBUG_TYPE "si-wqm" |
Definition at line 87 of file SIWholeQuadMode.cpp.
INITIALIZE_PASS_BEGIN | ( | SIWholeQuadMode | , |
DEBUG_TYPE | , | ||
"SI Whole Quad Mode" | , | ||
false | , | ||
false | |||
) |
DEBUG_TYPE |
Definition at line 263 of file SIWholeQuadMode.cpp.
SI Whole Quad false |
Definition at line 263 of file SIWholeQuadMode.cpp.
SI Whole Quad Mode |
Definition at line 263 of file SIWholeQuadMode.cpp.