LLVM 23.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386static bool isTargetWindows(const MachineFunction &MF) {
387 // TODO: Should this include targets like UEFI (which use Windows CFI)?
388 // Note: Currently, there is not AArch64 support for UEFI. The value returned
389 // here must align with the predicate used for returning the list of callee
390 // saved regs in AArch64RegisterInfo::getCalleeSavedRegs(), so that we use
391 // invalidateWindowsRegisterPairing() where appropriate.
393}
394
396 const MachineFunction &MF) const {
397 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
398 return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
399}
400
401/// Returns true if a homogeneous prolog or epilog code can be emitted
402/// for the size optimization. If possible, a frame helper call is injected.
403/// When Exit block is given, this check is for epilog.
404bool AArch64FrameLowering::homogeneousPrologEpilog(
405 MachineFunction &MF, MachineBasicBlock *Exit) const {
406 if (!MF.getFunction().hasMinSize())
407 return false;
409 return false;
410 if (EnableRedZone)
411 return false;
412
413 // TODO: Window is supported yet.
414 if (isTargetWindows(MF))
415 return false;
416
417 // TODO: SVE is not supported yet.
418 if (isLikelyToHaveSVEStack(*this, MF))
419 return false;
420
421 // Bail on stack adjustment needed on return for simplicity.
422 const MachineFrameInfo &MFI = MF.getFrameInfo();
423 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
424 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
425 return false;
426 if (Exit && getArgumentStackToRestore(MF, *Exit))
427 return false;
428
429 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
430 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
431 return false;
432
433 // If there are an odd number of GPRs before LR and FP in the CSRs list,
434 // they will not be paired into one RegPairInfo, which is incompatible with
435 // the assumption made by the homogeneous prolog epilog pass.
436 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
437 unsigned NumGPRs = 0;
438 for (unsigned I = 0; CSRegs[I]; ++I) {
439 Register Reg = CSRegs[I];
440 if (Reg == AArch64::LR) {
441 assert(CSRegs[I + 1] == AArch64::FP);
442 if (NumGPRs % 2 != 0)
443 return false;
444 break;
445 }
446 if (AArch64::GPR64RegClass.contains(Reg))
447 ++NumGPRs;
448 }
449
450 return true;
451}
452
453/// Returns true if CSRs should be paired.
454bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
455 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
456}
457
458/// This is the biggest offset to the stack pointer we can encode in aarch64
459/// instructions (without using a separate calculation and a temp register).
460/// Note that the exception here are vector stores/loads which cannot encode any
461/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
462static const unsigned DefaultSafeSPDisplacement = 255;
463
464/// Look at each instruction that references stack frames and return the stack
465/// size limit beyond which some of these instructions will require a scratch
466/// register during their expansion later.
468 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
469 // range. We'll end up allocating an unnecessary spill slot a lot, but
470 // realistically that's not a big deal at this stage of the game.
471 for (MachineBasicBlock &MBB : MF) {
472 for (MachineInstr &MI : MBB) {
473 if (MI.isDebugInstr() || MI.isPseudo() ||
474 MI.getOpcode() == AArch64::ADDXri ||
475 MI.getOpcode() == AArch64::ADDSXri)
476 continue;
477
478 for (const MachineOperand &MO : MI.operands()) {
479 if (!MO.isFI())
480 continue;
481
483 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
485 return 0;
486 }
487 }
488 }
490}
491
496
497unsigned
498AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
499 const AArch64FunctionInfo *AFI,
500 bool IsWin64, bool IsFunclet) const {
501 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
502 "Tail call reserved stack must be aligned to 16 bytes");
503 if (!IsWin64 || IsFunclet) {
504 return AFI->getTailCallReservedStack();
505 } else {
506 if (AFI->getTailCallReservedStack() != 0 &&
507 !MF.getFunction().getAttributes().hasAttrSomewhere(
508 Attribute::SwiftAsync))
509 report_fatal_error("cannot generate ABI-changing tail call for Win64");
510 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
511
512 // Var args are stored here in the primary function.
513 FixedObjectSize += AFI->getVarArgsGPRSize();
514
515 if (MF.hasEHFunclets()) {
516 // Catch objects are stored here in the primary function.
517 const MachineFrameInfo &MFI = MF.getFrameInfo();
518 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
519 SmallSetVector<int, 8> CatchObjFrameIndices;
520 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
521 for (const WinEHHandlerType &H : TBME.HandlerArray) {
522 int FrameIndex = H.CatchObj.FrameIndex;
523 if ((FrameIndex != INT_MAX) &&
524 CatchObjFrameIndices.insert(FrameIndex)) {
525 FixedObjectSize = alignTo(FixedObjectSize,
526 MFI.getObjectAlign(FrameIndex).value()) +
527 MFI.getObjectSize(FrameIndex);
528 }
529 }
530 }
531 // To support EH funclets we allocate an UnwindHelp object
532 FixedObjectSize += 8;
533 }
534 return alignTo(FixedObjectSize, 16);
535 }
536}
537
539 if (!EnableRedZone)
540 return false;
541
542 // Don't use the red zone if the function explicitly asks us not to.
543 // This is typically used for kernel code.
544 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
545 const unsigned RedZoneSize =
547 if (!RedZoneSize)
548 return false;
549
550 const MachineFrameInfo &MFI = MF.getFrameInfo();
552 uint64_t NumBytes = AFI->getLocalStackSize();
553
554 // If neither NEON or SVE are available, a COPY from one Q-reg to
555 // another requires a spill -> reload sequence. We can do that
556 // using a pre-decrementing store/post-decrementing load, but
557 // if we do so, we can't use the Red Zone.
558 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
559 !Subtarget.isNeonAvailable() &&
560 !Subtarget.hasSVE();
561
562 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
563 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
564}
565
566/// hasFPImpl - Return true if the specified function should have a dedicated
567/// frame pointer register.
569 const MachineFrameInfo &MFI = MF.getFrameInfo();
570 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
572
573 // Win64 EH requires a frame pointer if funclets are present, as the locals
574 // are accessed off the frame pointer in both the parent function and the
575 // funclets.
576 if (MF.hasEHFunclets())
577 return true;
578
579 // When the stack guard is mixed with the frame pointer, a dedicated FP is
580 // required so the guard value remains stable in the presence of dynamic
581 // stack allocations (e.g. _alloca on MSVCRT).
582 if (MFI.hasStackProtectorIndex()) {
583 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
584 if (Subtarget.getTargetLowering()->useStackGuardMixFP())
585 return true;
586 }
587
588 // Retain behavior of always omitting the FP for leaf functions when possible.
590 return true;
591 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
592 MFI.hasStackMap() || MFI.hasPatchPoint() ||
593 RegInfo->hasStackRealignment(MF))
594 return true;
595
596 // If we:
597 //
598 // 1. Have streaming mode changes
599 // OR:
600 // 2. Have a streaming body with SVE stack objects
601 //
602 // Then the value of VG restored when unwinding to this function may not match
603 // the value of VG used to set up the stack.
604 //
605 // This is a problem as the CFA can be described with an expression of the
606 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
607 //
608 // If the value of VG used in that expression does not match the value used to
609 // set up the stack, an incorrect address for the CFA will be computed, and
610 // unwinding will fail.
611 //
612 // We work around this issue by ensuring the frame-pointer can describe the
613 // CFA in either of these cases.
614 if (AFI.needsDwarfUnwindInfo(MF) &&
617 return true;
618 // With large callframes around we may need to use FP to access the scavenging
619 // emergency spillslot.
620 //
621 // Unfortunately some calls to hasFP() like machine verifier ->
622 // getReservedReg() -> hasFP in the middle of global isel are too early
623 // to know the max call frame size. Hopefully conservatively returning "true"
624 // in those cases is fine.
625 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
626 if (!MFI.isMaxCallFrameSizeComputed() ||
628 return true;
629
630 return false;
631}
632
633/// Should the Frame Pointer be reserved for the current function?
635 const TargetMachine &TM = MF.getTarget();
636 const Triple &TT = TM.getTargetTriple();
637
638 // These OSes require the frame chain is valid, even if the current frame does
639 // not use a frame pointer.
640 if (TT.isOSDarwin() || TT.isOSWindows())
641 return true;
642
643 // If the function has a frame pointer, it is reserved.
644 if (hasFP(MF))
645 return true;
646
647 // Frontend has requested to preserve the frame pointer.
649 return true;
650
651 return false;
652}
653
654/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
655/// not required, we reserve argument space for call sites in the function
656/// immediately on entry to the current function. This eliminates the need for
657/// add/sub sp brackets around call sites. Returns true if the call frame is
658/// included as part of the stack frame.
660 const MachineFunction &MF) const {
661 // The stack probing code for the dynamically allocated outgoing arguments
662 // area assumes that the stack is probed at the top - either by the prologue
663 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
664 // most recent variable-sized object allocation. Changing the condition here
665 // may need to be followed up by changes to the probe issuing logic.
666 return !MF.getFrameInfo().hasVarSizedObjects();
667}
668
672
673 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
674 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
675 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
676 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
677 DebugLoc DL = I->getDebugLoc();
678 unsigned Opc = I->getOpcode();
679 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
680 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
681
682 if (!hasReservedCallFrame(MF)) {
683 int64_t Amount = I->getOperand(0).getImm();
684 Amount = alignTo(Amount, getStackAlign());
685 if (!IsDestroy)
686 Amount = -Amount;
687
688 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
689 // doesn't have to pop anything), then the first operand will be zero too so
690 // this adjustment is a no-op.
691 if (CalleePopAmount == 0) {
692 // FIXME: in-function stack adjustment for calls is limited to 24-bits
693 // because there's no guaranteed temporary register available.
694 //
695 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
696 // 1) For offset <= 12-bit, we use LSL #0
697 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
698 // LSL #0, and the other uses LSL #12.
699 //
700 // Most call frames will be allocated at the start of a function so
701 // this is OK, but it is a limitation that needs dealing with.
702 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
703
704 if (TLI->hasInlineStackProbe(MF) &&
706 // When stack probing is enabled, the decrement of SP may need to be
707 // probed. We only need to do this if the call site needs 1024 bytes of
708 // space or more, because a region smaller than that is allowed to be
709 // unprobed at an ABI boundary. We rely on the fact that SP has been
710 // probed exactly at this point, either by the prologue or most recent
711 // dynamic allocation.
713 "non-reserved call frame without var sized objects?");
714 Register ScratchReg =
715 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
716 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
717 } else {
718 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
719 StackOffset::getFixed(Amount), TII);
720 }
721 }
722 } else if (CalleePopAmount != 0) {
723 // If the calling convention demands that the callee pops arguments from the
724 // stack, we want to add it back if we have a reserved call frame.
725 assert(CalleePopAmount < 0xffffff && "call frame too large");
726 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
727 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
728 }
729 return MBB.erase(I);
730}
731
733 MachineBasicBlock &MBB) const {
734
735 MachineFunction &MF = *MBB.getParent();
736 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
737 const auto &TRI = *Subtarget.getRegisterInfo();
738 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
739
740 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
741
742 // Reset the CFA to `SP + 0`.
743 CFIBuilder.buildDefCFA(AArch64::SP, 0);
744
745 // Flip the RA sign state.
746 if (MFI.shouldSignReturnAddress(MF)) {
747 if (MFI.branchProtectionPAuthLR()) {
748 CFIBuilder.buildNegateRAStateWithPC();
749 } else if (!MF.getTarget().getTargetTriple().isOSBinFormatMachO()) {
750 CFIBuilder.buildNegateRAState();
751 }
752 }
753
754 // Shadow call stack uses X18, reset it.
755 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
756 CFIBuilder.buildSameValue(AArch64::X18);
757
758 // Emit .cfi_same_value for callee-saved registers.
759 const std::vector<CalleeSavedInfo> &CSI =
761 for (const auto &Info : CSI) {
762 MCRegister Reg = Info.getReg();
763 if (!TRI.regNeedsCFI(Reg, Reg))
764 continue;
765 CFIBuilder.buildSameValue(Reg);
766 }
767}
768
770 switch (Reg.id()) {
771 default:
772 // The called routine is expected to preserve r19-r28
773 // r29 and r30 are used as frame pointer and link register resp.
774 return 0;
775
776 // GPRs
777#define CASE(n) \
778 case AArch64::W##n: \
779 case AArch64::X##n: \
780 return AArch64::X##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800#undef CASE
801
802 // FPRs
803#define CASE(n) \
804 case AArch64::B##n: \
805 case AArch64::H##n: \
806 case AArch64::S##n: \
807 case AArch64::D##n: \
808 case AArch64::Q##n: \
809 return HasSVE ? AArch64::Z##n : AArch64::Q##n
810 CASE(0);
811 CASE(1);
812 CASE(2);
813 CASE(3);
814 CASE(4);
815 CASE(5);
816 CASE(6);
817 CASE(7);
818 CASE(8);
819 CASE(9);
820 CASE(10);
821 CASE(11);
822 CASE(12);
823 CASE(13);
824 CASE(14);
825 CASE(15);
826 CASE(16);
827 CASE(17);
828 CASE(18);
829 CASE(19);
830 CASE(20);
831 CASE(21);
832 CASE(22);
833 CASE(23);
834 CASE(24);
835 CASE(25);
836 CASE(26);
837 CASE(27);
838 CASE(28);
839 CASE(29);
840 CASE(30);
841 CASE(31);
842#undef CASE
843 }
844}
845
846void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
847 MachineBasicBlock &MBB) const {
848 // Insertion point.
850
851 // Fake a debug loc.
852 DebugLoc DL;
853 if (MBBI != MBB.end())
854 DL = MBBI->getDebugLoc();
855
856 const MachineFunction &MF = *MBB.getParent();
857 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
858 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
859
860 BitVector GPRsToZero(TRI.getNumRegs());
861 BitVector FPRsToZero(TRI.getNumRegs());
862 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
863 for (MCRegister Reg : RegsToZero.set_bits()) {
864 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
865 // For GPRs, we only care to clear out the 64-bit register.
866 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
867 GPRsToZero.set(XReg);
868 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
869 // For FPRs,
870 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
871 FPRsToZero.set(XReg);
872 }
873 }
874
875 const AArch64InstrInfo &TII = *STI.getInstrInfo();
876
877 // Zero out GPRs.
878 for (MCRegister Reg : GPRsToZero.set_bits())
879 TII.buildClearRegister(Reg, MBB, MBBI, DL);
880
881 // Zero out FP/vector registers.
882 for (MCRegister Reg : FPRsToZero.set_bits())
883 TII.buildClearRegister(Reg, MBB, MBBI, DL);
884
885 if (HasSVE) {
886 for (MCRegister PReg :
887 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
888 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
889 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
890 AArch64::P15}) {
891 if (RegsToZero[PReg])
892 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
893 }
894 }
895}
896
897bool AArch64FrameLowering::windowsRequiresStackProbe(
898 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
899 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
900 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
901 // TODO: When implementing stack protectors, take that into account
902 // for the probe threshold.
903 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
904 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
905}
906
908 const MachineBasicBlock &MBB) {
909 const MachineFunction *MF = MBB.getParent();
910 LiveRegs.addLiveIns(MBB);
911 // Mark callee saved registers as used so we will not choose them.
912 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
913 for (unsigned i = 0; CSRegs[i]; ++i)
914 LiveRegs.addReg(CSRegs[i]);
915}
916
918AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
919 bool HasCall) const {
920 MachineFunction *MF = MBB->getParent();
921
922 // If MBB is an entry block, use X9 as the scratch register
923 // preserve_none functions may be using X9 to pass arguments,
924 // so prefer to pick an available register below.
925 if (&MF->front() == MBB &&
927 return AArch64::X9;
928
929 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
930 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
931 LivePhysRegs LiveRegs(TRI);
932 getLiveRegsForEntryMBB(LiveRegs, *MBB);
933 if (HasCall) {
934 LiveRegs.addReg(AArch64::X16);
935 LiveRegs.addReg(AArch64::X17);
936 LiveRegs.addReg(AArch64::X18);
937 }
938
939 // Prefer X9 since it was historically used for the prologue scratch reg.
940 const MachineRegisterInfo &MRI = MF->getRegInfo();
941 if (LiveRegs.available(MRI, AArch64::X9))
942 return AArch64::X9;
943
944 for (unsigned Reg : AArch64::GPR64RegClass) {
945 if (LiveRegs.available(MRI, Reg))
946 return Reg;
947 }
948 return AArch64::NoRegister;
949}
950
952 const MachineBasicBlock &MBB) const {
953 const MachineFunction *MF = MBB.getParent();
954 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
955 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
956 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
957 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
959
960 if (AFI->hasSwiftAsyncContext()) {
961 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
962 const MachineRegisterInfo &MRI = MF->getRegInfo();
965 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
966 // available.
967 if (!LiveRegs.available(MRI, AArch64::X16) ||
968 !LiveRegs.available(MRI, AArch64::X17))
969 return false;
970 }
971
972 // Certain stack probing sequences might clobber flags, then we can't use
973 // the block as a prologue if the flags register is a live-in.
975 MBB.isLiveIn(AArch64::NZCV))
976 return false;
977
978 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
979 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
980 return false;
981
982 // May need a scratch register (for return value) if require making a special
983 // call
984 if (requiresSaveVG(*MF) ||
985 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
986 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
987 return false;
988
989 return true;
990}
991
993 const Function &F = MF.getFunction();
994 return MF.getTarget().getMCAsmInfo().usesWindowsCFI() &&
995 F.needsUnwindTableEntry();
996}
997
998bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
999 const MachineFunction &MF) const {
1000 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
1001 // and SEH_EpilogEnd instructions in the correct order.
1003 return false;
1006}
1007
1008// Given a load or a store instruction, generate an appropriate unwinding SEH
1009// code on Windows.
1011AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
1012 const AArch64InstrInfo &TII,
1013 MachineInstr::MIFlag Flag) const {
1014 unsigned Opc = MBBI->getOpcode();
1015 MachineBasicBlock *MBB = MBBI->getParent();
1016 MachineFunction &MF = *MBB->getParent();
1017 DebugLoc DL = MBBI->getDebugLoc();
1018 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1019 int Imm = MBBI->getOperand(ImmIdx).getImm();
1021 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1022 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1023
1024 switch (Opc) {
1025 default:
1026 report_fatal_error("No SEH Opcode for this instruction");
1027 case AArch64::STR_ZXI:
1028 case AArch64::LDR_ZXI: {
1029 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1030 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1031 .addImm(Reg0)
1032 .addImm(Imm)
1033 .setMIFlag(Flag);
1034 break;
1035 }
1036 case AArch64::STR_PXI:
1037 case AArch64::LDR_PXI: {
1038 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1039 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1040 .addImm(Reg0)
1041 .addImm(Imm)
1042 .setMIFlag(Flag);
1043 break;
1044 }
1045 case AArch64::LDPDpost:
1046 Imm = -Imm;
1047 [[fallthrough]];
1048 case AArch64::STPDpre: {
1049 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1050 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1051 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1052 .addImm(Reg0)
1053 .addImm(Reg1)
1054 .addImm(Imm * 8)
1055 .setMIFlag(Flag);
1056 break;
1057 }
1058 case AArch64::LDPXpost:
1059 Imm = -Imm;
1060 [[fallthrough]];
1061 case AArch64::STPXpre: {
1062 Register Reg0 = MBBI->getOperand(1).getReg();
1063 Register Reg1 = MBBI->getOperand(2).getReg();
1064 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1065 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1066 .addImm(Imm * 8)
1067 .setMIFlag(Flag);
1068 else
1069 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1070 .addImm(RegInfo->getSEHRegNum(Reg0))
1071 .addImm(RegInfo->getSEHRegNum(Reg1))
1072 .addImm(Imm * 8)
1073 .setMIFlag(Flag);
1074 break;
1075 }
1076 case AArch64::LDRDpost:
1077 Imm = -Imm;
1078 [[fallthrough]];
1079 case AArch64::STRDpre: {
1080 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1081 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1082 .addImm(Reg)
1083 .addImm(Imm)
1084 .setMIFlag(Flag);
1085 break;
1086 }
1087 case AArch64::LDRXpost:
1088 Imm = -Imm;
1089 [[fallthrough]];
1090 case AArch64::STRXpre: {
1091 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1092 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1093 .addImm(Reg)
1094 .addImm(Imm)
1095 .setMIFlag(Flag);
1096 break;
1097 }
1098 case AArch64::STPDi:
1099 case AArch64::LDPDi: {
1100 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1101 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1102 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1103 .addImm(Reg0)
1104 .addImm(Reg1)
1105 .addImm(Imm * 8)
1106 .setMIFlag(Flag);
1107 break;
1108 }
1109 case AArch64::STPXi:
1110 case AArch64::LDPXi: {
1111 Register Reg0 = MBBI->getOperand(0).getReg();
1112 Register Reg1 = MBBI->getOperand(1).getReg();
1113
1114 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1115 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1116
1117 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1118 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1119 .addImm(Imm * 8)
1120 .setMIFlag(Flag);
1121 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1122 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1123 .addImm(SEHReg0)
1124 .addImm(SEHReg1)
1125 .addImm(Imm * 8)
1126 .setMIFlag(Flag);
1127 else
1128 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1129 .addImm(SEHReg0)
1130 .addImm(SEHReg1)
1131 .addImm(Imm * 8)
1132 .setMIFlag(Flag);
1133 break;
1134 }
1135 case AArch64::STRXui:
1136 case AArch64::LDRXui: {
1137 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1138 if (Reg >= 19)
1139 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1140 .addImm(Reg)
1141 .addImm(Imm * 8)
1142 .setMIFlag(Flag);
1143 else
1144 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1145 .addImm(Reg)
1146 .addImm(Imm * 8)
1147 .setMIFlag(Flag);
1148 break;
1149 }
1150 case AArch64::STRDui:
1151 case AArch64::LDRDui: {
1152 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1153 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1154 .addImm(Reg)
1155 .addImm(Imm * 8)
1156 .setMIFlag(Flag);
1157 break;
1158 }
1159 case AArch64::STPQi:
1160 case AArch64::LDPQi: {
1161 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1162 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1163 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1164 .addImm(Reg0)
1165 .addImm(Reg1)
1166 .addImm(Imm * 16)
1167 .setMIFlag(Flag);
1168 break;
1169 }
1170 case AArch64::LDPQpost:
1171 Imm = -Imm;
1172 [[fallthrough]];
1173 case AArch64::STPQpre: {
1174 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1175 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1176 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1177 .addImm(Reg0)
1178 .addImm(Reg1)
1179 .addImm(Imm * 16)
1180 .setMIFlag(Flag);
1181 break;
1182 }
1183 }
1184 auto I = MBB->insertAfter(MBBI, MIB);
1185 return I;
1186}
1187
1190 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1191 return false;
1192 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1193 // is enabled with streaming mode changes.
1194 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1195 if (ST.isTargetDarwin())
1196 return ST.hasSVE();
1197 return true;
1198}
1199
1201 MachineFunction &MF) const {
1202 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1203 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
1204
1205 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1206 DebugLoc DL; // Set debug location to unknown.
1208
1209 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1211 };
1212
1213 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1214 DebugLoc DL;
1215 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1216 if (MBBI != MBB.end())
1217 DL = MBBI->getDebugLoc();
1218
1219 TII->createPauthEpilogueInstr(MBB, DL);
1220 };
1221
1222 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1223 EmitSignRA(MF.front());
1224 for (MachineBasicBlock &MBB : MF) {
1225 if (MBB.isEHFuncletEntry())
1226 EmitSignRA(MBB);
1227 if (MBB.isReturnBlock())
1228 EmitAuthRA(MBB);
1229 }
1230}
1231
1233 MachineBasicBlock &MBB) const {
1234 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1235 PrologueEmitter.emitPrologue();
1236}
1237
1239 MachineBasicBlock &MBB) const {
1240 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1241 EpilogueEmitter.emitEpilogue();
1242}
1243
1246 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1247}
1248
1250 return enableCFIFixup(MF) &&
1251 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1252}
1253
1254/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1255/// debug info. It's the same as what we use for resolving the code-gen
1256/// references for now. FIXME: This can go wrong when references are
1257/// SP-relative and simple call frames aren't used.
1260 Register &FrameReg) const {
1262 MF, FI, FrameReg,
1263 /*PreferFP=*/
1264 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1265 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1266 /*ForSimm=*/false);
1267}
1268
1271 int FI) const {
1272 // This function serves to provide a comparable offset from a single reference
1273 // point (the value of SP at function entry) that can be used for analysis,
1274 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1275 // correct for all objects in the presence of VLA-area objects or dynamic
1276 // stack re-alignment.
1277
1278 const auto &MFI = MF.getFrameInfo();
1279
1280 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1281 StackOffset ZPRStackSize = getZPRStackSize(MF);
1282 StackOffset PPRStackSize = getPPRStackSize(MF);
1283 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1284
1285 // For VLA-area objects, just emit an offset at the end of the stack frame.
1286 // Whilst not quite correct, these objects do live at the end of the frame and
1287 // so it is more useful for analysis for the offset to reflect this.
1288 if (MFI.isVariableSizedObjectIndex(FI)) {
1289 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1290 }
1291
1292 // This is correct in the absence of any SVE stack objects.
1293 if (!SVEStackSize)
1294 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1295
1296 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1297 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1298 if (MFI.hasScalableStackID(FI)) {
1299 if (FPAfterSVECalleeSaves &&
1300 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1301 assert(!AFI->hasSplitSVEObjects() &&
1302 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1303 return StackOffset::getScalable(ObjectOffset);
1304 }
1305 StackOffset AccessOffset{};
1306 // The scalable vectors are below (lower address) the scalable predicates
1307 // with split SVE objects, so we must subtract the size of the predicates.
1308 if (AFI->hasSplitSVEObjects() &&
1309 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1310 AccessOffset = -PPRStackSize;
1311 return AccessOffset +
1312 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1313 ObjectOffset);
1314 }
1315
1316 bool IsFixed = MFI.isFixedObjectIndex(FI);
1317 bool IsCSR =
1318 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1319
1320 StackOffset ScalableOffset = {};
1321 if (!IsFixed && !IsCSR) {
1322 ScalableOffset = -SVEStackSize;
1323 } else if (FPAfterSVECalleeSaves && IsCSR) {
1324 ScalableOffset =
1326 }
1327
1328 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1329}
1330
1336
1337StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1338 int64_t ObjectOffset) const {
1339 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1340 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1341 const Function &F = MF.getFunction();
1342 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1343 unsigned FixedObject =
1344 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1345 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1346 int64_t FPAdjust =
1347 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1348 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1349}
1350
1351StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1352 int64_t ObjectOffset) const {
1353 const auto &MFI = MF.getFrameInfo();
1354 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1355}
1356
1357// TODO: This function currently does not work for scalable vectors.
1359 int FI) const {
1360 const AArch64RegisterInfo *RegInfo =
1361 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1362 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1363 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1364 ? getFPOffset(MF, ObjectOffset).getFixed()
1365 : getStackOffset(MF, ObjectOffset).getFixed();
1366}
1367
1369 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1370 bool ForSimm) const {
1371 const auto &MFI = MF.getFrameInfo();
1372 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1373 bool isFixed = MFI.isFixedObjectIndex(FI);
1374 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1375 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1376 FrameReg, PreferFP, ForSimm);
1377}
1378
1380 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1381 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1382 bool ForSimm) const {
1383 const auto &MFI = MF.getFrameInfo();
1384 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1385 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1386 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1387
1388 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1389 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1390 bool isCSR =
1391 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1392 bool isSVE = MFI.isScalableStackID(StackID);
1393
1394 StackOffset ZPRStackSize = getZPRStackSize(MF);
1395 StackOffset PPRStackSize = getPPRStackSize(MF);
1396 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1397
1398 // Use frame pointer to reference fixed objects. Use it for locals if
1399 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1400 // reliable as a base). Make sure useFPForScavengingIndex() does the
1401 // right thing for the emergency spill slot.
1402 bool UseFP = false;
1403 if (AFI->hasStackFrame() && !isSVE) {
1404 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1405 // there are scalable (SVE) objects in between the FP and the fixed-sized
1406 // objects.
1407 PreferFP &= !SVEStackSize;
1408
1409 // Note: Keeping the following as multiple 'if' statements rather than
1410 // merging to a single expression for readability.
1411 //
1412 // Argument access should always use the FP.
1413 if (isFixed) {
1414 UseFP = hasFP(MF);
1415 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1416 // References to the CSR area must use FP if we're re-aligning the stack
1417 // since the dynamically-sized alignment padding is between the SP/BP and
1418 // the CSR area.
1419 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1420 UseFP = true;
1421 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1422 // If the FPOffset is negative and we're producing a signed immediate, we
1423 // have to keep in mind that the available offset range for negative
1424 // offsets is smaller than for positive ones. If an offset is available
1425 // via the FP and the SP, use whichever is closest.
1426 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1427 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1428
1429 if (FPOffset >= 0) {
1430 // If the FPOffset is positive, that'll always be best, as the SP/BP
1431 // will be even further away.
1432 UseFP = true;
1433 } else if (MFI.hasVarSizedObjects()) {
1434 // If we have variable sized objects, we can use either FP or BP, as the
1435 // SP offset is unknown. We can use the base pointer if we have one and
1436 // FP is not preferred. If not, we're stuck with using FP.
1437 bool CanUseBP = RegInfo->hasBasePointer(MF);
1438 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1439 UseFP = PreferFP;
1440 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1441 UseFP = true;
1442 // else we can use BP and FP, but the offset from FP won't fit.
1443 // That will make us scavenge registers which we can probably avoid by
1444 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1445 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1446 // Funclets access the locals contained in the parent's stack frame
1447 // via the frame pointer, so we have to use the FP in the parent
1448 // function.
1449 (void) Subtarget;
1450 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1451 MF.getFunction().isVarArg()) &&
1452 "Funclets should only be present on Win64");
1453 UseFP = true;
1454 } else {
1455 // We have the choice between FP and (SP or BP).
1456 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1457 UseFP = true;
1458 }
1459 }
1460 }
1461
1462 assert(
1463 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1464 "In the presence of dynamic stack pointer realignment, "
1465 "non-argument/CSR objects cannot be accessed through the frame pointer");
1466
1467 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1468
1469 if (isSVE) {
1470 StackOffset FPOffset = StackOffset::get(
1471 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1472 StackOffset SPOffset =
1473 SVEStackSize +
1474 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1475 ObjectOffset);
1476
1477 // With split SVE objects the ObjectOffset is relative to the split area
1478 // (i.e. the PPR area or ZPR area respectively).
1479 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1480 // If we're accessing an SVE vector with split SVE objects...
1481 // - From the FP we need to move down past the PPR area:
1482 FPOffset -= PPRStackSize;
1483 // - From the SP we only need to move up to the ZPR area:
1484 SPOffset -= PPRStackSize;
1485 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1486 // `SPOffset = ZPRStackSize + ...`.
1487 }
1488
1489 if (FPAfterSVECalleeSaves) {
1491 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1494 }
1495 }
1496
1497 // Always use the FP for SVE spills if available and beneficial.
1498 if (hasFP(MF) && (SPOffset.getFixed() ||
1499 FPOffset.getScalable() < SPOffset.getScalable() ||
1500 RegInfo->hasStackRealignment(MF))) {
1501 FrameReg = RegInfo->getFrameRegister(MF);
1502 return FPOffset;
1503 }
1504 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1505 : MCRegister(AArch64::SP);
1506
1507 return SPOffset;
1508 }
1509
1510 StackOffset SVEAreaOffset = {};
1511 if (FPAfterSVECalleeSaves) {
1512 // In this stack layout, the FP is in between the callee saves and other
1513 // SVE allocations.
1514 StackOffset SVECalleeSavedStack =
1516 if (UseFP) {
1517 if (isFixed)
1518 SVEAreaOffset = SVECalleeSavedStack;
1519 else if (!isCSR)
1520 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1521 } else {
1522 if (isFixed)
1523 SVEAreaOffset = SVEStackSize;
1524 else if (isCSR)
1525 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1526 }
1527 } else {
1528 if (UseFP && !(isFixed || isCSR))
1529 SVEAreaOffset = -SVEStackSize;
1530 if (!UseFP && (isFixed || isCSR))
1531 SVEAreaOffset = SVEStackSize;
1532 }
1533
1534 if (UseFP) {
1535 FrameReg = RegInfo->getFrameRegister(MF);
1536 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1537 }
1538
1539 // Use the base pointer if we have one.
1540 if (RegInfo->hasBasePointer(MF))
1541 FrameReg = RegInfo->getBaseRegister();
1542 else {
1543 assert(!MFI.hasVarSizedObjects() &&
1544 "Can't use SP when we have var sized objects.");
1545 FrameReg = AArch64::SP;
1546 // If we're using the red zone for this function, the SP won't actually
1547 // be adjusted, so the offsets will be negative. They're also all
1548 // within range of the signed 9-bit immediate instructions.
1549 if (canUseRedZone(MF))
1550 Offset -= AFI->getLocalStackSize();
1551 }
1552
1553 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1554}
1555
1557 // Do not set a kill flag on values that are also marked as live-in. This
1558 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1559 // callee saved registers.
1560 // Omitting the kill flags is conservatively correct even if the live-in
1561 // is not used after all.
1562 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1563 return getKillRegState(!IsLiveIn);
1564}
1565
1567 MachineFunction &MF) {
1568 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1569 AttributeList Attrs = MF.getFunction().getAttributes();
1571 return Subtarget.isTargetMachO() &&
1572 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1573 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1575 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1576}
1577
1578static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1579 unsigned SpillCount, unsigned Reg1,
1580 unsigned Reg2, bool NeedsWinCFI,
1581 const TargetRegisterInfo *TRI) {
1582 // If we are generating register pairs for a Windows function that requires
1583 // EH support, then pair consecutive registers only. There are no unwind
1584 // opcodes for saves/restores of non-consecutive register pairs.
1585 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1586 // save_lrpair.
1587 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1588
1589 if (Reg2 == AArch64::FP)
1590 return true;
1591 if (!NeedsWinCFI)
1592 return false;
1593
1594 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1595 // This is handled by only allowing paired spills for registers spilled at
1596 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1597 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1598 // 16-byte alignment in the uop representation.
1599 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1600 return SpillExtendedVolatile
1601 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1602 (SpillCount % 2) == 0)
1603 : false;
1604
1605 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1606 // opcode. The save_lrpair opcode requires the first register to be odd.
1607 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1608 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR)
1609 return false;
1610 return true;
1611}
1612
1613/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1614/// WindowsCFI requires that only consecutive registers can be paired.
1615/// LR and FP need to be allocated together when the frame needs to save
1616/// the frame-record. This means any other register pairing with LR is invalid.
1617static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1618 unsigned SpillCount, unsigned Reg1,
1619 unsigned Reg2, bool UsesWinAAPCS,
1620 bool NeedsWinCFI, bool NeedsFrameRecord,
1621 const TargetRegisterInfo *TRI) {
1622 if (UsesWinAAPCS)
1623 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1624 Reg1, Reg2, NeedsWinCFI, TRI);
1625
1626 // If we need to store the frame record, don't pair any register
1627 // with LR other than FP.
1628 if (NeedsFrameRecord)
1629 return Reg2 == AArch64::LR;
1630
1631 return false;
1632}
1633
1634namespace {
1635
1636struct RegPairInfo {
1637 Register Reg1;
1638 Register Reg2;
1639 int FrameIdx;
1640 int Offset;
1641 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1642 const TargetRegisterClass *RC;
1643
1644 RegPairInfo() = default;
1645
1646 bool isPaired() const { return Reg2.isValid(); }
1647
1648 bool isScalable() const { return Type == PPR || Type == ZPR; }
1649};
1650
1651} // end anonymous namespace
1652
1654 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1655 if (SavedRegs.test(PReg)) {
1656 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1657 return MCRegister(PNReg);
1658 }
1659 }
1660 return MCRegister();
1661}
1662
1663// The multivector LD/ST are available only for SME or SVE2p1 targets
1665 MachineFunction &MF) {
1667 return false;
1668
1669 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1670 bool IsLocallyStreaming =
1671 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1672
1673 // Only when in streaming mode SME2 instructions can be safely used.
1674 // It is not safe to use SME2 instructions when in streaming compatible or
1675 // locally streaming mode.
1676 return Subtarget.hasSVE2p1() ||
1677 (Subtarget.hasSME2() &&
1678 (!IsLocallyStreaming && Subtarget.isStreaming()));
1679}
1680
1682 MachineFunction &MF,
1684 const TargetRegisterInfo *TRI,
1686 bool NeedsFrameRecord) {
1687
1688 if (CSI.empty())
1689 return;
1690
1691 bool IsWindows = isTargetWindows(MF);
1693 unsigned StackHazardSize = getStackHazardSize(MF);
1694 MachineFrameInfo &MFI = MF.getFrameInfo();
1696 unsigned Count = CSI.size();
1697 (void)CC;
1698 // MachO's compact unwind format relies on all registers being stored in
1699 // pairs.
1700 assert((!produceCompactUnwindFrame(AFL, MF) ||
1703 (Count & 1) == 0) &&
1704 "Odd number of callee-saved regs to spill!");
1705 int ByteOffset = AFI->getCalleeSavedStackSize();
1706 int StackFillDir = -1;
1707 int RegInc = 1;
1708 unsigned FirstReg = 0;
1709 if (IsWindows) {
1710 // For WinCFI, fill the stack from the bottom up.
1711 ByteOffset = 0;
1712 StackFillDir = 1;
1713 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1714 // backwards, to pair up registers starting from lower numbered registers.
1715 RegInc = -1;
1716 FirstReg = Count - 1;
1717 }
1718
1719 bool FPAfterSVECalleeSaves = AFL.hasSVECalleeSavesAboveFrameRecord(MF);
1720 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1721 // scratch, x18 as platform reserved. However, clang has extended calling
1722 // convensions such as preserve_most and preserve_all which treat these as
1723 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1724 // uOPs which have separate restrictions. We need to check for that.
1725 //
1726 // NOTE: we currently do not account for the D registers as LLVM does not
1727 // support non-ABI compliant D register spills.
1728 bool SpillExtendedVolatile =
1729 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1730 const auto &Reg = CSI.getReg();
1731 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1732 });
1733
1734 int ZPRByteOffset = 0;
1735 int PPRByteOffset = 0;
1736 bool SplitPPRs = AFI->hasSplitSVEObjects();
1737 if (SplitPPRs) {
1738 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1739 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1740 } else if (!FPAfterSVECalleeSaves) {
1741 ZPRByteOffset =
1743 // Unused: Everything goes in ZPR space.
1744 PPRByteOffset = 0;
1745 }
1746
1747 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1748 Register LastReg = 0;
1749 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1750
1751 auto AlignOffset = [StackFillDir](int Offset, int Align) {
1752 if (StackFillDir < 0)
1753 return alignDown(Offset, Align);
1754 return alignTo(Offset, Align);
1755 };
1756
1757 // When iterating backwards, the loop condition relies on unsigned wraparound.
1758 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1759 RegPairInfo RPI;
1760 RPI.Reg1 = CSI[i].getReg();
1761
1762 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1763 RPI.Type = RegPairInfo::GPR;
1764 RPI.RC = &AArch64::GPR64RegClass;
1765 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1766 RPI.Type = RegPairInfo::FPR64;
1767 RPI.RC = &AArch64::FPR64RegClass;
1768 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1769 RPI.Type = RegPairInfo::FPR128;
1770 RPI.RC = &AArch64::FPR128RegClass;
1771 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1772 RPI.Type = RegPairInfo::ZPR;
1773 RPI.RC = &AArch64::ZPRRegClass;
1774 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1775 RPI.Type = RegPairInfo::PPR;
1776 RPI.RC = &AArch64::PPRRegClass;
1777 } else if (RPI.Reg1 == AArch64::VG) {
1778 RPI.Type = RegPairInfo::VG;
1779 RPI.RC = &AArch64::FIXED_REGSRegClass;
1780 } else {
1781 llvm_unreachable("Unsupported register class.");
1782 }
1783
1784 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1785 ? PPRByteOffset
1786 : ZPRByteOffset;
1787
1788 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1789 if (HasCSHazardPadding &&
1790 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1792 ByteOffset += StackFillDir * StackHazardSize;
1793 LastReg = RPI.Reg1;
1794
1795 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1796 int Scale = TRI->getSpillSize(*RPI.RC);
1797 // Add the next reg to the pair if it is in the same register class.
1798 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1799 MCRegister NextReg = CSI[i + RegInc].getReg();
1800 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1801 switch (RPI.Type) {
1802 case RegPairInfo::GPR:
1803 if (AArch64::GPR64RegClass.contains(NextReg) &&
1804 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1805 RPI.Reg1, NextReg, IsWindows,
1806 NeedsWinCFI, NeedsFrameRecord, TRI))
1807 RPI.Reg2 = NextReg;
1808 break;
1809 case RegPairInfo::FPR64:
1810 if (AArch64::FPR64RegClass.contains(NextReg) &&
1811 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1812 RPI.Reg1, NextReg, IsWindows,
1813 NeedsWinCFI, NeedsFrameRecord, TRI))
1814 RPI.Reg2 = NextReg;
1815 break;
1816 case RegPairInfo::FPR128:
1817 if (AArch64::FPR128RegClass.contains(NextReg))
1818 RPI.Reg2 = NextReg;
1819 break;
1820 case RegPairInfo::PPR:
1821 break;
1822 case RegPairInfo::ZPR:
1823 if (AFI->getPredicateRegForFillSpill() != 0 &&
1824 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1825 // Calculate offset of register pair to see if pair instruction can be
1826 // used.
1827 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1828 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1829 RPI.Reg2 = NextReg;
1830 }
1831 break;
1832 case RegPairInfo::VG:
1833 break;
1834 }
1835 }
1836
1837 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1838 // list to come in sorted by frame index so that we can issue the store
1839 // pair instructions directly. Assert if we see anything otherwise.
1840 //
1841 // The order of the registers in the list is controlled by
1842 // getCalleeSavedRegs(), so they will always be in-order, as well.
1843 assert((!RPI.isPaired() ||
1844 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1845 "Out of order callee saved regs!");
1846
1847 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1848 RPI.Reg1 == AArch64::LR) &&
1849 "FrameRecord must be allocated together with LR");
1850
1851 // Windows AAPCS has FP and LR reversed.
1852 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1853 RPI.Reg2 == AArch64::LR) &&
1854 "FrameRecord must be allocated together with LR");
1855
1856 // MachO's compact unwind format relies on all registers being stored in
1857 // adjacent register pairs.
1858 assert((!produceCompactUnwindFrame(AFL, MF) ||
1861 (RPI.isPaired() &&
1862 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1863 RPI.Reg1 + 1 == RPI.Reg2))) &&
1864 "Callee-save registers not saved as adjacent register pair!");
1865
1866 RPI.FrameIdx = CSI[i].getFrameIdx();
1867 if (IsWindows &&
1868 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1869 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1870
1871 // Realign the scalable offset if necessary. This is relevant when spilling
1872 // predicates on Windows.
1873 if (RPI.isScalable() && ScalableByteOffset % Scale != 0)
1874 ScalableByteOffset = AlignOffset(ScalableByteOffset, Scale);
1875
1876 // Realign the fixed offset if necessary. This is relevant when spilling Q
1877 // registers after spilling an odd amount of X registers.
1878 if (!RPI.isScalable() && ByteOffset % Scale != 0)
1879 ByteOffset = AlignOffset(ByteOffset, Scale);
1880
1881 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1882 assert(OffsetPre % Scale == 0);
1883
1884 if (RPI.isScalable())
1885 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1886 else
1887 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1888
1889 // Swift's async context is directly before FP, so allocate an extra
1890 // 8 bytes for it.
1891 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1892 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1893 (IsWindows && RPI.Reg2 == AArch64::LR)))
1894 ByteOffset += StackFillDir * 8;
1895
1896 // Round up size of non-pair to pair size if we need to pad the
1897 // callee-save area to ensure 16-byte alignment.
1898 if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
1899 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1900 ByteOffset % 16 != 0) {
1901 ByteOffset += 8 * StackFillDir;
1902 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1903 // A stack frame with a gap looks like this, bottom up:
1904 // d9, d8. x21, gap, x20, x19.
1905 // Set extra alignment on the x21 object to create the gap above it.
1906 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1907 NeedGapToAlignStack = false;
1908 }
1909
1910 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1911 assert(OffsetPost % Scale == 0);
1912 // If filling top down (default), we want the offset after incrementing it.
1913 // If filling bottom up (WinCFI) we need the original offset.
1914 int Offset = IsWindows ? OffsetPre : OffsetPost;
1915
1916 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1917 // Swift context can directly precede FP.
1918 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1919 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1920 (IsWindows && RPI.Reg2 == AArch64::LR)))
1921 Offset += 8;
1922 RPI.Offset = Offset / Scale;
1923
1924 assert((!RPI.isPaired() ||
1925 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1926 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1927 "Offset out of bounds for LDP/STP immediate");
1928
1929 auto isFrameRecord = [&] {
1930 if (RPI.isPaired())
1931 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1932 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1933 // Otherwise, look for the frame record as two unpaired registers. This is
1934 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1935 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1936 // On Windows, this check works out as current reg == FP, next reg == LR,
1937 // and on other platforms current reg == FP, previous reg == LR. This
1938 // works out as the correct pre-increment or post-increment offsets
1939 // respectively.
1940 return i > 0 && RPI.Reg1 == AArch64::FP &&
1941 CSI[i - 1].getReg() == AArch64::LR;
1942 };
1943
1944 // Save the offset to frame record so that the FP register can point to the
1945 // innermost frame record (spilled FP and LR registers).
1946 if (NeedsFrameRecord && isFrameRecord())
1948
1949 RegPairs.push_back(RPI);
1950 if (RPI.isPaired())
1951 i += RegInc;
1952 }
1953 if (IsWindows) {
1954 // If we need an alignment gap in the stack, align the topmost stack
1955 // object. A stack frame with a gap looks like this, bottom up:
1956 // x19, d8. d9, gap.
1957 // Set extra alignment on the topmost stack object (the first element in
1958 // CSI, which goes top down), to create the gap above it.
1959 if (AFI->hasCalleeSaveStackFreeSpace())
1960 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1961 // We iterated bottom up over the registers; flip RegPairs back to top
1962 // down order.
1963 std::reverse(RegPairs.begin(), RegPairs.end());
1964 }
1965}
1966
1970 MachineFunction &MF = *MBB.getParent();
1971 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1972 auto &TLI = *Subtarget.getTargetLowering();
1973 const AArch64InstrInfo &TII = *Subtarget.getInstrInfo();
1974 bool NeedsWinCFI = needsWinCFI(MF);
1975 DebugLoc DL;
1977
1978 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1979
1980 MachineRegisterInfo &MRI = MF.getRegInfo();
1981 // Refresh the reserved regs in case there are any potential changes since the
1982 // last freeze.
1983 MRI.freezeReservedRegs();
1984
1985 if (homogeneousPrologEpilog(MF)) {
1986 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1988
1989 for (auto &RPI : RegPairs) {
1990 MIB.addReg(RPI.Reg1);
1991 MIB.addReg(RPI.Reg2);
1992
1993 // Update register live in.
1994 if (!MRI.isReserved(RPI.Reg1))
1995 MBB.addLiveIn(RPI.Reg1);
1996 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1997 MBB.addLiveIn(RPI.Reg2);
1998 }
1999 return true;
2000 }
2001 bool PTrueCreated = false;
2002 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2003 Register Reg1 = RPI.Reg1;
2004 Register Reg2 = RPI.Reg2;
2005 unsigned StrOpc;
2006
2007 // Issue sequence of spills for cs regs. The first spill may be converted
2008 // to a pre-decrement store later by emitPrologue if the callee-save stack
2009 // area allocation can't be combined with the local stack area allocation.
2010 // For example:
2011 // stp x22, x21, [sp, #0] // addImm(+0)
2012 // stp x20, x19, [sp, #16] // addImm(+2)
2013 // stp fp, lr, [sp, #32] // addImm(+4)
2014 // Rationale: This sequence saves uop updates compared to a sequence of
2015 // pre-increment spills like stp xi,xj,[sp,#-16]!
2016 // Note: Similar rationale and sequence for restores in epilog.
2017 unsigned Size = TRI->getSpillSize(*RPI.RC);
2018 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2019 switch (RPI.Type) {
2020 case RegPairInfo::GPR:
2021 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2022 break;
2023 case RegPairInfo::FPR64:
2024 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2025 break;
2026 case RegPairInfo::FPR128:
2027 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2028 break;
2029 case RegPairInfo::ZPR:
2030 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2031 break;
2032 case RegPairInfo::PPR:
2033 StrOpc = AArch64::STR_PXI;
2034 break;
2035 case RegPairInfo::VG:
2036 StrOpc = AArch64::STRXui;
2037 break;
2038 }
2039
2040 Register X0Scratch;
2041 llvm::scope_exit RestoreX0([&] {
2042 if (X0Scratch != AArch64::NoRegister)
2043 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2044 .addReg(X0Scratch)
2046 });
2047
2048 if (Reg1 == AArch64::VG) {
2049 // Find an available register to store value of VG to.
2050 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2051 assert(Reg1 != AArch64::NoRegister);
2052 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2053 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2054 .addImm(31)
2055 .addImm(1)
2057 } else {
2059 if (any_of(MBB.liveins(),
2060 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2061 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2062 AArch64::X0, LiveIn.PhysReg);
2063 })) {
2064 X0Scratch = Reg1;
2065 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2066 .addReg(AArch64::X0)
2068 }
2069
2070 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2071 const uint32_t *RegMask =
2072 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2073 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2074 .addExternalSymbol(TLI.getLibcallName(LC))
2075 .addRegMask(RegMask)
2076 .addReg(AArch64::X0, RegState::ImplicitDefine)
2078 Reg1 = AArch64::X0;
2079 }
2080 }
2081
2082 LLVM_DEBUG({
2083 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2084 if (RPI.isPaired())
2085 dbgs() << ", " << printReg(Reg2, TRI);
2086 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2087 if (RPI.isPaired())
2088 dbgs() << ", " << RPI.FrameIdx + 1;
2089 dbgs() << ")\n";
2090 });
2091
2092 assert((!isTargetWindows(MF) ||
2093 !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2094 "Windows unwdinding requires a consecutive (FP,LR) pair");
2095 // Windows unwind codes require consecutive registers if registers are
2096 // paired. Make the switch here, so that the code below will save (x,x+1)
2097 // and not (x+1,x).
2098 unsigned FrameIdxReg1 = RPI.FrameIdx;
2099 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2100 if (isTargetWindows(MF) && RPI.isPaired()) {
2101 std::swap(Reg1, Reg2);
2102 std::swap(FrameIdxReg1, FrameIdxReg2);
2103 }
2104
2105 if (RPI.isPaired() && RPI.isScalable()) {
2106 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2109 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2110 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2111 "Expects SVE2.1 or SME2 target and a predicate register");
2112#ifdef EXPENSIVE_CHECKS
2113 auto IsPPR = [](const RegPairInfo &c) {
2114 return c.Reg1 == RegPairInfo::PPR;
2115 };
2116 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2117 auto IsZPR = [](const RegPairInfo &c) {
2118 return c.Type == RegPairInfo::ZPR;
2119 };
2120 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2121 assert(!(PPRBegin < ZPRBegin) &&
2122 "Expected callee save predicate to be handled first");
2123#endif
2124 if (!PTrueCreated) {
2125 PTrueCreated = true;
2126 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2128 }
2129 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2130 if (!MRI.isReserved(Reg1))
2131 MBB.addLiveIn(Reg1);
2132 if (!MRI.isReserved(Reg2))
2133 MBB.addLiveIn(Reg2);
2134 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2136 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2137 MachineMemOperand::MOStore, Size, Alignment));
2138 MIB.addReg(PnReg);
2139 MIB.addReg(AArch64::SP)
2140 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2141 // where 2*vscale is implicit
2144 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2145 MachineMemOperand::MOStore, Size, Alignment));
2146 if (NeedsWinCFI)
2147 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2148 } else { // The code when the pair of ZReg is not present
2149 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2150 if (!MRI.isReserved(Reg1))
2151 MBB.addLiveIn(Reg1);
2152 if (RPI.isPaired()) {
2153 if (!MRI.isReserved(Reg2))
2154 MBB.addLiveIn(Reg2);
2155 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2157 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2158 MachineMemOperand::MOStore, Size, Alignment));
2159 }
2160 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2161 .addReg(AArch64::SP)
2162 .addImm(RPI.Offset) // [sp, #offset*vscale],
2163 // where factor*vscale is implicit
2166 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2167 MachineMemOperand::MOStore, Size, Alignment));
2168 if (NeedsWinCFI)
2169 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2170 }
2171 // Update the StackIDs of the SVE stack slots.
2172 MachineFrameInfo &MFI = MF.getFrameInfo();
2173 if (RPI.Type == RegPairInfo::ZPR) {
2174 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2175 if (RPI.isPaired())
2176 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2177 } else if (RPI.Type == RegPairInfo::PPR) {
2179 if (RPI.isPaired())
2181 }
2182 }
2183 return true;
2184}
2185
2189 MachineFunction &MF = *MBB.getParent();
2190 const AArch64InstrInfo &TII =
2191 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
2192 DebugLoc DL;
2194 bool NeedsWinCFI = needsWinCFI(MF);
2195
2196 if (MBBI != MBB.end())
2197 DL = MBBI->getDebugLoc();
2198
2199 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2200 if (homogeneousPrologEpilog(MF, &MBB)) {
2201 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2203 for (auto &RPI : RegPairs) {
2204 MIB.addReg(RPI.Reg1, RegState::Define);
2205 MIB.addReg(RPI.Reg2, RegState::Define);
2206 }
2207 return true;
2208 }
2209
2210 // For performance reasons restore SVE register in increasing order
2211 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2212 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2213 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2214 std::reverse(PPRBegin, PPREnd);
2215 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2216 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2217 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2218 std::reverse(ZPRBegin, ZPREnd);
2219
2220 bool PTrueCreated = false;
2221 for (const RegPairInfo &RPI : RegPairs) {
2222 Register Reg1 = RPI.Reg1;
2223 Register Reg2 = RPI.Reg2;
2224
2225 // Issue sequence of restores for cs regs. The last restore may be converted
2226 // to a post-increment load later by emitEpilogue if the callee-save stack
2227 // area allocation can't be combined with the local stack area allocation.
2228 // For example:
2229 // ldp fp, lr, [sp, #32] // addImm(+4)
2230 // ldp x20, x19, [sp, #16] // addImm(+2)
2231 // ldp x22, x21, [sp, #0] // addImm(+0)
2232 // Note: see comment in spillCalleeSavedRegisters()
2233 unsigned LdrOpc;
2234 unsigned Size = TRI->getSpillSize(*RPI.RC);
2235 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2236 switch (RPI.Type) {
2237 case RegPairInfo::GPR:
2238 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2239 break;
2240 case RegPairInfo::FPR64:
2241 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2242 break;
2243 case RegPairInfo::FPR128:
2244 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2245 break;
2246 case RegPairInfo::ZPR:
2247 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2248 break;
2249 case RegPairInfo::PPR:
2250 LdrOpc = AArch64::LDR_PXI;
2251 break;
2252 case RegPairInfo::VG:
2253 continue;
2254 }
2255 LLVM_DEBUG({
2256 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2257 if (RPI.isPaired())
2258 dbgs() << ", " << printReg(Reg2, TRI);
2259 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2260 if (RPI.isPaired())
2261 dbgs() << ", " << RPI.FrameIdx + 1;
2262 dbgs() << ")\n";
2263 });
2264
2265 // Windows unwind codes require consecutive registers if registers are
2266 // paired. Make the switch here, so that the code below will save (x,x+1)
2267 // and not (x+1,x).
2268 unsigned FrameIdxReg1 = RPI.FrameIdx;
2269 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2270 if (isTargetWindows(MF) && RPI.isPaired()) {
2271 std::swap(Reg1, Reg2);
2272 std::swap(FrameIdxReg1, FrameIdxReg2);
2273 }
2274
2276 if (RPI.isPaired() && RPI.isScalable()) {
2277 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2279 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2280 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2281 "Expects SVE2.1 or SME2 target and a predicate register");
2282#ifdef EXPENSIVE_CHECKS
2283 assert(!(PPRBegin < ZPRBegin) &&
2284 "Expected callee save predicate to be handled first");
2285#endif
2286 if (!PTrueCreated) {
2287 PTrueCreated = true;
2288 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2290 }
2291 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2292 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2293 getDefRegState(true));
2295 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2296 MachineMemOperand::MOLoad, Size, Alignment));
2297 MIB.addReg(PnReg);
2298 MIB.addReg(AArch64::SP)
2299 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2300 // where 2*vscale is implicit
2303 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2304 MachineMemOperand::MOLoad, Size, Alignment));
2305 if (NeedsWinCFI)
2306 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2307 } else {
2308 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2309 if (RPI.isPaired()) {
2310 MIB.addReg(Reg2, getDefRegState(true));
2312 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2313 MachineMemOperand::MOLoad, Size, Alignment));
2314 }
2315 MIB.addReg(Reg1, getDefRegState(true));
2316 MIB.addReg(AArch64::SP)
2317 .addImm(RPI.Offset) // [sp, #offset*vscale]
2318 // where factor*vscale is implicit
2321 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2322 MachineMemOperand::MOLoad, Size, Alignment));
2323 if (NeedsWinCFI)
2324 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2325 }
2326 }
2327 return true;
2328}
2329
2330// Return the FrameID for a MMO.
2331static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2332 const MachineFrameInfo &MFI) {
2333 auto *PSV =
2335 if (PSV)
2336 return std::optional<int>(PSV->getFrameIndex());
2337
2338 if (MMO->getValue()) {
2339 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2340 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2341 FI++)
2342 if (MFI.getObjectAllocation(FI) == Al)
2343 return FI;
2344 }
2345 }
2346
2347 return std::nullopt;
2348}
2349
2350// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2351static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2352 const MachineFrameInfo &MFI) {
2353 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2354 return std::nullopt;
2355
2356 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2357}
2358
2359// Returns true if the LDST MachineInstr \p MI is a PPR access.
2360static bool isPPRAccess(const MachineInstr &MI) {
2361 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2362}
2363
2364// Check if a Hazard slot is needed for the current function, and if so create
2365// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2366// which can be used to determine if any hazard padding is needed.
2367void AArch64FrameLowering::determineStackHazardSlot(
2368 MachineFunction &MF, BitVector &SavedRegs) const {
2369 unsigned StackHazardSize = getStackHazardSize(MF);
2370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2371 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2373 return;
2374
2375 // Stack hazards are only needed in streaming functions.
2376 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2377 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2378 return;
2379
2380 MachineFrameInfo &MFI = MF.getFrameInfo();
2381
2382 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2383 // stack objects.
2384 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2385 return AArch64::FPR64RegClass.contains(Reg) ||
2386 AArch64::FPR128RegClass.contains(Reg) ||
2387 AArch64::ZPRRegClass.contains(Reg);
2388 });
2389 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2390 return AArch64::PPRRegClass.contains(Reg);
2391 });
2392 bool HasFPRStackObjects = false;
2393 bool HasPPRStackObjects = false;
2394 if (!HasFPRCSRs || SplitSVEObjects) {
2395 enum SlotType : uint8_t {
2396 Unknown = 0,
2397 ZPRorFPR = 1 << 0,
2398 PPR = 1 << 1,
2399 GPR = 1 << 2,
2401 };
2402
2403 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2404 // based on the kinds of accesses used in the function.
2405 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2406 for (auto &MBB : MF) {
2407 for (auto &MI : MBB) {
2408 std::optional<int> FI = getLdStFrameID(MI, MFI);
2409 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2410 continue;
2411 if (MFI.hasScalableStackID(*FI)) {
2412 SlotTypes[*FI] |=
2413 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2414 } else {
2415 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2416 ? SlotType::ZPRorFPR
2417 : SlotType::GPR;
2418 }
2419 }
2420 }
2421
2422 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2423 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2424 // For SplitSVEObjects remember that this stack slot is a predicate, this
2425 // will be needed later when determining the frame layout.
2426 if (SlotTypes[FI] == SlotType::PPR) {
2428 HasPPRStackObjects = true;
2429 }
2430 }
2431 }
2432
2433 if (HasFPRCSRs || HasFPRStackObjects) {
2434 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2435 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2436 << StackHazardSize << "\n");
2438 }
2439
2440 if (!AFI->hasStackHazardSlotIndex())
2441 return;
2442
2443 if (SplitSVEObjects) {
2444 CallingConv::ID CC = MF.getFunction().getCallingConv();
2445 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2446 AFI->setSplitSVEObjects(true);
2447 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2448 return;
2449 }
2450
2451 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2452 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2453 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2454 "non-SVE CC function...\n");
2455
2456 // If another calling convention is explicitly set FPRs can't be promoted to
2457 // ZPR callee-saves.
2459 LLVM_DEBUG(
2460 dbgs()
2461 << "Calling convention is not supported with SplitSVEObjects\n");
2462 return;
2463 }
2464
2465 if (!HasPPRCSRs && !HasPPRStackObjects) {
2466 LLVM_DEBUG(
2467 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2468 return;
2469 }
2470
2471 if (!HasFPRCSRs && !HasFPRStackObjects) {
2472 LLVM_DEBUG(
2473 dbgs()
2474 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2475 return;
2476 }
2477
2478 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2479 MF.getSubtarget<AArch64Subtarget>();
2481 "Expected SVE to be available for PPRs");
2482
2483 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2484 // With SplitSVEObjects the CS hazard padding is placed between the
2485 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2486 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2487 BitVector FPRZRegs(SavedRegs.size());
2488 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2489 BitVector::reference RegBit = SavedRegs[Reg];
2490 if (!RegBit)
2491 continue;
2492 unsigned SubRegIdx = 0;
2493 if (AArch64::FPR64RegClass.contains(Reg))
2494 SubRegIdx = AArch64::dsub;
2495 else if (AArch64::FPR128RegClass.contains(Reg))
2496 SubRegIdx = AArch64::zsub;
2497 else
2498 continue;
2499 // Clear the bit for the FPR save.
2500 RegBit = false;
2501 // Mark that we should save the corresponding ZPR.
2502 Register ZReg =
2503 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2504 FPRZRegs.set(ZReg);
2505 }
2506 SavedRegs |= FPRZRegs;
2507
2508 AFI->setSplitSVEObjects(true);
2509 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2510 }
2511}
2512
2514 BitVector &SavedRegs,
2515 RegScavenger *RS) const {
2516 // All calls are tail calls in GHC calling conv, and functions have no
2517 // prologue/epilogue.
2519 return;
2520
2521 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2522
2524 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2526 unsigned UnspilledCSGPR = AArch64::NoRegister;
2527 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2528
2529 MachineFrameInfo &MFI = MF.getFrameInfo();
2530 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2531
2532 MCRegister BasePointerReg =
2533 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2534
2535 unsigned ExtraCSSpill = 0;
2536 bool HasUnpairedGPR64 = false;
2537 bool HasPairZReg = false;
2538 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2539 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2540
2541 // Figure out which callee-saved registers to save/restore.
2542 for (unsigned i = 0; CSRegs[i]; ++i) {
2543 const MCRegister Reg = CSRegs[i];
2544
2545 // Add the base pointer register to SavedRegs if it is callee-save.
2546 if (Reg == BasePointerReg)
2547 SavedRegs.set(Reg);
2548
2549 // Don't save manually reserved registers set through +reserve-x#i,
2550 // even for callee-saved registers, as per GCC's behavior.
2551 if (UserReservedRegs[Reg]) {
2552 SavedRegs.reset(Reg);
2553 continue;
2554 }
2555
2556 bool RegUsed = SavedRegs.test(Reg);
2557 MCRegister PairedReg;
2558 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2559 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2560 AArch64::FPR128RegClass.contains(Reg)) {
2561 // Compensate for odd numbers of GP CSRs.
2562 // For now, all the known cases of odd number of CSRs are of GPRs.
2563 if (HasUnpairedGPR64)
2564 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2565 else
2566 PairedReg = CSRegs[i ^ 1];
2567 }
2568
2569 // If the function requires all the GP registers to save (SavedRegs),
2570 // and there are an odd number of GP CSRs at the same time (CSRegs),
2571 // PairedReg could be in a different register class from Reg, which would
2572 // lead to a FPR (usually D8) accidentally being marked saved.
2573 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2574 PairedReg = AArch64::NoRegister;
2575 HasUnpairedGPR64 = true;
2576 }
2577 assert(PairedReg == AArch64::NoRegister ||
2578 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2579 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2580 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2581
2582 if (!RegUsed) {
2583 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2584 UnspilledCSGPR = Reg;
2585 UnspilledCSGPRPaired = PairedReg;
2586 }
2587 continue;
2588 }
2589
2590 // MachO's compact unwind format relies on all registers being stored in
2591 // pairs.
2592 // FIXME: the usual format is actually better if unwinding isn't needed.
2593 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2594 !SavedRegs.test(PairedReg)) {
2595 SavedRegs.set(PairedReg);
2596 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2597 !ReservedRegs[PairedReg])
2598 ExtraCSSpill = PairedReg;
2599 }
2600 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2601 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2602 SavedRegs.test(CSRegs[i ^ 1]));
2603 }
2604
2605 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2607 // Find a suitable predicate register for the multi-vector spill/fill
2608 // instructions.
2609 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2610 if (PnReg.isValid())
2611 AFI->setPredicateRegForFillSpill(PnReg);
2612 // If no free callee-save has been found assign one.
2613 if (!AFI->getPredicateRegForFillSpill() &&
2614 MF.getFunction().getCallingConv() ==
2616 SavedRegs.set(AArch64::P8);
2617 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2618 }
2619
2620 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2621 "Predicate cannot be a reserved register");
2622 }
2623
2625 !Subtarget.isTargetWindows()) {
2626 // For Windows calling convention on a non-windows OS, where X18 is treated
2627 // as reserved, back up X18 when entering non-windows code (marked with the
2628 // Windows calling convention) and restore when returning regardless of
2629 // whether the individual function uses it - it might call other functions
2630 // that clobber it.
2631 SavedRegs.set(AArch64::X18);
2632 }
2633
2634 // Determine if a Hazard slot should be used and where it should go.
2635 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2636 // and ZPRs. Otherwise, it goes in the callee save area.
2637 determineStackHazardSlot(MF, SavedRegs);
2638
2639 // Calculates the callee saved stack size.
2640 unsigned CSStackSize = 0;
2641 unsigned ZPRCSStackSize = 0;
2642 unsigned PPRCSStackSize = 0;
2644 for (unsigned Reg : SavedRegs.set_bits()) {
2645 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2646 assert(RC && "expected register class!");
2647 auto SpillSize = TRI->getSpillSize(*RC);
2648 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2649 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2650 if (IsZPR)
2651 ZPRCSStackSize += SpillSize;
2652 else if (IsPPR)
2653 PPRCSStackSize += SpillSize;
2654 else
2655 CSStackSize += SpillSize;
2656 }
2657
2658 // Save number of saved regs, so we can easily update CSStackSize later to
2659 // account for any additional 64-bit GPR saves. Note: After this point
2660 // only 64-bit GPRs can be added to SavedRegs.
2661 unsigned NumSavedRegs = SavedRegs.count();
2662
2663 // If we have hazard padding in the CS area add that to the size.
2665 CSStackSize += getStackHazardSize(MF);
2666
2667 // Increase the callee-saved stack size if the function has streaming mode
2668 // changes, as we will need to spill the value of the VG register.
2669 if (requiresSaveVG(MF))
2670 CSStackSize += 8;
2671
2672 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2673 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2674 SavedRegs.set(AArch64::LR);
2675
2676 // The frame record needs to be created by saving the appropriate registers
2677 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2678 if (hasFP(MF) ||
2679 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2680 SavedRegs.set(AArch64::FP);
2681 SavedRegs.set(AArch64::LR);
2682 }
2683
2684 LLVM_DEBUG({
2685 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2686 for (unsigned Reg : SavedRegs.set_bits())
2687 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2688 dbgs() << "\n";
2689 });
2690
2691 // If any callee-saved registers are used, the frame cannot be eliminated.
2692 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2694 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2695 uint64_t SVEStackSize =
2696 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2697 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2698
2699 // The CSR spill slots have not been allocated yet, so estimateStackSize
2700 // won't include them.
2701 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2702
2703 // We may address some of the stack above the canonical frame address, either
2704 // for our own arguments or during a call. Include that in calculating whether
2705 // we have complicated addressing concerns.
2706 int64_t CalleeStackUsed = 0;
2707 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2708 int64_t FixedOff = MFI.getObjectOffset(I);
2709 if (FixedOff > CalleeStackUsed)
2710 CalleeStackUsed = FixedOff;
2711 }
2712
2713 // Conservatively always assume BigStack when there are SVE spills.
2714 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2715 CalleeStackUsed) > EstimatedStackSizeLimit;
2716 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2717 AFI->setHasStackFrame(true);
2718
2719 // Estimate if we might need to scavenge a register at some point in order
2720 // to materialize a stack offset. If so, either spill one additional
2721 // callee-saved register or reserve a special spill slot to facilitate
2722 // register scavenging. If we already spilled an extra callee-saved register
2723 // above to keep the number of spills even, we don't need to do anything else
2724 // here.
2725 if (BigStack) {
2726 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2727 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2728 << " to get a scratch register.\n");
2729 SavedRegs.set(UnspilledCSGPR);
2730 ExtraCSSpill = UnspilledCSGPR;
2731
2732 // MachO's compact unwind format relies on all registers being stored in
2733 // pairs, so if we need to spill one extra for BigStack, then we need to
2734 // store the pair.
2735 if (producePairRegisters(MF)) {
2736 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2737 // Failed to make a pair for compact unwind format, revert spilling.
2738 if (produceCompactUnwindFrame(*this, MF)) {
2739 SavedRegs.reset(UnspilledCSGPR);
2740 ExtraCSSpill = AArch64::NoRegister;
2741 }
2742 } else
2743 SavedRegs.set(UnspilledCSGPRPaired);
2744 }
2745 }
2746
2747 // If we didn't find an extra callee-saved register to spill, create
2748 // an emergency spill slot.
2749 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2751 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2752 unsigned Size = TRI->getSpillSize(RC);
2753 Align Alignment = TRI->getSpillAlign(RC);
2754 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2755 RS->addScavengingFrameIndex(FI);
2756 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2757 << " as the emergency spill slot.\n");
2758 }
2759 }
2760
2761 // Adding the size of additional 64bit GPR saves.
2762 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2763
2764 // A Swift asynchronous context extends the frame record with a pointer
2765 // directly before FP.
2766 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2767 CSStackSize += 8;
2768
2769 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2770 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2771 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2772
2774 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2775 "Should not invalidate callee saved info");
2776
2777 // Round up to register pair alignment to avoid additional SP adjustment
2778 // instructions.
2779 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2780 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2781 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2782}
2783
2785 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2786 std::vector<CalleeSavedInfo> &CSI) const {
2787 bool IsWindows = isTargetWindows(MF);
2788 unsigned StackHazardSize = getStackHazardSize(MF);
2789 // To match the canonical windows frame layout, reverse the list of
2790 // callee saved registers to get them laid out by PrologEpilogInserter
2791 // in the right order. (PrologEpilogInserter allocates stack objects top
2792 // down. Windows canonical prologs store higher numbered registers at
2793 // the top, thus have the CSI array start from the highest registers.)
2794 if (IsWindows)
2795 std::reverse(CSI.begin(), CSI.end());
2796
2797 if (CSI.empty())
2798 return true; // Early exit if no callee saved registers are modified!
2799
2800 // Now that we know which registers need to be saved and restored, allocate
2801 // stack slots for them.
2802 MachineFrameInfo &MFI = MF.getFrameInfo();
2803 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2804
2805 if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2806 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2807 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2808 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2809 }
2810
2811 // Insert VG into the list of CSRs, immediately before LR if saved.
2812 if (requiresSaveVG(MF)) {
2813 CalleeSavedInfo VGInfo(AArch64::VG);
2814 auto It =
2815 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2816 if (It != CSI.end())
2817 CSI.insert(It, VGInfo);
2818 else
2819 CSI.push_back(VGInfo);
2820 }
2821
2822 Register LastReg = 0;
2823 int HazardSlotIndex = std::numeric_limits<int>::max();
2824 for (auto &CS : CSI) {
2825 MCRegister Reg = CS.getReg();
2826 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2827
2828 // Create a hazard slot as we switch between GPR and FPR CSRs.
2830 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2832 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2833 "Unexpected register order for hazard slot");
2834 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2835 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2836 << "\n");
2837 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2838 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2839 }
2840
2841 unsigned Size = RegInfo->getSpillSize(*RC);
2842 Align Alignment(RegInfo->getSpillAlign(*RC));
2843 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2844 CS.setFrameIdx(FrameIdx);
2845 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2846
2847 // Grab 8 bytes below FP for the extended asynchronous frame info.
2848 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
2849 Reg == AArch64::FP) {
2850 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2851 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2852 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2853 }
2854 LastReg = Reg;
2855 }
2856
2857 // Add hazard slot in the case where no FPR CSRs are present.
2859 HazardSlotIndex == std::numeric_limits<int>::max()) {
2860 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2861 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2862 << "\n");
2863 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2864 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2865 }
2866
2867 return true;
2868}
2869
2871 const MachineFunction &MF) const {
2873 // If the function has streaming-mode changes, don't scavenge a
2874 // spillslot in the callee-save area, as that might require an
2875 // 'addvl' in the streaming-mode-changing call-sequence when the
2876 // function doesn't use a FP.
2877 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2878 return false;
2879 // Don't allow register salvaging with hazard slots, in case it moves objects
2880 // into the wrong place.
2881 if (AFI->hasStackHazardSlotIndex())
2882 return false;
2883 return AFI->hasCalleeSaveStackFreeSpace();
2884}
2885
2886/// returns true if there are any SVE callee saves.
2888 int &Min, int &Max) {
2889 Min = std::numeric_limits<int>::max();
2890 Max = std::numeric_limits<int>::min();
2891
2892 if (!MFI.isCalleeSavedInfoValid())
2893 return false;
2894
2895 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2896 for (auto &CS : CSI) {
2897 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2898 AArch64::PPRRegClass.contains(CS.getReg())) {
2899 assert((Max == std::numeric_limits<int>::min() ||
2900 Max + 1 == CS.getFrameIdx()) &&
2901 "SVE CalleeSaves are not consecutive");
2902 Min = std::min(Min, CS.getFrameIdx());
2903 Max = std::max(Max, CS.getFrameIdx());
2904 }
2905 }
2906 return Min != std::numeric_limits<int>::max();
2907}
2908
2910 AssignObjectOffsets AssignOffsets) {
2911 MachineFrameInfo &MFI = MF.getFrameInfo();
2912 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2913
2914 SVEStackSizes SVEStack{};
2915
2916 // With SplitSVEObjects we maintain separate stack offsets for predicates
2917 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2918 // are included in the SVE vector area.
2919 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2920 uint64_t &PPRStackTop =
2921 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2922
2923#ifndef NDEBUG
2924 // First process all fixed stack objects.
2925 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2926 assert(!MFI.hasScalableStackID(I) &&
2927 "SVE vectors should never be passed on the stack by value, only by "
2928 "reference.");
2929#endif
2930
2931 auto AllocateObject = [&](int FI) {
2933 ? ZPRStackTop
2934 : PPRStackTop;
2935
2936 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2937 // two, we'd need to align every object dynamically at runtime if the
2938 // alignment is larger than 16. This is not yet supported.
2939 Align Alignment = MFI.getObjectAlign(FI);
2940 if (Alignment > Align(16))
2942 "Alignment of scalable vectors > 16 bytes is not yet supported");
2943
2944 StackTop += MFI.getObjectSize(FI);
2945 StackTop = alignTo(StackTop, Alignment);
2946
2947 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2948 "SVE StackTop far too large?!");
2949
2950 int64_t Offset = -int64_t(StackTop);
2951 if (AssignOffsets == AssignObjectOffsets::Yes)
2952 MFI.setObjectOffset(FI, Offset);
2953
2954 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2955 };
2956
2957 // Then process all callee saved slots.
2958 int MinCSFrameIndex, MaxCSFrameIndex;
2959 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2960 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2961 AllocateObject(FI);
2962 }
2963
2964 // Ensure the CS area is 16-byte aligned.
2965 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2966 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2967
2968 // Create a buffer of SVE objects to allocate and sort it.
2969 SmallVector<int, 8> ObjectsToAllocate;
2970 // If we have a stack protector, and we've previously decided that we have SVE
2971 // objects on the stack and thus need it to go in the SVE stack area, then it
2972 // needs to go first.
2973 int StackProtectorFI = -1;
2974 if (MFI.hasStackProtectorIndex()) {
2975 StackProtectorFI = MFI.getStackProtectorIndex();
2976 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2977 ObjectsToAllocate.push_back(StackProtectorFI);
2978 }
2979
2980 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2981 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI) ||
2983 continue;
2984
2987 continue;
2988
2989 ObjectsToAllocate.push_back(FI);
2990 }
2991
2992 // Allocate all SVE locals and spills
2993 for (unsigned FI : ObjectsToAllocate)
2994 AllocateObject(FI);
2995
2996 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2997 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2998
2999 if (AssignOffsets == AssignObjectOffsets::Yes)
3000 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
3001
3002 return SVEStack;
3003}
3004
3006 MachineFunction &MF, RegScavenger *RS) const {
3008 "Upwards growing stack unsupported");
3009
3011
3012 // If this function isn't doing Win64-style C++ EH, we don't need to do
3013 // anything.
3014 if (!MF.hasEHFunclets())
3015 return;
3016
3017 MachineFrameInfo &MFI = MF.getFrameInfo();
3018 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3019
3020 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3021 // object area right next to the UnwindHelp object.
3022 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3023 int64_t CurrentOffset =
3025 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3026 for (WinEHHandlerType &H : TBME.HandlerArray) {
3027 int FrameIndex = H.CatchObj.FrameIndex;
3028 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3029 CurrentOffset =
3030 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3031 CurrentOffset += MFI.getObjectSize(FrameIndex);
3032 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3033 }
3034 }
3035 }
3036
3037 // Create an UnwindHelp object.
3038 // The UnwindHelp object is allocated at the start of the fixed object area
3039 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3040 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3041 /*IsFunclet*/ false) &&
3042 "UnwindHelpOffset must be at the start of the fixed object area");
3043 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3044 /*IsImmutable=*/false);
3045 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3046
3047 MachineBasicBlock &MBB = MF.front();
3048 auto MBBI = MBB.begin();
3049 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3050 ++MBBI;
3051
3052 // We need to store -2 into the UnwindHelp object at the start of the
3053 // function.
3054 DebugLoc DL;
3055 RS->enterBasicBlockEnd(MBB);
3056 RS->backward(MBBI);
3057 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3058 assert(DstReg && "There must be a free register after frame setup");
3059 const AArch64InstrInfo &TII =
3060 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3061 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3062 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3063 .addReg(DstReg, getKillRegState(true))
3064 .addFrameIndex(UnwindHelpFI)
3065 .addImm(0);
3066}
3067
3068namespace {
3069struct TagStoreInstr {
3071 int64_t Offset, Size;
3072 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3073 : MI(MI), Offset(Offset), Size(Size) {}
3074};
3075
3076class TagStoreEdit {
3077 MachineFunction *MF;
3078 MachineBasicBlock *MBB;
3079 MachineRegisterInfo *MRI;
3080 // Tag store instructions that are being replaced.
3082 // Combined memref arguments of the above instructions.
3084
3085 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3086 // FrameRegOffset + Size) with the address tag of SP.
3087 Register FrameReg;
3088 StackOffset FrameRegOffset;
3089 int64_t Size;
3090 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3091 // end.
3092 std::optional<int64_t> FrameRegUpdate;
3093 // MIFlags for any FrameReg updating instructions.
3094 unsigned FrameRegUpdateFlags;
3095
3096 // Use zeroing instruction variants.
3097 bool ZeroData;
3098 DebugLoc DL;
3099
3100 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3101 void emitLoop(MachineBasicBlock::iterator InsertI);
3102
3103public:
3104 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3105 : MBB(MBB), ZeroData(ZeroData) {
3106 MF = MBB->getParent();
3107 MRI = &MF->getRegInfo();
3108 }
3109 // Add an instruction to be replaced. Instructions must be added in the
3110 // ascending order of Offset, and have to be adjacent.
3111 void addInstruction(TagStoreInstr I) {
3112 assert((TagStores.empty() ||
3113 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3114 "Non-adjacent tag store instructions.");
3115 TagStores.push_back(I);
3116 }
3117 void clear() { TagStores.clear(); }
3118 // Emit equivalent code at the given location, and erase the current set of
3119 // instructions. May skip if the replacement is not profitable. May invalidate
3120 // the input iterator and replace it with a valid one.
3121 void emitCode(MachineBasicBlock::iterator &InsertI,
3122 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3123};
3124
3125void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3126 const AArch64InstrInfo *TII =
3127 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3128
3129 const int64_t kMinOffset = -256 * 16;
3130 const int64_t kMaxOffset = 255 * 16;
3131
3132 Register BaseReg = FrameReg;
3133 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3134 if (BaseRegOffsetBytes < kMinOffset ||
3135 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3136 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3137 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3138 // is required for the offset of ST2G.
3139 BaseRegOffsetBytes % 16 != 0) {
3140 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3141 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3142 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3143 BaseReg = ScratchReg;
3144 BaseRegOffsetBytes = 0;
3145 }
3146
3147 MachineInstr *LastI = nullptr;
3148 while (Size) {
3149 int64_t InstrSize = (Size > 16) ? 32 : 16;
3150 unsigned Opcode =
3151 InstrSize == 16
3152 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3153 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3154 assert(BaseRegOffsetBytes % 16 == 0);
3155 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3156 .addReg(AArch64::SP)
3157 .addReg(BaseReg)
3158 .addImm(BaseRegOffsetBytes / 16)
3159 .setMemRefs(CombinedMemRefs);
3160 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3161 // final SP adjustment in the epilogue.
3162 if (BaseRegOffsetBytes == 0)
3163 LastI = I;
3164 BaseRegOffsetBytes += InstrSize;
3165 Size -= InstrSize;
3166 }
3167
3168 if (LastI)
3169 MBB->splice(InsertI, MBB, LastI);
3170}
3171
3172void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3173 const AArch64InstrInfo *TII =
3174 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3175
3176 Register BaseReg = FrameRegUpdate
3177 ? FrameReg
3178 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3179 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3180
3181 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3182
3183 int64_t LoopSize = Size;
3184 // If the loop size is not a multiple of 32, split off one 16-byte store at
3185 // the end to fold BaseReg update into.
3186 if (FrameRegUpdate && *FrameRegUpdate)
3187 LoopSize -= LoopSize % 32;
3188 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3189 TII->get(ZeroData ? AArch64::STZGloop_wback
3190 : AArch64::STGloop_wback))
3191 .addDef(SizeReg)
3192 .addDef(BaseReg)
3193 .addImm(LoopSize)
3194 .addReg(BaseReg)
3195 .setMemRefs(CombinedMemRefs);
3196 if (FrameRegUpdate)
3197 LoopI->setFlags(FrameRegUpdateFlags);
3198
3199 int64_t ExtraBaseRegUpdate =
3200 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3201 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3202 << ", Size=" << Size
3203 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3204 << ", FrameRegUpdate=" << FrameRegUpdate
3205 << ", FrameRegOffset.getFixed()="
3206 << FrameRegOffset.getFixed() << "\n");
3207 if (LoopSize < Size) {
3208 assert(FrameRegUpdate);
3209 assert(Size - LoopSize == 16);
3210 // Tag 16 more bytes at BaseReg and update BaseReg.
3211 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3212 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3213 "STG immediate out of range");
3214 BuildMI(*MBB, InsertI, DL,
3215 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3216 .addDef(BaseReg)
3217 .addReg(BaseReg)
3218 .addReg(BaseReg)
3219 .addImm(STGOffset / 16)
3220 .setMemRefs(CombinedMemRefs)
3221 .setMIFlags(FrameRegUpdateFlags);
3222 } else if (ExtraBaseRegUpdate) {
3223 // Update BaseReg.
3224 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3225 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3226 BuildMI(
3227 *MBB, InsertI, DL,
3228 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3229 .addDef(BaseReg)
3230 .addReg(BaseReg)
3231 .addImm(AddSubOffset)
3232 .addImm(0)
3233 .setMIFlags(FrameRegUpdateFlags);
3234 }
3235}
3236
3237// Check if *II is a register update that can be merged into STGloop that ends
3238// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3239// end of the loop.
3240bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3241 int64_t Size, int64_t *TotalOffset) {
3242 MachineInstr &MI = *II;
3243 if ((MI.getOpcode() == AArch64::ADDXri ||
3244 MI.getOpcode() == AArch64::SUBXri) &&
3245 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3246 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3247 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3248 if (MI.getOpcode() == AArch64::SUBXri)
3249 Offset = -Offset;
3250 int64_t PostOffset = Offset - Size;
3251 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3252 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3253 // chosen depends on the alignment of the loop size, but the difference
3254 // between the valid ranges for the two instructions is small, so we
3255 // conservatively assume that it could be either case here.
3256 //
3257 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3258 // instruction.
3259 const int64_t kMaxOffset = 4080 - 16;
3260 // Max offset of SUBXri.
3261 const int64_t kMinOffset = -4095;
3262 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3263 PostOffset % 16 == 0) {
3264 *TotalOffset = Offset;
3265 return true;
3266 }
3267 }
3268 return false;
3269}
3270
3271void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3273 MemRefs.clear();
3274 for (auto &TS : TSE) {
3275 MachineInstr *MI = TS.MI;
3276 // An instruction without memory operands may access anything. Be
3277 // conservative and return an empty list.
3278 if (MI->memoperands_empty()) {
3279 MemRefs.clear();
3280 return;
3281 }
3282 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3283 }
3284}
3285
3286void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3287 const AArch64FrameLowering *TFI,
3288 bool TryMergeSPUpdate) {
3289 if (TagStores.empty())
3290 return;
3291 TagStoreInstr &FirstTagStore = TagStores[0];
3292 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3293 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3294 DL = TagStores[0].MI->getDebugLoc();
3295
3296 Register Reg;
3297 FrameRegOffset = TFI->resolveFrameOffsetReference(
3298 *MF, FirstTagStore.Offset, false /*isFixed*/,
3299 TargetStackID::Default /*StackID*/, Reg,
3300 /*PreferFP=*/false, /*ForSimm=*/true);
3301 FrameReg = Reg;
3302 FrameRegUpdate = std::nullopt;
3303
3304 mergeMemRefs(TagStores, CombinedMemRefs);
3305
3306 LLVM_DEBUG({
3307 dbgs() << "Replacing adjacent STG instructions:\n";
3308 for (const auto &Instr : TagStores) {
3309 dbgs() << " " << *Instr.MI;
3310 }
3311 });
3312
3313 // Size threshold where a loop becomes shorter than a linear sequence of
3314 // tagging instructions.
3315 const int kSetTagLoopThreshold = 176;
3316 if (Size < kSetTagLoopThreshold) {
3317 if (TagStores.size() < 2)
3318 return;
3319 emitUnrolled(InsertI);
3320 } else {
3321 MachineInstr *UpdateInstr = nullptr;
3322 int64_t TotalOffset = 0;
3323 if (TryMergeSPUpdate) {
3324 // See if we can merge base register update into the STGloop.
3325 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3326 // but STGloop is way too unusual for that, and also it only
3327 // realistically happens in function epilogue. Also, STGloop is expanded
3328 // before that pass.
3329 if (InsertI != MBB->end() &&
3330 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3331 &TotalOffset)) {
3332 UpdateInstr = &*InsertI++;
3333 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3334 << *UpdateInstr);
3335 }
3336 }
3337
3338 if (!UpdateInstr && TagStores.size() < 2)
3339 return;
3340
3341 if (UpdateInstr) {
3342 FrameRegUpdate = TotalOffset;
3343 FrameRegUpdateFlags = UpdateInstr->getFlags();
3344 }
3345 emitLoop(InsertI);
3346 if (UpdateInstr)
3347 UpdateInstr->eraseFromParent();
3348 }
3349
3350 for (auto &TS : TagStores)
3351 TS.MI->eraseFromParent();
3352}
3353
3354bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3355 int64_t &Size, bool &ZeroData) {
3356 MachineFunction &MF = *MI.getParent()->getParent();
3357 const MachineFrameInfo &MFI = MF.getFrameInfo();
3358
3359 unsigned Opcode = MI.getOpcode();
3360 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3361 Opcode == AArch64::STZ2Gi);
3362
3363 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3364 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3365 return false;
3366 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3367 return false;
3368 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3369 Size = MI.getOperand(2).getImm();
3370 return true;
3371 }
3372
3373 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3374 Size = 16;
3375 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3376 Size = 32;
3377 else
3378 return false;
3379
3380 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3381 return false;
3382
3383 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3384 16 * MI.getOperand(2).getImm();
3385 return true;
3386}
3387
3388static size_t countAvailableScavengerSlots(LivePhysRegs &LiveRegs,
3390 RegScavenger *RS) {
3391 auto FreeGPRs =
3392 llvm::count_if(AArch64::GPR64RegClass, [&LiveRegs, &MRI](auto Reg) {
3393 return LiveRegs.available(MRI, Reg);
3394 });
3395
3396 size_t NumEmergencySlots = 0;
3397 if (RS)
3398 NumEmergencySlots = RS->getNumScavengingFrameIndices();
3399
3400 return FreeGPRs + NumEmergencySlots;
3401}
3402
3403// Detect a run of memory tagging instructions for adjacent stack frame slots,
3404// and replace them with a shorter instruction sequence:
3405// * replace STG + STG with ST2G
3406// * replace STGloop + STGloop with STGloop
3407// This code needs to run when stack slot offsets are already known, but before
3408// FrameIndex operands in STG instructions are eliminated.
3410 const AArch64FrameLowering *TFI,
3411 RegScavenger *RS) {
3412 bool FirstZeroData;
3413 int64_t Size, Offset;
3414 MachineInstr &MI = *II;
3415 MachineBasicBlock *MBB = MI.getParent();
3417 if (&MI == &MBB->instr_back())
3418 return II;
3419 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3420 return II;
3421
3423 Instrs.emplace_back(&MI, Offset, Size);
3424
3425 constexpr int kScanLimit = 10;
3426 int Count = 0;
3428 NextI != E && Count < kScanLimit; ++NextI) {
3429 MachineInstr &MI = *NextI;
3430 bool ZeroData;
3431 int64_t Size, Offset;
3432 // Collect instructions that update memory tags with a FrameIndex operand
3433 // and (when applicable) constant size, and whose output registers are dead
3434 // (the latter is almost always the case in practice). Since these
3435 // instructions effectively have no inputs or outputs, we are free to skip
3436 // any non-aliasing instructions in between without tracking used registers.
3437 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3438 if (ZeroData != FirstZeroData)
3439 break;
3440 Instrs.emplace_back(&MI, Offset, Size);
3441 continue;
3442 }
3443
3444 // Only count non-transient, non-tagging instructions toward the scan
3445 // limit.
3446 if (!MI.isTransient())
3447 ++Count;
3448
3449 // Just in case, stop before the epilogue code starts.
3450 if (MI.getFlag(MachineInstr::FrameSetup) ||
3452 break;
3453
3454 // Reject anything that may alias the collected instructions.
3455 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3456 break;
3457 }
3458
3459 // New code will be inserted after the last tagging instruction we've found.
3460 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3461
3462 // All the gathered stack tag instructions are merged and placed after
3463 // last tag store in the list. The check should be made if the nzcv
3464 // flag is live at the point where we are trying to insert. Otherwise
3465 // the nzcv flag might get clobbered if any stg loops are present.
3466
3467 // FIXME : This approach of bailing out from merge is conservative in
3468 // some ways like even if stg loops are not present after merge the
3469 // insert list, this liveness check is done (which is not needed).
3471 LiveRegs.addLiveOuts(*MBB);
3472 for (auto I = MBB->rbegin();; ++I) {
3473 MachineInstr &MI = *I;
3474 if (MI == InsertI)
3475 break;
3476 LiveRegs.stepBackward(*I);
3477 }
3478 InsertI++;
3479 if (LiveRegs.contains(AArch64::NZCV))
3480 return InsertI;
3481
3482 // Emitting an MTE loop requires two physical registers (BaseReg and
3483 // SizeReg). If the function is under register pressure, the register
3484 // scavenger will crash trying to allocate them. If we don't have at least
3485 // two free slots (free registers + emergency slots), bail out and fall back
3486 // to the unrolled sequence.
3487 if (countAvailableScavengerSlots(LiveRegs, MBB->getParent()->getRegInfo(),
3488 RS) < 2) {
3489 LLVM_DEBUG(
3490 dbgs() << "Failed to merge MTE stack tagging instructions into loop "
3491 << "due to high register pressure.\n");
3492 return InsertI;
3493 }
3494
3495 llvm::stable_sort(Instrs,
3496 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3497 return Left.Offset < Right.Offset;
3498 });
3499
3500 // Make sure that we don't have any overlapping stores.
3501 int64_t CurOffset = Instrs[0].Offset;
3502 for (auto &Instr : Instrs) {
3503 if (CurOffset > Instr.Offset)
3504 return NextI;
3505 CurOffset = Instr.Offset + Instr.Size;
3506 }
3507
3508 // Find contiguous runs of tagged memory and emit shorter instruction
3509 // sequences for them when possible.
3510 TagStoreEdit TSE(MBB, FirstZeroData);
3511 std::optional<int64_t> EndOffset;
3512 for (auto &Instr : Instrs) {
3513 if (EndOffset && *EndOffset != Instr.Offset) {
3514 // Found a gap.
3515 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3516 TSE.clear();
3517 }
3518
3519 TSE.addInstruction(Instr);
3520 EndOffset = Instr.Offset + Instr.Size;
3521 }
3522
3523 const MachineFunction *MF = MBB->getParent();
3524 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3525 TSE.emitCode(
3526 InsertI, TFI, /*TryMergeSPUpdate = */
3528
3529 return InsertI;
3530}
3531} // namespace
3532
3534 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3535 for (auto &BB : MF)
3536 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3538 II = tryMergeAdjacentSTG(II, this, RS);
3539 }
3540
3541 // By the time this method is called, most of the prologue/epilogue code is
3542 // already emitted, whether its location was affected by the shrink-wrapping
3543 // optimization or not.
3544 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3545 shouldSignReturnAddressEverywhere(MF))
3547}
3548
3549/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3550/// before the update. This is easily retrieved as it is exactly the offset
3551/// that is set in processFunctionBeforeFrameFinalized.
3553 const MachineFunction &MF, int FI, Register &FrameReg,
3554 bool IgnoreSPUpdates) const {
3555 const MachineFrameInfo &MFI = MF.getFrameInfo();
3556 if (IgnoreSPUpdates) {
3557 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3558 << MFI.getObjectOffset(FI) << "\n");
3559 FrameReg = AArch64::SP;
3560 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3561 }
3562
3563 // Go to common code if we cannot provide sp + offset.
3564 if (MFI.hasVarSizedObjects() ||
3567 return getFrameIndexReference(MF, FI, FrameReg);
3568
3569 FrameReg = AArch64::SP;
3570 return getStackOffset(MF, MFI.getObjectOffset(FI));
3571}
3572
3573/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3574/// the parent's frame pointer
3576 const MachineFunction &MF) const {
3577 return 0;
3578}
3579
3580/// Funclets only need to account for space for the callee saved registers,
3581/// as the locals are accounted for in the parent's stack frame.
3583 const MachineFunction &MF) const {
3584 // This is the size of the pushed CSRs.
3585 unsigned CSSize =
3586 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3587 // This is the amount of stack a funclet needs to allocate.
3588 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3589 getStackAlign());
3590}
3591
3592namespace {
3593struct FrameObject {
3594 bool IsValid = false;
3595 // Index of the object in MFI.
3596 int ObjectIndex = 0;
3597 // Group ID this object belongs to.
3598 int GroupIndex = -1;
3599 // This object should be placed first (closest to SP).
3600 bool ObjectFirst = false;
3601 // This object's group (which always contains the object with
3602 // ObjectFirst==true) should be placed first.
3603 bool GroupFirst = false;
3604
3605 // Used to distinguish between FP and GPR accesses. The values are decided so
3606 // that they sort FPR < Hazard < GPR and they can be or'd together.
3607 unsigned Accesses = 0;
3608 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3609};
3610
3611class GroupBuilder {
3612 SmallVector<int, 8> CurrentMembers;
3613 int NextGroupIndex = 0;
3614 std::vector<FrameObject> &Objects;
3615
3616public:
3617 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3618 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3619 void EndCurrentGroup() {
3620 if (CurrentMembers.size() > 1) {
3621 // Create a new group with the current member list. This might remove them
3622 // from their pre-existing groups. That's OK, dealing with overlapping
3623 // groups is too hard and unlikely to make a difference.
3624 LLVM_DEBUG(dbgs() << "group:");
3625 for (int Index : CurrentMembers) {
3626 Objects[Index].GroupIndex = NextGroupIndex;
3627 LLVM_DEBUG(dbgs() << " " << Index);
3628 }
3629 LLVM_DEBUG(dbgs() << "\n");
3630 NextGroupIndex++;
3631 }
3632 CurrentMembers.clear();
3633 }
3634};
3635
3636bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3637 // Objects at a lower index are closer to FP; objects at a higher index are
3638 // closer to SP.
3639 //
3640 // For consistency in our comparison, all invalid objects are placed
3641 // at the end. This also allows us to stop walking when we hit the
3642 // first invalid item after it's all sorted.
3643 //
3644 // If we want to include a stack hazard region, order FPR accesses < the
3645 // hazard object < GPRs accesses in order to create a separation between the
3646 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3647 //
3648 // Otherwise the "first" object goes first (closest to SP), followed by the
3649 // members of the "first" group.
3650 //
3651 // The rest are sorted by the group index to keep the groups together.
3652 // Higher numbered groups are more likely to be around longer (i.e. untagged
3653 // in the function epilogue and not at some earlier point). Place them closer
3654 // to SP.
3655 //
3656 // If all else equal, sort by the object index to keep the objects in the
3657 // original order.
3658 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3659 A.GroupIndex, A.ObjectIndex) <
3660 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3661 B.GroupIndex, B.ObjectIndex);
3662}
3663} // namespace
3664
3666 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3668
3669 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3670 ObjectsToAllocate.empty())
3671 return;
3672
3673 const MachineFrameInfo &MFI = MF.getFrameInfo();
3674 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3675 for (auto &Obj : ObjectsToAllocate) {
3676 FrameObjects[Obj].IsValid = true;
3677 FrameObjects[Obj].ObjectIndex = Obj;
3678 }
3679
3680 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3681 // the same time.
3682 GroupBuilder GB(FrameObjects);
3683 for (auto &MBB : MF) {
3684 for (auto &MI : MBB) {
3685 if (MI.isDebugInstr())
3686 continue;
3687
3688 if (AFI.hasStackHazardSlotIndex()) {
3689 std::optional<int> FI = getLdStFrameID(MI, MFI);
3690 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3691 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3693 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3694 else
3695 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3696 }
3697 }
3698
3699 int OpIndex;
3700 switch (MI.getOpcode()) {
3701 case AArch64::STGloop:
3702 case AArch64::STZGloop:
3703 OpIndex = 3;
3704 break;
3705 case AArch64::STGi:
3706 case AArch64::STZGi:
3707 case AArch64::ST2Gi:
3708 case AArch64::STZ2Gi:
3709 OpIndex = 1;
3710 break;
3711 default:
3712 OpIndex = -1;
3713 }
3714
3715 int TaggedFI = -1;
3716 if (OpIndex >= 0) {
3717 const MachineOperand &MO = MI.getOperand(OpIndex);
3718 if (MO.isFI()) {
3719 int FI = MO.getIndex();
3720 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3721 FrameObjects[FI].IsValid)
3722 TaggedFI = FI;
3723 }
3724 }
3725
3726 // If this is a stack tagging instruction for a slot that is not part of a
3727 // group yet, either start a new group or add it to the current one.
3728 if (TaggedFI >= 0)
3729 GB.AddMember(TaggedFI);
3730 else
3731 GB.EndCurrentGroup();
3732 }
3733 // Groups should never span multiple basic blocks.
3734 GB.EndCurrentGroup();
3735 }
3736
3737 if (AFI.hasStackHazardSlotIndex()) {
3738 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3739 FrameObject::AccessHazard;
3740 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3741 for (auto &Obj : FrameObjects)
3742 if (!Obj.Accesses ||
3743 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3744 Obj.Accesses = FrameObject::AccessGPR;
3745 }
3746
3747 // If the function's tagged base pointer is pinned to a stack slot, we want to
3748 // put that slot first when possible. This will likely place it at SP + 0,
3749 // and save one instruction when generating the base pointer because IRG does
3750 // not allow an immediate offset.
3751 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3752 if (TBPI) {
3753 FrameObjects[*TBPI].ObjectFirst = true;
3754 FrameObjects[*TBPI].GroupFirst = true;
3755 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3756 if (FirstGroupIndex >= 0)
3757 for (FrameObject &Object : FrameObjects)
3758 if (Object.GroupIndex == FirstGroupIndex)
3759 Object.GroupFirst = true;
3760 }
3761
3762 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3763
3764 int i = 0;
3765 for (auto &Obj : FrameObjects) {
3766 // All invalid items are sorted at the end, so it's safe to stop.
3767 if (!Obj.IsValid)
3768 break;
3769 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3770 }
3771
3772 LLVM_DEBUG({
3773 dbgs() << "Final frame order:\n";
3774 for (auto &Obj : FrameObjects) {
3775 if (!Obj.IsValid)
3776 break;
3777 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3778 if (Obj.ObjectFirst)
3779 dbgs() << ", first";
3780 if (Obj.GroupFirst)
3781 dbgs() << ", group-first";
3782 dbgs() << "\n";
3783 }
3784 });
3785}
3786
3787/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3788/// least every ProbeSize bytes. Returns an iterator of the first instruction
3789/// after the loop. The difference between SP and TargetReg must be an exact
3790/// multiple of ProbeSize.
3792AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3793 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3794 Register TargetReg) const {
3795 MachineBasicBlock &MBB = *MBBI->getParent();
3796 MachineFunction &MF = *MBB.getParent();
3797 const AArch64InstrInfo *TII =
3798 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3799 DebugLoc DL = MBB.findDebugLoc(MBBI);
3800
3801 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3802 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3803 MF.insert(MBBInsertPoint, LoopMBB);
3804 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3805 MF.insert(MBBInsertPoint, ExitMBB);
3806
3807 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3808 // in SUB).
3809 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3810 StackOffset::getFixed(-ProbeSize), TII,
3812 // LDR XZR, [SP]
3813 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::LDRXui))
3814 .addDef(AArch64::XZR)
3815 .addReg(AArch64::SP)
3816 .addImm(0)
3820 Align(8)))
3822 // CMP SP, TargetReg
3823 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3824 AArch64::XZR)
3825 .addReg(AArch64::SP)
3826 .addReg(TargetReg)
3829 // B.CC Loop
3830 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3832 .addMBB(LoopMBB)
3834
3835 LoopMBB->addSuccessor(ExitMBB);
3836 LoopMBB->addSuccessor(LoopMBB);
3837 // Synthesize the exit MBB.
3838 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3840 MBB.addSuccessor(LoopMBB);
3841 // Update liveins.
3842 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3843
3844 return ExitMBB->begin();
3845}
3846
3847void AArch64FrameLowering::inlineStackProbeFixed(
3848 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3849 StackOffset CFAOffset) const {
3850 MachineBasicBlock *MBB = MBBI->getParent();
3851 MachineFunction &MF = *MBB->getParent();
3852 const AArch64InstrInfo *TII =
3853 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3854 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3855 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3856 bool HasFP = hasFP(MF);
3857
3858 DebugLoc DL;
3859 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3860 int64_t NumBlocks = FrameSize / ProbeSize;
3861 int64_t ResidualSize = FrameSize % ProbeSize;
3862
3863 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3864 << NumBlocks << " blocks of " << ProbeSize
3865 << " bytes, plus " << ResidualSize << " bytes\n");
3866
3867 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3868 // ordinary loop.
3869 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3870 for (int i = 0; i < NumBlocks; ++i) {
3871 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3872 // encodable in a SUB).
3873 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3874 StackOffset::getFixed(-ProbeSize), TII,
3875 MachineInstr::FrameSetup, false, false, nullptr,
3876 EmitAsyncCFI && !HasFP, CFAOffset);
3877 CFAOffset += StackOffset::getFixed(ProbeSize);
3878 // LDR XZR, [SP]
3879 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3880 .addDef(AArch64::XZR)
3881 .addReg(AArch64::SP)
3882 .addImm(0)
3886 Align(8)))
3888 }
3889 } else if (NumBlocks != 0) {
3890 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3891 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3892 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3893 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3894 MachineInstr::FrameSetup, false, false, nullptr,
3895 EmitAsyncCFI && !HasFP, CFAOffset);
3896 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3897 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3898 MBB = MBBI->getParent();
3899 if (EmitAsyncCFI && !HasFP) {
3900 // Set the CFA register back to SP.
3901 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3902 .buildDefCFARegister(AArch64::SP);
3903 }
3904 }
3905
3906 if (ResidualSize != 0) {
3907 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3908 // in SUB).
3909 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3910 StackOffset::getFixed(-ResidualSize), TII,
3911 MachineInstr::FrameSetup, false, false, nullptr,
3912 EmitAsyncCFI && !HasFP, CFAOffset);
3913 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3914 // LDR XZR, [SP]
3915 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3916 .addDef(AArch64::XZR)
3917 .addReg(AArch64::SP)
3918 .addImm(0)
3922 Align(8)))
3924 }
3925 }
3926}
3927
3928void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3929 MachineBasicBlock &MBB) const {
3930 // Get the instructions that need to be replaced. We emit at most two of
3931 // these. Remember them in order to avoid complications coming from the need
3932 // to traverse the block while potentially creating more blocks.
3933 SmallVector<MachineInstr *, 4> ToReplace;
3934 for (MachineInstr &MI : MBB)
3935 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3936 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3937 ToReplace.push_back(&MI);
3938
3939 for (MachineInstr *MI : ToReplace) {
3940 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3941 Register ScratchReg = MI->getOperand(0).getReg();
3942 int64_t FrameSize = MI->getOperand(1).getImm();
3943 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3944 MI->getOperand(3).getImm());
3945 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3946 CFAOffset);
3947 } else {
3948 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3949 "Stack probe pseudo-instruction expected");
3950 const AArch64InstrInfo *TII =
3951 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3952 Register TargetReg = MI->getOperand(0).getReg();
3953 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3954 }
3955 MI->eraseFromParent();
3956 }
3957}
3958
3961 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3962 GPR = 1 << 0, // A general purpose register.
3963 PPR = 1 << 1, // A predicate register.
3964 FPR = 1 << 2, // A floating point/Neon/SVE register.
3965 };
3966
3967 int Idx;
3969 int64_t Size;
3970 unsigned AccessTypes;
3971
3973
3974 bool operator<(const StackAccess &Rhs) const {
3975 return std::make_tuple(start(), Idx) <
3976 std::make_tuple(Rhs.start(), Rhs.Idx);
3977 }
3978
3979 bool isCPU() const {
3980 // Predicate register load and store instructions execute on the CPU.
3982 }
3983 bool isSME() const { return AccessTypes & AccessType::FPR; }
3984 bool isMixed() const { return isCPU() && isSME(); }
3985
3986 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3987 int64_t end() const { return start() + Size; }
3988
3989 std::string getTypeString() const {
3990 switch (AccessTypes) {
3991 case AccessType::FPR:
3992 return "FPR";
3993 case AccessType::PPR:
3994 return "PPR";
3995 case AccessType::GPR:
3996 return "GPR";
3998 return "NA";
3999 default:
4000 return "Mixed";
4001 }
4002 }
4003
4004 void print(raw_ostream &OS) const {
4005 OS << getTypeString() << " stack object at [SP"
4006 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
4007 if (Offset.getScalable())
4008 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
4009 << " * vscale";
4010 OS << "]";
4011 }
4012};
4013
4014static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4015 SA.print(OS);
4016 return OS;
4017}
4018
4019void AArch64FrameLowering::emitRemarks(
4020 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4021
4022 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4024 return;
4025
4026 unsigned StackHazardSize = getStackHazardSize(MF);
4027 const uint64_t HazardSize =
4028 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4029
4030 if (HazardSize == 0)
4031 return;
4032
4033 const MachineFrameInfo &MFI = MF.getFrameInfo();
4034 // Bail if function has no stack objects.
4035 if (!MFI.hasStackObjects())
4036 return;
4037
4038 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4039
4040 size_t NumFPLdSt = 0;
4041 size_t NumNonFPLdSt = 0;
4042
4043 // Collect stack accesses via Load/Store instructions.
4044 for (const MachineBasicBlock &MBB : MF) {
4045 for (const MachineInstr &MI : MBB) {
4046 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4047 continue;
4048 for (MachineMemOperand *MMO : MI.memoperands()) {
4049 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4050 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4051 int FrameIdx = *FI;
4052
4053 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4054 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4055 StackAccesses[ArrIdx].Idx = FrameIdx;
4056 StackAccesses[ArrIdx].Offset =
4057 getFrameIndexReferenceFromSP(MF, FrameIdx);
4058 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4059 }
4060
4061 unsigned RegTy = StackAccess::AccessType::GPR;
4062 if (MFI.hasScalableStackID(FrameIdx))
4065 RegTy = StackAccess::FPR;
4066
4067 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4068
4069 if (RegTy == StackAccess::FPR)
4070 ++NumFPLdSt;
4071 else
4072 ++NumNonFPLdSt;
4073 }
4074 }
4075 }
4076 }
4077
4078 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4079 return;
4080
4081 llvm::sort(StackAccesses);
4082 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4084 });
4085
4088
4089 if (StackAccesses.front().isMixed())
4090 MixedObjects.push_back(&StackAccesses.front());
4091
4092 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4093 It != End; ++It) {
4094 const auto &First = *It;
4095 const auto &Second = *(It + 1);
4096
4097 if (Second.isMixed())
4098 MixedObjects.push_back(&Second);
4099
4100 if ((First.isSME() && Second.isCPU()) ||
4101 (First.isCPU() && Second.isSME())) {
4102 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4103 if (Distance < HazardSize)
4104 HazardPairs.emplace_back(&First, &Second);
4105 }
4106 }
4107
4108 auto EmitRemark = [&](llvm::StringRef Str) {
4109 ORE->emit([&]() {
4110 auto R = MachineOptimizationRemarkAnalysis(
4111 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4112 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4113 });
4114 };
4115
4116 for (const auto &P : HazardPairs)
4117 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4118
4119 for (const auto *Obj : MixedObjects)
4120 EmitRemark(
4121 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4122}
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static RegState getPrologueDeath(MachineFunction &MF, unsigned Reg)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, const TargetRegisterInfo *TRI)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:68
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:484
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:119
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) const
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB) const
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
Get the array size.
Definition ArrayRef.h:141
bool empty() const
Check if the array is empty.
Definition ArrayRef.h:136
bool test(unsigned Idx) const
Returns true if bit Idx is set.
Definition BitVector.h:482
BitVector & reset()
Reset all bits in the bitvector.
Definition BitVector.h:409
size_type count() const
Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Set all bits in the bitvector.
Definition BitVector.h:366
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:126
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:685
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:272
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:328
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:229
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:723
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:674
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool isCalleeSavedObjectIndex(int ObjectIdx) const
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment, TargetStackID::Value StackID=TargetStackID::Default)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
void setIsCalleeSavedObjectIndex(int ObjectIdx, bool IsCalleeSaved)
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addDef(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a virtual register definition operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
Representation of each machine instruction.
void setFlags(unsigned flags)
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI MachineInstrBundleIterator< MachineInstr > eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOVolatile
The memory access is volatile.
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI void freezeReservedRegs()
freezeReservedRegs - Called by the register allocator to freeze the set of reserved registers before ...
bool isReserved(MCRegister PhysReg) const
isReserved - Returns true when PhysReg is a reserved register.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
Represent a mutable reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:294
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:339
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
const MCAsmInfo & getMCAsmInfo() const
Return target specific asm information.
TargetOptions Options
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isOSBinFormatMachO() const
Tests whether the environment is MachO.
Definition Triple.h:786
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:573
void stable_sort(R &&Range)
Definition STLExtras.h:2116
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
RegState
Flags to represent properties of register accesses.
@ Define
Register definition.
constexpr RegState getKillRegState(bool B)
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
constexpr T alignDown(U Value, V Align, W Skew=0)
Returns the largest unsigned integer less than or equal to Value and is Skew mod Align.
Definition MathExtras.h:546
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1746
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:407
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1636
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
constexpr RegState getDefRegState(bool B)
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
RelativeUniformCounterPtr ValuesPtrExpr VTableAddr Count
Definition InstrProf.h:145
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto count_if(R &&Range, UnaryPredicate P)
Wrapper function around std::count_if to count the number of times an element satisfying a given pred...
Definition STLExtras.h:2019
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1772
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2192
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1947
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:862
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getUnknownStack(MachineFunction &MF)
Stack memory without other information.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray