LLVM 23.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386static bool isTargetWindows(const MachineFunction &MF) {
387 return MF.getTarget().getMCAsmInfo().usesWindowsCFI();
388}
389
391 const MachineFunction &MF) const {
392 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
393 return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
394}
395
396/// Returns true if a homogeneous prolog or epilog code can be emitted
397/// for the size optimization. If possible, a frame helper call is injected.
398/// When Exit block is given, this check is for epilog.
399bool AArch64FrameLowering::homogeneousPrologEpilog(
400 MachineFunction &MF, MachineBasicBlock *Exit) const {
401 if (!MF.getFunction().hasMinSize())
402 return false;
404 return false;
405 if (EnableRedZone)
406 return false;
407
408 // TODO: Window is supported yet.
409 if (isTargetWindows(MF))
410 return false;
411
412 // TODO: SVE is not supported yet.
413 if (isLikelyToHaveSVEStack(*this, MF))
414 return false;
415
416 // Bail on stack adjustment needed on return for simplicity.
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
418 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
419 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
420 return false;
421 if (Exit && getArgumentStackToRestore(MF, *Exit))
422 return false;
423
424 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
425 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
426 return false;
427
428 // If there are an odd number of GPRs before LR and FP in the CSRs list,
429 // they will not be paired into one RegPairInfo, which is incompatible with
430 // the assumption made by the homogeneous prolog epilog pass.
431 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
432 unsigned NumGPRs = 0;
433 for (unsigned I = 0; CSRegs[I]; ++I) {
434 Register Reg = CSRegs[I];
435 if (Reg == AArch64::LR) {
436 assert(CSRegs[I + 1] == AArch64::FP);
437 if (NumGPRs % 2 != 0)
438 return false;
439 break;
440 }
441 if (AArch64::GPR64RegClass.contains(Reg))
442 ++NumGPRs;
443 }
444
445 return true;
446}
447
448/// Returns true if CSRs should be paired.
449bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
450 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
451}
452
453/// This is the biggest offset to the stack pointer we can encode in aarch64
454/// instructions (without using a separate calculation and a temp register).
455/// Note that the exception here are vector stores/loads which cannot encode any
456/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
457static const unsigned DefaultSafeSPDisplacement = 255;
458
459/// Look at each instruction that references stack frames and return the stack
460/// size limit beyond which some of these instructions will require a scratch
461/// register during their expansion later.
463 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
464 // range. We'll end up allocating an unnecessary spill slot a lot, but
465 // realistically that's not a big deal at this stage of the game.
466 for (MachineBasicBlock &MBB : MF) {
467 for (MachineInstr &MI : MBB) {
468 if (MI.isDebugInstr() || MI.isPseudo() ||
469 MI.getOpcode() == AArch64::ADDXri ||
470 MI.getOpcode() == AArch64::ADDSXri)
471 continue;
472
473 for (const MachineOperand &MO : MI.operands()) {
474 if (!MO.isFI())
475 continue;
476
478 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
480 return 0;
481 }
482 }
483 }
485}
486
491
492unsigned
493AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
494 const AArch64FunctionInfo *AFI,
495 bool IsWin64, bool IsFunclet) const {
496 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
497 "Tail call reserved stack must be aligned to 16 bytes");
498 if (!IsWin64 || IsFunclet) {
499 return AFI->getTailCallReservedStack();
500 } else {
501 if (AFI->getTailCallReservedStack() != 0 &&
502 !MF.getFunction().getAttributes().hasAttrSomewhere(
503 Attribute::SwiftAsync))
504 report_fatal_error("cannot generate ABI-changing tail call for Win64");
505 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
506
507 // Var args are stored here in the primary function.
508 FixedObjectSize += AFI->getVarArgsGPRSize();
509
510 if (MF.hasEHFunclets()) {
511 // Catch objects are stored here in the primary function.
512 const MachineFrameInfo &MFI = MF.getFrameInfo();
513 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
514 SmallSetVector<int, 8> CatchObjFrameIndices;
515 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
516 for (const WinEHHandlerType &H : TBME.HandlerArray) {
517 int FrameIndex = H.CatchObj.FrameIndex;
518 if ((FrameIndex != INT_MAX) &&
519 CatchObjFrameIndices.insert(FrameIndex)) {
520 FixedObjectSize = alignTo(FixedObjectSize,
521 MFI.getObjectAlign(FrameIndex).value()) +
522 MFI.getObjectSize(FrameIndex);
523 }
524 }
525 }
526 // To support EH funclets we allocate an UnwindHelp object
527 FixedObjectSize += 8;
528 }
529 return alignTo(FixedObjectSize, 16);
530 }
531}
532
534 if (!EnableRedZone)
535 return false;
536
537 // Don't use the red zone if the function explicitly asks us not to.
538 // This is typically used for kernel code.
539 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
540 const unsigned RedZoneSize =
542 if (!RedZoneSize)
543 return false;
544
545 const MachineFrameInfo &MFI = MF.getFrameInfo();
547 uint64_t NumBytes = AFI->getLocalStackSize();
548
549 // If neither NEON or SVE are available, a COPY from one Q-reg to
550 // another requires a spill -> reload sequence. We can do that
551 // using a pre-decrementing store/post-decrementing load, but
552 // if we do so, we can't use the Red Zone.
553 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
554 !Subtarget.isNeonAvailable() &&
555 !Subtarget.hasSVE();
556
557 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
558 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
559}
560
561/// hasFPImpl - Return true if the specified function should have a dedicated
562/// frame pointer register.
564 const MachineFrameInfo &MFI = MF.getFrameInfo();
565 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
567
568 // Win64 EH requires a frame pointer if funclets are present, as the locals
569 // are accessed off the frame pointer in both the parent function and the
570 // funclets.
571 if (MF.hasEHFunclets())
572 return true;
573 // Retain behavior of always omitting the FP for leaf functions when possible.
575 return true;
576 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
577 MFI.hasStackMap() || MFI.hasPatchPoint() ||
578 RegInfo->hasStackRealignment(MF))
579 return true;
580
581 // If we:
582 //
583 // 1. Have streaming mode changes
584 // OR:
585 // 2. Have a streaming body with SVE stack objects
586 //
587 // Then the value of VG restored when unwinding to this function may not match
588 // the value of VG used to set up the stack.
589 //
590 // This is a problem as the CFA can be described with an expression of the
591 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
592 //
593 // If the value of VG used in that expression does not match the value used to
594 // set up the stack, an incorrect address for the CFA will be computed, and
595 // unwinding will fail.
596 //
597 // We work around this issue by ensuring the frame-pointer can describe the
598 // CFA in either of these cases.
599 if (AFI.needsDwarfUnwindInfo(MF) &&
602 return true;
603 // With large callframes around we may need to use FP to access the scavenging
604 // emergency spillslot.
605 //
606 // Unfortunately some calls to hasFP() like machine verifier ->
607 // getReservedReg() -> hasFP in the middle of global isel are too early
608 // to know the max call frame size. Hopefully conservatively returning "true"
609 // in those cases is fine.
610 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
611 if (!MFI.isMaxCallFrameSizeComputed() ||
613 return true;
614
615 return false;
616}
617
618/// Should the Frame Pointer be reserved for the current function?
620 const TargetMachine &TM = MF.getTarget();
621 const Triple &TT = TM.getTargetTriple();
622
623 // These OSes require the frame chain is valid, even if the current frame does
624 // not use a frame pointer.
625 if (TT.isOSDarwin() || TT.isOSWindows())
626 return true;
627
628 // If the function has a frame pointer, it is reserved.
629 if (hasFP(MF))
630 return true;
631
632 // Frontend has requested to preserve the frame pointer.
634 return true;
635
636 return false;
637}
638
639/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
640/// not required, we reserve argument space for call sites in the function
641/// immediately on entry to the current function. This eliminates the need for
642/// add/sub sp brackets around call sites. Returns true if the call frame is
643/// included as part of the stack frame.
645 const MachineFunction &MF) const {
646 // The stack probing code for the dynamically allocated outgoing arguments
647 // area assumes that the stack is probed at the top - either by the prologue
648 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
649 // most recent variable-sized object allocation. Changing the condition here
650 // may need to be followed up by changes to the probe issuing logic.
651 return !MF.getFrameInfo().hasVarSizedObjects();
652}
653
657
658 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
659 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
660 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
661 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
662 DebugLoc DL = I->getDebugLoc();
663 unsigned Opc = I->getOpcode();
664 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
665 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
666
667 if (!hasReservedCallFrame(MF)) {
668 int64_t Amount = I->getOperand(0).getImm();
669 Amount = alignTo(Amount, getStackAlign());
670 if (!IsDestroy)
671 Amount = -Amount;
672
673 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
674 // doesn't have to pop anything), then the first operand will be zero too so
675 // this adjustment is a no-op.
676 if (CalleePopAmount == 0) {
677 // FIXME: in-function stack adjustment for calls is limited to 24-bits
678 // because there's no guaranteed temporary register available.
679 //
680 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
681 // 1) For offset <= 12-bit, we use LSL #0
682 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
683 // LSL #0, and the other uses LSL #12.
684 //
685 // Most call frames will be allocated at the start of a function so
686 // this is OK, but it is a limitation that needs dealing with.
687 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
688
689 if (TLI->hasInlineStackProbe(MF) &&
691 // When stack probing is enabled, the decrement of SP may need to be
692 // probed. We only need to do this if the call site needs 1024 bytes of
693 // space or more, because a region smaller than that is allowed to be
694 // unprobed at an ABI boundary. We rely on the fact that SP has been
695 // probed exactly at this point, either by the prologue or most recent
696 // dynamic allocation.
698 "non-reserved call frame without var sized objects?");
699 Register ScratchReg =
700 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
701 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
702 } else {
703 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
704 StackOffset::getFixed(Amount), TII);
705 }
706 }
707 } else if (CalleePopAmount != 0) {
708 // If the calling convention demands that the callee pops arguments from the
709 // stack, we want to add it back if we have a reserved call frame.
710 assert(CalleePopAmount < 0xffffff && "call frame too large");
711 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
712 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
713 }
714 return MBB.erase(I);
715}
716
718 MachineBasicBlock &MBB) const {
719
720 MachineFunction &MF = *MBB.getParent();
721 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
722 const auto &TRI = *Subtarget.getRegisterInfo();
723 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
724
725 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
726
727 // Reset the CFA to `SP + 0`.
728 CFIBuilder.buildDefCFA(AArch64::SP, 0);
729
730 // Flip the RA sign state.
731 if (MFI.shouldSignReturnAddress(MF)) {
732 if (MFI.branchProtectionPAuthLR()) {
733 CFIBuilder.buildNegateRAStateWithPC();
734 } else if (!MF.getTarget().getTargetTriple().isOSBinFormatMachO()) {
735 CFIBuilder.buildNegateRAState();
736 }
737 }
738
739 // Shadow call stack uses X18, reset it.
740 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
741 CFIBuilder.buildSameValue(AArch64::X18);
742
743 // Emit .cfi_same_value for callee-saved registers.
744 const std::vector<CalleeSavedInfo> &CSI =
746 for (const auto &Info : CSI) {
747 MCRegister Reg = Info.getReg();
748 if (!TRI.regNeedsCFI(Reg, Reg))
749 continue;
750 CFIBuilder.buildSameValue(Reg);
751 }
752}
753
755 switch (Reg.id()) {
756 default:
757 // The called routine is expected to preserve r19-r28
758 // r29 and r30 are used as frame pointer and link register resp.
759 return 0;
760
761 // GPRs
762#define CASE(n) \
763 case AArch64::W##n: \
764 case AArch64::X##n: \
765 return AArch64::X##n
766 CASE(0);
767 CASE(1);
768 CASE(2);
769 CASE(3);
770 CASE(4);
771 CASE(5);
772 CASE(6);
773 CASE(7);
774 CASE(8);
775 CASE(9);
776 CASE(10);
777 CASE(11);
778 CASE(12);
779 CASE(13);
780 CASE(14);
781 CASE(15);
782 CASE(16);
783 CASE(17);
784 CASE(18);
785#undef CASE
786
787 // FPRs
788#define CASE(n) \
789 case AArch64::B##n: \
790 case AArch64::H##n: \
791 case AArch64::S##n: \
792 case AArch64::D##n: \
793 case AArch64::Q##n: \
794 return HasSVE ? AArch64::Z##n : AArch64::Q##n
795 CASE(0);
796 CASE(1);
797 CASE(2);
798 CASE(3);
799 CASE(4);
800 CASE(5);
801 CASE(6);
802 CASE(7);
803 CASE(8);
804 CASE(9);
805 CASE(10);
806 CASE(11);
807 CASE(12);
808 CASE(13);
809 CASE(14);
810 CASE(15);
811 CASE(16);
812 CASE(17);
813 CASE(18);
814 CASE(19);
815 CASE(20);
816 CASE(21);
817 CASE(22);
818 CASE(23);
819 CASE(24);
820 CASE(25);
821 CASE(26);
822 CASE(27);
823 CASE(28);
824 CASE(29);
825 CASE(30);
826 CASE(31);
827#undef CASE
828 }
829}
830
831void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
832 MachineBasicBlock &MBB) const {
833 // Insertion point.
835
836 // Fake a debug loc.
837 DebugLoc DL;
838 if (MBBI != MBB.end())
839 DL = MBBI->getDebugLoc();
840
841 const MachineFunction &MF = *MBB.getParent();
842 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
843 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
844
845 BitVector GPRsToZero(TRI.getNumRegs());
846 BitVector FPRsToZero(TRI.getNumRegs());
847 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
848 for (MCRegister Reg : RegsToZero.set_bits()) {
849 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
850 // For GPRs, we only care to clear out the 64-bit register.
851 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
852 GPRsToZero.set(XReg);
853 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
854 // For FPRs,
855 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
856 FPRsToZero.set(XReg);
857 }
858 }
859
860 const AArch64InstrInfo &TII = *STI.getInstrInfo();
861
862 // Zero out GPRs.
863 for (MCRegister Reg : GPRsToZero.set_bits())
864 TII.buildClearRegister(Reg, MBB, MBBI, DL);
865
866 // Zero out FP/vector registers.
867 for (MCRegister Reg : FPRsToZero.set_bits())
868 TII.buildClearRegister(Reg, MBB, MBBI, DL);
869
870 if (HasSVE) {
871 for (MCRegister PReg :
872 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
873 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
874 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
875 AArch64::P15}) {
876 if (RegsToZero[PReg])
877 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
878 }
879 }
880}
881
882bool AArch64FrameLowering::windowsRequiresStackProbe(
883 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
884 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
885 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
886 // TODO: When implementing stack protectors, take that into account
887 // for the probe threshold.
888 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
889 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
890}
891
893 const MachineBasicBlock &MBB) {
894 const MachineFunction *MF = MBB.getParent();
895 LiveRegs.addLiveIns(MBB);
896 // Mark callee saved registers as used so we will not choose them.
897 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
898 for (unsigned i = 0; CSRegs[i]; ++i)
899 LiveRegs.addReg(CSRegs[i]);
900}
901
903AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
904 bool HasCall) const {
905 MachineFunction *MF = MBB->getParent();
906
907 // If MBB is an entry block, use X9 as the scratch register
908 // preserve_none functions may be using X9 to pass arguments,
909 // so prefer to pick an available register below.
910 if (&MF->front() == MBB &&
912 return AArch64::X9;
913
914 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
915 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
916 LivePhysRegs LiveRegs(TRI);
917 getLiveRegsForEntryMBB(LiveRegs, *MBB);
918 if (HasCall) {
919 LiveRegs.addReg(AArch64::X16);
920 LiveRegs.addReg(AArch64::X17);
921 LiveRegs.addReg(AArch64::X18);
922 }
923
924 // Prefer X9 since it was historically used for the prologue scratch reg.
925 const MachineRegisterInfo &MRI = MF->getRegInfo();
926 if (LiveRegs.available(MRI, AArch64::X9))
927 return AArch64::X9;
928
929 for (unsigned Reg : AArch64::GPR64RegClass) {
930 if (LiveRegs.available(MRI, Reg))
931 return Reg;
932 }
933 return AArch64::NoRegister;
934}
935
937 const MachineBasicBlock &MBB) const {
938 const MachineFunction *MF = MBB.getParent();
939 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
940 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
941 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
942 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
944
945 if (AFI->hasSwiftAsyncContext()) {
946 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
947 const MachineRegisterInfo &MRI = MF->getRegInfo();
950 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
951 // available.
952 if (!LiveRegs.available(MRI, AArch64::X16) ||
953 !LiveRegs.available(MRI, AArch64::X17))
954 return false;
955 }
956
957 // Certain stack probing sequences might clobber flags, then we can't use
958 // the block as a prologue if the flags register is a live-in.
960 MBB.isLiveIn(AArch64::NZCV))
961 return false;
962
963 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
964 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
965 return false;
966
967 // May need a scratch register (for return value) if require making a special
968 // call
969 if (requiresSaveVG(*MF) ||
970 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
971 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
972 return false;
973
974 return true;
975}
976
978 const Function &F = MF.getFunction();
979 return MF.getTarget().getMCAsmInfo().usesWindowsCFI() &&
980 F.needsUnwindTableEntry();
981}
982
983bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
984 const MachineFunction &MF) const {
985 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
986 // and SEH_EpilogEnd instructions in the correct order.
988 return false;
991}
992
993// Given a load or a store instruction, generate an appropriate unwinding SEH
994// code on Windows.
996AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
997 const AArch64InstrInfo &TII,
998 MachineInstr::MIFlag Flag) const {
999 unsigned Opc = MBBI->getOpcode();
1000 MachineBasicBlock *MBB = MBBI->getParent();
1001 MachineFunction &MF = *MBB->getParent();
1002 DebugLoc DL = MBBI->getDebugLoc();
1003 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1004 int Imm = MBBI->getOperand(ImmIdx).getImm();
1006 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1007 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1008
1009 switch (Opc) {
1010 default:
1011 report_fatal_error("No SEH Opcode for this instruction");
1012 case AArch64::STR_ZXI:
1013 case AArch64::LDR_ZXI: {
1014 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1015 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1016 .addImm(Reg0)
1017 .addImm(Imm)
1018 .setMIFlag(Flag);
1019 break;
1020 }
1021 case AArch64::STR_PXI:
1022 case AArch64::LDR_PXI: {
1023 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1024 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1025 .addImm(Reg0)
1026 .addImm(Imm)
1027 .setMIFlag(Flag);
1028 break;
1029 }
1030 case AArch64::LDPDpost:
1031 Imm = -Imm;
1032 [[fallthrough]];
1033 case AArch64::STPDpre: {
1034 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1035 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1036 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1037 .addImm(Reg0)
1038 .addImm(Reg1)
1039 .addImm(Imm * 8)
1040 .setMIFlag(Flag);
1041 break;
1042 }
1043 case AArch64::LDPXpost:
1044 Imm = -Imm;
1045 [[fallthrough]];
1046 case AArch64::STPXpre: {
1047 Register Reg0 = MBBI->getOperand(1).getReg();
1048 Register Reg1 = MBBI->getOperand(2).getReg();
1049 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1050 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1051 .addImm(Imm * 8)
1052 .setMIFlag(Flag);
1053 else
1054 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1055 .addImm(RegInfo->getSEHRegNum(Reg0))
1056 .addImm(RegInfo->getSEHRegNum(Reg1))
1057 .addImm(Imm * 8)
1058 .setMIFlag(Flag);
1059 break;
1060 }
1061 case AArch64::LDRDpost:
1062 Imm = -Imm;
1063 [[fallthrough]];
1064 case AArch64::STRDpre: {
1065 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1066 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1067 .addImm(Reg)
1068 .addImm(Imm)
1069 .setMIFlag(Flag);
1070 break;
1071 }
1072 case AArch64::LDRXpost:
1073 Imm = -Imm;
1074 [[fallthrough]];
1075 case AArch64::STRXpre: {
1076 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1077 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1078 .addImm(Reg)
1079 .addImm(Imm)
1080 .setMIFlag(Flag);
1081 break;
1082 }
1083 case AArch64::STPDi:
1084 case AArch64::LDPDi: {
1085 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1086 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1087 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1088 .addImm(Reg0)
1089 .addImm(Reg1)
1090 .addImm(Imm * 8)
1091 .setMIFlag(Flag);
1092 break;
1093 }
1094 case AArch64::STPXi:
1095 case AArch64::LDPXi: {
1096 Register Reg0 = MBBI->getOperand(0).getReg();
1097 Register Reg1 = MBBI->getOperand(1).getReg();
1098
1099 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1100 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1101
1102 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1103 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1104 .addImm(Imm * 8)
1105 .setMIFlag(Flag);
1106 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1107 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1108 .addImm(SEHReg0)
1109 .addImm(SEHReg1)
1110 .addImm(Imm * 8)
1111 .setMIFlag(Flag);
1112 else
1113 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1114 .addImm(SEHReg0)
1115 .addImm(SEHReg1)
1116 .addImm(Imm * 8)
1117 .setMIFlag(Flag);
1118 break;
1119 }
1120 case AArch64::STRXui:
1121 case AArch64::LDRXui: {
1122 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1123 if (Reg >= 19)
1124 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1125 .addImm(Reg)
1126 .addImm(Imm * 8)
1127 .setMIFlag(Flag);
1128 else
1129 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1130 .addImm(Reg)
1131 .addImm(Imm * 8)
1132 .setMIFlag(Flag);
1133 break;
1134 }
1135 case AArch64::STRDui:
1136 case AArch64::LDRDui: {
1137 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1138 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1139 .addImm(Reg)
1140 .addImm(Imm * 8)
1141 .setMIFlag(Flag);
1142 break;
1143 }
1144 case AArch64::STPQi:
1145 case AArch64::LDPQi: {
1146 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1147 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1148 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1149 .addImm(Reg0)
1150 .addImm(Reg1)
1151 .addImm(Imm * 16)
1152 .setMIFlag(Flag);
1153 break;
1154 }
1155 case AArch64::LDPQpost:
1156 Imm = -Imm;
1157 [[fallthrough]];
1158 case AArch64::STPQpre: {
1159 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1160 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1161 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1162 .addImm(Reg0)
1163 .addImm(Reg1)
1164 .addImm(Imm * 16)
1165 .setMIFlag(Flag);
1166 break;
1167 }
1168 }
1169 auto I = MBB->insertAfter(MBBI, MIB);
1170 return I;
1171}
1172
1175 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1176 return false;
1177 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1178 // is enabled with streaming mode changes.
1179 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1180 if (ST.isTargetDarwin())
1181 return ST.hasSVE();
1182 return true;
1183}
1184
1186 MachineFunction &MF) const {
1187 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1188 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
1189
1190 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1191 DebugLoc DL; // Set debug location to unknown.
1193
1194 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1196 };
1197
1198 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1199 DebugLoc DL;
1200 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1201 if (MBBI != MBB.end())
1202 DL = MBBI->getDebugLoc();
1203
1204 TII->createPauthEpilogueInstr(MBB, DL);
1205 };
1206
1207 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1208 EmitSignRA(MF.front());
1209 for (MachineBasicBlock &MBB : MF) {
1210 if (MBB.isEHFuncletEntry())
1211 EmitSignRA(MBB);
1212 if (MBB.isReturnBlock())
1213 EmitAuthRA(MBB);
1214 }
1215}
1216
1218 MachineBasicBlock &MBB) const {
1219 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1220 PrologueEmitter.emitPrologue();
1221}
1222
1224 MachineBasicBlock &MBB) const {
1225 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1226 EpilogueEmitter.emitEpilogue();
1227}
1228
1231 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1232}
1233
1235 return enableCFIFixup(MF) &&
1236 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1237}
1238
1239/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1240/// debug info. It's the same as what we use for resolving the code-gen
1241/// references for now. FIXME: This can go wrong when references are
1242/// SP-relative and simple call frames aren't used.
1245 Register &FrameReg) const {
1247 MF, FI, FrameReg,
1248 /*PreferFP=*/
1249 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1250 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1251 /*ForSimm=*/false);
1252}
1253
1256 int FI) const {
1257 // This function serves to provide a comparable offset from a single reference
1258 // point (the value of SP at function entry) that can be used for analysis,
1259 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1260 // correct for all objects in the presence of VLA-area objects or dynamic
1261 // stack re-alignment.
1262
1263 const auto &MFI = MF.getFrameInfo();
1264
1265 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1266 StackOffset ZPRStackSize = getZPRStackSize(MF);
1267 StackOffset PPRStackSize = getPPRStackSize(MF);
1268 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1269
1270 // For VLA-area objects, just emit an offset at the end of the stack frame.
1271 // Whilst not quite correct, these objects do live at the end of the frame and
1272 // so it is more useful for analysis for the offset to reflect this.
1273 if (MFI.isVariableSizedObjectIndex(FI)) {
1274 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1275 }
1276
1277 // This is correct in the absence of any SVE stack objects.
1278 if (!SVEStackSize)
1279 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1280
1281 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1282 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1283 if (MFI.hasScalableStackID(FI)) {
1284 if (FPAfterSVECalleeSaves &&
1285 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1286 assert(!AFI->hasSplitSVEObjects() &&
1287 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1288 return StackOffset::getScalable(ObjectOffset);
1289 }
1290 StackOffset AccessOffset{};
1291 // The scalable vectors are below (lower address) the scalable predicates
1292 // with split SVE objects, so we must subtract the size of the predicates.
1293 if (AFI->hasSplitSVEObjects() &&
1294 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1295 AccessOffset = -PPRStackSize;
1296 return AccessOffset +
1297 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1298 ObjectOffset);
1299 }
1300
1301 bool IsFixed = MFI.isFixedObjectIndex(FI);
1302 bool IsCSR =
1303 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1304
1305 StackOffset ScalableOffset = {};
1306 if (!IsFixed && !IsCSR) {
1307 ScalableOffset = -SVEStackSize;
1308 } else if (FPAfterSVECalleeSaves && IsCSR) {
1309 ScalableOffset =
1311 }
1312
1313 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1314}
1315
1321
1322StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1323 int64_t ObjectOffset) const {
1324 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1325 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1326 const Function &F = MF.getFunction();
1327 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1328 unsigned FixedObject =
1329 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1330 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1331 int64_t FPAdjust =
1332 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1333 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1334}
1335
1336StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1337 int64_t ObjectOffset) const {
1338 const auto &MFI = MF.getFrameInfo();
1339 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1340}
1341
1342// TODO: This function currently does not work for scalable vectors.
1344 int FI) const {
1345 const AArch64RegisterInfo *RegInfo =
1346 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1347 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1348 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1349 ? getFPOffset(MF, ObjectOffset).getFixed()
1350 : getStackOffset(MF, ObjectOffset).getFixed();
1351}
1352
1354 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1355 bool ForSimm) const {
1356 const auto &MFI = MF.getFrameInfo();
1357 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1358 bool isFixed = MFI.isFixedObjectIndex(FI);
1359 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1360 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1361 FrameReg, PreferFP, ForSimm);
1362}
1363
1365 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1366 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1367 bool ForSimm) const {
1368 const auto &MFI = MF.getFrameInfo();
1369 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1370 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1371 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1372
1373 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1374 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1375 bool isCSR =
1376 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1377 bool isSVE = MFI.isScalableStackID(StackID);
1378
1379 StackOffset ZPRStackSize = getZPRStackSize(MF);
1380 StackOffset PPRStackSize = getPPRStackSize(MF);
1381 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1382
1383 // Use frame pointer to reference fixed objects. Use it for locals if
1384 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1385 // reliable as a base). Make sure useFPForScavengingIndex() does the
1386 // right thing for the emergency spill slot.
1387 bool UseFP = false;
1388 if (AFI->hasStackFrame() && !isSVE) {
1389 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1390 // there are scalable (SVE) objects in between the FP and the fixed-sized
1391 // objects.
1392 PreferFP &= !SVEStackSize;
1393
1394 // Note: Keeping the following as multiple 'if' statements rather than
1395 // merging to a single expression for readability.
1396 //
1397 // Argument access should always use the FP.
1398 if (isFixed) {
1399 UseFP = hasFP(MF);
1400 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1401 // References to the CSR area must use FP if we're re-aligning the stack
1402 // since the dynamically-sized alignment padding is between the SP/BP and
1403 // the CSR area.
1404 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1405 UseFP = true;
1406 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1407 // If the FPOffset is negative and we're producing a signed immediate, we
1408 // have to keep in mind that the available offset range for negative
1409 // offsets is smaller than for positive ones. If an offset is available
1410 // via the FP and the SP, use whichever is closest.
1411 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1412 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1413
1414 if (FPOffset >= 0) {
1415 // If the FPOffset is positive, that'll always be best, as the SP/BP
1416 // will be even further away.
1417 UseFP = true;
1418 } else if (MFI.hasVarSizedObjects()) {
1419 // If we have variable sized objects, we can use either FP or BP, as the
1420 // SP offset is unknown. We can use the base pointer if we have one and
1421 // FP is not preferred. If not, we're stuck with using FP.
1422 bool CanUseBP = RegInfo->hasBasePointer(MF);
1423 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1424 UseFP = PreferFP;
1425 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1426 UseFP = true;
1427 // else we can use BP and FP, but the offset from FP won't fit.
1428 // That will make us scavenge registers which we can probably avoid by
1429 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1430 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1431 // Funclets access the locals contained in the parent's stack frame
1432 // via the frame pointer, so we have to use the FP in the parent
1433 // function.
1434 (void) Subtarget;
1435 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1436 MF.getFunction().isVarArg()) &&
1437 "Funclets should only be present on Win64");
1438 UseFP = true;
1439 } else {
1440 // We have the choice between FP and (SP or BP).
1441 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1442 UseFP = true;
1443 }
1444 }
1445 }
1446
1447 assert(
1448 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1449 "In the presence of dynamic stack pointer realignment, "
1450 "non-argument/CSR objects cannot be accessed through the frame pointer");
1451
1452 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1453
1454 if (isSVE) {
1455 StackOffset FPOffset = StackOffset::get(
1456 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1457 StackOffset SPOffset =
1458 SVEStackSize +
1459 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1460 ObjectOffset);
1461
1462 // With split SVE objects the ObjectOffset is relative to the split area
1463 // (i.e. the PPR area or ZPR area respectively).
1464 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1465 // If we're accessing an SVE vector with split SVE objects...
1466 // - From the FP we need to move down past the PPR area:
1467 FPOffset -= PPRStackSize;
1468 // - From the SP we only need to move up to the ZPR area:
1469 SPOffset -= PPRStackSize;
1470 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1471 // `SPOffset = ZPRStackSize + ...`.
1472 }
1473
1474 if (FPAfterSVECalleeSaves) {
1476 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1479 }
1480 }
1481
1482 // Always use the FP for SVE spills if available and beneficial.
1483 if (hasFP(MF) && (SPOffset.getFixed() ||
1484 FPOffset.getScalable() < SPOffset.getScalable() ||
1485 RegInfo->hasStackRealignment(MF))) {
1486 FrameReg = RegInfo->getFrameRegister(MF);
1487 return FPOffset;
1488 }
1489 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1490 : MCRegister(AArch64::SP);
1491
1492 return SPOffset;
1493 }
1494
1495 StackOffset SVEAreaOffset = {};
1496 if (FPAfterSVECalleeSaves) {
1497 // In this stack layout, the FP is in between the callee saves and other
1498 // SVE allocations.
1499 StackOffset SVECalleeSavedStack =
1501 if (UseFP) {
1502 if (isFixed)
1503 SVEAreaOffset = SVECalleeSavedStack;
1504 else if (!isCSR)
1505 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1506 } else {
1507 if (isFixed)
1508 SVEAreaOffset = SVEStackSize;
1509 else if (isCSR)
1510 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1511 }
1512 } else {
1513 if (UseFP && !(isFixed || isCSR))
1514 SVEAreaOffset = -SVEStackSize;
1515 if (!UseFP && (isFixed || isCSR))
1516 SVEAreaOffset = SVEStackSize;
1517 }
1518
1519 if (UseFP) {
1520 FrameReg = RegInfo->getFrameRegister(MF);
1521 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1522 }
1523
1524 // Use the base pointer if we have one.
1525 if (RegInfo->hasBasePointer(MF))
1526 FrameReg = RegInfo->getBaseRegister();
1527 else {
1528 assert(!MFI.hasVarSizedObjects() &&
1529 "Can't use SP when we have var sized objects.");
1530 FrameReg = AArch64::SP;
1531 // If we're using the red zone for this function, the SP won't actually
1532 // be adjusted, so the offsets will be negative. They're also all
1533 // within range of the signed 9-bit immediate instructions.
1534 if (canUseRedZone(MF))
1535 Offset -= AFI->getLocalStackSize();
1536 }
1537
1538 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1539}
1540
1542 // Do not set a kill flag on values that are also marked as live-in. This
1543 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1544 // callee saved registers.
1545 // Omitting the kill flags is conservatively correct even if the live-in
1546 // is not used after all.
1547 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1548 return getKillRegState(!IsLiveIn);
1549}
1550
1552 MachineFunction &MF) {
1553 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1554 AttributeList Attrs = MF.getFunction().getAttributes();
1556 return Subtarget.isTargetMachO() &&
1557 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1558 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1560 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1561}
1562
1563static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1564 unsigned SpillCount, unsigned Reg1,
1565 unsigned Reg2, bool NeedsWinCFI,
1566 const TargetRegisterInfo *TRI) {
1567 // If we are generating register pairs for a Windows function that requires
1568 // EH support, then pair consecutive registers only. There are no unwind
1569 // opcodes for saves/restores of non-consecutive register pairs.
1570 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1571 // save_lrpair.
1572 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1573
1574 if (Reg2 == AArch64::FP)
1575 return true;
1576 if (!NeedsWinCFI)
1577 return false;
1578
1579 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1580 // This is handled by only allowing paired spills for registers spilled at
1581 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1582 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1583 // 16-byte alignment in the uop representation.
1584 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1585 return SpillExtendedVolatile
1586 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1587 (SpillCount % 2) == 0)
1588 : false;
1589
1590 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1591 // opcode. The save_lrpair opcode requires the first register to be odd.
1592 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1593 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR)
1594 return false;
1595 return true;
1596}
1597
1598/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1599/// WindowsCFI requires that only consecutive registers can be paired.
1600/// LR and FP need to be allocated together when the frame needs to save
1601/// the frame-record. This means any other register pairing with LR is invalid.
1602static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1603 unsigned SpillCount, unsigned Reg1,
1604 unsigned Reg2, bool UsesWinAAPCS,
1605 bool NeedsWinCFI, bool NeedsFrameRecord,
1606 const TargetRegisterInfo *TRI) {
1607 if (UsesWinAAPCS)
1608 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1609 Reg1, Reg2, NeedsWinCFI, TRI);
1610
1611 // If we need to store the frame record, don't pair any register
1612 // with LR other than FP.
1613 if (NeedsFrameRecord)
1614 return Reg2 == AArch64::LR;
1615
1616 return false;
1617}
1618
1619namespace {
1620
1621struct RegPairInfo {
1622 Register Reg1;
1623 Register Reg2;
1624 int FrameIdx;
1625 int Offset;
1626 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1627 const TargetRegisterClass *RC;
1628
1629 RegPairInfo() = default;
1630
1631 bool isPaired() const { return Reg2.isValid(); }
1632
1633 bool isScalable() const { return Type == PPR || Type == ZPR; }
1634};
1635
1636} // end anonymous namespace
1637
1639 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1640 if (SavedRegs.test(PReg)) {
1641 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1642 return MCRegister(PNReg);
1643 }
1644 }
1645 return MCRegister();
1646}
1647
1648// The multivector LD/ST are available only for SME or SVE2p1 targets
1650 MachineFunction &MF) {
1652 return false;
1653
1654 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1655 bool IsLocallyStreaming =
1656 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1657
1658 // Only when in streaming mode SME2 instructions can be safely used.
1659 // It is not safe to use SME2 instructions when in streaming compatible or
1660 // locally streaming mode.
1661 return Subtarget.hasSVE2p1() ||
1662 (Subtarget.hasSME2() &&
1663 (!IsLocallyStreaming && Subtarget.isStreaming()));
1664}
1665
1667 MachineFunction &MF,
1669 const TargetRegisterInfo *TRI,
1671 bool NeedsFrameRecord) {
1672
1673 if (CSI.empty())
1674 return;
1675
1676 bool IsWindows = isTargetWindows(MF);
1678 unsigned StackHazardSize = getStackHazardSize(MF);
1679 MachineFrameInfo &MFI = MF.getFrameInfo();
1681 unsigned Count = CSI.size();
1682 (void)CC;
1683 // MachO's compact unwind format relies on all registers being stored in
1684 // pairs.
1685 assert((!produceCompactUnwindFrame(AFL, MF) ||
1688 (Count & 1) == 0) &&
1689 "Odd number of callee-saved regs to spill!");
1690 int ByteOffset = AFI->getCalleeSavedStackSize();
1691 int StackFillDir = -1;
1692 int RegInc = 1;
1693 unsigned FirstReg = 0;
1694 if (IsWindows) {
1695 // For WinCFI, fill the stack from the bottom up.
1696 ByteOffset = 0;
1697 StackFillDir = 1;
1698 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1699 // backwards, to pair up registers starting from lower numbered registers.
1700 RegInc = -1;
1701 FirstReg = Count - 1;
1702 }
1703
1704 bool FPAfterSVECalleeSaves = AFL.hasSVECalleeSavesAboveFrameRecord(MF);
1705 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1706 // scratch, x18 as platform reserved. However, clang has extended calling
1707 // convensions such as preserve_most and preserve_all which treat these as
1708 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1709 // uOPs which have separate restrictions. We need to check for that.
1710 //
1711 // NOTE: we currently do not account for the D registers as LLVM does not
1712 // support non-ABI compliant D register spills.
1713 bool SpillExtendedVolatile =
1714 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1715 const auto &Reg = CSI.getReg();
1716 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1717 });
1718
1719 int ZPRByteOffset = 0;
1720 int PPRByteOffset = 0;
1721 bool SplitPPRs = AFI->hasSplitSVEObjects();
1722 if (SplitPPRs) {
1723 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1724 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1725 } else if (!FPAfterSVECalleeSaves) {
1726 ZPRByteOffset =
1728 // Unused: Everything goes in ZPR space.
1729 PPRByteOffset = 0;
1730 }
1731
1732 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1733 Register LastReg = 0;
1734 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1735
1736 auto AlignOffset = [StackFillDir](int Offset, int Align) {
1737 if (StackFillDir < 0)
1738 return alignDown(Offset, Align);
1739 return alignTo(Offset, Align);
1740 };
1741
1742 // When iterating backwards, the loop condition relies on unsigned wraparound.
1743 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1744 RegPairInfo RPI;
1745 RPI.Reg1 = CSI[i].getReg();
1746
1747 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1748 RPI.Type = RegPairInfo::GPR;
1749 RPI.RC = &AArch64::GPR64RegClass;
1750 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1751 RPI.Type = RegPairInfo::FPR64;
1752 RPI.RC = &AArch64::FPR64RegClass;
1753 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1754 RPI.Type = RegPairInfo::FPR128;
1755 RPI.RC = &AArch64::FPR128RegClass;
1756 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1757 RPI.Type = RegPairInfo::ZPR;
1758 RPI.RC = &AArch64::ZPRRegClass;
1759 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1760 RPI.Type = RegPairInfo::PPR;
1761 RPI.RC = &AArch64::PPRRegClass;
1762 } else if (RPI.Reg1 == AArch64::VG) {
1763 RPI.Type = RegPairInfo::VG;
1764 RPI.RC = &AArch64::FIXED_REGSRegClass;
1765 } else {
1766 llvm_unreachable("Unsupported register class.");
1767 }
1768
1769 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1770 ? PPRByteOffset
1771 : ZPRByteOffset;
1772
1773 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1774 if (HasCSHazardPadding &&
1775 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1777 ByteOffset += StackFillDir * StackHazardSize;
1778 LastReg = RPI.Reg1;
1779
1780 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1781 int Scale = TRI->getSpillSize(*RPI.RC);
1782 // Add the next reg to the pair if it is in the same register class.
1783 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1784 MCRegister NextReg = CSI[i + RegInc].getReg();
1785 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1786 switch (RPI.Type) {
1787 case RegPairInfo::GPR:
1788 if (AArch64::GPR64RegClass.contains(NextReg) &&
1789 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1790 RPI.Reg1, NextReg, IsWindows,
1791 NeedsWinCFI, NeedsFrameRecord, TRI))
1792 RPI.Reg2 = NextReg;
1793 break;
1794 case RegPairInfo::FPR64:
1795 if (AArch64::FPR64RegClass.contains(NextReg) &&
1796 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1797 RPI.Reg1, NextReg, IsWindows,
1798 NeedsWinCFI, NeedsFrameRecord, TRI))
1799 RPI.Reg2 = NextReg;
1800 break;
1801 case RegPairInfo::FPR128:
1802 if (AArch64::FPR128RegClass.contains(NextReg))
1803 RPI.Reg2 = NextReg;
1804 break;
1805 case RegPairInfo::PPR:
1806 break;
1807 case RegPairInfo::ZPR:
1808 if (AFI->getPredicateRegForFillSpill() != 0 &&
1809 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1810 // Calculate offset of register pair to see if pair instruction can be
1811 // used.
1812 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1813 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1814 RPI.Reg2 = NextReg;
1815 }
1816 break;
1817 case RegPairInfo::VG:
1818 break;
1819 }
1820 }
1821
1822 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1823 // list to come in sorted by frame index so that we can issue the store
1824 // pair instructions directly. Assert if we see anything otherwise.
1825 //
1826 // The order of the registers in the list is controlled by
1827 // getCalleeSavedRegs(), so they will always be in-order, as well.
1828 assert((!RPI.isPaired() ||
1829 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1830 "Out of order callee saved regs!");
1831
1832 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1833 RPI.Reg1 == AArch64::LR) &&
1834 "FrameRecord must be allocated together with LR");
1835
1836 // Windows AAPCS has FP and LR reversed.
1837 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1838 RPI.Reg2 == AArch64::LR) &&
1839 "FrameRecord must be allocated together with LR");
1840
1841 // MachO's compact unwind format relies on all registers being stored in
1842 // adjacent register pairs.
1843 assert((!produceCompactUnwindFrame(AFL, MF) ||
1846 (RPI.isPaired() &&
1847 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1848 RPI.Reg1 + 1 == RPI.Reg2))) &&
1849 "Callee-save registers not saved as adjacent register pair!");
1850
1851 RPI.FrameIdx = CSI[i].getFrameIdx();
1852 if (IsWindows &&
1853 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1854 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1855
1856 // Realign the scalable offset if necessary. This is relevant when spilling
1857 // predicates on Windows.
1858 if (RPI.isScalable() && ScalableByteOffset % Scale != 0)
1859 ScalableByteOffset = AlignOffset(ScalableByteOffset, Scale);
1860
1861 // Realign the fixed offset if necessary. This is relevant when spilling Q
1862 // registers after spilling an odd amount of X registers.
1863 if (!RPI.isScalable() && ByteOffset % Scale != 0)
1864 ByteOffset = AlignOffset(ByteOffset, Scale);
1865
1866 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1867 assert(OffsetPre % Scale == 0);
1868
1869 if (RPI.isScalable())
1870 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1871 else
1872 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1873
1874 // Swift's async context is directly before FP, so allocate an extra
1875 // 8 bytes for it.
1876 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1877 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1878 (IsWindows && RPI.Reg2 == AArch64::LR)))
1879 ByteOffset += StackFillDir * 8;
1880
1881 // Round up size of non-pair to pair size if we need to pad the
1882 // callee-save area to ensure 16-byte alignment.
1883 if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
1884 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1885 ByteOffset % 16 != 0) {
1886 ByteOffset += 8 * StackFillDir;
1887 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1888 // A stack frame with a gap looks like this, bottom up:
1889 // d9, d8. x21, gap, x20, x19.
1890 // Set extra alignment on the x21 object to create the gap above it.
1891 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1892 NeedGapToAlignStack = false;
1893 }
1894
1895 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1896 assert(OffsetPost % Scale == 0);
1897 // If filling top down (default), we want the offset after incrementing it.
1898 // If filling bottom up (WinCFI) we need the original offset.
1899 int Offset = IsWindows ? OffsetPre : OffsetPost;
1900
1901 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1902 // Swift context can directly precede FP.
1903 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1904 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1905 (IsWindows && RPI.Reg2 == AArch64::LR)))
1906 Offset += 8;
1907 RPI.Offset = Offset / Scale;
1908
1909 assert((!RPI.isPaired() ||
1910 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1911 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1912 "Offset out of bounds for LDP/STP immediate");
1913
1914 auto isFrameRecord = [&] {
1915 if (RPI.isPaired())
1916 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1917 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1918 // Otherwise, look for the frame record as two unpaired registers. This is
1919 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1920 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1921 // On Windows, this check works out as current reg == FP, next reg == LR,
1922 // and on other platforms current reg == FP, previous reg == LR. This
1923 // works out as the correct pre-increment or post-increment offsets
1924 // respectively.
1925 return i > 0 && RPI.Reg1 == AArch64::FP &&
1926 CSI[i - 1].getReg() == AArch64::LR;
1927 };
1928
1929 // Save the offset to frame record so that the FP register can point to the
1930 // innermost frame record (spilled FP and LR registers).
1931 if (NeedsFrameRecord && isFrameRecord())
1933
1934 RegPairs.push_back(RPI);
1935 if (RPI.isPaired())
1936 i += RegInc;
1937 }
1938 if (IsWindows) {
1939 // If we need an alignment gap in the stack, align the topmost stack
1940 // object. A stack frame with a gap looks like this, bottom up:
1941 // x19, d8. d9, gap.
1942 // Set extra alignment on the topmost stack object (the first element in
1943 // CSI, which goes top down), to create the gap above it.
1944 if (AFI->hasCalleeSaveStackFreeSpace())
1945 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1946 // We iterated bottom up over the registers; flip RegPairs back to top
1947 // down order.
1948 std::reverse(RegPairs.begin(), RegPairs.end());
1949 }
1950}
1951
1955 MachineFunction &MF = *MBB.getParent();
1956 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1957 auto &TLI = *Subtarget.getTargetLowering();
1958 const AArch64InstrInfo &TII = *Subtarget.getInstrInfo();
1959 bool NeedsWinCFI = needsWinCFI(MF);
1960 DebugLoc DL;
1962
1963 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1964
1965 MachineRegisterInfo &MRI = MF.getRegInfo();
1966 // Refresh the reserved regs in case there are any potential changes since the
1967 // last freeze.
1968 MRI.freezeReservedRegs();
1969
1970 if (homogeneousPrologEpilog(MF)) {
1971 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1973
1974 for (auto &RPI : RegPairs) {
1975 MIB.addReg(RPI.Reg1);
1976 MIB.addReg(RPI.Reg2);
1977
1978 // Update register live in.
1979 if (!MRI.isReserved(RPI.Reg1))
1980 MBB.addLiveIn(RPI.Reg1);
1981 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1982 MBB.addLiveIn(RPI.Reg2);
1983 }
1984 return true;
1985 }
1986 bool PTrueCreated = false;
1987 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1988 Register Reg1 = RPI.Reg1;
1989 Register Reg2 = RPI.Reg2;
1990 unsigned StrOpc;
1991
1992 // Issue sequence of spills for cs regs. The first spill may be converted
1993 // to a pre-decrement store later by emitPrologue if the callee-save stack
1994 // area allocation can't be combined with the local stack area allocation.
1995 // For example:
1996 // stp x22, x21, [sp, #0] // addImm(+0)
1997 // stp x20, x19, [sp, #16] // addImm(+2)
1998 // stp fp, lr, [sp, #32] // addImm(+4)
1999 // Rationale: This sequence saves uop updates compared to a sequence of
2000 // pre-increment spills like stp xi,xj,[sp,#-16]!
2001 // Note: Similar rationale and sequence for restores in epilog.
2002 unsigned Size = TRI->getSpillSize(*RPI.RC);
2003 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2004 switch (RPI.Type) {
2005 case RegPairInfo::GPR:
2006 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2007 break;
2008 case RegPairInfo::FPR64:
2009 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2010 break;
2011 case RegPairInfo::FPR128:
2012 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2013 break;
2014 case RegPairInfo::ZPR:
2015 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2016 break;
2017 case RegPairInfo::PPR:
2018 StrOpc = AArch64::STR_PXI;
2019 break;
2020 case RegPairInfo::VG:
2021 StrOpc = AArch64::STRXui;
2022 break;
2023 }
2024
2025 Register X0Scratch;
2026 llvm::scope_exit RestoreX0([&] {
2027 if (X0Scratch != AArch64::NoRegister)
2028 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2029 .addReg(X0Scratch)
2031 });
2032
2033 if (Reg1 == AArch64::VG) {
2034 // Find an available register to store value of VG to.
2035 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2036 assert(Reg1 != AArch64::NoRegister);
2037 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2038 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2039 .addImm(31)
2040 .addImm(1)
2042 } else {
2044 if (any_of(MBB.liveins(),
2045 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2046 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2047 AArch64::X0, LiveIn.PhysReg);
2048 })) {
2049 X0Scratch = Reg1;
2050 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2051 .addReg(AArch64::X0)
2053 }
2054
2055 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2056 const uint32_t *RegMask =
2057 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2058 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2059 .addExternalSymbol(TLI.getLibcallName(LC))
2060 .addRegMask(RegMask)
2061 .addReg(AArch64::X0, RegState::ImplicitDefine)
2063 Reg1 = AArch64::X0;
2064 }
2065 }
2066
2067 LLVM_DEBUG({
2068 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2069 if (RPI.isPaired())
2070 dbgs() << ", " << printReg(Reg2, TRI);
2071 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2072 if (RPI.isPaired())
2073 dbgs() << ", " << RPI.FrameIdx + 1;
2074 dbgs() << ")\n";
2075 });
2076
2077 assert((!isTargetWindows(MF) ||
2078 !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2079 "Windows unwdinding requires a consecutive (FP,LR) pair");
2080 // Windows unwind codes require consecutive registers if registers are
2081 // paired. Make the switch here, so that the code below will save (x,x+1)
2082 // and not (x+1,x).
2083 unsigned FrameIdxReg1 = RPI.FrameIdx;
2084 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2085 if (isTargetWindows(MF) && RPI.isPaired()) {
2086 std::swap(Reg1, Reg2);
2087 std::swap(FrameIdxReg1, FrameIdxReg2);
2088 }
2089
2090 if (RPI.isPaired() && RPI.isScalable()) {
2091 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2094 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2095 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2096 "Expects SVE2.1 or SME2 target and a predicate register");
2097#ifdef EXPENSIVE_CHECKS
2098 auto IsPPR = [](const RegPairInfo &c) {
2099 return c.Reg1 == RegPairInfo::PPR;
2100 };
2101 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2102 auto IsZPR = [](const RegPairInfo &c) {
2103 return c.Type == RegPairInfo::ZPR;
2104 };
2105 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2106 assert(!(PPRBegin < ZPRBegin) &&
2107 "Expected callee save predicate to be handled first");
2108#endif
2109 if (!PTrueCreated) {
2110 PTrueCreated = true;
2111 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2113 }
2114 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2115 if (!MRI.isReserved(Reg1))
2116 MBB.addLiveIn(Reg1);
2117 if (!MRI.isReserved(Reg2))
2118 MBB.addLiveIn(Reg2);
2119 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2121 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2122 MachineMemOperand::MOStore, Size, Alignment));
2123 MIB.addReg(PnReg);
2124 MIB.addReg(AArch64::SP)
2125 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2126 // where 2*vscale is implicit
2129 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2130 MachineMemOperand::MOStore, Size, Alignment));
2131 if (NeedsWinCFI)
2132 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2133 } else { // The code when the pair of ZReg is not present
2134 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2135 if (!MRI.isReserved(Reg1))
2136 MBB.addLiveIn(Reg1);
2137 if (RPI.isPaired()) {
2138 if (!MRI.isReserved(Reg2))
2139 MBB.addLiveIn(Reg2);
2140 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2142 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2143 MachineMemOperand::MOStore, Size, Alignment));
2144 }
2145 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2146 .addReg(AArch64::SP)
2147 .addImm(RPI.Offset) // [sp, #offset*vscale],
2148 // where factor*vscale is implicit
2151 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2152 MachineMemOperand::MOStore, Size, Alignment));
2153 if (NeedsWinCFI)
2154 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2155 }
2156 // Update the StackIDs of the SVE stack slots.
2157 MachineFrameInfo &MFI = MF.getFrameInfo();
2158 if (RPI.Type == RegPairInfo::ZPR) {
2159 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2160 if (RPI.isPaired())
2161 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2162 } else if (RPI.Type == RegPairInfo::PPR) {
2164 if (RPI.isPaired())
2166 }
2167 }
2168 return true;
2169}
2170
2174 MachineFunction &MF = *MBB.getParent();
2175 const AArch64InstrInfo &TII =
2176 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
2177 DebugLoc DL;
2179 bool NeedsWinCFI = needsWinCFI(MF);
2180
2181 if (MBBI != MBB.end())
2182 DL = MBBI->getDebugLoc();
2183
2184 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2185 if (homogeneousPrologEpilog(MF, &MBB)) {
2186 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2188 for (auto &RPI : RegPairs) {
2189 MIB.addReg(RPI.Reg1, RegState::Define);
2190 MIB.addReg(RPI.Reg2, RegState::Define);
2191 }
2192 return true;
2193 }
2194
2195 // For performance reasons restore SVE register in increasing order
2196 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2197 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2198 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2199 std::reverse(PPRBegin, PPREnd);
2200 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2201 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2202 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2203 std::reverse(ZPRBegin, ZPREnd);
2204
2205 bool PTrueCreated = false;
2206 for (const RegPairInfo &RPI : RegPairs) {
2207 Register Reg1 = RPI.Reg1;
2208 Register Reg2 = RPI.Reg2;
2209
2210 // Issue sequence of restores for cs regs. The last restore may be converted
2211 // to a post-increment load later by emitEpilogue if the callee-save stack
2212 // area allocation can't be combined with the local stack area allocation.
2213 // For example:
2214 // ldp fp, lr, [sp, #32] // addImm(+4)
2215 // ldp x20, x19, [sp, #16] // addImm(+2)
2216 // ldp x22, x21, [sp, #0] // addImm(+0)
2217 // Note: see comment in spillCalleeSavedRegisters()
2218 unsigned LdrOpc;
2219 unsigned Size = TRI->getSpillSize(*RPI.RC);
2220 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2221 switch (RPI.Type) {
2222 case RegPairInfo::GPR:
2223 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2224 break;
2225 case RegPairInfo::FPR64:
2226 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2227 break;
2228 case RegPairInfo::FPR128:
2229 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2230 break;
2231 case RegPairInfo::ZPR:
2232 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2233 break;
2234 case RegPairInfo::PPR:
2235 LdrOpc = AArch64::LDR_PXI;
2236 break;
2237 case RegPairInfo::VG:
2238 continue;
2239 }
2240 LLVM_DEBUG({
2241 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2242 if (RPI.isPaired())
2243 dbgs() << ", " << printReg(Reg2, TRI);
2244 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2245 if (RPI.isPaired())
2246 dbgs() << ", " << RPI.FrameIdx + 1;
2247 dbgs() << ")\n";
2248 });
2249
2250 // Windows unwind codes require consecutive registers if registers are
2251 // paired. Make the switch here, so that the code below will save (x,x+1)
2252 // and not (x+1,x).
2253 unsigned FrameIdxReg1 = RPI.FrameIdx;
2254 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2255 if (isTargetWindows(MF) && RPI.isPaired()) {
2256 std::swap(Reg1, Reg2);
2257 std::swap(FrameIdxReg1, FrameIdxReg2);
2258 }
2259
2261 if (RPI.isPaired() && RPI.isScalable()) {
2262 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2264 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2265 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2266 "Expects SVE2.1 or SME2 target and a predicate register");
2267#ifdef EXPENSIVE_CHECKS
2268 assert(!(PPRBegin < ZPRBegin) &&
2269 "Expected callee save predicate to be handled first");
2270#endif
2271 if (!PTrueCreated) {
2272 PTrueCreated = true;
2273 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2275 }
2276 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2277 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2278 getDefRegState(true));
2280 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2281 MachineMemOperand::MOLoad, Size, Alignment));
2282 MIB.addReg(PnReg);
2283 MIB.addReg(AArch64::SP)
2284 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2285 // where 2*vscale is implicit
2288 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2289 MachineMemOperand::MOLoad, Size, Alignment));
2290 if (NeedsWinCFI)
2291 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2292 } else {
2293 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2294 if (RPI.isPaired()) {
2295 MIB.addReg(Reg2, getDefRegState(true));
2297 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2298 MachineMemOperand::MOLoad, Size, Alignment));
2299 }
2300 MIB.addReg(Reg1, getDefRegState(true));
2301 MIB.addReg(AArch64::SP)
2302 .addImm(RPI.Offset) // [sp, #offset*vscale]
2303 // where factor*vscale is implicit
2306 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2307 MachineMemOperand::MOLoad, Size, Alignment));
2308 if (NeedsWinCFI)
2309 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2310 }
2311 }
2312 return true;
2313}
2314
2315// Return the FrameID for a MMO.
2316static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2317 const MachineFrameInfo &MFI) {
2318 auto *PSV =
2320 if (PSV)
2321 return std::optional<int>(PSV->getFrameIndex());
2322
2323 if (MMO->getValue()) {
2324 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2325 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2326 FI++)
2327 if (MFI.getObjectAllocation(FI) == Al)
2328 return FI;
2329 }
2330 }
2331
2332 return std::nullopt;
2333}
2334
2335// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2336static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2337 const MachineFrameInfo &MFI) {
2338 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2339 return std::nullopt;
2340
2341 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2342}
2343
2344// Returns true if the LDST MachineInstr \p MI is a PPR access.
2345static bool isPPRAccess(const MachineInstr &MI) {
2346 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2347}
2348
2349// Check if a Hazard slot is needed for the current function, and if so create
2350// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2351// which can be used to determine if any hazard padding is needed.
2352void AArch64FrameLowering::determineStackHazardSlot(
2353 MachineFunction &MF, BitVector &SavedRegs) const {
2354 unsigned StackHazardSize = getStackHazardSize(MF);
2355 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2356 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2358 return;
2359
2360 // Stack hazards are only needed in streaming functions.
2361 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2362 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2363 return;
2364
2365 MachineFrameInfo &MFI = MF.getFrameInfo();
2366
2367 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2368 // stack objects.
2369 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2370 return AArch64::FPR64RegClass.contains(Reg) ||
2371 AArch64::FPR128RegClass.contains(Reg) ||
2372 AArch64::ZPRRegClass.contains(Reg);
2373 });
2374 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2375 return AArch64::PPRRegClass.contains(Reg);
2376 });
2377 bool HasFPRStackObjects = false;
2378 bool HasPPRStackObjects = false;
2379 if (!HasFPRCSRs || SplitSVEObjects) {
2380 enum SlotType : uint8_t {
2381 Unknown = 0,
2382 ZPRorFPR = 1 << 0,
2383 PPR = 1 << 1,
2384 GPR = 1 << 2,
2386 };
2387
2388 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2389 // based on the kinds of accesses used in the function.
2390 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2391 for (auto &MBB : MF) {
2392 for (auto &MI : MBB) {
2393 std::optional<int> FI = getLdStFrameID(MI, MFI);
2394 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2395 continue;
2396 if (MFI.hasScalableStackID(*FI)) {
2397 SlotTypes[*FI] |=
2398 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2399 } else {
2400 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2401 ? SlotType::ZPRorFPR
2402 : SlotType::GPR;
2403 }
2404 }
2405 }
2406
2407 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2408 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2409 // For SplitSVEObjects remember that this stack slot is a predicate, this
2410 // will be needed later when determining the frame layout.
2411 if (SlotTypes[FI] == SlotType::PPR) {
2413 HasPPRStackObjects = true;
2414 }
2415 }
2416 }
2417
2418 if (HasFPRCSRs || HasFPRStackObjects) {
2419 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2420 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2421 << StackHazardSize << "\n");
2423 }
2424
2425 if (!AFI->hasStackHazardSlotIndex())
2426 return;
2427
2428 if (SplitSVEObjects) {
2429 CallingConv::ID CC = MF.getFunction().getCallingConv();
2430 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2431 AFI->setSplitSVEObjects(true);
2432 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2433 return;
2434 }
2435
2436 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2437 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2438 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2439 "non-SVE CC function...\n");
2440
2441 // If another calling convention is explicitly set FPRs can't be promoted to
2442 // ZPR callee-saves.
2444 LLVM_DEBUG(
2445 dbgs()
2446 << "Calling convention is not supported with SplitSVEObjects\n");
2447 return;
2448 }
2449
2450 if (!HasPPRCSRs && !HasPPRStackObjects) {
2451 LLVM_DEBUG(
2452 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2453 return;
2454 }
2455
2456 if (!HasFPRCSRs && !HasFPRStackObjects) {
2457 LLVM_DEBUG(
2458 dbgs()
2459 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2460 return;
2461 }
2462
2463 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2464 MF.getSubtarget<AArch64Subtarget>();
2466 "Expected SVE to be available for PPRs");
2467
2468 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2469 // With SplitSVEObjects the CS hazard padding is placed between the
2470 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2471 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2472 BitVector FPRZRegs(SavedRegs.size());
2473 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2474 BitVector::reference RegBit = SavedRegs[Reg];
2475 if (!RegBit)
2476 continue;
2477 unsigned SubRegIdx = 0;
2478 if (AArch64::FPR64RegClass.contains(Reg))
2479 SubRegIdx = AArch64::dsub;
2480 else if (AArch64::FPR128RegClass.contains(Reg))
2481 SubRegIdx = AArch64::zsub;
2482 else
2483 continue;
2484 // Clear the bit for the FPR save.
2485 RegBit = false;
2486 // Mark that we should save the corresponding ZPR.
2487 Register ZReg =
2488 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2489 FPRZRegs.set(ZReg);
2490 }
2491 SavedRegs |= FPRZRegs;
2492
2493 AFI->setSplitSVEObjects(true);
2494 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2495 }
2496}
2497
2499 BitVector &SavedRegs,
2500 RegScavenger *RS) const {
2501 // All calls are tail calls in GHC calling conv, and functions have no
2502 // prologue/epilogue.
2504 return;
2505
2506 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2507
2509 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2511 unsigned UnspilledCSGPR = AArch64::NoRegister;
2512 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2513
2514 MachineFrameInfo &MFI = MF.getFrameInfo();
2515 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2516
2517 MCRegister BasePointerReg =
2518 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2519
2520 unsigned ExtraCSSpill = 0;
2521 bool HasUnpairedGPR64 = false;
2522 bool HasPairZReg = false;
2523 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2524 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2525
2526 // Figure out which callee-saved registers to save/restore.
2527 for (unsigned i = 0; CSRegs[i]; ++i) {
2528 const MCRegister Reg = CSRegs[i];
2529
2530 // Add the base pointer register to SavedRegs if it is callee-save.
2531 if (Reg == BasePointerReg)
2532 SavedRegs.set(Reg);
2533
2534 // Don't save manually reserved registers set through +reserve-x#i,
2535 // even for callee-saved registers, as per GCC's behavior.
2536 if (UserReservedRegs[Reg]) {
2537 SavedRegs.reset(Reg);
2538 continue;
2539 }
2540
2541 bool RegUsed = SavedRegs.test(Reg);
2542 MCRegister PairedReg;
2543 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2544 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2545 AArch64::FPR128RegClass.contains(Reg)) {
2546 // Compensate for odd numbers of GP CSRs.
2547 // For now, all the known cases of odd number of CSRs are of GPRs.
2548 if (HasUnpairedGPR64)
2549 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2550 else
2551 PairedReg = CSRegs[i ^ 1];
2552 }
2553
2554 // If the function requires all the GP registers to save (SavedRegs),
2555 // and there are an odd number of GP CSRs at the same time (CSRegs),
2556 // PairedReg could be in a different register class from Reg, which would
2557 // lead to a FPR (usually D8) accidentally being marked saved.
2558 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2559 PairedReg = AArch64::NoRegister;
2560 HasUnpairedGPR64 = true;
2561 }
2562 assert(PairedReg == AArch64::NoRegister ||
2563 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2564 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2565 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2566
2567 if (!RegUsed) {
2568 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2569 UnspilledCSGPR = Reg;
2570 UnspilledCSGPRPaired = PairedReg;
2571 }
2572 continue;
2573 }
2574
2575 // MachO's compact unwind format relies on all registers being stored in
2576 // pairs.
2577 // FIXME: the usual format is actually better if unwinding isn't needed.
2578 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2579 !SavedRegs.test(PairedReg)) {
2580 SavedRegs.set(PairedReg);
2581 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2582 !ReservedRegs[PairedReg])
2583 ExtraCSSpill = PairedReg;
2584 }
2585 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2586 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2587 SavedRegs.test(CSRegs[i ^ 1]));
2588 }
2589
2590 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2592 // Find a suitable predicate register for the multi-vector spill/fill
2593 // instructions.
2594 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2595 if (PnReg.isValid())
2596 AFI->setPredicateRegForFillSpill(PnReg);
2597 // If no free callee-save has been found assign one.
2598 if (!AFI->getPredicateRegForFillSpill() &&
2599 MF.getFunction().getCallingConv() ==
2601 SavedRegs.set(AArch64::P8);
2602 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2603 }
2604
2605 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2606 "Predicate cannot be a reserved register");
2607 }
2608
2610 !Subtarget.isTargetWindows()) {
2611 // For Windows calling convention on a non-windows OS, where X18 is treated
2612 // as reserved, back up X18 when entering non-windows code (marked with the
2613 // Windows calling convention) and restore when returning regardless of
2614 // whether the individual function uses it - it might call other functions
2615 // that clobber it.
2616 SavedRegs.set(AArch64::X18);
2617 }
2618
2619 // Determine if a Hazard slot should be used and where it should go.
2620 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2621 // and ZPRs. Otherwise, it goes in the callee save area.
2622 determineStackHazardSlot(MF, SavedRegs);
2623
2624 // Calculates the callee saved stack size.
2625 unsigned CSStackSize = 0;
2626 unsigned ZPRCSStackSize = 0;
2627 unsigned PPRCSStackSize = 0;
2629 for (unsigned Reg : SavedRegs.set_bits()) {
2630 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2631 assert(RC && "expected register class!");
2632 auto SpillSize = TRI->getSpillSize(*RC);
2633 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2634 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2635 if (IsZPR)
2636 ZPRCSStackSize += SpillSize;
2637 else if (IsPPR)
2638 PPRCSStackSize += SpillSize;
2639 else
2640 CSStackSize += SpillSize;
2641 }
2642
2643 // Save number of saved regs, so we can easily update CSStackSize later to
2644 // account for any additional 64-bit GPR saves. Note: After this point
2645 // only 64-bit GPRs can be added to SavedRegs.
2646 unsigned NumSavedRegs = SavedRegs.count();
2647
2648 // If we have hazard padding in the CS area add that to the size.
2650 CSStackSize += getStackHazardSize(MF);
2651
2652 // Increase the callee-saved stack size if the function has streaming mode
2653 // changes, as we will need to spill the value of the VG register.
2654 if (requiresSaveVG(MF))
2655 CSStackSize += 8;
2656
2657 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2658 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2659 SavedRegs.set(AArch64::LR);
2660
2661 // The frame record needs to be created by saving the appropriate registers
2662 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2663 if (hasFP(MF) ||
2664 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2665 SavedRegs.set(AArch64::FP);
2666 SavedRegs.set(AArch64::LR);
2667 }
2668
2669 LLVM_DEBUG({
2670 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2671 for (unsigned Reg : SavedRegs.set_bits())
2672 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2673 dbgs() << "\n";
2674 });
2675
2676 // If any callee-saved registers are used, the frame cannot be eliminated.
2677 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2679 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2680 uint64_t SVEStackSize =
2681 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2682 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2683
2684 // The CSR spill slots have not been allocated yet, so estimateStackSize
2685 // won't include them.
2686 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2687
2688 // We may address some of the stack above the canonical frame address, either
2689 // for our own arguments or during a call. Include that in calculating whether
2690 // we have complicated addressing concerns.
2691 int64_t CalleeStackUsed = 0;
2692 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2693 int64_t FixedOff = MFI.getObjectOffset(I);
2694 if (FixedOff > CalleeStackUsed)
2695 CalleeStackUsed = FixedOff;
2696 }
2697
2698 // Conservatively always assume BigStack when there are SVE spills.
2699 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2700 CalleeStackUsed) > EstimatedStackSizeLimit;
2701 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2702 AFI->setHasStackFrame(true);
2703
2704 // Estimate if we might need to scavenge a register at some point in order
2705 // to materialize a stack offset. If so, either spill one additional
2706 // callee-saved register or reserve a special spill slot to facilitate
2707 // register scavenging. If we already spilled an extra callee-saved register
2708 // above to keep the number of spills even, we don't need to do anything else
2709 // here.
2710 if (BigStack) {
2711 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2712 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2713 << " to get a scratch register.\n");
2714 SavedRegs.set(UnspilledCSGPR);
2715 ExtraCSSpill = UnspilledCSGPR;
2716
2717 // MachO's compact unwind format relies on all registers being stored in
2718 // pairs, so if we need to spill one extra for BigStack, then we need to
2719 // store the pair.
2720 if (producePairRegisters(MF)) {
2721 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2722 // Failed to make a pair for compact unwind format, revert spilling.
2723 if (produceCompactUnwindFrame(*this, MF)) {
2724 SavedRegs.reset(UnspilledCSGPR);
2725 ExtraCSSpill = AArch64::NoRegister;
2726 }
2727 } else
2728 SavedRegs.set(UnspilledCSGPRPaired);
2729 }
2730 }
2731
2732 // If we didn't find an extra callee-saved register to spill, create
2733 // an emergency spill slot.
2734 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2736 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2737 unsigned Size = TRI->getSpillSize(RC);
2738 Align Alignment = TRI->getSpillAlign(RC);
2739 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2740 RS->addScavengingFrameIndex(FI);
2741 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2742 << " as the emergency spill slot.\n");
2743 }
2744 }
2745
2746 // Adding the size of additional 64bit GPR saves.
2747 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2748
2749 // A Swift asynchronous context extends the frame record with a pointer
2750 // directly before FP.
2751 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2752 CSStackSize += 8;
2753
2754 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2755 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2756 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2757
2759 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2760 "Should not invalidate callee saved info");
2761
2762 // Round up to register pair alignment to avoid additional SP adjustment
2763 // instructions.
2764 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2765 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2766 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2767}
2768
2770 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2771 std::vector<CalleeSavedInfo> &CSI) const {
2772 bool IsWindows = isTargetWindows(MF);
2773 unsigned StackHazardSize = getStackHazardSize(MF);
2774 // To match the canonical windows frame layout, reverse the list of
2775 // callee saved registers to get them laid out by PrologEpilogInserter
2776 // in the right order. (PrologEpilogInserter allocates stack objects top
2777 // down. Windows canonical prologs store higher numbered registers at
2778 // the top, thus have the CSI array start from the highest registers.)
2779 if (IsWindows)
2780 std::reverse(CSI.begin(), CSI.end());
2781
2782 if (CSI.empty())
2783 return true; // Early exit if no callee saved registers are modified!
2784
2785 // Now that we know which registers need to be saved and restored, allocate
2786 // stack slots for them.
2787 MachineFrameInfo &MFI = MF.getFrameInfo();
2788 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2789
2790 if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2791 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2792 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2793 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2794 }
2795
2796 // Insert VG into the list of CSRs, immediately before LR if saved.
2797 if (requiresSaveVG(MF)) {
2798 CalleeSavedInfo VGInfo(AArch64::VG);
2799 auto It =
2800 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2801 if (It != CSI.end())
2802 CSI.insert(It, VGInfo);
2803 else
2804 CSI.push_back(VGInfo);
2805 }
2806
2807 Register LastReg = 0;
2808 int HazardSlotIndex = std::numeric_limits<int>::max();
2809 for (auto &CS : CSI) {
2810 MCRegister Reg = CS.getReg();
2811 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2812
2813 // Create a hazard slot as we switch between GPR and FPR CSRs.
2815 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2817 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2818 "Unexpected register order for hazard slot");
2819 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2820 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2821 << "\n");
2822 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2823 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2824 }
2825
2826 unsigned Size = RegInfo->getSpillSize(*RC);
2827 Align Alignment(RegInfo->getSpillAlign(*RC));
2828 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2829 CS.setFrameIdx(FrameIdx);
2830 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2831
2832 // Grab 8 bytes below FP for the extended asynchronous frame info.
2833 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
2834 Reg == AArch64::FP) {
2835 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2836 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2837 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2838 }
2839 LastReg = Reg;
2840 }
2841
2842 // Add hazard slot in the case where no FPR CSRs are present.
2844 HazardSlotIndex == std::numeric_limits<int>::max()) {
2845 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2846 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2847 << "\n");
2848 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2849 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2850 }
2851
2852 return true;
2853}
2854
2856 const MachineFunction &MF) const {
2858 // If the function has streaming-mode changes, don't scavenge a
2859 // spillslot in the callee-save area, as that might require an
2860 // 'addvl' in the streaming-mode-changing call-sequence when the
2861 // function doesn't use a FP.
2862 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2863 return false;
2864 // Don't allow register salvaging with hazard slots, in case it moves objects
2865 // into the wrong place.
2866 if (AFI->hasStackHazardSlotIndex())
2867 return false;
2868 return AFI->hasCalleeSaveStackFreeSpace();
2869}
2870
2871/// returns true if there are any SVE callee saves.
2873 int &Min, int &Max) {
2874 Min = std::numeric_limits<int>::max();
2875 Max = std::numeric_limits<int>::min();
2876
2877 if (!MFI.isCalleeSavedInfoValid())
2878 return false;
2879
2880 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2881 for (auto &CS : CSI) {
2882 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2883 AArch64::PPRRegClass.contains(CS.getReg())) {
2884 assert((Max == std::numeric_limits<int>::min() ||
2885 Max + 1 == CS.getFrameIdx()) &&
2886 "SVE CalleeSaves are not consecutive");
2887 Min = std::min(Min, CS.getFrameIdx());
2888 Max = std::max(Max, CS.getFrameIdx());
2889 }
2890 }
2891 return Min != std::numeric_limits<int>::max();
2892}
2893
2895 AssignObjectOffsets AssignOffsets) {
2896 MachineFrameInfo &MFI = MF.getFrameInfo();
2897 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2898
2899 SVEStackSizes SVEStack{};
2900
2901 // With SplitSVEObjects we maintain separate stack offsets for predicates
2902 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2903 // are included in the SVE vector area.
2904 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2905 uint64_t &PPRStackTop =
2906 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2907
2908#ifndef NDEBUG
2909 // First process all fixed stack objects.
2910 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2911 assert(!MFI.hasScalableStackID(I) &&
2912 "SVE vectors should never be passed on the stack by value, only by "
2913 "reference.");
2914#endif
2915
2916 auto AllocateObject = [&](int FI) {
2918 ? ZPRStackTop
2919 : PPRStackTop;
2920
2921 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2922 // two, we'd need to align every object dynamically at runtime if the
2923 // alignment is larger than 16. This is not yet supported.
2924 Align Alignment = MFI.getObjectAlign(FI);
2925 if (Alignment > Align(16))
2927 "Alignment of scalable vectors > 16 bytes is not yet supported");
2928
2929 StackTop += MFI.getObjectSize(FI);
2930 StackTop = alignTo(StackTop, Alignment);
2931
2932 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2933 "SVE StackTop far too large?!");
2934
2935 int64_t Offset = -int64_t(StackTop);
2936 if (AssignOffsets == AssignObjectOffsets::Yes)
2937 MFI.setObjectOffset(FI, Offset);
2938
2939 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2940 };
2941
2942 // Then process all callee saved slots.
2943 int MinCSFrameIndex, MaxCSFrameIndex;
2944 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2945 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2946 AllocateObject(FI);
2947 }
2948
2949 // Ensure the CS area is 16-byte aligned.
2950 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2951 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2952
2953 // Create a buffer of SVE objects to allocate and sort it.
2954 SmallVector<int, 8> ObjectsToAllocate;
2955 // If we have a stack protector, and we've previously decided that we have SVE
2956 // objects on the stack and thus need it to go in the SVE stack area, then it
2957 // needs to go first.
2958 int StackProtectorFI = -1;
2959 if (MFI.hasStackProtectorIndex()) {
2960 StackProtectorFI = MFI.getStackProtectorIndex();
2961 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2962 ObjectsToAllocate.push_back(StackProtectorFI);
2963 }
2964
2965 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2966 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI) ||
2968 continue;
2969
2972 continue;
2973
2974 ObjectsToAllocate.push_back(FI);
2975 }
2976
2977 // Allocate all SVE locals and spills
2978 for (unsigned FI : ObjectsToAllocate)
2979 AllocateObject(FI);
2980
2981 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2982 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2983
2984 if (AssignOffsets == AssignObjectOffsets::Yes)
2985 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2986
2987 return SVEStack;
2988}
2989
2991 MachineFunction &MF, RegScavenger *RS) const {
2993 "Upwards growing stack unsupported");
2994
2996
2997 // If this function isn't doing Win64-style C++ EH, we don't need to do
2998 // anything.
2999 if (!MF.hasEHFunclets())
3000 return;
3001
3002 MachineFrameInfo &MFI = MF.getFrameInfo();
3003 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3004
3005 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3006 // object area right next to the UnwindHelp object.
3007 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3008 int64_t CurrentOffset =
3010 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3011 for (WinEHHandlerType &H : TBME.HandlerArray) {
3012 int FrameIndex = H.CatchObj.FrameIndex;
3013 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3014 CurrentOffset =
3015 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3016 CurrentOffset += MFI.getObjectSize(FrameIndex);
3017 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3018 }
3019 }
3020 }
3021
3022 // Create an UnwindHelp object.
3023 // The UnwindHelp object is allocated at the start of the fixed object area
3024 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3025 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3026 /*IsFunclet*/ false) &&
3027 "UnwindHelpOffset must be at the start of the fixed object area");
3028 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3029 /*IsImmutable=*/false);
3030 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3031
3032 MachineBasicBlock &MBB = MF.front();
3033 auto MBBI = MBB.begin();
3034 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3035 ++MBBI;
3036
3037 // We need to store -2 into the UnwindHelp object at the start of the
3038 // function.
3039 DebugLoc DL;
3040 RS->enterBasicBlockEnd(MBB);
3041 RS->backward(MBBI);
3042 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3043 assert(DstReg && "There must be a free register after frame setup");
3044 const AArch64InstrInfo &TII =
3045 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3046 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3047 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3048 .addReg(DstReg, getKillRegState(true))
3049 .addFrameIndex(UnwindHelpFI)
3050 .addImm(0);
3051}
3052
3053namespace {
3054struct TagStoreInstr {
3056 int64_t Offset, Size;
3057 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3058 : MI(MI), Offset(Offset), Size(Size) {}
3059};
3060
3061class TagStoreEdit {
3062 MachineFunction *MF;
3063 MachineBasicBlock *MBB;
3064 MachineRegisterInfo *MRI;
3065 // Tag store instructions that are being replaced.
3067 // Combined memref arguments of the above instructions.
3069
3070 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3071 // FrameRegOffset + Size) with the address tag of SP.
3072 Register FrameReg;
3073 StackOffset FrameRegOffset;
3074 int64_t Size;
3075 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3076 // end.
3077 std::optional<int64_t> FrameRegUpdate;
3078 // MIFlags for any FrameReg updating instructions.
3079 unsigned FrameRegUpdateFlags;
3080
3081 // Use zeroing instruction variants.
3082 bool ZeroData;
3083 DebugLoc DL;
3084
3085 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3086 void emitLoop(MachineBasicBlock::iterator InsertI);
3087
3088public:
3089 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3090 : MBB(MBB), ZeroData(ZeroData) {
3091 MF = MBB->getParent();
3092 MRI = &MF->getRegInfo();
3093 }
3094 // Add an instruction to be replaced. Instructions must be added in the
3095 // ascending order of Offset, and have to be adjacent.
3096 void addInstruction(TagStoreInstr I) {
3097 assert((TagStores.empty() ||
3098 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3099 "Non-adjacent tag store instructions.");
3100 TagStores.push_back(I);
3101 }
3102 void clear() { TagStores.clear(); }
3103 // Emit equivalent code at the given location, and erase the current set of
3104 // instructions. May skip if the replacement is not profitable. May invalidate
3105 // the input iterator and replace it with a valid one.
3106 void emitCode(MachineBasicBlock::iterator &InsertI,
3107 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3108};
3109
3110void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3111 const AArch64InstrInfo *TII =
3112 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3113
3114 const int64_t kMinOffset = -256 * 16;
3115 const int64_t kMaxOffset = 255 * 16;
3116
3117 Register BaseReg = FrameReg;
3118 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3119 if (BaseRegOffsetBytes < kMinOffset ||
3120 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3121 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3122 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3123 // is required for the offset of ST2G.
3124 BaseRegOffsetBytes % 16 != 0) {
3125 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3126 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3127 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3128 BaseReg = ScratchReg;
3129 BaseRegOffsetBytes = 0;
3130 }
3131
3132 MachineInstr *LastI = nullptr;
3133 while (Size) {
3134 int64_t InstrSize = (Size > 16) ? 32 : 16;
3135 unsigned Opcode =
3136 InstrSize == 16
3137 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3138 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3139 assert(BaseRegOffsetBytes % 16 == 0);
3140 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3141 .addReg(AArch64::SP)
3142 .addReg(BaseReg)
3143 .addImm(BaseRegOffsetBytes / 16)
3144 .setMemRefs(CombinedMemRefs);
3145 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3146 // final SP adjustment in the epilogue.
3147 if (BaseRegOffsetBytes == 0)
3148 LastI = I;
3149 BaseRegOffsetBytes += InstrSize;
3150 Size -= InstrSize;
3151 }
3152
3153 if (LastI)
3154 MBB->splice(InsertI, MBB, LastI);
3155}
3156
3157void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3158 const AArch64InstrInfo *TII =
3159 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3160
3161 Register BaseReg = FrameRegUpdate
3162 ? FrameReg
3163 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3164 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3165
3166 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3167
3168 int64_t LoopSize = Size;
3169 // If the loop size is not a multiple of 32, split off one 16-byte store at
3170 // the end to fold BaseReg update into.
3171 if (FrameRegUpdate && *FrameRegUpdate)
3172 LoopSize -= LoopSize % 32;
3173 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3174 TII->get(ZeroData ? AArch64::STZGloop_wback
3175 : AArch64::STGloop_wback))
3176 .addDef(SizeReg)
3177 .addDef(BaseReg)
3178 .addImm(LoopSize)
3179 .addReg(BaseReg)
3180 .setMemRefs(CombinedMemRefs);
3181 if (FrameRegUpdate)
3182 LoopI->setFlags(FrameRegUpdateFlags);
3183
3184 int64_t ExtraBaseRegUpdate =
3185 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3186 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3187 << ", Size=" << Size
3188 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3189 << ", FrameRegUpdate=" << FrameRegUpdate
3190 << ", FrameRegOffset.getFixed()="
3191 << FrameRegOffset.getFixed() << "\n");
3192 if (LoopSize < Size) {
3193 assert(FrameRegUpdate);
3194 assert(Size - LoopSize == 16);
3195 // Tag 16 more bytes at BaseReg and update BaseReg.
3196 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3197 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3198 "STG immediate out of range");
3199 BuildMI(*MBB, InsertI, DL,
3200 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3201 .addDef(BaseReg)
3202 .addReg(BaseReg)
3203 .addReg(BaseReg)
3204 .addImm(STGOffset / 16)
3205 .setMemRefs(CombinedMemRefs)
3206 .setMIFlags(FrameRegUpdateFlags);
3207 } else if (ExtraBaseRegUpdate) {
3208 // Update BaseReg.
3209 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3210 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3211 BuildMI(
3212 *MBB, InsertI, DL,
3213 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3214 .addDef(BaseReg)
3215 .addReg(BaseReg)
3216 .addImm(AddSubOffset)
3217 .addImm(0)
3218 .setMIFlags(FrameRegUpdateFlags);
3219 }
3220}
3221
3222// Check if *II is a register update that can be merged into STGloop that ends
3223// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3224// end of the loop.
3225bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3226 int64_t Size, int64_t *TotalOffset) {
3227 MachineInstr &MI = *II;
3228 if ((MI.getOpcode() == AArch64::ADDXri ||
3229 MI.getOpcode() == AArch64::SUBXri) &&
3230 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3231 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3232 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3233 if (MI.getOpcode() == AArch64::SUBXri)
3234 Offset = -Offset;
3235 int64_t PostOffset = Offset - Size;
3236 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3237 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3238 // chosen depends on the alignment of the loop size, but the difference
3239 // between the valid ranges for the two instructions is small, so we
3240 // conservatively assume that it could be either case here.
3241 //
3242 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3243 // instruction.
3244 const int64_t kMaxOffset = 4080 - 16;
3245 // Max offset of SUBXri.
3246 const int64_t kMinOffset = -4095;
3247 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3248 PostOffset % 16 == 0) {
3249 *TotalOffset = Offset;
3250 return true;
3251 }
3252 }
3253 return false;
3254}
3255
3256void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3258 MemRefs.clear();
3259 for (auto &TS : TSE) {
3260 MachineInstr *MI = TS.MI;
3261 // An instruction without memory operands may access anything. Be
3262 // conservative and return an empty list.
3263 if (MI->memoperands_empty()) {
3264 MemRefs.clear();
3265 return;
3266 }
3267 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3268 }
3269}
3270
3271void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3272 const AArch64FrameLowering *TFI,
3273 bool TryMergeSPUpdate) {
3274 if (TagStores.empty())
3275 return;
3276 TagStoreInstr &FirstTagStore = TagStores[0];
3277 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3278 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3279 DL = TagStores[0].MI->getDebugLoc();
3280
3281 Register Reg;
3282 FrameRegOffset = TFI->resolveFrameOffsetReference(
3283 *MF, FirstTagStore.Offset, false /*isFixed*/,
3284 TargetStackID::Default /*StackID*/, Reg,
3285 /*PreferFP=*/false, /*ForSimm=*/true);
3286 FrameReg = Reg;
3287 FrameRegUpdate = std::nullopt;
3288
3289 mergeMemRefs(TagStores, CombinedMemRefs);
3290
3291 LLVM_DEBUG({
3292 dbgs() << "Replacing adjacent STG instructions:\n";
3293 for (const auto &Instr : TagStores) {
3294 dbgs() << " " << *Instr.MI;
3295 }
3296 });
3297
3298 // Size threshold where a loop becomes shorter than a linear sequence of
3299 // tagging instructions.
3300 const int kSetTagLoopThreshold = 176;
3301 if (Size < kSetTagLoopThreshold) {
3302 if (TagStores.size() < 2)
3303 return;
3304 emitUnrolled(InsertI);
3305 } else {
3306 MachineInstr *UpdateInstr = nullptr;
3307 int64_t TotalOffset = 0;
3308 if (TryMergeSPUpdate) {
3309 // See if we can merge base register update into the STGloop.
3310 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3311 // but STGloop is way too unusual for that, and also it only
3312 // realistically happens in function epilogue. Also, STGloop is expanded
3313 // before that pass.
3314 if (InsertI != MBB->end() &&
3315 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3316 &TotalOffset)) {
3317 UpdateInstr = &*InsertI++;
3318 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3319 << *UpdateInstr);
3320 }
3321 }
3322
3323 if (!UpdateInstr && TagStores.size() < 2)
3324 return;
3325
3326 if (UpdateInstr) {
3327 FrameRegUpdate = TotalOffset;
3328 FrameRegUpdateFlags = UpdateInstr->getFlags();
3329 }
3330 emitLoop(InsertI);
3331 if (UpdateInstr)
3332 UpdateInstr->eraseFromParent();
3333 }
3334
3335 for (auto &TS : TagStores)
3336 TS.MI->eraseFromParent();
3337}
3338
3339bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3340 int64_t &Size, bool &ZeroData) {
3341 MachineFunction &MF = *MI.getParent()->getParent();
3342 const MachineFrameInfo &MFI = MF.getFrameInfo();
3343
3344 unsigned Opcode = MI.getOpcode();
3345 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3346 Opcode == AArch64::STZ2Gi);
3347
3348 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3349 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3350 return false;
3351 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3352 return false;
3353 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3354 Size = MI.getOperand(2).getImm();
3355 return true;
3356 }
3357
3358 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3359 Size = 16;
3360 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3361 Size = 32;
3362 else
3363 return false;
3364
3365 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3366 return false;
3367
3368 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3369 16 * MI.getOperand(2).getImm();
3370 return true;
3371}
3372
3373static size_t countAvailableScavengerSlots(LivePhysRegs &LiveRegs,
3375 RegScavenger *RS) {
3376 auto FreeGPRs =
3377 llvm::count_if(AArch64::GPR64RegClass, [&LiveRegs, &MRI](auto Reg) {
3378 return LiveRegs.available(MRI, Reg);
3379 });
3380
3381 size_t NumEmergencySlots = 0;
3382 if (RS)
3383 NumEmergencySlots = RS->getNumScavengingFrameIndices();
3384
3385 return FreeGPRs + NumEmergencySlots;
3386}
3387
3388// Detect a run of memory tagging instructions for adjacent stack frame slots,
3389// and replace them with a shorter instruction sequence:
3390// * replace STG + STG with ST2G
3391// * replace STGloop + STGloop with STGloop
3392// This code needs to run when stack slot offsets are already known, but before
3393// FrameIndex operands in STG instructions are eliminated.
3395 const AArch64FrameLowering *TFI,
3396 RegScavenger *RS) {
3397 bool FirstZeroData;
3398 int64_t Size, Offset;
3399 MachineInstr &MI = *II;
3400 MachineBasicBlock *MBB = MI.getParent();
3402 if (&MI == &MBB->instr_back())
3403 return II;
3404 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3405 return II;
3406
3408 Instrs.emplace_back(&MI, Offset, Size);
3409
3410 constexpr int kScanLimit = 10;
3411 int Count = 0;
3413 NextI != E && Count < kScanLimit; ++NextI) {
3414 MachineInstr &MI = *NextI;
3415 bool ZeroData;
3416 int64_t Size, Offset;
3417 // Collect instructions that update memory tags with a FrameIndex operand
3418 // and (when applicable) constant size, and whose output registers are dead
3419 // (the latter is almost always the case in practice). Since these
3420 // instructions effectively have no inputs or outputs, we are free to skip
3421 // any non-aliasing instructions in between without tracking used registers.
3422 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3423 if (ZeroData != FirstZeroData)
3424 break;
3425 Instrs.emplace_back(&MI, Offset, Size);
3426 continue;
3427 }
3428
3429 // Only count non-transient, non-tagging instructions toward the scan
3430 // limit.
3431 if (!MI.isTransient())
3432 ++Count;
3433
3434 // Just in case, stop before the epilogue code starts.
3435 if (MI.getFlag(MachineInstr::FrameSetup) ||
3437 break;
3438
3439 // Reject anything that may alias the collected instructions.
3440 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3441 break;
3442 }
3443
3444 // New code will be inserted after the last tagging instruction we've found.
3445 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3446
3447 // All the gathered stack tag instructions are merged and placed after
3448 // last tag store in the list. The check should be made if the nzcv
3449 // flag is live at the point where we are trying to insert. Otherwise
3450 // the nzcv flag might get clobbered if any stg loops are present.
3451
3452 // FIXME : This approach of bailing out from merge is conservative in
3453 // some ways like even if stg loops are not present after merge the
3454 // insert list, this liveness check is done (which is not needed).
3456 LiveRegs.addLiveOuts(*MBB);
3457 for (auto I = MBB->rbegin();; ++I) {
3458 MachineInstr &MI = *I;
3459 if (MI == InsertI)
3460 break;
3461 LiveRegs.stepBackward(*I);
3462 }
3463 InsertI++;
3464 if (LiveRegs.contains(AArch64::NZCV))
3465 return InsertI;
3466
3467 // Emitting an MTE loop requires two physical registers (BaseReg and
3468 // SizeReg). If the function is under register pressure, the register
3469 // scavenger will crash trying to allocate them. If we don't have at least
3470 // two free slots (free registers + emergency slots), bail out and fall back
3471 // to the unrolled sequence.
3472 if (countAvailableScavengerSlots(LiveRegs, MBB->getParent()->getRegInfo(),
3473 RS) < 2) {
3474 LLVM_DEBUG(
3475 dbgs() << "Failed to merge MTE stack tagging instructions into loop "
3476 << "due to high register pressure.\n");
3477 return InsertI;
3478 }
3479
3480 llvm::stable_sort(Instrs,
3481 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3482 return Left.Offset < Right.Offset;
3483 });
3484
3485 // Make sure that we don't have any overlapping stores.
3486 int64_t CurOffset = Instrs[0].Offset;
3487 for (auto &Instr : Instrs) {
3488 if (CurOffset > Instr.Offset)
3489 return NextI;
3490 CurOffset = Instr.Offset + Instr.Size;
3491 }
3492
3493 // Find contiguous runs of tagged memory and emit shorter instruction
3494 // sequences for them when possible.
3495 TagStoreEdit TSE(MBB, FirstZeroData);
3496 std::optional<int64_t> EndOffset;
3497 for (auto &Instr : Instrs) {
3498 if (EndOffset && *EndOffset != Instr.Offset) {
3499 // Found a gap.
3500 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3501 TSE.clear();
3502 }
3503
3504 TSE.addInstruction(Instr);
3505 EndOffset = Instr.Offset + Instr.Size;
3506 }
3507
3508 const MachineFunction *MF = MBB->getParent();
3509 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3510 TSE.emitCode(
3511 InsertI, TFI, /*TryMergeSPUpdate = */
3513
3514 return InsertI;
3515}
3516} // namespace
3517
3519 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3520 for (auto &BB : MF)
3521 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3523 II = tryMergeAdjacentSTG(II, this, RS);
3524 }
3525
3526 // By the time this method is called, most of the prologue/epilogue code is
3527 // already emitted, whether its location was affected by the shrink-wrapping
3528 // optimization or not.
3529 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3530 shouldSignReturnAddressEverywhere(MF))
3532}
3533
3534/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3535/// before the update. This is easily retrieved as it is exactly the offset
3536/// that is set in processFunctionBeforeFrameFinalized.
3538 const MachineFunction &MF, int FI, Register &FrameReg,
3539 bool IgnoreSPUpdates) const {
3540 const MachineFrameInfo &MFI = MF.getFrameInfo();
3541 if (IgnoreSPUpdates) {
3542 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3543 << MFI.getObjectOffset(FI) << "\n");
3544 FrameReg = AArch64::SP;
3545 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3546 }
3547
3548 // Go to common code if we cannot provide sp + offset.
3549 if (MFI.hasVarSizedObjects() ||
3552 return getFrameIndexReference(MF, FI, FrameReg);
3553
3554 FrameReg = AArch64::SP;
3555 return getStackOffset(MF, MFI.getObjectOffset(FI));
3556}
3557
3558/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3559/// the parent's frame pointer
3561 const MachineFunction &MF) const {
3562 return 0;
3563}
3564
3565/// Funclets only need to account for space for the callee saved registers,
3566/// as the locals are accounted for in the parent's stack frame.
3568 const MachineFunction &MF) const {
3569 // This is the size of the pushed CSRs.
3570 unsigned CSSize =
3571 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3572 // This is the amount of stack a funclet needs to allocate.
3573 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3574 getStackAlign());
3575}
3576
3577namespace {
3578struct FrameObject {
3579 bool IsValid = false;
3580 // Index of the object in MFI.
3581 int ObjectIndex = 0;
3582 // Group ID this object belongs to.
3583 int GroupIndex = -1;
3584 // This object should be placed first (closest to SP).
3585 bool ObjectFirst = false;
3586 // This object's group (which always contains the object with
3587 // ObjectFirst==true) should be placed first.
3588 bool GroupFirst = false;
3589
3590 // Used to distinguish between FP and GPR accesses. The values are decided so
3591 // that they sort FPR < Hazard < GPR and they can be or'd together.
3592 unsigned Accesses = 0;
3593 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3594};
3595
3596class GroupBuilder {
3597 SmallVector<int, 8> CurrentMembers;
3598 int NextGroupIndex = 0;
3599 std::vector<FrameObject> &Objects;
3600
3601public:
3602 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3603 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3604 void EndCurrentGroup() {
3605 if (CurrentMembers.size() > 1) {
3606 // Create a new group with the current member list. This might remove them
3607 // from their pre-existing groups. That's OK, dealing with overlapping
3608 // groups is too hard and unlikely to make a difference.
3609 LLVM_DEBUG(dbgs() << "group:");
3610 for (int Index : CurrentMembers) {
3611 Objects[Index].GroupIndex = NextGroupIndex;
3612 LLVM_DEBUG(dbgs() << " " << Index);
3613 }
3614 LLVM_DEBUG(dbgs() << "\n");
3615 NextGroupIndex++;
3616 }
3617 CurrentMembers.clear();
3618 }
3619};
3620
3621bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3622 // Objects at a lower index are closer to FP; objects at a higher index are
3623 // closer to SP.
3624 //
3625 // For consistency in our comparison, all invalid objects are placed
3626 // at the end. This also allows us to stop walking when we hit the
3627 // first invalid item after it's all sorted.
3628 //
3629 // If we want to include a stack hazard region, order FPR accesses < the
3630 // hazard object < GPRs accesses in order to create a separation between the
3631 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3632 //
3633 // Otherwise the "first" object goes first (closest to SP), followed by the
3634 // members of the "first" group.
3635 //
3636 // The rest are sorted by the group index to keep the groups together.
3637 // Higher numbered groups are more likely to be around longer (i.e. untagged
3638 // in the function epilogue and not at some earlier point). Place them closer
3639 // to SP.
3640 //
3641 // If all else equal, sort by the object index to keep the objects in the
3642 // original order.
3643 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3644 A.GroupIndex, A.ObjectIndex) <
3645 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3646 B.GroupIndex, B.ObjectIndex);
3647}
3648} // namespace
3649
3651 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3653
3654 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3655 ObjectsToAllocate.empty())
3656 return;
3657
3658 const MachineFrameInfo &MFI = MF.getFrameInfo();
3659 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3660 for (auto &Obj : ObjectsToAllocate) {
3661 FrameObjects[Obj].IsValid = true;
3662 FrameObjects[Obj].ObjectIndex = Obj;
3663 }
3664
3665 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3666 // the same time.
3667 GroupBuilder GB(FrameObjects);
3668 for (auto &MBB : MF) {
3669 for (auto &MI : MBB) {
3670 if (MI.isDebugInstr())
3671 continue;
3672
3673 if (AFI.hasStackHazardSlotIndex()) {
3674 std::optional<int> FI = getLdStFrameID(MI, MFI);
3675 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3676 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3678 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3679 else
3680 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3681 }
3682 }
3683
3684 int OpIndex;
3685 switch (MI.getOpcode()) {
3686 case AArch64::STGloop:
3687 case AArch64::STZGloop:
3688 OpIndex = 3;
3689 break;
3690 case AArch64::STGi:
3691 case AArch64::STZGi:
3692 case AArch64::ST2Gi:
3693 case AArch64::STZ2Gi:
3694 OpIndex = 1;
3695 break;
3696 default:
3697 OpIndex = -1;
3698 }
3699
3700 int TaggedFI = -1;
3701 if (OpIndex >= 0) {
3702 const MachineOperand &MO = MI.getOperand(OpIndex);
3703 if (MO.isFI()) {
3704 int FI = MO.getIndex();
3705 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3706 FrameObjects[FI].IsValid)
3707 TaggedFI = FI;
3708 }
3709 }
3710
3711 // If this is a stack tagging instruction for a slot that is not part of a
3712 // group yet, either start a new group or add it to the current one.
3713 if (TaggedFI >= 0)
3714 GB.AddMember(TaggedFI);
3715 else
3716 GB.EndCurrentGroup();
3717 }
3718 // Groups should never span multiple basic blocks.
3719 GB.EndCurrentGroup();
3720 }
3721
3722 if (AFI.hasStackHazardSlotIndex()) {
3723 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3724 FrameObject::AccessHazard;
3725 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3726 for (auto &Obj : FrameObjects)
3727 if (!Obj.Accesses ||
3728 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3729 Obj.Accesses = FrameObject::AccessGPR;
3730 }
3731
3732 // If the function's tagged base pointer is pinned to a stack slot, we want to
3733 // put that slot first when possible. This will likely place it at SP + 0,
3734 // and save one instruction when generating the base pointer because IRG does
3735 // not allow an immediate offset.
3736 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3737 if (TBPI) {
3738 FrameObjects[*TBPI].ObjectFirst = true;
3739 FrameObjects[*TBPI].GroupFirst = true;
3740 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3741 if (FirstGroupIndex >= 0)
3742 for (FrameObject &Object : FrameObjects)
3743 if (Object.GroupIndex == FirstGroupIndex)
3744 Object.GroupFirst = true;
3745 }
3746
3747 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3748
3749 int i = 0;
3750 for (auto &Obj : FrameObjects) {
3751 // All invalid items are sorted at the end, so it's safe to stop.
3752 if (!Obj.IsValid)
3753 break;
3754 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3755 }
3756
3757 LLVM_DEBUG({
3758 dbgs() << "Final frame order:\n";
3759 for (auto &Obj : FrameObjects) {
3760 if (!Obj.IsValid)
3761 break;
3762 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3763 if (Obj.ObjectFirst)
3764 dbgs() << ", first";
3765 if (Obj.GroupFirst)
3766 dbgs() << ", group-first";
3767 dbgs() << "\n";
3768 }
3769 });
3770}
3771
3772/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3773/// least every ProbeSize bytes. Returns an iterator of the first instruction
3774/// after the loop. The difference between SP and TargetReg must be an exact
3775/// multiple of ProbeSize.
3777AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3778 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3779 Register TargetReg) const {
3780 MachineBasicBlock &MBB = *MBBI->getParent();
3781 MachineFunction &MF = *MBB.getParent();
3782 const AArch64InstrInfo *TII =
3783 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3784 DebugLoc DL = MBB.findDebugLoc(MBBI);
3785
3786 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3787 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3788 MF.insert(MBBInsertPoint, LoopMBB);
3789 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3790 MF.insert(MBBInsertPoint, ExitMBB);
3791
3792 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3793 // in SUB).
3794 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3795 StackOffset::getFixed(-ProbeSize), TII,
3797 // LDR XZR, [SP]
3798 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::LDRXui))
3799 .addDef(AArch64::XZR)
3800 .addReg(AArch64::SP)
3801 .addImm(0)
3805 Align(8)))
3807 // CMP SP, TargetReg
3808 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3809 AArch64::XZR)
3810 .addReg(AArch64::SP)
3811 .addReg(TargetReg)
3814 // B.CC Loop
3815 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3817 .addMBB(LoopMBB)
3819
3820 LoopMBB->addSuccessor(ExitMBB);
3821 LoopMBB->addSuccessor(LoopMBB);
3822 // Synthesize the exit MBB.
3823 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3825 MBB.addSuccessor(LoopMBB);
3826 // Update liveins.
3827 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3828
3829 return ExitMBB->begin();
3830}
3831
3832void AArch64FrameLowering::inlineStackProbeFixed(
3833 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3834 StackOffset CFAOffset) const {
3835 MachineBasicBlock *MBB = MBBI->getParent();
3836 MachineFunction &MF = *MBB->getParent();
3837 const AArch64InstrInfo *TII =
3838 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3839 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3840 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3841 bool HasFP = hasFP(MF);
3842
3843 DebugLoc DL;
3844 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3845 int64_t NumBlocks = FrameSize / ProbeSize;
3846 int64_t ResidualSize = FrameSize % ProbeSize;
3847
3848 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3849 << NumBlocks << " blocks of " << ProbeSize
3850 << " bytes, plus " << ResidualSize << " bytes\n");
3851
3852 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3853 // ordinary loop.
3854 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3855 for (int i = 0; i < NumBlocks; ++i) {
3856 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3857 // encodable in a SUB).
3858 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3859 StackOffset::getFixed(-ProbeSize), TII,
3860 MachineInstr::FrameSetup, false, false, nullptr,
3861 EmitAsyncCFI && !HasFP, CFAOffset);
3862 CFAOffset += StackOffset::getFixed(ProbeSize);
3863 // LDR XZR, [SP]
3864 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3865 .addDef(AArch64::XZR)
3866 .addReg(AArch64::SP)
3867 .addImm(0)
3871 Align(8)))
3873 }
3874 } else if (NumBlocks != 0) {
3875 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3876 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3877 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3878 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3879 MachineInstr::FrameSetup, false, false, nullptr,
3880 EmitAsyncCFI && !HasFP, CFAOffset);
3881 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3882 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3883 MBB = MBBI->getParent();
3884 if (EmitAsyncCFI && !HasFP) {
3885 // Set the CFA register back to SP.
3886 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3887 .buildDefCFARegister(AArch64::SP);
3888 }
3889 }
3890
3891 if (ResidualSize != 0) {
3892 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3893 // in SUB).
3894 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3895 StackOffset::getFixed(-ResidualSize), TII,
3896 MachineInstr::FrameSetup, false, false, nullptr,
3897 EmitAsyncCFI && !HasFP, CFAOffset);
3898 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3899 // LDR XZR, [SP]
3900 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3901 .addDef(AArch64::XZR)
3902 .addReg(AArch64::SP)
3903 .addImm(0)
3907 Align(8)))
3909 }
3910 }
3911}
3912
3913void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3914 MachineBasicBlock &MBB) const {
3915 // Get the instructions that need to be replaced. We emit at most two of
3916 // these. Remember them in order to avoid complications coming from the need
3917 // to traverse the block while potentially creating more blocks.
3918 SmallVector<MachineInstr *, 4> ToReplace;
3919 for (MachineInstr &MI : MBB)
3920 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3921 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3922 ToReplace.push_back(&MI);
3923
3924 for (MachineInstr *MI : ToReplace) {
3925 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3926 Register ScratchReg = MI->getOperand(0).getReg();
3927 int64_t FrameSize = MI->getOperand(1).getImm();
3928 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3929 MI->getOperand(3).getImm());
3930 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3931 CFAOffset);
3932 } else {
3933 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3934 "Stack probe pseudo-instruction expected");
3935 const AArch64InstrInfo *TII =
3936 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3937 Register TargetReg = MI->getOperand(0).getReg();
3938 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3939 }
3940 MI->eraseFromParent();
3941 }
3942}
3943
3946 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3947 GPR = 1 << 0, // A general purpose register.
3948 PPR = 1 << 1, // A predicate register.
3949 FPR = 1 << 2, // A floating point/Neon/SVE register.
3950 };
3951
3952 int Idx;
3954 int64_t Size;
3955 unsigned AccessTypes;
3956
3958
3959 bool operator<(const StackAccess &Rhs) const {
3960 return std::make_tuple(start(), Idx) <
3961 std::make_tuple(Rhs.start(), Rhs.Idx);
3962 }
3963
3964 bool isCPU() const {
3965 // Predicate register load and store instructions execute on the CPU.
3967 }
3968 bool isSME() const { return AccessTypes & AccessType::FPR; }
3969 bool isMixed() const { return isCPU() && isSME(); }
3970
3971 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3972 int64_t end() const { return start() + Size; }
3973
3974 std::string getTypeString() const {
3975 switch (AccessTypes) {
3976 case AccessType::FPR:
3977 return "FPR";
3978 case AccessType::PPR:
3979 return "PPR";
3980 case AccessType::GPR:
3981 return "GPR";
3983 return "NA";
3984 default:
3985 return "Mixed";
3986 }
3987 }
3988
3989 void print(raw_ostream &OS) const {
3990 OS << getTypeString() << " stack object at [SP"
3991 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3992 if (Offset.getScalable())
3993 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3994 << " * vscale";
3995 OS << "]";
3996 }
3997};
3998
3999static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4000 SA.print(OS);
4001 return OS;
4002}
4003
4004void AArch64FrameLowering::emitRemarks(
4005 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4006
4007 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4009 return;
4010
4011 unsigned StackHazardSize = getStackHazardSize(MF);
4012 const uint64_t HazardSize =
4013 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4014
4015 if (HazardSize == 0)
4016 return;
4017
4018 const MachineFrameInfo &MFI = MF.getFrameInfo();
4019 // Bail if function has no stack objects.
4020 if (!MFI.hasStackObjects())
4021 return;
4022
4023 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4024
4025 size_t NumFPLdSt = 0;
4026 size_t NumNonFPLdSt = 0;
4027
4028 // Collect stack accesses via Load/Store instructions.
4029 for (const MachineBasicBlock &MBB : MF) {
4030 for (const MachineInstr &MI : MBB) {
4031 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4032 continue;
4033 for (MachineMemOperand *MMO : MI.memoperands()) {
4034 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4035 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4036 int FrameIdx = *FI;
4037
4038 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4039 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4040 StackAccesses[ArrIdx].Idx = FrameIdx;
4041 StackAccesses[ArrIdx].Offset =
4042 getFrameIndexReferenceFromSP(MF, FrameIdx);
4043 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4044 }
4045
4046 unsigned RegTy = StackAccess::AccessType::GPR;
4047 if (MFI.hasScalableStackID(FrameIdx))
4050 RegTy = StackAccess::FPR;
4051
4052 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4053
4054 if (RegTy == StackAccess::FPR)
4055 ++NumFPLdSt;
4056 else
4057 ++NumNonFPLdSt;
4058 }
4059 }
4060 }
4061 }
4062
4063 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4064 return;
4065
4066 llvm::sort(StackAccesses);
4067 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4069 });
4070
4073
4074 if (StackAccesses.front().isMixed())
4075 MixedObjects.push_back(&StackAccesses.front());
4076
4077 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4078 It != End; ++It) {
4079 const auto &First = *It;
4080 const auto &Second = *(It + 1);
4081
4082 if (Second.isMixed())
4083 MixedObjects.push_back(&Second);
4084
4085 if ((First.isSME() && Second.isCPU()) ||
4086 (First.isCPU() && Second.isSME())) {
4087 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4088 if (Distance < HazardSize)
4089 HazardPairs.emplace_back(&First, &Second);
4090 }
4091 }
4092
4093 auto EmitRemark = [&](llvm::StringRef Str) {
4094 ORE->emit([&]() {
4095 auto R = MachineOptimizationRemarkAnalysis(
4096 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4097 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4098 });
4099 };
4100
4101 for (const auto &P : HazardPairs)
4102 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4103
4104 for (const auto *Obj : MixedObjects)
4105 EmitRemark(
4106 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4107}
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static RegState getPrologueDeath(MachineFunction &MF, unsigned Reg)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, const TargetRegisterInfo *TRI)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:68
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:484
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:119
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) const
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB) const
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
Get the array size.
Definition ArrayRef.h:141
bool empty() const
Check if the array is empty.
Definition ArrayRef.h:136
bool test(unsigned Idx) const
Returns true if bit Idx is set.
Definition BitVector.h:482
BitVector & reset()
Reset all bits in the bitvector.
Definition BitVector.h:409
size_type count() const
Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Set all bits in the bitvector.
Definition BitVector.h:366
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:711
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:272
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:354
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:229
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:724
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:674
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool isCalleeSavedObjectIndex(int ObjectIdx) const
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment, TargetStackID::Value StackID=TargetStackID::Default)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
void setIsCalleeSavedObjectIndex(int ObjectIdx, bool IsCalleeSaved)
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addDef(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a virtual register definition operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
Representation of each machine instruction.
void setFlags(unsigned flags)
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI MachineInstrBundleIterator< MachineInstr > eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOVolatile
The memory access is volatile.
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI void freezeReservedRegs()
freezeReservedRegs - Called by the register allocator to freeze the set of reserved registers before ...
bool isReserved(MCRegister PhysReg) const
isReserved - Returns true when PhysReg is a reserved register.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
Represent a mutable reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:294
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:339
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
const MCAsmInfo & getMCAsmInfo() const
Return target specific asm information.
TargetOptions Options
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isOSBinFormatMachO() const
Tests whether the environment is MachO.
Definition Triple.h:791
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:558
void stable_sort(R &&Range)
Definition STLExtras.h:2115
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
RegState
Flags to represent properties of register accesses.
@ Define
Register definition.
constexpr RegState getKillRegState(bool B)
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
constexpr T alignDown(U Value, V Align, W Skew=0)
Returns the largest unsigned integer less than or equal to Value and is Skew mod Align.
Definition MathExtras.h:546
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1745
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:407
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1635
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
constexpr RegState getDefRegState(bool B)
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto count_if(R &&Range, UnaryPredicate P)
Wrapper function around std::count_if to count the number of times an element satisfying a given pred...
Definition STLExtras.h:2018
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1771
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2191
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1946
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:862
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getUnknownStack(MachineFunction &MF)
Stack memory without other information.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray