LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386/// Returns true if a homogeneous prolog or epilog code can be emitted
387/// for the size optimization. If possible, a frame helper call is injected.
388/// When Exit block is given, this check is for epilog.
389bool AArch64FrameLowering::homogeneousPrologEpilog(
390 MachineFunction &MF, MachineBasicBlock *Exit) const {
391 if (!MF.getFunction().hasMinSize())
392 return false;
394 return false;
395 if (EnableRedZone)
396 return false;
397
398 // TODO: Window is supported yet.
399 if (needsWinCFI(MF))
400 return false;
401
402 // TODO: SVE is not supported yet.
403 if (isLikelyToHaveSVEStack(*this, MF))
404 return false;
405
406 // Bail on stack adjustment needed on return for simplicity.
407 const MachineFrameInfo &MFI = MF.getFrameInfo();
408 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
409 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
410 return false;
411 if (Exit && getArgumentStackToRestore(MF, *Exit))
412 return false;
413
414 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
416 return false;
417
418 // If there are an odd number of GPRs before LR and FP in the CSRs list,
419 // they will not be paired into one RegPairInfo, which is incompatible with
420 // the assumption made by the homogeneous prolog epilog pass.
421 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
422 unsigned NumGPRs = 0;
423 for (unsigned I = 0; CSRegs[I]; ++I) {
424 Register Reg = CSRegs[I];
425 if (Reg == AArch64::LR) {
426 assert(CSRegs[I + 1] == AArch64::FP);
427 if (NumGPRs % 2 != 0)
428 return false;
429 break;
430 }
431 if (AArch64::GPR64RegClass.contains(Reg))
432 ++NumGPRs;
433 }
434
435 return true;
436}
437
438/// Returns true if CSRs should be paired.
439bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
440 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
441}
442
443/// This is the biggest offset to the stack pointer we can encode in aarch64
444/// instructions (without using a separate calculation and a temp register).
445/// Note that the exception here are vector stores/loads which cannot encode any
446/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
447static const unsigned DefaultSafeSPDisplacement = 255;
448
449/// Look at each instruction that references stack frames and return the stack
450/// size limit beyond which some of these instructions will require a scratch
451/// register during their expansion later.
453 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
454 // range. We'll end up allocating an unnecessary spill slot a lot, but
455 // realistically that's not a big deal at this stage of the game.
456 for (MachineBasicBlock &MBB : MF) {
457 for (MachineInstr &MI : MBB) {
458 if (MI.isDebugInstr() || MI.isPseudo() ||
459 MI.getOpcode() == AArch64::ADDXri ||
460 MI.getOpcode() == AArch64::ADDSXri)
461 continue;
462
463 for (const MachineOperand &MO : MI.operands()) {
464 if (!MO.isFI())
465 continue;
466
468 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
470 return 0;
471 }
472 }
473 }
475}
476
481
482unsigned
483AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
484 const AArch64FunctionInfo *AFI,
485 bool IsWin64, bool IsFunclet) const {
486 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
487 "Tail call reserved stack must be aligned to 16 bytes");
488 if (!IsWin64 || IsFunclet) {
489 return AFI->getTailCallReservedStack();
490 } else {
491 if (AFI->getTailCallReservedStack() != 0 &&
492 !MF.getFunction().getAttributes().hasAttrSomewhere(
493 Attribute::SwiftAsync))
494 report_fatal_error("cannot generate ABI-changing tail call for Win64");
495 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
496
497 // Var args are stored here in the primary function.
498 FixedObjectSize += AFI->getVarArgsGPRSize();
499
500 if (MF.hasEHFunclets()) {
501 // Catch objects are stored here in the primary function.
502 const MachineFrameInfo &MFI = MF.getFrameInfo();
503 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
504 SmallSetVector<int, 8> CatchObjFrameIndices;
505 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
506 for (const WinEHHandlerType &H : TBME.HandlerArray) {
507 int FrameIndex = H.CatchObj.FrameIndex;
508 if ((FrameIndex != INT_MAX) &&
509 CatchObjFrameIndices.insert(FrameIndex)) {
510 FixedObjectSize = alignTo(FixedObjectSize,
511 MFI.getObjectAlign(FrameIndex).value()) +
512 MFI.getObjectSize(FrameIndex);
513 }
514 }
515 }
516 // To support EH funclets we allocate an UnwindHelp object
517 FixedObjectSize += 8;
518 }
519 return alignTo(FixedObjectSize, 16);
520 }
521}
522
524 if (!EnableRedZone)
525 return false;
526
527 // Don't use the red zone if the function explicitly asks us not to.
528 // This is typically used for kernel code.
529 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
530 const unsigned RedZoneSize =
532 if (!RedZoneSize)
533 return false;
534
535 const MachineFrameInfo &MFI = MF.getFrameInfo();
537 uint64_t NumBytes = AFI->getLocalStackSize();
538
539 // If neither NEON or SVE are available, a COPY from one Q-reg to
540 // another requires a spill -> reload sequence. We can do that
541 // using a pre-decrementing store/post-decrementing load, but
542 // if we do so, we can't use the Red Zone.
543 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
544 !Subtarget.isNeonAvailable() &&
545 !Subtarget.hasSVE();
546
547 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
548 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
549}
550
551/// hasFPImpl - Return true if the specified function should have a dedicated
552/// frame pointer register.
554 const MachineFrameInfo &MFI = MF.getFrameInfo();
555 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
557
558 // Win64 EH requires a frame pointer if funclets are present, as the locals
559 // are accessed off the frame pointer in both the parent function and the
560 // funclets.
561 if (MF.hasEHFunclets())
562 return true;
563 // Retain behavior of always omitting the FP for leaf functions when possible.
565 return true;
566 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
567 MFI.hasStackMap() || MFI.hasPatchPoint() ||
568 RegInfo->hasStackRealignment(MF))
569 return true;
570
571 // If we:
572 //
573 // 1. Have streaming mode changes
574 // OR:
575 // 2. Have a streaming body with SVE stack objects
576 //
577 // Then the value of VG restored when unwinding to this function may not match
578 // the value of VG used to set up the stack.
579 //
580 // This is a problem as the CFA can be described with an expression of the
581 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
582 //
583 // If the value of VG used in that expression does not match the value used to
584 // set up the stack, an incorrect address for the CFA will be computed, and
585 // unwinding will fail.
586 //
587 // We work around this issue by ensuring the frame-pointer can describe the
588 // CFA in either of these cases.
589 if (AFI.needsDwarfUnwindInfo(MF) &&
592 return true;
593 // With large callframes around we may need to use FP to access the scavenging
594 // emergency spillslot.
595 //
596 // Unfortunately some calls to hasFP() like machine verifier ->
597 // getReservedReg() -> hasFP in the middle of global isel are too early
598 // to know the max call frame size. Hopefully conservatively returning "true"
599 // in those cases is fine.
600 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
601 if (!MFI.isMaxCallFrameSizeComputed() ||
603 return true;
604
605 return false;
606}
607
608/// Should the Frame Pointer be reserved for the current function?
610 const TargetMachine &TM = MF.getTarget();
611 const Triple &TT = TM.getTargetTriple();
612
613 // These OSes require the frame chain is valid, even if the current frame does
614 // not use a frame pointer.
615 if (TT.isOSDarwin() || TT.isOSWindows())
616 return true;
617
618 // If the function has a frame pointer, it is reserved.
619 if (hasFP(MF))
620 return true;
621
622 // Frontend has requested to preserve the frame pointer.
624 return true;
625
626 return false;
627}
628
629/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
630/// not required, we reserve argument space for call sites in the function
631/// immediately on entry to the current function. This eliminates the need for
632/// add/sub sp brackets around call sites. Returns true if the call frame is
633/// included as part of the stack frame.
635 const MachineFunction &MF) const {
636 // The stack probing code for the dynamically allocated outgoing arguments
637 // area assumes that the stack is probed at the top - either by the prologue
638 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
639 // most recent variable-sized object allocation. Changing the condition here
640 // may need to be followed up by changes to the probe issuing logic.
641 return !MF.getFrameInfo().hasVarSizedObjects();
642}
643
647
648 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
649 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
650 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
651 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
652 DebugLoc DL = I->getDebugLoc();
653 unsigned Opc = I->getOpcode();
654 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
655 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
656
657 if (!hasReservedCallFrame(MF)) {
658 int64_t Amount = I->getOperand(0).getImm();
659 Amount = alignTo(Amount, getStackAlign());
660 if (!IsDestroy)
661 Amount = -Amount;
662
663 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
664 // doesn't have to pop anything), then the first operand will be zero too so
665 // this adjustment is a no-op.
666 if (CalleePopAmount == 0) {
667 // FIXME: in-function stack adjustment for calls is limited to 24-bits
668 // because there's no guaranteed temporary register available.
669 //
670 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
671 // 1) For offset <= 12-bit, we use LSL #0
672 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
673 // LSL #0, and the other uses LSL #12.
674 //
675 // Most call frames will be allocated at the start of a function so
676 // this is OK, but it is a limitation that needs dealing with.
677 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
678
679 if (TLI->hasInlineStackProbe(MF) &&
681 // When stack probing is enabled, the decrement of SP may need to be
682 // probed. We only need to do this if the call site needs 1024 bytes of
683 // space or more, because a region smaller than that is allowed to be
684 // unprobed at an ABI boundary. We rely on the fact that SP has been
685 // probed exactly at this point, either by the prologue or most recent
686 // dynamic allocation.
688 "non-reserved call frame without var sized objects?");
689 Register ScratchReg =
690 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
691 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
692 } else {
693 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
694 StackOffset::getFixed(Amount), TII);
695 }
696 }
697 } else if (CalleePopAmount != 0) {
698 // If the calling convention demands that the callee pops arguments from the
699 // stack, we want to add it back if we have a reserved call frame.
700 assert(CalleePopAmount < 0xffffff && "call frame too large");
701 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
702 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
703 }
704 return MBB.erase(I);
705}
706
708 MachineBasicBlock &MBB) const {
709
710 MachineFunction &MF = *MBB.getParent();
711 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
712 const auto &TRI = *Subtarget.getRegisterInfo();
713 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
714
715 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
716
717 // Reset the CFA to `SP + 0`.
718 CFIBuilder.buildDefCFA(AArch64::SP, 0);
719
720 // Flip the RA sign state.
721 if (MFI.shouldSignReturnAddress(MF))
722 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
723 : CFIBuilder.buildNegateRAState();
724
725 // Shadow call stack uses X18, reset it.
726 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
727 CFIBuilder.buildSameValue(AArch64::X18);
728
729 // Emit .cfi_same_value for callee-saved registers.
730 const std::vector<CalleeSavedInfo> &CSI =
732 for (const auto &Info : CSI) {
733 MCRegister Reg = Info.getReg();
734 if (!TRI.regNeedsCFI(Reg, Reg))
735 continue;
736 CFIBuilder.buildSameValue(Reg);
737 }
738}
739
741 switch (Reg.id()) {
742 default:
743 // The called routine is expected to preserve r19-r28
744 // r29 and r30 are used as frame pointer and link register resp.
745 return 0;
746
747 // GPRs
748#define CASE(n) \
749 case AArch64::W##n: \
750 case AArch64::X##n: \
751 return AArch64::X##n
752 CASE(0);
753 CASE(1);
754 CASE(2);
755 CASE(3);
756 CASE(4);
757 CASE(5);
758 CASE(6);
759 CASE(7);
760 CASE(8);
761 CASE(9);
762 CASE(10);
763 CASE(11);
764 CASE(12);
765 CASE(13);
766 CASE(14);
767 CASE(15);
768 CASE(16);
769 CASE(17);
770 CASE(18);
771#undef CASE
772
773 // FPRs
774#define CASE(n) \
775 case AArch64::B##n: \
776 case AArch64::H##n: \
777 case AArch64::S##n: \
778 case AArch64::D##n: \
779 case AArch64::Q##n: \
780 return HasSVE ? AArch64::Z##n : AArch64::Q##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800 CASE(19);
801 CASE(20);
802 CASE(21);
803 CASE(22);
804 CASE(23);
805 CASE(24);
806 CASE(25);
807 CASE(26);
808 CASE(27);
809 CASE(28);
810 CASE(29);
811 CASE(30);
812 CASE(31);
813#undef CASE
814 }
815}
816
817void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
818 MachineBasicBlock &MBB) const {
819 // Insertion point.
821
822 // Fake a debug loc.
823 DebugLoc DL;
824 if (MBBI != MBB.end())
825 DL = MBBI->getDebugLoc();
826
827 const MachineFunction &MF = *MBB.getParent();
828 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
829 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
830
831 BitVector GPRsToZero(TRI.getNumRegs());
832 BitVector FPRsToZero(TRI.getNumRegs());
833 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
834 for (MCRegister Reg : RegsToZero.set_bits()) {
835 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
836 // For GPRs, we only care to clear out the 64-bit register.
837 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
838 GPRsToZero.set(XReg);
839 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
840 // For FPRs,
841 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
842 FPRsToZero.set(XReg);
843 }
844 }
845
846 const AArch64InstrInfo &TII = *STI.getInstrInfo();
847
848 // Zero out GPRs.
849 for (MCRegister Reg : GPRsToZero.set_bits())
850 TII.buildClearRegister(Reg, MBB, MBBI, DL);
851
852 // Zero out FP/vector registers.
853 for (MCRegister Reg : FPRsToZero.set_bits())
854 TII.buildClearRegister(Reg, MBB, MBBI, DL);
855
856 if (HasSVE) {
857 for (MCRegister PReg :
858 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
859 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
860 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
861 AArch64::P15}) {
862 if (RegsToZero[PReg])
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
864 }
865 }
866}
867
868bool AArch64FrameLowering::windowsRequiresStackProbe(
869 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
870 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
871 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
875 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
876}
877
879 const MachineBasicBlock &MBB) {
880 const MachineFunction *MF = MBB.getParent();
881 LiveRegs.addLiveIns(MBB);
882 // Mark callee saved registers as used so we will not choose them.
883 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
884 for (unsigned i = 0; CSRegs[i]; ++i)
885 LiveRegs.addReg(CSRegs[i]);
886}
887
889AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
890 bool HasCall) const {
891 MachineFunction *MF = MBB->getParent();
892
893 // If MBB is an entry block, use X9 as the scratch register
894 // preserve_none functions may be using X9 to pass arguments,
895 // so prefer to pick an available register below.
896 if (&MF->front() == MBB &&
898 return AArch64::X9;
899
900 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 LivePhysRegs LiveRegs(TRI);
903 getLiveRegsForEntryMBB(LiveRegs, *MBB);
904 if (HasCall) {
905 LiveRegs.addReg(AArch64::X16);
906 LiveRegs.addReg(AArch64::X17);
907 LiveRegs.addReg(AArch64::X18);
908 }
909
910 // Prefer X9 since it was historically used for the prologue scratch reg.
911 const MachineRegisterInfo &MRI = MF->getRegInfo();
912 if (LiveRegs.available(MRI, AArch64::X9))
913 return AArch64::X9;
914
915 for (unsigned Reg : AArch64::GPR64RegClass) {
916 if (LiveRegs.available(MRI, Reg))
917 return Reg;
918 }
919 return AArch64::NoRegister;
920}
921
923 const MachineBasicBlock &MBB) const {
924 const MachineFunction *MF = MBB.getParent();
925 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
926 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
927 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
928 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
930
931 if (AFI->hasSwiftAsyncContext()) {
932 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
933 const MachineRegisterInfo &MRI = MF->getRegInfo();
936 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
937 // available.
938 if (!LiveRegs.available(MRI, AArch64::X16) ||
939 !LiveRegs.available(MRI, AArch64::X17))
940 return false;
941 }
942
943 // Certain stack probing sequences might clobber flags, then we can't use
944 // the block as a prologue if the flags register is a live-in.
946 MBB.isLiveIn(AArch64::NZCV))
947 return false;
948
949 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
950 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
951 return false;
952
953 // May need a scratch register (for return value) if require making a special
954 // call
955 if (requiresSaveVG(*MF) ||
956 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
957 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
958 return false;
959
960 return true;
961}
962
964 const Function &F = MF.getFunction();
965 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
966 F.needsUnwindTableEntry();
967}
968
969bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
970 const MachineFunction &MF) const {
971 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
972 // and SEH_EpilogEnd instructions in the correct order.
974 return false;
976 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
977 return SignReturnAddressAll;
978}
979
980// Given a load or a store instruction, generate an appropriate unwinding SEH
981// code on Windows.
983AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
984 const TargetInstrInfo &TII,
985 MachineInstr::MIFlag Flag) const {
986 unsigned Opc = MBBI->getOpcode();
987 MachineBasicBlock *MBB = MBBI->getParent();
988 MachineFunction &MF = *MBB->getParent();
989 DebugLoc DL = MBBI->getDebugLoc();
990 unsigned ImmIdx = MBBI->getNumOperands() - 1;
991 int Imm = MBBI->getOperand(ImmIdx).getImm();
993 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
994 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
995
996 switch (Opc) {
997 default:
998 report_fatal_error("No SEH Opcode for this instruction");
999 case AArch64::STR_ZXI:
1000 case AArch64::LDR_ZXI: {
1001 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1002 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1003 .addImm(Reg0)
1004 .addImm(Imm)
1005 .setMIFlag(Flag);
1006 break;
1007 }
1008 case AArch64::STR_PXI:
1009 case AArch64::LDR_PXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDPDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STPDpre: {
1021 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1023 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1024 .addImm(Reg0)
1025 .addImm(Reg1)
1026 .addImm(Imm * 8)
1027 .setMIFlag(Flag);
1028 break;
1029 }
1030 case AArch64::LDPXpost:
1031 Imm = -Imm;
1032 [[fallthrough]];
1033 case AArch64::STPXpre: {
1034 Register Reg0 = MBBI->getOperand(1).getReg();
1035 Register Reg1 = MBBI->getOperand(2).getReg();
1036 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1037 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1038 .addImm(Imm * 8)
1039 .setMIFlag(Flag);
1040 else
1041 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1042 .addImm(RegInfo->getSEHRegNum(Reg0))
1043 .addImm(RegInfo->getSEHRegNum(Reg1))
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 break;
1047 }
1048 case AArch64::LDRDpost:
1049 Imm = -Imm;
1050 [[fallthrough]];
1051 case AArch64::STRDpre: {
1052 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1053 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1054 .addImm(Reg)
1055 .addImm(Imm)
1056 .setMIFlag(Flag);
1057 break;
1058 }
1059 case AArch64::LDRXpost:
1060 Imm = -Imm;
1061 [[fallthrough]];
1062 case AArch64::STRXpre: {
1063 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1064 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1065 .addImm(Reg)
1066 .addImm(Imm)
1067 .setMIFlag(Flag);
1068 break;
1069 }
1070 case AArch64::STPDi:
1071 case AArch64::LDPDi: {
1072 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1073 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1074 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1075 .addImm(Reg0)
1076 .addImm(Reg1)
1077 .addImm(Imm * 8)
1078 .setMIFlag(Flag);
1079 break;
1080 }
1081 case AArch64::STPXi:
1082 case AArch64::LDPXi: {
1083 Register Reg0 = MBBI->getOperand(0).getReg();
1084 Register Reg1 = MBBI->getOperand(1).getReg();
1085
1086 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1087 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1088
1089 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1090 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1091 .addImm(Imm * 8)
1092 .setMIFlag(Flag);
1093 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1094 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1095 .addImm(SEHReg0)
1096 .addImm(SEHReg1)
1097 .addImm(Imm * 8)
1098 .setMIFlag(Flag);
1099 else
1100 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1101 .addImm(SEHReg0)
1102 .addImm(SEHReg1)
1103 .addImm(Imm * 8)
1104 .setMIFlag(Flag);
1105 break;
1106 }
1107 case AArch64::STRXui:
1108 case AArch64::LDRXui: {
1109 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1110 if (Reg >= 19)
1111 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1112 .addImm(Reg)
1113 .addImm(Imm * 8)
1114 .setMIFlag(Flag);
1115 else
1116 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1117 .addImm(Reg)
1118 .addImm(Imm * 8)
1119 .setMIFlag(Flag);
1120 break;
1121 }
1122 case AArch64::STRDui:
1123 case AArch64::LDRDui: {
1124 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1125 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1126 .addImm(Reg)
1127 .addImm(Imm * 8)
1128 .setMIFlag(Flag);
1129 break;
1130 }
1131 case AArch64::STPQi:
1132 case AArch64::LDPQi: {
1133 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1134 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1135 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1136 .addImm(Reg0)
1137 .addImm(Reg1)
1138 .addImm(Imm * 16)
1139 .setMIFlag(Flag);
1140 break;
1141 }
1142 case AArch64::LDPQpost:
1143 Imm = -Imm;
1144 [[fallthrough]];
1145 case AArch64::STPQpre: {
1146 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1147 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1148 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1149 .addImm(Reg0)
1150 .addImm(Reg1)
1151 .addImm(Imm * 16)
1152 .setMIFlag(Flag);
1153 break;
1154 }
1155 }
1156 auto I = MBB->insertAfter(MBBI, MIB);
1157 return I;
1158}
1159
1162 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1163 return false;
1164 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1165 // is enabled with streaming mode changes.
1166 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1167 if (ST.isTargetDarwin())
1168 return ST.hasSVE();
1169 return true;
1170}
1171
1172static bool isTargetWindows(const MachineFunction &MF) {
1174}
1175
1177 MachineFunction &MF) const {
1178 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1179 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1180
1181 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1182 DebugLoc DL; // Set debug location to unknown.
1184
1185 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1187 };
1188
1189 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1190 DebugLoc DL;
1191 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1192 if (MBBI != MBB.end())
1193 DL = MBBI->getDebugLoc();
1194
1195 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1197 };
1198
1199 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1200 EmitSignRA(MF.front());
1201 for (MachineBasicBlock &MBB : MF) {
1202 if (MBB.isEHFuncletEntry())
1203 EmitSignRA(MBB);
1204 if (MBB.isReturnBlock())
1205 EmitAuthRA(MBB);
1206 }
1207}
1208
1210 MachineBasicBlock &MBB) const {
1211 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1212 PrologueEmitter.emitPrologue();
1213}
1214
1216 MachineBasicBlock &MBB) const {
1217 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1218 EpilogueEmitter.emitEpilogue();
1219}
1220
1223 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1224}
1225
1227 return enableCFIFixup(MF) &&
1228 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1229}
1230
1231/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1232/// debug info. It's the same as what we use for resolving the code-gen
1233/// references for now. FIXME: This can go wrong when references are
1234/// SP-relative and simple call frames aren't used.
1237 Register &FrameReg) const {
1239 MF, FI, FrameReg,
1240 /*PreferFP=*/
1241 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1242 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1243 /*ForSimm=*/false);
1244}
1245
1248 int FI) const {
1249 // This function serves to provide a comparable offset from a single reference
1250 // point (the value of SP at function entry) that can be used for analysis,
1251 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1252 // correct for all objects in the presence of VLA-area objects or dynamic
1253 // stack re-alignment.
1254
1255 const auto &MFI = MF.getFrameInfo();
1256
1257 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1258 StackOffset ZPRStackSize = getZPRStackSize(MF);
1259 StackOffset PPRStackSize = getPPRStackSize(MF);
1260 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1261
1262 // For VLA-area objects, just emit an offset at the end of the stack frame.
1263 // Whilst not quite correct, these objects do live at the end of the frame and
1264 // so it is more useful for analysis for the offset to reflect this.
1265 if (MFI.isVariableSizedObjectIndex(FI)) {
1266 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1267 }
1268
1269 // This is correct in the absence of any SVE stack objects.
1270 if (!SVEStackSize)
1271 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1272
1273 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1274 bool FPAfterSVECalleeSaves =
1276 if (MFI.hasScalableStackID(FI)) {
1277 if (FPAfterSVECalleeSaves &&
1278 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1279 assert(!AFI->hasSplitSVEObjects() &&
1280 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1281 return StackOffset::getScalable(ObjectOffset);
1282 }
1283 StackOffset AccessOffset{};
1284 // The scalable vectors are below (lower address) the scalable predicates
1285 // with split SVE objects, so we must subtract the size of the predicates.
1286 if (AFI->hasSplitSVEObjects() &&
1287 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1288 AccessOffset = -PPRStackSize;
1289 return AccessOffset +
1290 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1291 ObjectOffset);
1292 }
1293
1294 bool IsFixed = MFI.isFixedObjectIndex(FI);
1295 bool IsCSR =
1296 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1297
1298 StackOffset ScalableOffset = {};
1299 if (!IsFixed && !IsCSR) {
1300 ScalableOffset = -SVEStackSize;
1301 } else if (FPAfterSVECalleeSaves && IsCSR) {
1302 ScalableOffset =
1304 }
1305
1306 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1307}
1308
1314
1315StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1316 int64_t ObjectOffset) const {
1317 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1318 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1319 const Function &F = MF.getFunction();
1320 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1321 unsigned FixedObject =
1322 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1323 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1324 int64_t FPAdjust =
1325 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1326 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1327}
1328
1329StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1330 int64_t ObjectOffset) const {
1331 const auto &MFI = MF.getFrameInfo();
1332 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1333}
1334
1335// TODO: This function currently does not work for scalable vectors.
1337 int FI) const {
1338 const AArch64RegisterInfo *RegInfo =
1339 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1340 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1341 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1342 ? getFPOffset(MF, ObjectOffset).getFixed()
1343 : getStackOffset(MF, ObjectOffset).getFixed();
1344}
1345
1347 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1348 bool ForSimm) const {
1349 const auto &MFI = MF.getFrameInfo();
1350 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1351 bool isFixed = MFI.isFixedObjectIndex(FI);
1352 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1353 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1354 FrameReg, PreferFP, ForSimm);
1355}
1356
1358 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1359 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1360 bool ForSimm) const {
1361 const auto &MFI = MF.getFrameInfo();
1362 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1363 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1364 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1365
1366 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1367 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1368 bool isCSR =
1369 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1370 bool isSVE = MFI.isScalableStackID(StackID);
1371
1372 StackOffset ZPRStackSize = getZPRStackSize(MF);
1373 StackOffset PPRStackSize = getPPRStackSize(MF);
1374 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1375
1376 // Use frame pointer to reference fixed objects. Use it for locals if
1377 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1378 // reliable as a base). Make sure useFPForScavengingIndex() does the
1379 // right thing for the emergency spill slot.
1380 bool UseFP = false;
1381 if (AFI->hasStackFrame() && !isSVE) {
1382 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1383 // there are scalable (SVE) objects in between the FP and the fixed-sized
1384 // objects.
1385 PreferFP &= !SVEStackSize;
1386
1387 // Note: Keeping the following as multiple 'if' statements rather than
1388 // merging to a single expression for readability.
1389 //
1390 // Argument access should always use the FP.
1391 if (isFixed) {
1392 UseFP = hasFP(MF);
1393 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1394 // References to the CSR area must use FP if we're re-aligning the stack
1395 // since the dynamically-sized alignment padding is between the SP/BP and
1396 // the CSR area.
1397 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1398 UseFP = true;
1399 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1400 // If the FPOffset is negative and we're producing a signed immediate, we
1401 // have to keep in mind that the available offset range for negative
1402 // offsets is smaller than for positive ones. If an offset is available
1403 // via the FP and the SP, use whichever is closest.
1404 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1405 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1406
1407 if (FPOffset >= 0) {
1408 // If the FPOffset is positive, that'll always be best, as the SP/BP
1409 // will be even further away.
1410 UseFP = true;
1411 } else if (MFI.hasVarSizedObjects()) {
1412 // If we have variable sized objects, we can use either FP or BP, as the
1413 // SP offset is unknown. We can use the base pointer if we have one and
1414 // FP is not preferred. If not, we're stuck with using FP.
1415 bool CanUseBP = RegInfo->hasBasePointer(MF);
1416 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1417 UseFP = PreferFP;
1418 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1419 UseFP = true;
1420 // else we can use BP and FP, but the offset from FP won't fit.
1421 // That will make us scavenge registers which we can probably avoid by
1422 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1423 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1424 // Funclets access the locals contained in the parent's stack frame
1425 // via the frame pointer, so we have to use the FP in the parent
1426 // function.
1427 (void) Subtarget;
1428 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1429 MF.getFunction().isVarArg()) &&
1430 "Funclets should only be present on Win64");
1431 UseFP = true;
1432 } else {
1433 // We have the choice between FP and (SP or BP).
1434 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1435 UseFP = true;
1436 }
1437 }
1438 }
1439
1440 assert(
1441 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1442 "In the presence of dynamic stack pointer realignment, "
1443 "non-argument/CSR objects cannot be accessed through the frame pointer");
1444
1445 bool FPAfterSVECalleeSaves =
1447
1448 if (isSVE) {
1449 StackOffset FPOffset = StackOffset::get(
1450 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1451 StackOffset SPOffset =
1452 SVEStackSize +
1453 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1454 ObjectOffset);
1455
1456 // With split SVE objects the ObjectOffset is relative to the split area
1457 // (i.e. the PPR area or ZPR area respectively).
1458 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1459 // If we're accessing an SVE vector with split SVE objects...
1460 // - From the FP we need to move down past the PPR area:
1461 FPOffset -= PPRStackSize;
1462 // - From the SP we only need to move up to the ZPR area:
1463 SPOffset -= PPRStackSize;
1464 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1465 // `SPOffset = ZPRStackSize + ...`.
1466 }
1467
1468 if (FPAfterSVECalleeSaves) {
1470 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1473 }
1474 }
1475
1476 // Always use the FP for SVE spills if available and beneficial.
1477 if (hasFP(MF) && (SPOffset.getFixed() ||
1478 FPOffset.getScalable() < SPOffset.getScalable() ||
1479 RegInfo->hasStackRealignment(MF))) {
1480 FrameReg = RegInfo->getFrameRegister(MF);
1481 return FPOffset;
1482 }
1483 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1484 : MCRegister(AArch64::SP);
1485
1486 return SPOffset;
1487 }
1488
1489 StackOffset SVEAreaOffset = {};
1490 if (FPAfterSVECalleeSaves) {
1491 // In this stack layout, the FP is in between the callee saves and other
1492 // SVE allocations.
1493 StackOffset SVECalleeSavedStack =
1495 if (UseFP) {
1496 if (isFixed)
1497 SVEAreaOffset = SVECalleeSavedStack;
1498 else if (!isCSR)
1499 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1500 } else {
1501 if (isFixed)
1502 SVEAreaOffset = SVEStackSize;
1503 else if (isCSR)
1504 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1505 }
1506 } else {
1507 if (UseFP && !(isFixed || isCSR))
1508 SVEAreaOffset = -SVEStackSize;
1509 if (!UseFP && (isFixed || isCSR))
1510 SVEAreaOffset = SVEStackSize;
1511 }
1512
1513 if (UseFP) {
1514 FrameReg = RegInfo->getFrameRegister(MF);
1515 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1516 }
1517
1518 // Use the base pointer if we have one.
1519 if (RegInfo->hasBasePointer(MF))
1520 FrameReg = RegInfo->getBaseRegister();
1521 else {
1522 assert(!MFI.hasVarSizedObjects() &&
1523 "Can't use SP when we have var sized objects.");
1524 FrameReg = AArch64::SP;
1525 // If we're using the red zone for this function, the SP won't actually
1526 // be adjusted, so the offsets will be negative. They're also all
1527 // within range of the signed 9-bit immediate instructions.
1528 if (canUseRedZone(MF))
1529 Offset -= AFI->getLocalStackSize();
1530 }
1531
1532 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1533}
1534
1535static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1536 // Do not set a kill flag on values that are also marked as live-in. This
1537 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1538 // callee saved registers.
1539 // Omitting the kill flags is conservatively correct even if the live-in
1540 // is not used after all.
1541 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1542 return getKillRegState(!IsLiveIn);
1543}
1544
1546 MachineFunction &MF) {
1547 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1548 AttributeList Attrs = MF.getFunction().getAttributes();
1550 return Subtarget.isTargetMachO() &&
1551 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1552 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1554 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1555}
1556
1557static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1558 unsigned SpillCount, unsigned Reg1,
1559 unsigned Reg2, bool NeedsWinCFI,
1560 bool IsFirst,
1561 const TargetRegisterInfo *TRI) {
1562 // If we are generating register pairs for a Windows function that requires
1563 // EH support, then pair consecutive registers only. There are no unwind
1564 // opcodes for saves/restores of non-consecutive register pairs.
1565 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1566 // save_lrpair.
1567 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1568
1569 if (Reg2 == AArch64::FP)
1570 return true;
1571 if (!NeedsWinCFI)
1572 return false;
1573
1574 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1575 // This is handled by only allowing paired spills for registers spilled at
1576 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1577 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1578 // 16-byte alignment in the uop representation.
1579 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1580 return SpillExtendedVolatile
1581 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1582 (SpillCount % 2) == 0)
1583 : false;
1584
1585 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1586 // opcode. If this is the first register pair, it would end up with a
1587 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1588 // if LR is paired with something else than the first register.
1589 // The save_lrpair opcode requires the first register to be an odd one.
1590 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1591 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1592 return false;
1593 return true;
1594}
1595
1596/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1597/// WindowsCFI requires that only consecutive registers can be paired.
1598/// LR and FP need to be allocated together when the frame needs to save
1599/// the frame-record. This means any other register pairing with LR is invalid.
1600static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1601 unsigned SpillCount, unsigned Reg1,
1602 unsigned Reg2, bool UsesWinAAPCS,
1603 bool NeedsWinCFI, bool NeedsFrameRecord,
1604 bool IsFirst,
1605 const TargetRegisterInfo *TRI) {
1606 if (UsesWinAAPCS)
1607 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1608 Reg1, Reg2, NeedsWinCFI, IsFirst,
1609 TRI);
1610
1611 // If we need to store the frame record, don't pair any register
1612 // with LR other than FP.
1613 if (NeedsFrameRecord)
1614 return Reg2 == AArch64::LR;
1615
1616 return false;
1617}
1618
1619namespace {
1620
1621struct RegPairInfo {
1622 Register Reg1;
1623 Register Reg2;
1624 int FrameIdx;
1625 int Offset;
1626 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1627 const TargetRegisterClass *RC;
1628
1629 RegPairInfo() = default;
1630
1631 bool isPaired() const { return Reg2.isValid(); }
1632
1633 bool isScalable() const { return Type == PPR || Type == ZPR; }
1634};
1635
1636} // end anonymous namespace
1637
1639 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1640 if (SavedRegs.test(PReg)) {
1641 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1642 return MCRegister(PNReg);
1643 }
1644 }
1645 return MCRegister();
1646}
1647
1648// The multivector LD/ST are available only for SME or SVE2p1 targets
1650 MachineFunction &MF) {
1652 return false;
1653
1654 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1655 bool IsLocallyStreaming =
1656 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1657
1658 // Only when in streaming mode SME2 instructions can be safely used.
1659 // It is not safe to use SME2 instructions when in streaming compatible or
1660 // locally streaming mode.
1661 return Subtarget.hasSVE2p1() ||
1662 (Subtarget.hasSME2() &&
1663 (!IsLocallyStreaming && Subtarget.isStreaming()));
1664}
1665
1667 MachineFunction &MF,
1669 const TargetRegisterInfo *TRI,
1671 bool NeedsFrameRecord) {
1672
1673 if (CSI.empty())
1674 return;
1675
1676 bool IsWindows = isTargetWindows(MF);
1677 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1679 unsigned StackHazardSize = getStackHazardSize(MF);
1680 MachineFrameInfo &MFI = MF.getFrameInfo();
1682 unsigned Count = CSI.size();
1683 (void)CC;
1684 // MachO's compact unwind format relies on all registers being stored in
1685 // pairs.
1686 assert((!produceCompactUnwindFrame(AFL, MF) ||
1689 (Count & 1) == 0) &&
1690 "Odd number of callee-saved regs to spill!");
1691 int ByteOffset = AFI->getCalleeSavedStackSize();
1692 int StackFillDir = -1;
1693 int RegInc = 1;
1694 unsigned FirstReg = 0;
1695 if (NeedsWinCFI) {
1696 // For WinCFI, fill the stack from the bottom up.
1697 ByteOffset = 0;
1698 StackFillDir = 1;
1699 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1700 // backwards, to pair up registers starting from lower numbered registers.
1701 RegInc = -1;
1702 FirstReg = Count - 1;
1703 }
1704
1705 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1706 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1707 // scratch, x18 as platform reserved. However, clang has extended calling
1708 // convensions such as preserve_most and preserve_all which treat these as
1709 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1710 // uOPs which have separate restrictions. We need to check for that.
1711 //
1712 // NOTE: we currently do not account for the D registers as LLVM does not
1713 // support non-ABI compliant D register spills.
1714 bool SpillExtendedVolatile =
1715 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1716 const auto &Reg = CSI.getReg();
1717 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1718 });
1719
1720 int ZPRByteOffset = 0;
1721 int PPRByteOffset = 0;
1722 bool SplitPPRs = AFI->hasSplitSVEObjects();
1723 if (SplitPPRs) {
1724 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1725 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1726 } else if (!FPAfterSVECalleeSaves) {
1727 ZPRByteOffset =
1729 // Unused: Everything goes in ZPR space.
1730 PPRByteOffset = 0;
1731 }
1732
1733 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1734 Register LastReg = 0;
1735 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1736
1737 // When iterating backwards, the loop condition relies on unsigned wraparound.
1738 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1739 RegPairInfo RPI;
1740 RPI.Reg1 = CSI[i].getReg();
1741
1742 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1743 RPI.Type = RegPairInfo::GPR;
1744 RPI.RC = &AArch64::GPR64RegClass;
1745 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1746 RPI.Type = RegPairInfo::FPR64;
1747 RPI.RC = &AArch64::FPR64RegClass;
1748 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1749 RPI.Type = RegPairInfo::FPR128;
1750 RPI.RC = &AArch64::FPR128RegClass;
1751 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1752 RPI.Type = RegPairInfo::ZPR;
1753 RPI.RC = &AArch64::ZPRRegClass;
1754 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1755 RPI.Type = RegPairInfo::PPR;
1756 RPI.RC = &AArch64::PPRRegClass;
1757 } else if (RPI.Reg1 == AArch64::VG) {
1758 RPI.Type = RegPairInfo::VG;
1759 RPI.RC = &AArch64::FIXED_REGSRegClass;
1760 } else {
1761 llvm_unreachable("Unsupported register class.");
1762 }
1763
1764 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1765 ? PPRByteOffset
1766 : ZPRByteOffset;
1767
1768 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1769 if (HasCSHazardPadding &&
1770 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1772 ByteOffset += StackFillDir * StackHazardSize;
1773 LastReg = RPI.Reg1;
1774
1775 int Scale = TRI->getSpillSize(*RPI.RC);
1776 // Add the next reg to the pair if it is in the same register class.
1777 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1778 MCRegister NextReg = CSI[i + RegInc].getReg();
1779 bool IsFirst = i == FirstReg;
1780 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1781 switch (RPI.Type) {
1782 case RegPairInfo::GPR:
1783 if (AArch64::GPR64RegClass.contains(NextReg) &&
1785 SpillExtendedVolatile, SpillCount, RPI.Reg1, NextReg, IsWindows,
1786 NeedsWinCFI, NeedsFrameRecord, IsFirst, TRI))
1787 RPI.Reg2 = NextReg;
1788 break;
1789 case RegPairInfo::FPR64:
1790 if (AArch64::FPR64RegClass.contains(NextReg) &&
1791 !invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1792 RPI.Reg1, NextReg, NeedsWinCFI,
1793 IsFirst, TRI))
1794 RPI.Reg2 = NextReg;
1795 break;
1796 case RegPairInfo::FPR128:
1797 if (AArch64::FPR128RegClass.contains(NextReg))
1798 RPI.Reg2 = NextReg;
1799 break;
1800 case RegPairInfo::PPR:
1801 break;
1802 case RegPairInfo::ZPR:
1803 if (AFI->getPredicateRegForFillSpill() != 0 &&
1804 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1805 // Calculate offset of register pair to see if pair instruction can be
1806 // used.
1807 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1808 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1809 RPI.Reg2 = NextReg;
1810 }
1811 break;
1812 case RegPairInfo::VG:
1813 break;
1814 }
1815 }
1816
1817 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1818 // list to come in sorted by frame index so that we can issue the store
1819 // pair instructions directly. Assert if we see anything otherwise.
1820 //
1821 // The order of the registers in the list is controlled by
1822 // getCalleeSavedRegs(), so they will always be in-order, as well.
1823 assert((!RPI.isPaired() ||
1824 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1825 "Out of order callee saved regs!");
1826
1827 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1828 RPI.Reg1 == AArch64::LR) &&
1829 "FrameRecord must be allocated together with LR");
1830
1831 // Windows AAPCS has FP and LR reversed.
1832 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1833 RPI.Reg2 == AArch64::LR) &&
1834 "FrameRecord must be allocated together with LR");
1835
1836 // MachO's compact unwind format relies on all registers being stored in
1837 // adjacent register pairs.
1838 assert((!produceCompactUnwindFrame(AFL, MF) ||
1841 (RPI.isPaired() &&
1842 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1843 RPI.Reg1 + 1 == RPI.Reg2))) &&
1844 "Callee-save registers not saved as adjacent register pair!");
1845
1846 RPI.FrameIdx = CSI[i].getFrameIdx();
1847 if (NeedsWinCFI &&
1848 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1849 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1850
1851 // Realign the scalable offset if necessary. This is relevant when
1852 // spilling predicates on Windows.
1853 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1854 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1855 }
1856
1857 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1858 assert(OffsetPre % Scale == 0);
1859
1860 if (RPI.isScalable())
1861 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1862 else
1863 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1864
1865 // Swift's async context is directly before FP, so allocate an extra
1866 // 8 bytes for it.
1867 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1868 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1869 (IsWindows && RPI.Reg2 == AArch64::LR)))
1870 ByteOffset += StackFillDir * 8;
1871
1872 // Round up size of non-pair to pair size if we need to pad the
1873 // callee-save area to ensure 16-byte alignment.
1874 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1875 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1876 ByteOffset % 16 != 0) {
1877 ByteOffset += 8 * StackFillDir;
1878 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1879 // A stack frame with a gap looks like this, bottom up:
1880 // d9, d8. x21, gap, x20, x19.
1881 // Set extra alignment on the x21 object to create the gap above it.
1882 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1883 NeedGapToAlignStack = false;
1884 }
1885
1886 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1887 assert(OffsetPost % Scale == 0);
1888 // If filling top down (default), we want the offset after incrementing it.
1889 // If filling bottom up (WinCFI) we need the original offset.
1890 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1891
1892 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1893 // Swift context can directly precede FP.
1894 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1895 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1896 (IsWindows && RPI.Reg2 == AArch64::LR)))
1897 Offset += 8;
1898 RPI.Offset = Offset / Scale;
1899
1900 assert((!RPI.isPaired() ||
1901 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1902 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1903 "Offset out of bounds for LDP/STP immediate");
1904
1905 auto isFrameRecord = [&] {
1906 if (RPI.isPaired())
1907 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1908 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1909 // Otherwise, look for the frame record as two unpaired registers. This is
1910 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1911 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1912 // On Windows, this check works out as current reg == FP, next reg == LR,
1913 // and on other platforms current reg == FP, previous reg == LR. This
1914 // works out as the correct pre-increment or post-increment offsets
1915 // respectively.
1916 return i > 0 && RPI.Reg1 == AArch64::FP &&
1917 CSI[i - 1].getReg() == AArch64::LR;
1918 };
1919
1920 // Save the offset to frame record so that the FP register can point to the
1921 // innermost frame record (spilled FP and LR registers).
1922 if (NeedsFrameRecord && isFrameRecord())
1924
1925 RegPairs.push_back(RPI);
1926 if (RPI.isPaired())
1927 i += RegInc;
1928 }
1929 if (NeedsWinCFI) {
1930 // If we need an alignment gap in the stack, align the topmost stack
1931 // object. A stack frame with a gap looks like this, bottom up:
1932 // x19, d8. d9, gap.
1933 // Set extra alignment on the topmost stack object (the first element in
1934 // CSI, which goes top down), to create the gap above it.
1935 if (AFI->hasCalleeSaveStackFreeSpace())
1936 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1937 // We iterated bottom up over the registers; flip RegPairs back to top
1938 // down order.
1939 std::reverse(RegPairs.begin(), RegPairs.end());
1940 }
1941}
1942
1946 MachineFunction &MF = *MBB.getParent();
1947 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1949 bool NeedsWinCFI = needsWinCFI(MF);
1950 DebugLoc DL;
1952
1953 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1954
1956 // Refresh the reserved regs in case there are any potential changes since the
1957 // last freeze.
1958 MRI.freezeReservedRegs();
1959
1960 if (homogeneousPrologEpilog(MF)) {
1961 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1963
1964 for (auto &RPI : RegPairs) {
1965 MIB.addReg(RPI.Reg1);
1966 MIB.addReg(RPI.Reg2);
1967
1968 // Update register live in.
1969 if (!MRI.isReserved(RPI.Reg1))
1970 MBB.addLiveIn(RPI.Reg1);
1971 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1972 MBB.addLiveIn(RPI.Reg2);
1973 }
1974 return true;
1975 }
1976 bool PTrueCreated = false;
1977 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1978 Register Reg1 = RPI.Reg1;
1979 Register Reg2 = RPI.Reg2;
1980 unsigned StrOpc;
1981
1982 // Issue sequence of spills for cs regs. The first spill may be converted
1983 // to a pre-decrement store later by emitPrologue if the callee-save stack
1984 // area allocation can't be combined with the local stack area allocation.
1985 // For example:
1986 // stp x22, x21, [sp, #0] // addImm(+0)
1987 // stp x20, x19, [sp, #16] // addImm(+2)
1988 // stp fp, lr, [sp, #32] // addImm(+4)
1989 // Rationale: This sequence saves uop updates compared to a sequence of
1990 // pre-increment spills like stp xi,xj,[sp,#-16]!
1991 // Note: Similar rationale and sequence for restores in epilog.
1992 unsigned Size = TRI->getSpillSize(*RPI.RC);
1993 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1994 switch (RPI.Type) {
1995 case RegPairInfo::GPR:
1996 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1997 break;
1998 case RegPairInfo::FPR64:
1999 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2000 break;
2001 case RegPairInfo::FPR128:
2002 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2003 break;
2004 case RegPairInfo::ZPR:
2005 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2006 break;
2007 case RegPairInfo::PPR:
2008 StrOpc = AArch64::STR_PXI;
2009 break;
2010 case RegPairInfo::VG:
2011 StrOpc = AArch64::STRXui;
2012 break;
2013 }
2014
2015 Register X0Scratch;
2016 auto RestoreX0 = make_scope_exit([&] {
2017 if (X0Scratch != AArch64::NoRegister)
2018 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2019 .addReg(X0Scratch)
2021 });
2022
2023 if (Reg1 == AArch64::VG) {
2024 // Find an available register to store value of VG to.
2025 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2026 assert(Reg1 != AArch64::NoRegister);
2027 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2028 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2029 .addImm(31)
2030 .addImm(1)
2032 } else {
2034 if (any_of(MBB.liveins(),
2035 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2036 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2037 AArch64::X0, LiveIn.PhysReg);
2038 })) {
2039 X0Scratch = Reg1;
2040 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2041 .addReg(AArch64::X0)
2043 }
2044
2045 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2046 const uint32_t *RegMask =
2047 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2048 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2049 .addExternalSymbol(TLI.getLibcallName(LC))
2050 .addRegMask(RegMask)
2051 .addReg(AArch64::X0, RegState::ImplicitDefine)
2053 Reg1 = AArch64::X0;
2054 }
2055 }
2056
2057 LLVM_DEBUG({
2058 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2059 if (RPI.isPaired())
2060 dbgs() << ", " << printReg(Reg2, TRI);
2061 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2062 if (RPI.isPaired())
2063 dbgs() << ", " << RPI.FrameIdx + 1;
2064 dbgs() << ")\n";
2065 });
2066
2067 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2068 "Windows unwdinding requires a consecutive (FP,LR) pair");
2069 // Windows unwind codes require consecutive registers if registers are
2070 // paired. Make the switch here, so that the code below will save (x,x+1)
2071 // and not (x+1,x).
2072 unsigned FrameIdxReg1 = RPI.FrameIdx;
2073 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2074 if (NeedsWinCFI && RPI.isPaired()) {
2075 std::swap(Reg1, Reg2);
2076 std::swap(FrameIdxReg1, FrameIdxReg2);
2077 }
2078
2079 if (RPI.isPaired() && RPI.isScalable()) {
2080 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2083 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2084 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2085 "Expects SVE2.1 or SME2 target and a predicate register");
2086#ifdef EXPENSIVE_CHECKS
2087 auto IsPPR = [](const RegPairInfo &c) {
2088 return c.Reg1 == RegPairInfo::PPR;
2089 };
2090 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2091 auto IsZPR = [](const RegPairInfo &c) {
2092 return c.Type == RegPairInfo::ZPR;
2093 };
2094 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2095 assert(!(PPRBegin < ZPRBegin) &&
2096 "Expected callee save predicate to be handled first");
2097#endif
2098 if (!PTrueCreated) {
2099 PTrueCreated = true;
2100 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2102 }
2103 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2104 if (!MRI.isReserved(Reg1))
2105 MBB.addLiveIn(Reg1);
2106 if (!MRI.isReserved(Reg2))
2107 MBB.addLiveIn(Reg2);
2108 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2110 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2111 MachineMemOperand::MOStore, Size, Alignment));
2112 MIB.addReg(PnReg);
2113 MIB.addReg(AArch64::SP)
2114 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2115 // where 2*vscale is implicit
2118 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2119 MachineMemOperand::MOStore, Size, Alignment));
2120 if (NeedsWinCFI)
2121 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2122 } else { // The code when the pair of ZReg is not present
2123 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2124 if (!MRI.isReserved(Reg1))
2125 MBB.addLiveIn(Reg1);
2126 if (RPI.isPaired()) {
2127 if (!MRI.isReserved(Reg2))
2128 MBB.addLiveIn(Reg2);
2129 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2131 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2132 MachineMemOperand::MOStore, Size, Alignment));
2133 }
2134 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2135 .addReg(AArch64::SP)
2136 .addImm(RPI.Offset) // [sp, #offset*vscale],
2137 // where factor*vscale is implicit
2140 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2141 MachineMemOperand::MOStore, Size, Alignment));
2142 if (NeedsWinCFI)
2143 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2144 }
2145 // Update the StackIDs of the SVE stack slots.
2146 MachineFrameInfo &MFI = MF.getFrameInfo();
2147 if (RPI.Type == RegPairInfo::ZPR) {
2148 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2149 if (RPI.isPaired())
2150 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2151 } else if (RPI.Type == RegPairInfo::PPR) {
2153 if (RPI.isPaired())
2155 }
2156 }
2157 return true;
2158}
2159
2163 MachineFunction &MF = *MBB.getParent();
2165 DebugLoc DL;
2167 bool NeedsWinCFI = needsWinCFI(MF);
2168
2169 if (MBBI != MBB.end())
2170 DL = MBBI->getDebugLoc();
2171
2172 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2173 if (homogeneousPrologEpilog(MF, &MBB)) {
2174 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2176 for (auto &RPI : RegPairs) {
2177 MIB.addReg(RPI.Reg1, RegState::Define);
2178 MIB.addReg(RPI.Reg2, RegState::Define);
2179 }
2180 return true;
2181 }
2182
2183 // For performance reasons restore SVE register in increasing order
2184 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2185 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2186 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2187 std::reverse(PPRBegin, PPREnd);
2188 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2189 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2190 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2191 std::reverse(ZPRBegin, ZPREnd);
2192
2193 bool PTrueCreated = false;
2194 for (const RegPairInfo &RPI : RegPairs) {
2195 Register Reg1 = RPI.Reg1;
2196 Register Reg2 = RPI.Reg2;
2197
2198 // Issue sequence of restores for cs regs. The last restore may be converted
2199 // to a post-increment load later by emitEpilogue if the callee-save stack
2200 // area allocation can't be combined with the local stack area allocation.
2201 // For example:
2202 // ldp fp, lr, [sp, #32] // addImm(+4)
2203 // ldp x20, x19, [sp, #16] // addImm(+2)
2204 // ldp x22, x21, [sp, #0] // addImm(+0)
2205 // Note: see comment in spillCalleeSavedRegisters()
2206 unsigned LdrOpc;
2207 unsigned Size = TRI->getSpillSize(*RPI.RC);
2208 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2209 switch (RPI.Type) {
2210 case RegPairInfo::GPR:
2211 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2212 break;
2213 case RegPairInfo::FPR64:
2214 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2215 break;
2216 case RegPairInfo::FPR128:
2217 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2218 break;
2219 case RegPairInfo::ZPR:
2220 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2221 break;
2222 case RegPairInfo::PPR:
2223 LdrOpc = AArch64::LDR_PXI;
2224 break;
2225 case RegPairInfo::VG:
2226 continue;
2227 }
2228 LLVM_DEBUG({
2229 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2230 if (RPI.isPaired())
2231 dbgs() << ", " << printReg(Reg2, TRI);
2232 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2233 if (RPI.isPaired())
2234 dbgs() << ", " << RPI.FrameIdx + 1;
2235 dbgs() << ")\n";
2236 });
2237
2238 // Windows unwind codes require consecutive registers if registers are
2239 // paired. Make the switch here, so that the code below will save (x,x+1)
2240 // and not (x+1,x).
2241 unsigned FrameIdxReg1 = RPI.FrameIdx;
2242 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2243 if (NeedsWinCFI && RPI.isPaired()) {
2244 std::swap(Reg1, Reg2);
2245 std::swap(FrameIdxReg1, FrameIdxReg2);
2246 }
2247
2249 if (RPI.isPaired() && RPI.isScalable()) {
2250 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2252 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2253 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2254 "Expects SVE2.1 or SME2 target and a predicate register");
2255#ifdef EXPENSIVE_CHECKS
2256 assert(!(PPRBegin < ZPRBegin) &&
2257 "Expected callee save predicate to be handled first");
2258#endif
2259 if (!PTrueCreated) {
2260 PTrueCreated = true;
2261 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2263 }
2264 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2265 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2266 getDefRegState(true));
2268 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2269 MachineMemOperand::MOLoad, Size, Alignment));
2270 MIB.addReg(PnReg);
2271 MIB.addReg(AArch64::SP)
2272 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2273 // where 2*vscale is implicit
2276 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2277 MachineMemOperand::MOLoad, Size, Alignment));
2278 if (NeedsWinCFI)
2279 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2280 } else {
2281 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2282 if (RPI.isPaired()) {
2283 MIB.addReg(Reg2, getDefRegState(true));
2285 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2286 MachineMemOperand::MOLoad, Size, Alignment));
2287 }
2288 MIB.addReg(Reg1, getDefRegState(true));
2289 MIB.addReg(AArch64::SP)
2290 .addImm(RPI.Offset) // [sp, #offset*vscale]
2291 // where factor*vscale is implicit
2294 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2295 MachineMemOperand::MOLoad, Size, Alignment));
2296 if (NeedsWinCFI)
2297 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2298 }
2299 }
2300 return true;
2301}
2302
2303// Return the FrameID for a MMO.
2304static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2305 const MachineFrameInfo &MFI) {
2306 auto *PSV =
2308 if (PSV)
2309 return std::optional<int>(PSV->getFrameIndex());
2310
2311 if (MMO->getValue()) {
2312 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2313 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2314 FI++)
2315 if (MFI.getObjectAllocation(FI) == Al)
2316 return FI;
2317 }
2318 }
2319
2320 return std::nullopt;
2321}
2322
2323// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2324static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2325 const MachineFrameInfo &MFI) {
2326 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2327 return std::nullopt;
2328
2329 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2330}
2331
2332// Returns true if the LDST MachineInstr \p MI is a PPR access.
2333static bool isPPRAccess(const MachineInstr &MI) {
2334 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2335}
2336
2337// Check if a Hazard slot is needed for the current function, and if so create
2338// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2339// which can be used to determine if any hazard padding is needed.
2340void AArch64FrameLowering::determineStackHazardSlot(
2341 MachineFunction &MF, BitVector &SavedRegs) const {
2342 unsigned StackHazardSize = getStackHazardSize(MF);
2343 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2344 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2346 return;
2347
2348 // Stack hazards are only needed in streaming functions.
2349 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2350 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2351 return;
2352
2353 MachineFrameInfo &MFI = MF.getFrameInfo();
2354
2355 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2356 // stack objects.
2357 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2358 return AArch64::FPR64RegClass.contains(Reg) ||
2359 AArch64::FPR128RegClass.contains(Reg) ||
2360 AArch64::ZPRRegClass.contains(Reg);
2361 });
2362 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2363 return AArch64::PPRRegClass.contains(Reg);
2364 });
2365 bool HasFPRStackObjects = false;
2366 bool HasPPRStackObjects = false;
2367 if (!HasFPRCSRs || SplitSVEObjects) {
2368 enum SlotType : uint8_t {
2369 Unknown = 0,
2370 ZPRorFPR = 1 << 0,
2371 PPR = 1 << 1,
2372 GPR = 1 << 2,
2374 };
2375
2376 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2377 // based on the kinds of accesses used in the function.
2378 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2379 for (auto &MBB : MF) {
2380 for (auto &MI : MBB) {
2381 std::optional<int> FI = getLdStFrameID(MI, MFI);
2382 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2383 continue;
2384 if (MFI.hasScalableStackID(*FI)) {
2385 SlotTypes[*FI] |=
2386 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2387 } else {
2388 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2389 ? SlotType::ZPRorFPR
2390 : SlotType::GPR;
2391 }
2392 }
2393 }
2394
2395 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2396 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2397 // For SplitSVEObjects remember that this stack slot is a predicate, this
2398 // will be needed later when determining the frame layout.
2399 if (SlotTypes[FI] == SlotType::PPR) {
2401 HasPPRStackObjects = true;
2402 }
2403 }
2404 }
2405
2406 if (HasFPRCSRs || HasFPRStackObjects) {
2407 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2408 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2409 << StackHazardSize << "\n");
2411 }
2412
2413 if (!AFI->hasStackHazardSlotIndex())
2414 return;
2415
2416 if (SplitSVEObjects) {
2417 CallingConv::ID CC = MF.getFunction().getCallingConv();
2418 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2419 AFI->setSplitSVEObjects(true);
2420 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2421 return;
2422 }
2423
2424 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2425 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2426 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2427 "non-SVE CC function...\n");
2428
2429 // If another calling convention is explicitly set FPRs can't be promoted to
2430 // ZPR callee-saves.
2432 LLVM_DEBUG(
2433 dbgs()
2434 << "Calling convention is not supported with SplitSVEObjects\n");
2435 return;
2436 }
2437
2438 if (!HasPPRCSRs && !HasPPRStackObjects) {
2439 LLVM_DEBUG(
2440 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2441 return;
2442 }
2443
2444 if (!HasFPRCSRs && !HasFPRStackObjects) {
2445 LLVM_DEBUG(
2446 dbgs()
2447 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2448 return;
2449 }
2450
2451 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2452 MF.getSubtarget<AArch64Subtarget>();
2454 "Expected SVE to be available for PPRs");
2455
2456 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2457 // With SplitSVEObjects the CS hazard padding is placed between the
2458 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2459 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2460 BitVector FPRZRegs(SavedRegs.size());
2461 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2462 BitVector::reference RegBit = SavedRegs[Reg];
2463 if (!RegBit)
2464 continue;
2465 unsigned SubRegIdx = 0;
2466 if (AArch64::FPR64RegClass.contains(Reg))
2467 SubRegIdx = AArch64::dsub;
2468 else if (AArch64::FPR128RegClass.contains(Reg))
2469 SubRegIdx = AArch64::zsub;
2470 else
2471 continue;
2472 // Clear the bit for the FPR save.
2473 RegBit = false;
2474 // Mark that we should save the corresponding ZPR.
2475 Register ZReg =
2476 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2477 FPRZRegs.set(ZReg);
2478 }
2479 SavedRegs |= FPRZRegs;
2480
2481 AFI->setSplitSVEObjects(true);
2482 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2483 }
2484}
2485
2487 BitVector &SavedRegs,
2488 RegScavenger *RS) const {
2489 // All calls are tail calls in GHC calling conv, and functions have no
2490 // prologue/epilogue.
2492 return;
2493
2494 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2495
2497 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2499 unsigned UnspilledCSGPR = AArch64::NoRegister;
2500 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2501
2502 MachineFrameInfo &MFI = MF.getFrameInfo();
2503 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2504
2505 MCRegister BasePointerReg =
2506 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2507
2508 unsigned ExtraCSSpill = 0;
2509 bool HasUnpairedGPR64 = false;
2510 bool HasPairZReg = false;
2511 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2512 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2513
2514 // Figure out which callee-saved registers to save/restore.
2515 for (unsigned i = 0; CSRegs[i]; ++i) {
2516 const MCRegister Reg = CSRegs[i];
2517
2518 // Add the base pointer register to SavedRegs if it is callee-save.
2519 if (Reg == BasePointerReg)
2520 SavedRegs.set(Reg);
2521
2522 // Don't save manually reserved registers set through +reserve-x#i,
2523 // even for callee-saved registers, as per GCC's behavior.
2524 if (UserReservedRegs[Reg]) {
2525 SavedRegs.reset(Reg);
2526 continue;
2527 }
2528
2529 bool RegUsed = SavedRegs.test(Reg);
2530 MCRegister PairedReg;
2531 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2532 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2533 AArch64::FPR128RegClass.contains(Reg)) {
2534 // Compensate for odd numbers of GP CSRs.
2535 // For now, all the known cases of odd number of CSRs are of GPRs.
2536 if (HasUnpairedGPR64)
2537 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2538 else
2539 PairedReg = CSRegs[i ^ 1];
2540 }
2541
2542 // If the function requires all the GP registers to save (SavedRegs),
2543 // and there are an odd number of GP CSRs at the same time (CSRegs),
2544 // PairedReg could be in a different register class from Reg, which would
2545 // lead to a FPR (usually D8) accidentally being marked saved.
2546 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2547 PairedReg = AArch64::NoRegister;
2548 HasUnpairedGPR64 = true;
2549 }
2550 assert(PairedReg == AArch64::NoRegister ||
2551 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2552 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2553 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2554
2555 if (!RegUsed) {
2556 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2557 UnspilledCSGPR = Reg;
2558 UnspilledCSGPRPaired = PairedReg;
2559 }
2560 continue;
2561 }
2562
2563 // MachO's compact unwind format relies on all registers being stored in
2564 // pairs.
2565 // FIXME: the usual format is actually better if unwinding isn't needed.
2566 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2567 !SavedRegs.test(PairedReg)) {
2568 SavedRegs.set(PairedReg);
2569 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2570 !ReservedRegs[PairedReg])
2571 ExtraCSSpill = PairedReg;
2572 }
2573 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2574 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2575 SavedRegs.test(CSRegs[i ^ 1]));
2576 }
2577
2578 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2580 // Find a suitable predicate register for the multi-vector spill/fill
2581 // instructions.
2582 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2583 if (PnReg.isValid())
2584 AFI->setPredicateRegForFillSpill(PnReg);
2585 // If no free callee-save has been found assign one.
2586 if (!AFI->getPredicateRegForFillSpill() &&
2587 MF.getFunction().getCallingConv() ==
2589 SavedRegs.set(AArch64::P8);
2590 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2591 }
2592
2593 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2594 "Predicate cannot be a reserved register");
2595 }
2596
2598 !Subtarget.isTargetWindows()) {
2599 // For Windows calling convention on a non-windows OS, where X18 is treated
2600 // as reserved, back up X18 when entering non-windows code (marked with the
2601 // Windows calling convention) and restore when returning regardless of
2602 // whether the individual function uses it - it might call other functions
2603 // that clobber it.
2604 SavedRegs.set(AArch64::X18);
2605 }
2606
2607 // Determine if a Hazard slot should be used and where it should go.
2608 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2609 // and ZPRs. Otherwise, it goes in the callee save area.
2610 determineStackHazardSlot(MF, SavedRegs);
2611
2612 // Calculates the callee saved stack size.
2613 unsigned CSStackSize = 0;
2614 unsigned ZPRCSStackSize = 0;
2615 unsigned PPRCSStackSize = 0;
2617 for (unsigned Reg : SavedRegs.set_bits()) {
2618 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2619 assert(RC && "expected register class!");
2620 auto SpillSize = TRI->getSpillSize(*RC);
2621 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2622 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2623 if (IsZPR)
2624 ZPRCSStackSize += SpillSize;
2625 else if (IsPPR)
2626 PPRCSStackSize += SpillSize;
2627 else
2628 CSStackSize += SpillSize;
2629 }
2630
2631 // Save number of saved regs, so we can easily update CSStackSize later to
2632 // account for any additional 64-bit GPR saves. Note: After this point
2633 // only 64-bit GPRs can be added to SavedRegs.
2634 unsigned NumSavedRegs = SavedRegs.count();
2635
2636 // If we have hazard padding in the CS area add that to the size.
2638 CSStackSize += getStackHazardSize(MF);
2639
2640 // Increase the callee-saved stack size if the function has streaming mode
2641 // changes, as we will need to spill the value of the VG register.
2642 if (requiresSaveVG(MF))
2643 CSStackSize += 8;
2644
2645 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2646 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2647 SavedRegs.set(AArch64::LR);
2648
2649 // The frame record needs to be created by saving the appropriate registers
2650 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2651 if (hasFP(MF) ||
2652 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2653 SavedRegs.set(AArch64::FP);
2654 SavedRegs.set(AArch64::LR);
2655 }
2656
2657 LLVM_DEBUG({
2658 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2659 for (unsigned Reg : SavedRegs.set_bits())
2660 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2661 dbgs() << "\n";
2662 });
2663
2664 // If any callee-saved registers are used, the frame cannot be eliminated.
2665 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2667 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2668 uint64_t SVEStackSize =
2669 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2670 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2671
2672 // The CSR spill slots have not been allocated yet, so estimateStackSize
2673 // won't include them.
2674 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2675
2676 // We may address some of the stack above the canonical frame address, either
2677 // for our own arguments or during a call. Include that in calculating whether
2678 // we have complicated addressing concerns.
2679 int64_t CalleeStackUsed = 0;
2680 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2681 int64_t FixedOff = MFI.getObjectOffset(I);
2682 if (FixedOff > CalleeStackUsed)
2683 CalleeStackUsed = FixedOff;
2684 }
2685
2686 // Conservatively always assume BigStack when there are SVE spills.
2687 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2688 CalleeStackUsed) > EstimatedStackSizeLimit;
2689 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2690 AFI->setHasStackFrame(true);
2691
2692 // Estimate if we might need to scavenge a register at some point in order
2693 // to materialize a stack offset. If so, either spill one additional
2694 // callee-saved register or reserve a special spill slot to facilitate
2695 // register scavenging. If we already spilled an extra callee-saved register
2696 // above to keep the number of spills even, we don't need to do anything else
2697 // here.
2698 if (BigStack) {
2699 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2700 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2701 << " to get a scratch register.\n");
2702 SavedRegs.set(UnspilledCSGPR);
2703 ExtraCSSpill = UnspilledCSGPR;
2704
2705 // MachO's compact unwind format relies on all registers being stored in
2706 // pairs, so if we need to spill one extra for BigStack, then we need to
2707 // store the pair.
2708 if (producePairRegisters(MF)) {
2709 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2710 // Failed to make a pair for compact unwind format, revert spilling.
2711 if (produceCompactUnwindFrame(*this, MF)) {
2712 SavedRegs.reset(UnspilledCSGPR);
2713 ExtraCSSpill = AArch64::NoRegister;
2714 }
2715 } else
2716 SavedRegs.set(UnspilledCSGPRPaired);
2717 }
2718 }
2719
2720 // If we didn't find an extra callee-saved register to spill, create
2721 // an emergency spill slot.
2722 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2724 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2725 unsigned Size = TRI->getSpillSize(RC);
2726 Align Alignment = TRI->getSpillAlign(RC);
2727 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2728 RS->addScavengingFrameIndex(FI);
2729 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2730 << " as the emergency spill slot.\n");
2731 }
2732 }
2733
2734 // Adding the size of additional 64bit GPR saves.
2735 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2736
2737 // A Swift asynchronous context extends the frame record with a pointer
2738 // directly before FP.
2739 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2740 CSStackSize += 8;
2741
2742 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2743 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2744 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2745
2747 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2748 "Should not invalidate callee saved info");
2749
2750 // Round up to register pair alignment to avoid additional SP adjustment
2751 // instructions.
2752 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2753 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2754 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2755}
2756
2758 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2759 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2760 unsigned &MaxCSFrameIndex) const {
2761 bool NeedsWinCFI = needsWinCFI(MF);
2762 unsigned StackHazardSize = getStackHazardSize(MF);
2763 // To match the canonical windows frame layout, reverse the list of
2764 // callee saved registers to get them laid out by PrologEpilogInserter
2765 // in the right order. (PrologEpilogInserter allocates stack objects top
2766 // down. Windows canonical prologs store higher numbered registers at
2767 // the top, thus have the CSI array start from the highest registers.)
2768 if (NeedsWinCFI)
2769 std::reverse(CSI.begin(), CSI.end());
2770
2771 if (CSI.empty())
2772 return true; // Early exit if no callee saved registers are modified!
2773
2774 // Now that we know which registers need to be saved and restored, allocate
2775 // stack slots for them.
2776 MachineFrameInfo &MFI = MF.getFrameInfo();
2777 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2778
2779 bool UsesWinAAPCS = isTargetWindows(MF);
2780 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2781 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2782 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2783 if ((unsigned)FrameIdx < MinCSFrameIndex)
2784 MinCSFrameIndex = FrameIdx;
2785 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2786 MaxCSFrameIndex = FrameIdx;
2787 }
2788
2789 // Insert VG into the list of CSRs, immediately before LR if saved.
2790 if (requiresSaveVG(MF)) {
2791 CalleeSavedInfo VGInfo(AArch64::VG);
2792 auto It =
2793 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2794 if (It != CSI.end())
2795 CSI.insert(It, VGInfo);
2796 else
2797 CSI.push_back(VGInfo);
2798 }
2799
2800 Register LastReg = 0;
2801 int HazardSlotIndex = std::numeric_limits<int>::max();
2802 for (auto &CS : CSI) {
2803 MCRegister Reg = CS.getReg();
2804 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2805
2806 // Create a hazard slot as we switch between GPR and FPR CSRs.
2808 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2810 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2811 "Unexpected register order for hazard slot");
2812 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2813 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2814 << "\n");
2815 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2816 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2817 MinCSFrameIndex = HazardSlotIndex;
2818 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2819 MaxCSFrameIndex = HazardSlotIndex;
2820 }
2821
2822 unsigned Size = RegInfo->getSpillSize(*RC);
2823 Align Alignment(RegInfo->getSpillAlign(*RC));
2824 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2825 CS.setFrameIdx(FrameIdx);
2826
2827 if ((unsigned)FrameIdx < MinCSFrameIndex)
2828 MinCSFrameIndex = FrameIdx;
2829 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2830 MaxCSFrameIndex = FrameIdx;
2831
2832 // Grab 8 bytes below FP for the extended asynchronous frame info.
2833 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2834 Reg == AArch64::FP) {
2835 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2836 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2837 if ((unsigned)FrameIdx < MinCSFrameIndex)
2838 MinCSFrameIndex = FrameIdx;
2839 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2840 MaxCSFrameIndex = FrameIdx;
2841 }
2842 LastReg = Reg;
2843 }
2844
2845 // Add hazard slot in the case where no FPR CSRs are present.
2847 HazardSlotIndex == std::numeric_limits<int>::max()) {
2848 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2849 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2850 << "\n");
2851 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2852 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2853 MinCSFrameIndex = HazardSlotIndex;
2854 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2855 MaxCSFrameIndex = HazardSlotIndex;
2856 }
2857
2858 return true;
2859}
2860
2862 const MachineFunction &MF) const {
2864 // If the function has streaming-mode changes, don't scavenge a
2865 // spillslot in the callee-save area, as that might require an
2866 // 'addvl' in the streaming-mode-changing call-sequence when the
2867 // function doesn't use a FP.
2868 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2869 return false;
2870 // Don't allow register salvaging with hazard slots, in case it moves objects
2871 // into the wrong place.
2872 if (AFI->hasStackHazardSlotIndex())
2873 return false;
2874 return AFI->hasCalleeSaveStackFreeSpace();
2875}
2876
2877/// returns true if there are any SVE callee saves.
2879 int &Min, int &Max) {
2880 Min = std::numeric_limits<int>::max();
2881 Max = std::numeric_limits<int>::min();
2882
2883 if (!MFI.isCalleeSavedInfoValid())
2884 return false;
2885
2886 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2887 for (auto &CS : CSI) {
2888 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2889 AArch64::PPRRegClass.contains(CS.getReg())) {
2890 assert((Max == std::numeric_limits<int>::min() ||
2891 Max + 1 == CS.getFrameIdx()) &&
2892 "SVE CalleeSaves are not consecutive");
2893 Min = std::min(Min, CS.getFrameIdx());
2894 Max = std::max(Max, CS.getFrameIdx());
2895 }
2896 }
2897 return Min != std::numeric_limits<int>::max();
2898}
2899
2901 AssignObjectOffsets AssignOffsets) {
2902 MachineFrameInfo &MFI = MF.getFrameInfo();
2903 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2904
2905 SVEStackSizes SVEStack{};
2906
2907 // With SplitSVEObjects we maintain separate stack offsets for predicates
2908 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2909 // are included in the SVE vector area.
2910 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2911 uint64_t &PPRStackTop =
2912 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2913
2914#ifndef NDEBUG
2915 // First process all fixed stack objects.
2916 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2917 assert(!MFI.hasScalableStackID(I) &&
2918 "SVE vectors should never be passed on the stack by value, only by "
2919 "reference.");
2920#endif
2921
2922 auto AllocateObject = [&](int FI) {
2924 ? ZPRStackTop
2925 : PPRStackTop;
2926
2927 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2928 // two, we'd need to align every object dynamically at runtime if the
2929 // alignment is larger than 16. This is not yet supported.
2930 Align Alignment = MFI.getObjectAlign(FI);
2931 if (Alignment > Align(16))
2933 "Alignment of scalable vectors > 16 bytes is not yet supported");
2934
2935 StackTop += MFI.getObjectSize(FI);
2936 StackTop = alignTo(StackTop, Alignment);
2937
2938 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2939 "SVE StackTop far too large?!");
2940
2941 int64_t Offset = -int64_t(StackTop);
2942 if (AssignOffsets == AssignObjectOffsets::Yes)
2943 MFI.setObjectOffset(FI, Offset);
2944
2945 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2946 };
2947
2948 // Then process all callee saved slots.
2949 int MinCSFrameIndex, MaxCSFrameIndex;
2950 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2951 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2952 AllocateObject(FI);
2953 }
2954
2955 // Ensure the CS area is 16-byte aligned.
2956 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2957 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2958
2959 // Create a buffer of SVE objects to allocate and sort it.
2960 SmallVector<int, 8> ObjectsToAllocate;
2961 // If we have a stack protector, and we've previously decided that we have SVE
2962 // objects on the stack and thus need it to go in the SVE stack area, then it
2963 // needs to go first.
2964 int StackProtectorFI = -1;
2965 if (MFI.hasStackProtectorIndex()) {
2966 StackProtectorFI = MFI.getStackProtectorIndex();
2967 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2968 ObjectsToAllocate.push_back(StackProtectorFI);
2969 }
2970
2971 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2972 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2973 continue;
2974 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2975 continue;
2976
2979 continue;
2980
2981 ObjectsToAllocate.push_back(FI);
2982 }
2983
2984 // Allocate all SVE locals and spills
2985 for (unsigned FI : ObjectsToAllocate)
2986 AllocateObject(FI);
2987
2988 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2989 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2990
2991 if (AssignOffsets == AssignObjectOffsets::Yes)
2992 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2993
2994 return SVEStack;
2995}
2996
2998 MachineFunction &MF, RegScavenger *RS) const {
3000 "Upwards growing stack unsupported");
3001
3003
3004 // If this function isn't doing Win64-style C++ EH, we don't need to do
3005 // anything.
3006 if (!MF.hasEHFunclets())
3007 return;
3008
3009 MachineFrameInfo &MFI = MF.getFrameInfo();
3010 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3011
3012 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3013 // object area right next to the UnwindHelp object.
3014 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3015 int64_t CurrentOffset =
3017 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3018 for (WinEHHandlerType &H : TBME.HandlerArray) {
3019 int FrameIndex = H.CatchObj.FrameIndex;
3020 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3021 CurrentOffset =
3022 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3023 CurrentOffset += MFI.getObjectSize(FrameIndex);
3024 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3025 }
3026 }
3027 }
3028
3029 // Create an UnwindHelp object.
3030 // The UnwindHelp object is allocated at the start of the fixed object area
3031 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3032 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3033 /*IsFunclet*/ false) &&
3034 "UnwindHelpOffset must be at the start of the fixed object area");
3035 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3036 /*IsImmutable=*/false);
3037 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3038
3039 MachineBasicBlock &MBB = MF.front();
3040 auto MBBI = MBB.begin();
3041 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3042 ++MBBI;
3043
3044 // We need to store -2 into the UnwindHelp object at the start of the
3045 // function.
3046 DebugLoc DL;
3047 RS->enterBasicBlockEnd(MBB);
3048 RS->backward(MBBI);
3049 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3050 assert(DstReg && "There must be a free register after frame setup");
3052 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3053 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3054 .addReg(DstReg, getKillRegState(true))
3055 .addFrameIndex(UnwindHelpFI)
3056 .addImm(0);
3057}
3058
3059namespace {
3060struct TagStoreInstr {
3062 int64_t Offset, Size;
3063 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3064 : MI(MI), Offset(Offset), Size(Size) {}
3065};
3066
3067class TagStoreEdit {
3068 MachineFunction *MF;
3069 MachineBasicBlock *MBB;
3070 MachineRegisterInfo *MRI;
3071 // Tag store instructions that are being replaced.
3073 // Combined memref arguments of the above instructions.
3075
3076 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3077 // FrameRegOffset + Size) with the address tag of SP.
3078 Register FrameReg;
3079 StackOffset FrameRegOffset;
3080 int64_t Size;
3081 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3082 // end.
3083 std::optional<int64_t> FrameRegUpdate;
3084 // MIFlags for any FrameReg updating instructions.
3085 unsigned FrameRegUpdateFlags;
3086
3087 // Use zeroing instruction variants.
3088 bool ZeroData;
3089 DebugLoc DL;
3090
3091 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3092 void emitLoop(MachineBasicBlock::iterator InsertI);
3093
3094public:
3095 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3096 : MBB(MBB), ZeroData(ZeroData) {
3097 MF = MBB->getParent();
3098 MRI = &MF->getRegInfo();
3099 }
3100 // Add an instruction to be replaced. Instructions must be added in the
3101 // ascending order of Offset, and have to be adjacent.
3102 void addInstruction(TagStoreInstr I) {
3103 assert((TagStores.empty() ||
3104 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3105 "Non-adjacent tag store instructions.");
3106 TagStores.push_back(I);
3107 }
3108 void clear() { TagStores.clear(); }
3109 // Emit equivalent code at the given location, and erase the current set of
3110 // instructions. May skip if the replacement is not profitable. May invalidate
3111 // the input iterator and replace it with a valid one.
3112 void emitCode(MachineBasicBlock::iterator &InsertI,
3113 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3114};
3115
3116void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3117 const AArch64InstrInfo *TII =
3118 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3119
3120 const int64_t kMinOffset = -256 * 16;
3121 const int64_t kMaxOffset = 255 * 16;
3122
3123 Register BaseReg = FrameReg;
3124 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3125 if (BaseRegOffsetBytes < kMinOffset ||
3126 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3127 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3128 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3129 // is required for the offset of ST2G.
3130 BaseRegOffsetBytes % 16 != 0) {
3131 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3132 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3133 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3134 BaseReg = ScratchReg;
3135 BaseRegOffsetBytes = 0;
3136 }
3137
3138 MachineInstr *LastI = nullptr;
3139 while (Size) {
3140 int64_t InstrSize = (Size > 16) ? 32 : 16;
3141 unsigned Opcode =
3142 InstrSize == 16
3143 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3144 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3145 assert(BaseRegOffsetBytes % 16 == 0);
3146 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3147 .addReg(AArch64::SP)
3148 .addReg(BaseReg)
3149 .addImm(BaseRegOffsetBytes / 16)
3150 .setMemRefs(CombinedMemRefs);
3151 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3152 // final SP adjustment in the epilogue.
3153 if (BaseRegOffsetBytes == 0)
3154 LastI = I;
3155 BaseRegOffsetBytes += InstrSize;
3156 Size -= InstrSize;
3157 }
3158
3159 if (LastI)
3160 MBB->splice(InsertI, MBB, LastI);
3161}
3162
3163void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3164 const AArch64InstrInfo *TII =
3165 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3166
3167 Register BaseReg = FrameRegUpdate
3168 ? FrameReg
3169 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3170 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3171
3172 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3173
3174 int64_t LoopSize = Size;
3175 // If the loop size is not a multiple of 32, split off one 16-byte store at
3176 // the end to fold BaseReg update into.
3177 if (FrameRegUpdate && *FrameRegUpdate)
3178 LoopSize -= LoopSize % 32;
3179 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3180 TII->get(ZeroData ? AArch64::STZGloop_wback
3181 : AArch64::STGloop_wback))
3182 .addDef(SizeReg)
3183 .addDef(BaseReg)
3184 .addImm(LoopSize)
3185 .addReg(BaseReg)
3186 .setMemRefs(CombinedMemRefs);
3187 if (FrameRegUpdate)
3188 LoopI->setFlags(FrameRegUpdateFlags);
3189
3190 int64_t ExtraBaseRegUpdate =
3191 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3192 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3193 << ", Size=" << Size
3194 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3195 << ", FrameRegUpdate=" << FrameRegUpdate
3196 << ", FrameRegOffset.getFixed()="
3197 << FrameRegOffset.getFixed() << "\n");
3198 if (LoopSize < Size) {
3199 assert(FrameRegUpdate);
3200 assert(Size - LoopSize == 16);
3201 // Tag 16 more bytes at BaseReg and update BaseReg.
3202 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3203 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3204 "STG immediate out of range");
3205 BuildMI(*MBB, InsertI, DL,
3206 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3207 .addDef(BaseReg)
3208 .addReg(BaseReg)
3209 .addReg(BaseReg)
3210 .addImm(STGOffset / 16)
3211 .setMemRefs(CombinedMemRefs)
3212 .setMIFlags(FrameRegUpdateFlags);
3213 } else if (ExtraBaseRegUpdate) {
3214 // Update BaseReg.
3215 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3216 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3217 BuildMI(
3218 *MBB, InsertI, DL,
3219 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3220 .addDef(BaseReg)
3221 .addReg(BaseReg)
3222 .addImm(AddSubOffset)
3223 .addImm(0)
3224 .setMIFlags(FrameRegUpdateFlags);
3225 }
3226}
3227
3228// Check if *II is a register update that can be merged into STGloop that ends
3229// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3230// end of the loop.
3231bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3232 int64_t Size, int64_t *TotalOffset) {
3233 MachineInstr &MI = *II;
3234 if ((MI.getOpcode() == AArch64::ADDXri ||
3235 MI.getOpcode() == AArch64::SUBXri) &&
3236 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3237 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3238 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3239 if (MI.getOpcode() == AArch64::SUBXri)
3240 Offset = -Offset;
3241 int64_t PostOffset = Offset - Size;
3242 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3243 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3244 // chosen depends on the alignment of the loop size, but the difference
3245 // between the valid ranges for the two instructions is small, so we
3246 // conservatively assume that it could be either case here.
3247 //
3248 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3249 // instruction.
3250 const int64_t kMaxOffset = 4080 - 16;
3251 // Max offset of SUBXri.
3252 const int64_t kMinOffset = -4095;
3253 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3254 PostOffset % 16 == 0) {
3255 *TotalOffset = Offset;
3256 return true;
3257 }
3258 }
3259 return false;
3260}
3261
3262void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3264 MemRefs.clear();
3265 for (auto &TS : TSE) {
3266 MachineInstr *MI = TS.MI;
3267 // An instruction without memory operands may access anything. Be
3268 // conservative and return an empty list.
3269 if (MI->memoperands_empty()) {
3270 MemRefs.clear();
3271 return;
3272 }
3273 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3274 }
3275}
3276
3277void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3278 const AArch64FrameLowering *TFI,
3279 bool TryMergeSPUpdate) {
3280 if (TagStores.empty())
3281 return;
3282 TagStoreInstr &FirstTagStore = TagStores[0];
3283 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3284 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3285 DL = TagStores[0].MI->getDebugLoc();
3286
3287 Register Reg;
3288 FrameRegOffset = TFI->resolveFrameOffsetReference(
3289 *MF, FirstTagStore.Offset, false /*isFixed*/,
3290 TargetStackID::Default /*StackID*/, Reg,
3291 /*PreferFP=*/false, /*ForSimm=*/true);
3292 FrameReg = Reg;
3293 FrameRegUpdate = std::nullopt;
3294
3295 mergeMemRefs(TagStores, CombinedMemRefs);
3296
3297 LLVM_DEBUG({
3298 dbgs() << "Replacing adjacent STG instructions:\n";
3299 for (const auto &Instr : TagStores) {
3300 dbgs() << " " << *Instr.MI;
3301 }
3302 });
3303
3304 // Size threshold where a loop becomes shorter than a linear sequence of
3305 // tagging instructions.
3306 const int kSetTagLoopThreshold = 176;
3307 if (Size < kSetTagLoopThreshold) {
3308 if (TagStores.size() < 2)
3309 return;
3310 emitUnrolled(InsertI);
3311 } else {
3312 MachineInstr *UpdateInstr = nullptr;
3313 int64_t TotalOffset = 0;
3314 if (TryMergeSPUpdate) {
3315 // See if we can merge base register update into the STGloop.
3316 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3317 // but STGloop is way too unusual for that, and also it only
3318 // realistically happens in function epilogue. Also, STGloop is expanded
3319 // before that pass.
3320 if (InsertI != MBB->end() &&
3321 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3322 &TotalOffset)) {
3323 UpdateInstr = &*InsertI++;
3324 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3325 << *UpdateInstr);
3326 }
3327 }
3328
3329 if (!UpdateInstr && TagStores.size() < 2)
3330 return;
3331
3332 if (UpdateInstr) {
3333 FrameRegUpdate = TotalOffset;
3334 FrameRegUpdateFlags = UpdateInstr->getFlags();
3335 }
3336 emitLoop(InsertI);
3337 if (UpdateInstr)
3338 UpdateInstr->eraseFromParent();
3339 }
3340
3341 for (auto &TS : TagStores)
3342 TS.MI->eraseFromParent();
3343}
3344
3345bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3346 int64_t &Size, bool &ZeroData) {
3347 MachineFunction &MF = *MI.getParent()->getParent();
3348 const MachineFrameInfo &MFI = MF.getFrameInfo();
3349
3350 unsigned Opcode = MI.getOpcode();
3351 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3352 Opcode == AArch64::STZ2Gi);
3353
3354 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3355 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3356 return false;
3357 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3358 return false;
3359 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3360 Size = MI.getOperand(2).getImm();
3361 return true;
3362 }
3363
3364 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3365 Size = 16;
3366 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3367 Size = 32;
3368 else
3369 return false;
3370
3371 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3372 return false;
3373
3374 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3375 16 * MI.getOperand(2).getImm();
3376 return true;
3377}
3378
3379// Detect a run of memory tagging instructions for adjacent stack frame slots,
3380// and replace them with a shorter instruction sequence:
3381// * replace STG + STG with ST2G
3382// * replace STGloop + STGloop with STGloop
3383// This code needs to run when stack slot offsets are already known, but before
3384// FrameIndex operands in STG instructions are eliminated.
3386 const AArch64FrameLowering *TFI,
3387 RegScavenger *RS) {
3388 bool FirstZeroData;
3389 int64_t Size, Offset;
3390 MachineInstr &MI = *II;
3391 MachineBasicBlock *MBB = MI.getParent();
3393 if (&MI == &MBB->instr_back())
3394 return II;
3395 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3396 return II;
3397
3399 Instrs.emplace_back(&MI, Offset, Size);
3400
3401 constexpr int kScanLimit = 10;
3402 int Count = 0;
3404 NextI != E && Count < kScanLimit; ++NextI) {
3405 MachineInstr &MI = *NextI;
3406 bool ZeroData;
3407 int64_t Size, Offset;
3408 // Collect instructions that update memory tags with a FrameIndex operand
3409 // and (when applicable) constant size, and whose output registers are dead
3410 // (the latter is almost always the case in practice). Since these
3411 // instructions effectively have no inputs or outputs, we are free to skip
3412 // any non-aliasing instructions in between without tracking used registers.
3413 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3414 if (ZeroData != FirstZeroData)
3415 break;
3416 Instrs.emplace_back(&MI, Offset, Size);
3417 continue;
3418 }
3419
3420 // Only count non-transient, non-tagging instructions toward the scan
3421 // limit.
3422 if (!MI.isTransient())
3423 ++Count;
3424
3425 // Just in case, stop before the epilogue code starts.
3426 if (MI.getFlag(MachineInstr::FrameSetup) ||
3428 break;
3429
3430 // Reject anything that may alias the collected instructions.
3431 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3432 break;
3433 }
3434
3435 // New code will be inserted after the last tagging instruction we've found.
3436 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3437
3438 // All the gathered stack tag instructions are merged and placed after
3439 // last tag store in the list. The check should be made if the nzcv
3440 // flag is live at the point where we are trying to insert. Otherwise
3441 // the nzcv flag might get clobbered if any stg loops are present.
3442
3443 // FIXME : This approach of bailing out from merge is conservative in
3444 // some ways like even if stg loops are not present after merge the
3445 // insert list, this liveness check is done (which is not needed).
3447 LiveRegs.addLiveOuts(*MBB);
3448 for (auto I = MBB->rbegin();; ++I) {
3449 MachineInstr &MI = *I;
3450 if (MI == InsertI)
3451 break;
3452 LiveRegs.stepBackward(*I);
3453 }
3454 InsertI++;
3455 if (LiveRegs.contains(AArch64::NZCV))
3456 return InsertI;
3457
3458 llvm::stable_sort(Instrs,
3459 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3460 return Left.Offset < Right.Offset;
3461 });
3462
3463 // Make sure that we don't have any overlapping stores.
3464 int64_t CurOffset = Instrs[0].Offset;
3465 for (auto &Instr : Instrs) {
3466 if (CurOffset > Instr.Offset)
3467 return NextI;
3468 CurOffset = Instr.Offset + Instr.Size;
3469 }
3470
3471 // Find contiguous runs of tagged memory and emit shorter instruction
3472 // sequences for them when possible.
3473 TagStoreEdit TSE(MBB, FirstZeroData);
3474 std::optional<int64_t> EndOffset;
3475 for (auto &Instr : Instrs) {
3476 if (EndOffset && *EndOffset != Instr.Offset) {
3477 // Found a gap.
3478 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3479 TSE.clear();
3480 }
3481
3482 TSE.addInstruction(Instr);
3483 EndOffset = Instr.Offset + Instr.Size;
3484 }
3485
3486 const MachineFunction *MF = MBB->getParent();
3487 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3488 TSE.emitCode(
3489 InsertI, TFI, /*TryMergeSPUpdate = */
3491
3492 return InsertI;
3493}
3494} // namespace
3495
3497 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3498 for (auto &BB : MF)
3499 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3501 II = tryMergeAdjacentSTG(II, this, RS);
3502 }
3503
3504 // By the time this method is called, most of the prologue/epilogue code is
3505 // already emitted, whether its location was affected by the shrink-wrapping
3506 // optimization or not.
3507 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3508 shouldSignReturnAddressEverywhere(MF))
3510}
3511
3512/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3513/// before the update. This is easily retrieved as it is exactly the offset
3514/// that is set in processFunctionBeforeFrameFinalized.
3516 const MachineFunction &MF, int FI, Register &FrameReg,
3517 bool IgnoreSPUpdates) const {
3518 const MachineFrameInfo &MFI = MF.getFrameInfo();
3519 if (IgnoreSPUpdates) {
3520 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3521 << MFI.getObjectOffset(FI) << "\n");
3522 FrameReg = AArch64::SP;
3523 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3524 }
3525
3526 // Go to common code if we cannot provide sp + offset.
3527 if (MFI.hasVarSizedObjects() ||
3530 return getFrameIndexReference(MF, FI, FrameReg);
3531
3532 FrameReg = AArch64::SP;
3533 return getStackOffset(MF, MFI.getObjectOffset(FI));
3534}
3535
3536/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3537/// the parent's frame pointer
3539 const MachineFunction &MF) const {
3540 return 0;
3541}
3542
3543/// Funclets only need to account for space for the callee saved registers,
3544/// as the locals are accounted for in the parent's stack frame.
3546 const MachineFunction &MF) const {
3547 // This is the size of the pushed CSRs.
3548 unsigned CSSize =
3549 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3550 // This is the amount of stack a funclet needs to allocate.
3551 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3552 getStackAlign());
3553}
3554
3555namespace {
3556struct FrameObject {
3557 bool IsValid = false;
3558 // Index of the object in MFI.
3559 int ObjectIndex = 0;
3560 // Group ID this object belongs to.
3561 int GroupIndex = -1;
3562 // This object should be placed first (closest to SP).
3563 bool ObjectFirst = false;
3564 // This object's group (which always contains the object with
3565 // ObjectFirst==true) should be placed first.
3566 bool GroupFirst = false;
3567
3568 // Used to distinguish between FP and GPR accesses. The values are decided so
3569 // that they sort FPR < Hazard < GPR and they can be or'd together.
3570 unsigned Accesses = 0;
3571 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3572};
3573
3574class GroupBuilder {
3575 SmallVector<int, 8> CurrentMembers;
3576 int NextGroupIndex = 0;
3577 std::vector<FrameObject> &Objects;
3578
3579public:
3580 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3581 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3582 void EndCurrentGroup() {
3583 if (CurrentMembers.size() > 1) {
3584 // Create a new group with the current member list. This might remove them
3585 // from their pre-existing groups. That's OK, dealing with overlapping
3586 // groups is too hard and unlikely to make a difference.
3587 LLVM_DEBUG(dbgs() << "group:");
3588 for (int Index : CurrentMembers) {
3589 Objects[Index].GroupIndex = NextGroupIndex;
3590 LLVM_DEBUG(dbgs() << " " << Index);
3591 }
3592 LLVM_DEBUG(dbgs() << "\n");
3593 NextGroupIndex++;
3594 }
3595 CurrentMembers.clear();
3596 }
3597};
3598
3599bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3600 // Objects at a lower index are closer to FP; objects at a higher index are
3601 // closer to SP.
3602 //
3603 // For consistency in our comparison, all invalid objects are placed
3604 // at the end. This also allows us to stop walking when we hit the
3605 // first invalid item after it's all sorted.
3606 //
3607 // If we want to include a stack hazard region, order FPR accesses < the
3608 // hazard object < GPRs accesses in order to create a separation between the
3609 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3610 //
3611 // Otherwise the "first" object goes first (closest to SP), followed by the
3612 // members of the "first" group.
3613 //
3614 // The rest are sorted by the group index to keep the groups together.
3615 // Higher numbered groups are more likely to be around longer (i.e. untagged
3616 // in the function epilogue and not at some earlier point). Place them closer
3617 // to SP.
3618 //
3619 // If all else equal, sort by the object index to keep the objects in the
3620 // original order.
3621 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3622 A.GroupIndex, A.ObjectIndex) <
3623 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3624 B.GroupIndex, B.ObjectIndex);
3625}
3626} // namespace
3627
3629 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3631
3632 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3633 ObjectsToAllocate.empty())
3634 return;
3635
3636 const MachineFrameInfo &MFI = MF.getFrameInfo();
3637 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3638 for (auto &Obj : ObjectsToAllocate) {
3639 FrameObjects[Obj].IsValid = true;
3640 FrameObjects[Obj].ObjectIndex = Obj;
3641 }
3642
3643 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3644 // the same time.
3645 GroupBuilder GB(FrameObjects);
3646 for (auto &MBB : MF) {
3647 for (auto &MI : MBB) {
3648 if (MI.isDebugInstr())
3649 continue;
3650
3651 if (AFI.hasStackHazardSlotIndex()) {
3652 std::optional<int> FI = getLdStFrameID(MI, MFI);
3653 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3654 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3656 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3657 else
3658 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3659 }
3660 }
3661
3662 int OpIndex;
3663 switch (MI.getOpcode()) {
3664 case AArch64::STGloop:
3665 case AArch64::STZGloop:
3666 OpIndex = 3;
3667 break;
3668 case AArch64::STGi:
3669 case AArch64::STZGi:
3670 case AArch64::ST2Gi:
3671 case AArch64::STZ2Gi:
3672 OpIndex = 1;
3673 break;
3674 default:
3675 OpIndex = -1;
3676 }
3677
3678 int TaggedFI = -1;
3679 if (OpIndex >= 0) {
3680 const MachineOperand &MO = MI.getOperand(OpIndex);
3681 if (MO.isFI()) {
3682 int FI = MO.getIndex();
3683 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3684 FrameObjects[FI].IsValid)
3685 TaggedFI = FI;
3686 }
3687 }
3688
3689 // If this is a stack tagging instruction for a slot that is not part of a
3690 // group yet, either start a new group or add it to the current one.
3691 if (TaggedFI >= 0)
3692 GB.AddMember(TaggedFI);
3693 else
3694 GB.EndCurrentGroup();
3695 }
3696 // Groups should never span multiple basic blocks.
3697 GB.EndCurrentGroup();
3698 }
3699
3700 if (AFI.hasStackHazardSlotIndex()) {
3701 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3702 FrameObject::AccessHazard;
3703 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3704 for (auto &Obj : FrameObjects)
3705 if (!Obj.Accesses ||
3706 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3707 Obj.Accesses = FrameObject::AccessGPR;
3708 }
3709
3710 // If the function's tagged base pointer is pinned to a stack slot, we want to
3711 // put that slot first when possible. This will likely place it at SP + 0,
3712 // and save one instruction when generating the base pointer because IRG does
3713 // not allow an immediate offset.
3714 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3715 if (TBPI) {
3716 FrameObjects[*TBPI].ObjectFirst = true;
3717 FrameObjects[*TBPI].GroupFirst = true;
3718 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3719 if (FirstGroupIndex >= 0)
3720 for (FrameObject &Object : FrameObjects)
3721 if (Object.GroupIndex == FirstGroupIndex)
3722 Object.GroupFirst = true;
3723 }
3724
3725 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3726
3727 int i = 0;
3728 for (auto &Obj : FrameObjects) {
3729 // All invalid items are sorted at the end, so it's safe to stop.
3730 if (!Obj.IsValid)
3731 break;
3732 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3733 }
3734
3735 LLVM_DEBUG({
3736 dbgs() << "Final frame order:\n";
3737 for (auto &Obj : FrameObjects) {
3738 if (!Obj.IsValid)
3739 break;
3740 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3741 if (Obj.ObjectFirst)
3742 dbgs() << ", first";
3743 if (Obj.GroupFirst)
3744 dbgs() << ", group-first";
3745 dbgs() << "\n";
3746 }
3747 });
3748}
3749
3750/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3751/// least every ProbeSize bytes. Returns an iterator of the first instruction
3752/// after the loop. The difference between SP and TargetReg must be an exact
3753/// multiple of ProbeSize.
3755AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3756 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3757 Register TargetReg) const {
3758 MachineBasicBlock &MBB = *MBBI->getParent();
3759 MachineFunction &MF = *MBB.getParent();
3760 const AArch64InstrInfo *TII =
3761 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3762 DebugLoc DL = MBB.findDebugLoc(MBBI);
3763
3764 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3765 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3766 MF.insert(MBBInsertPoint, LoopMBB);
3767 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3768 MF.insert(MBBInsertPoint, ExitMBB);
3769
3770 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3771 // in SUB).
3772 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3773 StackOffset::getFixed(-ProbeSize), TII,
3775 // STR XZR, [SP]
3776 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3777 .addReg(AArch64::XZR)
3778 .addReg(AArch64::SP)
3779 .addImm(0)
3781 // CMP SP, TargetReg
3782 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3783 AArch64::XZR)
3784 .addReg(AArch64::SP)
3785 .addReg(TargetReg)
3788 // B.CC Loop
3789 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3791 .addMBB(LoopMBB)
3793
3794 LoopMBB->addSuccessor(ExitMBB);
3795 LoopMBB->addSuccessor(LoopMBB);
3796 // Synthesize the exit MBB.
3797 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3799 MBB.addSuccessor(LoopMBB);
3800 // Update liveins.
3801 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3802
3803 return ExitMBB->begin();
3804}
3805
3806void AArch64FrameLowering::inlineStackProbeFixed(
3807 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3808 StackOffset CFAOffset) const {
3809 MachineBasicBlock *MBB = MBBI->getParent();
3810 MachineFunction &MF = *MBB->getParent();
3811 const AArch64InstrInfo *TII =
3812 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3813 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3814 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3815 bool HasFP = hasFP(MF);
3816
3817 DebugLoc DL;
3818 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3819 int64_t NumBlocks = FrameSize / ProbeSize;
3820 int64_t ResidualSize = FrameSize % ProbeSize;
3821
3822 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3823 << NumBlocks << " blocks of " << ProbeSize
3824 << " bytes, plus " << ResidualSize << " bytes\n");
3825
3826 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3827 // ordinary loop.
3828 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3829 for (int i = 0; i < NumBlocks; ++i) {
3830 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3831 // encodable in a SUB).
3832 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3833 StackOffset::getFixed(-ProbeSize), TII,
3834 MachineInstr::FrameSetup, false, false, nullptr,
3835 EmitAsyncCFI && !HasFP, CFAOffset);
3836 CFAOffset += StackOffset::getFixed(ProbeSize);
3837 // STR XZR, [SP]
3838 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3839 .addReg(AArch64::XZR)
3840 .addReg(AArch64::SP)
3841 .addImm(0)
3843 }
3844 } else if (NumBlocks != 0) {
3845 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3846 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3847 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3848 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3849 MachineInstr::FrameSetup, false, false, nullptr,
3850 EmitAsyncCFI && !HasFP, CFAOffset);
3851 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3852 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3853 MBB = MBBI->getParent();
3854 if (EmitAsyncCFI && !HasFP) {
3855 // Set the CFA register back to SP.
3856 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3857 .buildDefCFARegister(AArch64::SP);
3858 }
3859 }
3860
3861 if (ResidualSize != 0) {
3862 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3863 // in SUB).
3864 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3865 StackOffset::getFixed(-ResidualSize), TII,
3866 MachineInstr::FrameSetup, false, false, nullptr,
3867 EmitAsyncCFI && !HasFP, CFAOffset);
3868 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3869 // STR XZR, [SP]
3870 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3871 .addReg(AArch64::XZR)
3872 .addReg(AArch64::SP)
3873 .addImm(0)
3875 }
3876 }
3877}
3878
3879void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3880 MachineBasicBlock &MBB) const {
3881 // Get the instructions that need to be replaced. We emit at most two of
3882 // these. Remember them in order to avoid complications coming from the need
3883 // to traverse the block while potentially creating more blocks.
3884 SmallVector<MachineInstr *, 4> ToReplace;
3885 for (MachineInstr &MI : MBB)
3886 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3887 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3888 ToReplace.push_back(&MI);
3889
3890 for (MachineInstr *MI : ToReplace) {
3891 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3892 Register ScratchReg = MI->getOperand(0).getReg();
3893 int64_t FrameSize = MI->getOperand(1).getImm();
3894 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3895 MI->getOperand(3).getImm());
3896 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3897 CFAOffset);
3898 } else {
3899 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3900 "Stack probe pseudo-instruction expected");
3901 const AArch64InstrInfo *TII =
3902 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3903 Register TargetReg = MI->getOperand(0).getReg();
3904 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3905 }
3906 MI->eraseFromParent();
3907 }
3908}
3909
3912 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3913 GPR = 1 << 0, // A general purpose register.
3914 PPR = 1 << 1, // A predicate register.
3915 FPR = 1 << 2, // A floating point/Neon/SVE register.
3916 };
3917
3918 int Idx;
3920 int64_t Size;
3921 unsigned AccessTypes;
3922
3924
3925 bool operator<(const StackAccess &Rhs) const {
3926 return std::make_tuple(start(), Idx) <
3927 std::make_tuple(Rhs.start(), Rhs.Idx);
3928 }
3929
3930 bool isCPU() const {
3931 // Predicate register load and store instructions execute on the CPU.
3933 }
3934 bool isSME() const { return AccessTypes & AccessType::FPR; }
3935 bool isMixed() const { return isCPU() && isSME(); }
3936
3937 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3938 int64_t end() const { return start() + Size; }
3939
3940 std::string getTypeString() const {
3941 switch (AccessTypes) {
3942 case AccessType::FPR:
3943 return "FPR";
3944 case AccessType::PPR:
3945 return "PPR";
3946 case AccessType::GPR:
3947 return "GPR";
3949 return "NA";
3950 default:
3951 return "Mixed";
3952 }
3953 }
3954
3955 void print(raw_ostream &OS) const {
3956 OS << getTypeString() << " stack object at [SP"
3957 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3958 if (Offset.getScalable())
3959 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3960 << " * vscale";
3961 OS << "]";
3962 }
3963};
3964
3965static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3966 SA.print(OS);
3967 return OS;
3968}
3969
3970void AArch64FrameLowering::emitRemarks(
3971 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3972
3973 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3975 return;
3976
3977 unsigned StackHazardSize = getStackHazardSize(MF);
3978 const uint64_t HazardSize =
3979 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3980
3981 if (HazardSize == 0)
3982 return;
3983
3984 const MachineFrameInfo &MFI = MF.getFrameInfo();
3985 // Bail if function has no stack objects.
3986 if (!MFI.hasStackObjects())
3987 return;
3988
3989 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3990
3991 size_t NumFPLdSt = 0;
3992 size_t NumNonFPLdSt = 0;
3993
3994 // Collect stack accesses via Load/Store instructions.
3995 for (const MachineBasicBlock &MBB : MF) {
3996 for (const MachineInstr &MI : MBB) {
3997 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3998 continue;
3999 for (MachineMemOperand *MMO : MI.memoperands()) {
4000 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4001 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4002 int FrameIdx = *FI;
4003
4004 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4005 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4006 StackAccesses[ArrIdx].Idx = FrameIdx;
4007 StackAccesses[ArrIdx].Offset =
4008 getFrameIndexReferenceFromSP(MF, FrameIdx);
4009 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4010 }
4011
4012 unsigned RegTy = StackAccess::AccessType::GPR;
4013 if (MFI.hasScalableStackID(FrameIdx))
4016 RegTy = StackAccess::FPR;
4017
4018 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4019
4020 if (RegTy == StackAccess::FPR)
4021 ++NumFPLdSt;
4022 else
4023 ++NumNonFPLdSt;
4024 }
4025 }
4026 }
4027 }
4028
4029 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4030 return;
4031
4032 llvm::sort(StackAccesses);
4033 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4035 });
4036
4039
4040 if (StackAccesses.front().isMixed())
4041 MixedObjects.push_back(&StackAccesses.front());
4042
4043 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4044 It != End; ++It) {
4045 const auto &First = *It;
4046 const auto &Second = *(It + 1);
4047
4048 if (Second.isMixed())
4049 MixedObjects.push_back(&Second);
4050
4051 if ((First.isSME() && Second.isCPU()) ||
4052 (First.isCPU() && Second.isSME())) {
4053 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4054 if (Distance < HazardSize)
4055 HazardPairs.emplace_back(&First, &Second);
4056 }
4057 }
4058
4059 auto EmitRemark = [&](llvm::StringRef Str) {
4060 ORE->emit([&]() {
4061 auto R = MachineOptimizationRemarkAnalysis(
4062 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4063 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4064 });
4065 };
4066
4067 for (const auto &P : HazardPairs)
4068 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4069
4070 for (const auto *Obj : MixedObjects)
4071 EmitRemark(
4072 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4073}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
size - Get the array size.
Definition ArrayRef.h:142
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:137
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:730
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:298
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:149
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:337
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2058
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1732
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1622
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1758
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2120
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1897
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray