LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386/// Returns true if a homogeneous prolog or epilog code can be emitted
387/// for the size optimization. If possible, a frame helper call is injected.
388/// When Exit block is given, this check is for epilog.
389bool AArch64FrameLowering::homogeneousPrologEpilog(
390 MachineFunction &MF, MachineBasicBlock *Exit) const {
391 if (!MF.getFunction().hasMinSize())
392 return false;
394 return false;
395 if (EnableRedZone)
396 return false;
397
398 // TODO: Window is supported yet.
399 if (needsWinCFI(MF))
400 return false;
401
402 // TODO: SVE is not supported yet.
403 if (isLikelyToHaveSVEStack(*this, MF))
404 return false;
405
406 // Bail on stack adjustment needed on return for simplicity.
407 const MachineFrameInfo &MFI = MF.getFrameInfo();
408 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
409 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
410 return false;
411 if (Exit && getArgumentStackToRestore(MF, *Exit))
412 return false;
413
414 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
416 return false;
417
418 // If there are an odd number of GPRs before LR and FP in the CSRs list,
419 // they will not be paired into one RegPairInfo, which is incompatible with
420 // the assumption made by the homogeneous prolog epilog pass.
421 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
422 unsigned NumGPRs = 0;
423 for (unsigned I = 0; CSRegs[I]; ++I) {
424 Register Reg = CSRegs[I];
425 if (Reg == AArch64::LR) {
426 assert(CSRegs[I + 1] == AArch64::FP);
427 if (NumGPRs % 2 != 0)
428 return false;
429 break;
430 }
431 if (AArch64::GPR64RegClass.contains(Reg))
432 ++NumGPRs;
433 }
434
435 return true;
436}
437
438/// Returns true if CSRs should be paired.
439bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
440 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
441}
442
443/// This is the biggest offset to the stack pointer we can encode in aarch64
444/// instructions (without using a separate calculation and a temp register).
445/// Note that the exception here are vector stores/loads which cannot encode any
446/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
447static const unsigned DefaultSafeSPDisplacement = 255;
448
449/// Look at each instruction that references stack frames and return the stack
450/// size limit beyond which some of these instructions will require a scratch
451/// register during their expansion later.
453 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
454 // range. We'll end up allocating an unnecessary spill slot a lot, but
455 // realistically that's not a big deal at this stage of the game.
456 for (MachineBasicBlock &MBB : MF) {
457 for (MachineInstr &MI : MBB) {
458 if (MI.isDebugInstr() || MI.isPseudo() ||
459 MI.getOpcode() == AArch64::ADDXri ||
460 MI.getOpcode() == AArch64::ADDSXri)
461 continue;
462
463 for (const MachineOperand &MO : MI.operands()) {
464 if (!MO.isFI())
465 continue;
466
468 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
470 return 0;
471 }
472 }
473 }
475}
476
481
482unsigned
483AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
484 const AArch64FunctionInfo *AFI,
485 bool IsWin64, bool IsFunclet) const {
486 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
487 "Tail call reserved stack must be aligned to 16 bytes");
488 if (!IsWin64 || IsFunclet) {
489 return AFI->getTailCallReservedStack();
490 } else {
491 if (AFI->getTailCallReservedStack() != 0 &&
492 !MF.getFunction().getAttributes().hasAttrSomewhere(
493 Attribute::SwiftAsync))
494 report_fatal_error("cannot generate ABI-changing tail call for Win64");
495 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
496
497 // Var args are stored here in the primary function.
498 FixedObjectSize += AFI->getVarArgsGPRSize();
499
500 if (MF.hasEHFunclets()) {
501 // Catch objects are stored here in the primary function.
502 const MachineFrameInfo &MFI = MF.getFrameInfo();
503 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
504 SmallSetVector<int, 8> CatchObjFrameIndices;
505 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
506 for (const WinEHHandlerType &H : TBME.HandlerArray) {
507 int FrameIndex = H.CatchObj.FrameIndex;
508 if ((FrameIndex != INT_MAX) &&
509 CatchObjFrameIndices.insert(FrameIndex)) {
510 FixedObjectSize = alignTo(FixedObjectSize,
511 MFI.getObjectAlign(FrameIndex).value()) +
512 MFI.getObjectSize(FrameIndex);
513 }
514 }
515 }
516 // To support EH funclets we allocate an UnwindHelp object
517 FixedObjectSize += 8;
518 }
519 return alignTo(FixedObjectSize, 16);
520 }
521}
522
524 if (!EnableRedZone)
525 return false;
526
527 // Don't use the red zone if the function explicitly asks us not to.
528 // This is typically used for kernel code.
529 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
530 const unsigned RedZoneSize =
532 if (!RedZoneSize)
533 return false;
534
535 const MachineFrameInfo &MFI = MF.getFrameInfo();
537 uint64_t NumBytes = AFI->getLocalStackSize();
538
539 // If neither NEON or SVE are available, a COPY from one Q-reg to
540 // another requires a spill -> reload sequence. We can do that
541 // using a pre-decrementing store/post-decrementing load, but
542 // if we do so, we can't use the Red Zone.
543 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
544 !Subtarget.isNeonAvailable() &&
545 !Subtarget.hasSVE();
546
547 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
548 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
549}
550
551/// hasFPImpl - Return true if the specified function should have a dedicated
552/// frame pointer register.
554 const MachineFrameInfo &MFI = MF.getFrameInfo();
555 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
557
558 // Win64 EH requires a frame pointer if funclets are present, as the locals
559 // are accessed off the frame pointer in both the parent function and the
560 // funclets.
561 if (MF.hasEHFunclets())
562 return true;
563 // Retain behavior of always omitting the FP for leaf functions when possible.
565 return true;
566 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
567 MFI.hasStackMap() || MFI.hasPatchPoint() ||
568 RegInfo->hasStackRealignment(MF))
569 return true;
570
571 // If we:
572 //
573 // 1. Have streaming mode changes
574 // OR:
575 // 2. Have a streaming body with SVE stack objects
576 //
577 // Then the value of VG restored when unwinding to this function may not match
578 // the value of VG used to set up the stack.
579 //
580 // This is a problem as the CFA can be described with an expression of the
581 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
582 //
583 // If the value of VG used in that expression does not match the value used to
584 // set up the stack, an incorrect address for the CFA will be computed, and
585 // unwinding will fail.
586 //
587 // We work around this issue by ensuring the frame-pointer can describe the
588 // CFA in either of these cases.
589 if (AFI.needsDwarfUnwindInfo(MF) &&
592 return true;
593 // With large callframes around we may need to use FP to access the scavenging
594 // emergency spillslot.
595 //
596 // Unfortunately some calls to hasFP() like machine verifier ->
597 // getReservedReg() -> hasFP in the middle of global isel are too early
598 // to know the max call frame size. Hopefully conservatively returning "true"
599 // in those cases is fine.
600 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
601 if (!MFI.isMaxCallFrameSizeComputed() ||
603 return true;
604
605 return false;
606}
607
608/// Should the Frame Pointer be reserved for the current function?
610 const TargetMachine &TM = MF.getTarget();
611 const Triple &TT = TM.getTargetTriple();
612
613 // These OSes require the frame chain is valid, even if the current frame does
614 // not use a frame pointer.
615 if (TT.isOSDarwin() || TT.isOSWindows())
616 return true;
617
618 // If the function has a frame pointer, it is reserved.
619 if (hasFP(MF))
620 return true;
621
622 // Frontend has requested to preserve the frame pointer.
623 if (TM.Options.FramePointerIsReserved(MF))
624 return true;
625
626 return false;
627}
628
629/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
630/// not required, we reserve argument space for call sites in the function
631/// immediately on entry to the current function. This eliminates the need for
632/// add/sub sp brackets around call sites. Returns true if the call frame is
633/// included as part of the stack frame.
635 const MachineFunction &MF) const {
636 // The stack probing code for the dynamically allocated outgoing arguments
637 // area assumes that the stack is probed at the top - either by the prologue
638 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
639 // most recent variable-sized object allocation. Changing the condition here
640 // may need to be followed up by changes to the probe issuing logic.
641 return !MF.getFrameInfo().hasVarSizedObjects();
642}
643
647
648 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
649 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
650 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
651 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
652 DebugLoc DL = I->getDebugLoc();
653 unsigned Opc = I->getOpcode();
654 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
655 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
656
657 if (!hasReservedCallFrame(MF)) {
658 int64_t Amount = I->getOperand(0).getImm();
659 Amount = alignTo(Amount, getStackAlign());
660 if (!IsDestroy)
661 Amount = -Amount;
662
663 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
664 // doesn't have to pop anything), then the first operand will be zero too so
665 // this adjustment is a no-op.
666 if (CalleePopAmount == 0) {
667 // FIXME: in-function stack adjustment for calls is limited to 24-bits
668 // because there's no guaranteed temporary register available.
669 //
670 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
671 // 1) For offset <= 12-bit, we use LSL #0
672 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
673 // LSL #0, and the other uses LSL #12.
674 //
675 // Most call frames will be allocated at the start of a function so
676 // this is OK, but it is a limitation that needs dealing with.
677 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
678
679 if (TLI->hasInlineStackProbe(MF) &&
681 // When stack probing is enabled, the decrement of SP may need to be
682 // probed. We only need to do this if the call site needs 1024 bytes of
683 // space or more, because a region smaller than that is allowed to be
684 // unprobed at an ABI boundary. We rely on the fact that SP has been
685 // probed exactly at this point, either by the prologue or most recent
686 // dynamic allocation.
688 "non-reserved call frame without var sized objects?");
689 Register ScratchReg =
690 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
691 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
692 } else {
693 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
694 StackOffset::getFixed(Amount), TII);
695 }
696 }
697 } else if (CalleePopAmount != 0) {
698 // If the calling convention demands that the callee pops arguments from the
699 // stack, we want to add it back if we have a reserved call frame.
700 assert(CalleePopAmount < 0xffffff && "call frame too large");
701 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
702 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
703 }
704 return MBB.erase(I);
705}
706
708 MachineBasicBlock &MBB) const {
709
710 MachineFunction &MF = *MBB.getParent();
711 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
712 const auto &TRI = *Subtarget.getRegisterInfo();
713 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
714
715 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
716
717 // Reset the CFA to `SP + 0`.
718 CFIBuilder.buildDefCFA(AArch64::SP, 0);
719
720 // Flip the RA sign state.
721 if (MFI.shouldSignReturnAddress(MF))
722 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
723 : CFIBuilder.buildNegateRAState();
724
725 // Shadow call stack uses X18, reset it.
726 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
727 CFIBuilder.buildSameValue(AArch64::X18);
728
729 // Emit .cfi_same_value for callee-saved registers.
730 const std::vector<CalleeSavedInfo> &CSI =
732 for (const auto &Info : CSI) {
733 MCRegister Reg = Info.getReg();
734 if (!TRI.regNeedsCFI(Reg, Reg))
735 continue;
736 CFIBuilder.buildSameValue(Reg);
737 }
738}
739
741 switch (Reg.id()) {
742 default:
743 // The called routine is expected to preserve r19-r28
744 // r29 and r30 are used as frame pointer and link register resp.
745 return 0;
746
747 // GPRs
748#define CASE(n) \
749 case AArch64::W##n: \
750 case AArch64::X##n: \
751 return AArch64::X##n
752 CASE(0);
753 CASE(1);
754 CASE(2);
755 CASE(3);
756 CASE(4);
757 CASE(5);
758 CASE(6);
759 CASE(7);
760 CASE(8);
761 CASE(9);
762 CASE(10);
763 CASE(11);
764 CASE(12);
765 CASE(13);
766 CASE(14);
767 CASE(15);
768 CASE(16);
769 CASE(17);
770 CASE(18);
771#undef CASE
772
773 // FPRs
774#define CASE(n) \
775 case AArch64::B##n: \
776 case AArch64::H##n: \
777 case AArch64::S##n: \
778 case AArch64::D##n: \
779 case AArch64::Q##n: \
780 return HasSVE ? AArch64::Z##n : AArch64::Q##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800 CASE(19);
801 CASE(20);
802 CASE(21);
803 CASE(22);
804 CASE(23);
805 CASE(24);
806 CASE(25);
807 CASE(26);
808 CASE(27);
809 CASE(28);
810 CASE(29);
811 CASE(30);
812 CASE(31);
813#undef CASE
814 }
815}
816
817void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
818 MachineBasicBlock &MBB) const {
819 // Insertion point.
821
822 // Fake a debug loc.
823 DebugLoc DL;
824 if (MBBI != MBB.end())
825 DL = MBBI->getDebugLoc();
826
827 const MachineFunction &MF = *MBB.getParent();
828 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
829 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
830
831 BitVector GPRsToZero(TRI.getNumRegs());
832 BitVector FPRsToZero(TRI.getNumRegs());
833 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
834 for (MCRegister Reg : RegsToZero.set_bits()) {
835 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
836 // For GPRs, we only care to clear out the 64-bit register.
837 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
838 GPRsToZero.set(XReg);
839 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
840 // For FPRs,
841 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
842 FPRsToZero.set(XReg);
843 }
844 }
845
846 const AArch64InstrInfo &TII = *STI.getInstrInfo();
847
848 // Zero out GPRs.
849 for (MCRegister Reg : GPRsToZero.set_bits())
850 TII.buildClearRegister(Reg, MBB, MBBI, DL);
851
852 // Zero out FP/vector registers.
853 for (MCRegister Reg : FPRsToZero.set_bits())
854 TII.buildClearRegister(Reg, MBB, MBBI, DL);
855
856 if (HasSVE) {
857 for (MCRegister PReg :
858 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
859 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
860 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
861 AArch64::P15}) {
862 if (RegsToZero[PReg])
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
864 }
865 }
866}
867
868bool AArch64FrameLowering::windowsRequiresStackProbe(
869 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
870 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
871 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
875 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
876}
877
879 const MachineBasicBlock &MBB) {
880 const MachineFunction *MF = MBB.getParent();
881 LiveRegs.addLiveIns(MBB);
882 // Mark callee saved registers as used so we will not choose them.
883 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
884 for (unsigned i = 0; CSRegs[i]; ++i)
885 LiveRegs.addReg(CSRegs[i]);
886}
887
889AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
890 bool HasCall) const {
891 MachineFunction *MF = MBB->getParent();
892
893 // If MBB is an entry block, use X9 as the scratch register
894 // preserve_none functions may be using X9 to pass arguments,
895 // so prefer to pick an available register below.
896 if (&MF->front() == MBB &&
898 return AArch64::X9;
899
900 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 LivePhysRegs LiveRegs(TRI);
903 getLiveRegsForEntryMBB(LiveRegs, *MBB);
904 if (HasCall) {
905 LiveRegs.addReg(AArch64::X16);
906 LiveRegs.addReg(AArch64::X17);
907 LiveRegs.addReg(AArch64::X18);
908 }
909
910 // Prefer X9 since it was historically used for the prologue scratch reg.
911 const MachineRegisterInfo &MRI = MF->getRegInfo();
912 if (LiveRegs.available(MRI, AArch64::X9))
913 return AArch64::X9;
914
915 for (unsigned Reg : AArch64::GPR64RegClass) {
916 if (LiveRegs.available(MRI, Reg))
917 return Reg;
918 }
919 return AArch64::NoRegister;
920}
921
923 const MachineBasicBlock &MBB) const {
924 const MachineFunction *MF = MBB.getParent();
925 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
926 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
927 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
928 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
930
931 if (AFI->hasSwiftAsyncContext()) {
932 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
933 const MachineRegisterInfo &MRI = MF->getRegInfo();
936 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
937 // available.
938 if (!LiveRegs.available(MRI, AArch64::X16) ||
939 !LiveRegs.available(MRI, AArch64::X17))
940 return false;
941 }
942
943 // Certain stack probing sequences might clobber flags, then we can't use
944 // the block as a prologue if the flags register is a live-in.
946 MBB.isLiveIn(AArch64::NZCV))
947 return false;
948
949 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
950 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
951 return false;
952
953 // May need a scratch register (for return value) if require making a special
954 // call
955 if (requiresSaveVG(*MF) ||
956 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
957 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
958 return false;
959
960 return true;
961}
962
964 const Function &F = MF.getFunction();
965 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
966 F.needsUnwindTableEntry();
967}
968
969bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
970 const MachineFunction &MF) const {
971 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
972 // and SEH_EpilogEnd instructions in the correct order.
974 return false;
976 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
977 return SignReturnAddressAll;
978}
979
980// Given a load or a store instruction, generate an appropriate unwinding SEH
981// code on Windows.
983AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
984 const TargetInstrInfo &TII,
985 MachineInstr::MIFlag Flag) const {
986 unsigned Opc = MBBI->getOpcode();
987 MachineBasicBlock *MBB = MBBI->getParent();
988 MachineFunction &MF = *MBB->getParent();
989 DebugLoc DL = MBBI->getDebugLoc();
990 unsigned ImmIdx = MBBI->getNumOperands() - 1;
991 int Imm = MBBI->getOperand(ImmIdx).getImm();
993 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
994 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
995
996 switch (Opc) {
997 default:
998 report_fatal_error("No SEH Opcode for this instruction");
999 case AArch64::STR_ZXI:
1000 case AArch64::LDR_ZXI: {
1001 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1002 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1003 .addImm(Reg0)
1004 .addImm(Imm)
1005 .setMIFlag(Flag);
1006 break;
1007 }
1008 case AArch64::STR_PXI:
1009 case AArch64::LDR_PXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDPDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STPDpre: {
1021 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1023 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1024 .addImm(Reg0)
1025 .addImm(Reg1)
1026 .addImm(Imm * 8)
1027 .setMIFlag(Flag);
1028 break;
1029 }
1030 case AArch64::LDPXpost:
1031 Imm = -Imm;
1032 [[fallthrough]];
1033 case AArch64::STPXpre: {
1034 Register Reg0 = MBBI->getOperand(1).getReg();
1035 Register Reg1 = MBBI->getOperand(2).getReg();
1036 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1037 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1038 .addImm(Imm * 8)
1039 .setMIFlag(Flag);
1040 else
1041 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1042 .addImm(RegInfo->getSEHRegNum(Reg0))
1043 .addImm(RegInfo->getSEHRegNum(Reg1))
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 break;
1047 }
1048 case AArch64::LDRDpost:
1049 Imm = -Imm;
1050 [[fallthrough]];
1051 case AArch64::STRDpre: {
1052 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1053 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1054 .addImm(Reg)
1055 .addImm(Imm)
1056 .setMIFlag(Flag);
1057 break;
1058 }
1059 case AArch64::LDRXpost:
1060 Imm = -Imm;
1061 [[fallthrough]];
1062 case AArch64::STRXpre: {
1063 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1064 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1065 .addImm(Reg)
1066 .addImm(Imm)
1067 .setMIFlag(Flag);
1068 break;
1069 }
1070 case AArch64::STPDi:
1071 case AArch64::LDPDi: {
1072 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1073 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1074 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1075 .addImm(Reg0)
1076 .addImm(Reg1)
1077 .addImm(Imm * 8)
1078 .setMIFlag(Flag);
1079 break;
1080 }
1081 case AArch64::STPXi:
1082 case AArch64::LDPXi: {
1083 Register Reg0 = MBBI->getOperand(0).getReg();
1084 Register Reg1 = MBBI->getOperand(1).getReg();
1085 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1086 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1087 .addImm(Imm * 8)
1088 .setMIFlag(Flag);
1089 else
1090 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1091 .addImm(RegInfo->getSEHRegNum(Reg0))
1092 .addImm(RegInfo->getSEHRegNum(Reg1))
1093 .addImm(Imm * 8)
1094 .setMIFlag(Flag);
1095 break;
1096 }
1097 case AArch64::STRXui:
1098 case AArch64::LDRXui: {
1099 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1100 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1101 .addImm(Reg)
1102 .addImm(Imm * 8)
1103 .setMIFlag(Flag);
1104 break;
1105 }
1106 case AArch64::STRDui:
1107 case AArch64::LDRDui: {
1108 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1109 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1110 .addImm(Reg)
1111 .addImm(Imm * 8)
1112 .setMIFlag(Flag);
1113 break;
1114 }
1115 case AArch64::STPQi:
1116 case AArch64::LDPQi: {
1117 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1118 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1119 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1120 .addImm(Reg0)
1121 .addImm(Reg1)
1122 .addImm(Imm * 16)
1123 .setMIFlag(Flag);
1124 break;
1125 }
1126 case AArch64::LDPQpost:
1127 Imm = -Imm;
1128 [[fallthrough]];
1129 case AArch64::STPQpre: {
1130 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1131 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1132 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1133 .addImm(Reg0)
1134 .addImm(Reg1)
1135 .addImm(Imm * 16)
1136 .setMIFlag(Flag);
1137 break;
1138 }
1139 }
1140 auto I = MBB->insertAfter(MBBI, MIB);
1141 return I;
1142}
1143
1146 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1147 return false;
1148 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1149 // is enabled with streaming mode changes.
1150 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1151 if (ST.isTargetDarwin())
1152 return ST.hasSVE();
1153 return true;
1154}
1155
1156static bool isTargetWindows(const MachineFunction &MF) {
1158}
1159
1161 MachineFunction &MF) const {
1162 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1163 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1164
1165 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1166 DebugLoc DL; // Set debug location to unknown.
1168
1169 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1171 };
1172
1173 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1174 DebugLoc DL;
1175 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1176 if (MBBI != MBB.end())
1177 DL = MBBI->getDebugLoc();
1178
1179 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1181 };
1182
1183 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1184 EmitSignRA(MF.front());
1185 for (MachineBasicBlock &MBB : MF) {
1186 if (MBB.isEHFuncletEntry())
1187 EmitSignRA(MBB);
1188 if (MBB.isReturnBlock())
1189 EmitAuthRA(MBB);
1190 }
1191}
1192
1194 MachineBasicBlock &MBB) const {
1195 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1196 PrologueEmitter.emitPrologue();
1197}
1198
1200 MachineBasicBlock &MBB) const {
1201 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1202 EpilogueEmitter.emitEpilogue();
1203}
1204
1207 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1208}
1209
1211 return enableCFIFixup(MF) &&
1212 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1213}
1214
1215/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1216/// debug info. It's the same as what we use for resolving the code-gen
1217/// references for now. FIXME: This can go wrong when references are
1218/// SP-relative and simple call frames aren't used.
1221 Register &FrameReg) const {
1223 MF, FI, FrameReg,
1224 /*PreferFP=*/
1225 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1226 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1227 /*ForSimm=*/false);
1228}
1229
1232 int FI) const {
1233 // This function serves to provide a comparable offset from a single reference
1234 // point (the value of SP at function entry) that can be used for analysis,
1235 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1236 // correct for all objects in the presence of VLA-area objects or dynamic
1237 // stack re-alignment.
1238
1239 const auto &MFI = MF.getFrameInfo();
1240
1241 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1242 StackOffset ZPRStackSize = getZPRStackSize(MF);
1243 StackOffset PPRStackSize = getPPRStackSize(MF);
1244 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1245
1246 // For VLA-area objects, just emit an offset at the end of the stack frame.
1247 // Whilst not quite correct, these objects do live at the end of the frame and
1248 // so it is more useful for analysis for the offset to reflect this.
1249 if (MFI.isVariableSizedObjectIndex(FI)) {
1250 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1251 }
1252
1253 // This is correct in the absence of any SVE stack objects.
1254 if (!SVEStackSize)
1255 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1256
1257 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1258 bool FPAfterSVECalleeSaves =
1260 if (MFI.hasScalableStackID(FI)) {
1261 if (FPAfterSVECalleeSaves &&
1262 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1263 assert(!AFI->hasSplitSVEObjects() &&
1264 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1265 return StackOffset::getScalable(ObjectOffset);
1266 }
1267 StackOffset AccessOffset{};
1268 // The scalable vectors are below (lower address) the scalable predicates
1269 // with split SVE objects, so we must subtract the size of the predicates.
1270 if (AFI->hasSplitSVEObjects() &&
1271 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1272 AccessOffset = -PPRStackSize;
1273 return AccessOffset +
1274 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1275 ObjectOffset);
1276 }
1277
1278 bool IsFixed = MFI.isFixedObjectIndex(FI);
1279 bool IsCSR =
1280 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1281
1282 StackOffset ScalableOffset = {};
1283 if (!IsFixed && !IsCSR) {
1284 ScalableOffset = -SVEStackSize;
1285 } else if (FPAfterSVECalleeSaves && IsCSR) {
1286 ScalableOffset =
1288 }
1289
1290 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1291}
1292
1298
1299StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1300 int64_t ObjectOffset) const {
1301 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1302 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1303 const Function &F = MF.getFunction();
1304 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1305 unsigned FixedObject =
1306 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1307 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1308 int64_t FPAdjust =
1309 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1310 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1311}
1312
1313StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1314 int64_t ObjectOffset) const {
1315 const auto &MFI = MF.getFrameInfo();
1316 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1317}
1318
1319// TODO: This function currently does not work for scalable vectors.
1321 int FI) const {
1322 const AArch64RegisterInfo *RegInfo =
1323 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1324 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1325 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1326 ? getFPOffset(MF, ObjectOffset).getFixed()
1327 : getStackOffset(MF, ObjectOffset).getFixed();
1328}
1329
1331 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1332 bool ForSimm) const {
1333 const auto &MFI = MF.getFrameInfo();
1334 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1335 bool isFixed = MFI.isFixedObjectIndex(FI);
1336 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1337 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1338 FrameReg, PreferFP, ForSimm);
1339}
1340
1342 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1343 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1344 bool ForSimm) const {
1345 const auto &MFI = MF.getFrameInfo();
1346 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1347 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1348 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1349
1350 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1351 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1352 bool isCSR =
1353 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1354 bool isSVE = MFI.isScalableStackID(StackID);
1355
1356 StackOffset ZPRStackSize = getZPRStackSize(MF);
1357 StackOffset PPRStackSize = getPPRStackSize(MF);
1358 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1359
1360 // Use frame pointer to reference fixed objects. Use it for locals if
1361 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1362 // reliable as a base). Make sure useFPForScavengingIndex() does the
1363 // right thing for the emergency spill slot.
1364 bool UseFP = false;
1365 if (AFI->hasStackFrame() && !isSVE) {
1366 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1367 // there are scalable (SVE) objects in between the FP and the fixed-sized
1368 // objects.
1369 PreferFP &= !SVEStackSize;
1370
1371 // Note: Keeping the following as multiple 'if' statements rather than
1372 // merging to a single expression for readability.
1373 //
1374 // Argument access should always use the FP.
1375 if (isFixed) {
1376 UseFP = hasFP(MF);
1377 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1378 // References to the CSR area must use FP if we're re-aligning the stack
1379 // since the dynamically-sized alignment padding is between the SP/BP and
1380 // the CSR area.
1381 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1382 UseFP = true;
1383 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1384 // If the FPOffset is negative and we're producing a signed immediate, we
1385 // have to keep in mind that the available offset range for negative
1386 // offsets is smaller than for positive ones. If an offset is available
1387 // via the FP and the SP, use whichever is closest.
1388 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1389 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1390
1391 if (FPOffset >= 0) {
1392 // If the FPOffset is positive, that'll always be best, as the SP/BP
1393 // will be even further away.
1394 UseFP = true;
1395 } else if (MFI.hasVarSizedObjects()) {
1396 // If we have variable sized objects, we can use either FP or BP, as the
1397 // SP offset is unknown. We can use the base pointer if we have one and
1398 // FP is not preferred. If not, we're stuck with using FP.
1399 bool CanUseBP = RegInfo->hasBasePointer(MF);
1400 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1401 UseFP = PreferFP;
1402 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1403 UseFP = true;
1404 // else we can use BP and FP, but the offset from FP won't fit.
1405 // That will make us scavenge registers which we can probably avoid by
1406 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1407 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1408 // Funclets access the locals contained in the parent's stack frame
1409 // via the frame pointer, so we have to use the FP in the parent
1410 // function.
1411 (void) Subtarget;
1412 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1413 MF.getFunction().isVarArg()) &&
1414 "Funclets should only be present on Win64");
1415 UseFP = true;
1416 } else {
1417 // We have the choice between FP and (SP or BP).
1418 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1419 UseFP = true;
1420 }
1421 }
1422 }
1423
1424 assert(
1425 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1426 "In the presence of dynamic stack pointer realignment, "
1427 "non-argument/CSR objects cannot be accessed through the frame pointer");
1428
1429 bool FPAfterSVECalleeSaves =
1431
1432 if (isSVE) {
1433 StackOffset FPOffset = StackOffset::get(
1434 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1435 StackOffset SPOffset =
1436 SVEStackSize +
1437 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1438 ObjectOffset);
1439
1440 // With split SVE objects the ObjectOffset is relative to the split area
1441 // (i.e. the PPR area or ZPR area respectively).
1442 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1443 // If we're accessing an SVE vector with split SVE objects...
1444 // - From the FP we need to move down past the PPR area:
1445 FPOffset -= PPRStackSize;
1446 // - From the SP we only need to move up to the ZPR area:
1447 SPOffset -= PPRStackSize;
1448 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1449 // `SPOffset = ZPRStackSize + ...`.
1450 }
1451
1452 if (FPAfterSVECalleeSaves) {
1454 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1457 }
1458 }
1459
1460 // Always use the FP for SVE spills if available and beneficial.
1461 if (hasFP(MF) && (SPOffset.getFixed() ||
1462 FPOffset.getScalable() < SPOffset.getScalable() ||
1463 RegInfo->hasStackRealignment(MF))) {
1464 FrameReg = RegInfo->getFrameRegister(MF);
1465 return FPOffset;
1466 }
1467 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1468 : MCRegister(AArch64::SP);
1469
1470 return SPOffset;
1471 }
1472
1473 StackOffset SVEAreaOffset = {};
1474 if (FPAfterSVECalleeSaves) {
1475 // In this stack layout, the FP is in between the callee saves and other
1476 // SVE allocations.
1477 StackOffset SVECalleeSavedStack =
1479 if (UseFP) {
1480 if (isFixed)
1481 SVEAreaOffset = SVECalleeSavedStack;
1482 else if (!isCSR)
1483 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1484 } else {
1485 if (isFixed)
1486 SVEAreaOffset = SVEStackSize;
1487 else if (isCSR)
1488 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1489 }
1490 } else {
1491 if (UseFP && !(isFixed || isCSR))
1492 SVEAreaOffset = -SVEStackSize;
1493 if (!UseFP && (isFixed || isCSR))
1494 SVEAreaOffset = SVEStackSize;
1495 }
1496
1497 if (UseFP) {
1498 FrameReg = RegInfo->getFrameRegister(MF);
1499 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1500 }
1501
1502 // Use the base pointer if we have one.
1503 if (RegInfo->hasBasePointer(MF))
1504 FrameReg = RegInfo->getBaseRegister();
1505 else {
1506 assert(!MFI.hasVarSizedObjects() &&
1507 "Can't use SP when we have var sized objects.");
1508 FrameReg = AArch64::SP;
1509 // If we're using the red zone for this function, the SP won't actually
1510 // be adjusted, so the offsets will be negative. They're also all
1511 // within range of the signed 9-bit immediate instructions.
1512 if (canUseRedZone(MF))
1513 Offset -= AFI->getLocalStackSize();
1514 }
1515
1516 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1517}
1518
1519static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1520 // Do not set a kill flag on values that are also marked as live-in. This
1521 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1522 // callee saved registers.
1523 // Omitting the kill flags is conservatively correct even if the live-in
1524 // is not used after all.
1525 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1526 return getKillRegState(!IsLiveIn);
1527}
1528
1530 MachineFunction &MF) {
1531 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1532 AttributeList Attrs = MF.getFunction().getAttributes();
1534 return Subtarget.isTargetMachO() &&
1535 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1536 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1538 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1539}
1540
1541static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1542 bool NeedsWinCFI, bool IsFirst,
1543 const TargetRegisterInfo *TRI) {
1544 // If we are generating register pairs for a Windows function that requires
1545 // EH support, then pair consecutive registers only. There are no unwind
1546 // opcodes for saves/restores of non-consecutive register pairs.
1547 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1548 // save_lrpair.
1549 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1550
1551 if (Reg2 == AArch64::FP)
1552 return true;
1553 if (!NeedsWinCFI)
1554 return false;
1555 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1556 return false;
1557 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1558 // opcode. If this is the first register pair, it would end up with a
1559 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1560 // if LR is paired with something else than the first register.
1561 // The save_lrpair opcode requires the first register to be an odd one.
1562 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1563 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1564 return false;
1565 return true;
1566}
1567
1568/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1569/// WindowsCFI requires that only consecutive registers can be paired.
1570/// LR and FP need to be allocated together when the frame needs to save
1571/// the frame-record. This means any other register pairing with LR is invalid.
1572static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1573 bool UsesWinAAPCS, bool NeedsWinCFI,
1574 bool NeedsFrameRecord, bool IsFirst,
1575 const TargetRegisterInfo *TRI) {
1576 if (UsesWinAAPCS)
1577 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1578 TRI);
1579
1580 // If we need to store the frame record, don't pair any register
1581 // with LR other than FP.
1582 if (NeedsFrameRecord)
1583 return Reg2 == AArch64::LR;
1584
1585 return false;
1586}
1587
1588namespace {
1589
1590struct RegPairInfo {
1591 Register Reg1;
1592 Register Reg2;
1593 int FrameIdx;
1594 int Offset;
1595 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1596 const TargetRegisterClass *RC;
1597
1598 RegPairInfo() = default;
1599
1600 bool isPaired() const { return Reg2.isValid(); }
1601
1602 bool isScalable() const { return Type == PPR || Type == ZPR; }
1603};
1604
1605} // end anonymous namespace
1606
1608 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1609 if (SavedRegs.test(PReg)) {
1610 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1611 return MCRegister(PNReg);
1612 }
1613 }
1614 return MCRegister();
1615}
1616
1617// The multivector LD/ST are available only for SME or SVE2p1 targets
1619 MachineFunction &MF) {
1621 return false;
1622
1623 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1624 bool IsLocallyStreaming =
1625 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1626
1627 // Only when in streaming mode SME2 instructions can be safely used.
1628 // It is not safe to use SME2 instructions when in streaming compatible or
1629 // locally streaming mode.
1630 return Subtarget.hasSVE2p1() ||
1631 (Subtarget.hasSME2() &&
1632 (!IsLocallyStreaming && Subtarget.isStreaming()));
1633}
1634
1636 MachineFunction &MF,
1638 const TargetRegisterInfo *TRI,
1640 bool NeedsFrameRecord) {
1641
1642 if (CSI.empty())
1643 return;
1644
1645 bool IsWindows = isTargetWindows(MF);
1646 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1648 unsigned StackHazardSize = getStackHazardSize(MF);
1649 MachineFrameInfo &MFI = MF.getFrameInfo();
1651 unsigned Count = CSI.size();
1652 (void)CC;
1653 // MachO's compact unwind format relies on all registers being stored in
1654 // pairs.
1655 assert((!produceCompactUnwindFrame(AFL, MF) ||
1658 (Count & 1) == 0) &&
1659 "Odd number of callee-saved regs to spill!");
1660 int ByteOffset = AFI->getCalleeSavedStackSize();
1661 int StackFillDir = -1;
1662 int RegInc = 1;
1663 unsigned FirstReg = 0;
1664 if (NeedsWinCFI) {
1665 // For WinCFI, fill the stack from the bottom up.
1666 ByteOffset = 0;
1667 StackFillDir = 1;
1668 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1669 // backwards, to pair up registers starting from lower numbered registers.
1670 RegInc = -1;
1671 FirstReg = Count - 1;
1672 }
1673
1674 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1675
1676 int ZPRByteOffset = 0;
1677 int PPRByteOffset = 0;
1678 bool SplitPPRs = AFI->hasSplitSVEObjects();
1679 if (SplitPPRs) {
1680 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1681 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1682 } else if (!FPAfterSVECalleeSaves) {
1683 ZPRByteOffset =
1685 // Unused: Everything goes in ZPR space.
1686 PPRByteOffset = 0;
1687 }
1688
1689 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1690 Register LastReg = 0;
1691 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1692
1693 // When iterating backwards, the loop condition relies on unsigned wraparound.
1694 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1695 RegPairInfo RPI;
1696 RPI.Reg1 = CSI[i].getReg();
1697
1698 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1699 RPI.Type = RegPairInfo::GPR;
1700 RPI.RC = &AArch64::GPR64RegClass;
1701 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1702 RPI.Type = RegPairInfo::FPR64;
1703 RPI.RC = &AArch64::FPR64RegClass;
1704 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1705 RPI.Type = RegPairInfo::FPR128;
1706 RPI.RC = &AArch64::FPR128RegClass;
1707 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1708 RPI.Type = RegPairInfo::ZPR;
1709 RPI.RC = &AArch64::ZPRRegClass;
1710 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1711 RPI.Type = RegPairInfo::PPR;
1712 RPI.RC = &AArch64::PPRRegClass;
1713 } else if (RPI.Reg1 == AArch64::VG) {
1714 RPI.Type = RegPairInfo::VG;
1715 RPI.RC = &AArch64::FIXED_REGSRegClass;
1716 } else {
1717 llvm_unreachable("Unsupported register class.");
1718 }
1719
1720 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1721 ? PPRByteOffset
1722 : ZPRByteOffset;
1723
1724 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1725 if (HasCSHazardPadding &&
1726 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1728 ByteOffset += StackFillDir * StackHazardSize;
1729 LastReg = RPI.Reg1;
1730
1731 int Scale = TRI->getSpillSize(*RPI.RC);
1732 // Add the next reg to the pair if it is in the same register class.
1733 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1734 MCRegister NextReg = CSI[i + RegInc].getReg();
1735 bool IsFirst = i == FirstReg;
1736 switch (RPI.Type) {
1737 case RegPairInfo::GPR:
1738 if (AArch64::GPR64RegClass.contains(NextReg) &&
1739 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
1740 NeedsWinCFI, NeedsFrameRecord, IsFirst,
1741 TRI))
1742 RPI.Reg2 = NextReg;
1743 break;
1744 case RegPairInfo::FPR64:
1745 if (AArch64::FPR64RegClass.contains(NextReg) &&
1746 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
1747 IsFirst, TRI))
1748 RPI.Reg2 = NextReg;
1749 break;
1750 case RegPairInfo::FPR128:
1751 if (AArch64::FPR128RegClass.contains(NextReg))
1752 RPI.Reg2 = NextReg;
1753 break;
1754 case RegPairInfo::PPR:
1755 break;
1756 case RegPairInfo::ZPR:
1757 if (AFI->getPredicateRegForFillSpill() != 0 &&
1758 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1759 // Calculate offset of register pair to see if pair instruction can be
1760 // used.
1761 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1762 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1763 RPI.Reg2 = NextReg;
1764 }
1765 break;
1766 case RegPairInfo::VG:
1767 break;
1768 }
1769 }
1770
1771 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1772 // list to come in sorted by frame index so that we can issue the store
1773 // pair instructions directly. Assert if we see anything otherwise.
1774 //
1775 // The order of the registers in the list is controlled by
1776 // getCalleeSavedRegs(), so they will always be in-order, as well.
1777 assert((!RPI.isPaired() ||
1778 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1779 "Out of order callee saved regs!");
1780
1781 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1782 RPI.Reg1 == AArch64::LR) &&
1783 "FrameRecord must be allocated together with LR");
1784
1785 // Windows AAPCS has FP and LR reversed.
1786 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1787 RPI.Reg2 == AArch64::LR) &&
1788 "FrameRecord must be allocated together with LR");
1789
1790 // MachO's compact unwind format relies on all registers being stored in
1791 // adjacent register pairs.
1792 assert((!produceCompactUnwindFrame(AFL, MF) ||
1795 (RPI.isPaired() &&
1796 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1797 RPI.Reg1 + 1 == RPI.Reg2))) &&
1798 "Callee-save registers not saved as adjacent register pair!");
1799
1800 RPI.FrameIdx = CSI[i].getFrameIdx();
1801 if (NeedsWinCFI &&
1802 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1803 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1804
1805 // Realign the scalable offset if necessary. This is relevant when
1806 // spilling predicates on Windows.
1807 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1808 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1809 }
1810
1811 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1812 assert(OffsetPre % Scale == 0);
1813
1814 if (RPI.isScalable())
1815 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1816 else
1817 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1818
1819 // Swift's async context is directly before FP, so allocate an extra
1820 // 8 bytes for it.
1821 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1822 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1823 (IsWindows && RPI.Reg2 == AArch64::LR)))
1824 ByteOffset += StackFillDir * 8;
1825
1826 // Round up size of non-pair to pair size if we need to pad the
1827 // callee-save area to ensure 16-byte alignment.
1828 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1829 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1830 ByteOffset % 16 != 0) {
1831 ByteOffset += 8 * StackFillDir;
1832 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1833 // A stack frame with a gap looks like this, bottom up:
1834 // d9, d8. x21, gap, x20, x19.
1835 // Set extra alignment on the x21 object to create the gap above it.
1836 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1837 NeedGapToAlignStack = false;
1838 }
1839
1840 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1841 assert(OffsetPost % Scale == 0);
1842 // If filling top down (default), we want the offset after incrementing it.
1843 // If filling bottom up (WinCFI) we need the original offset.
1844 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1845
1846 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1847 // Swift context can directly precede FP.
1848 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1849 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1850 (IsWindows && RPI.Reg2 == AArch64::LR)))
1851 Offset += 8;
1852 RPI.Offset = Offset / Scale;
1853
1854 assert((!RPI.isPaired() ||
1855 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1856 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1857 "Offset out of bounds for LDP/STP immediate");
1858
1859 auto isFrameRecord = [&] {
1860 if (RPI.isPaired())
1861 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1862 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1863 // Otherwise, look for the frame record as two unpaired registers. This is
1864 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1865 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1866 // On Windows, this check works out as current reg == FP, next reg == LR,
1867 // and on other platforms current reg == FP, previous reg == LR. This
1868 // works out as the correct pre-increment or post-increment offsets
1869 // respectively.
1870 return i > 0 && RPI.Reg1 == AArch64::FP &&
1871 CSI[i - 1].getReg() == AArch64::LR;
1872 };
1873
1874 // Save the offset to frame record so that the FP register can point to the
1875 // innermost frame record (spilled FP and LR registers).
1876 if (NeedsFrameRecord && isFrameRecord())
1878
1879 RegPairs.push_back(RPI);
1880 if (RPI.isPaired())
1881 i += RegInc;
1882 }
1883 if (NeedsWinCFI) {
1884 // If we need an alignment gap in the stack, align the topmost stack
1885 // object. A stack frame with a gap looks like this, bottom up:
1886 // x19, d8. d9, gap.
1887 // Set extra alignment on the topmost stack object (the first element in
1888 // CSI, which goes top down), to create the gap above it.
1889 if (AFI->hasCalleeSaveStackFreeSpace())
1890 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1891 // We iterated bottom up over the registers; flip RegPairs back to top
1892 // down order.
1893 std::reverse(RegPairs.begin(), RegPairs.end());
1894 }
1895}
1896
1900 MachineFunction &MF = *MBB.getParent();
1901 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1903 bool NeedsWinCFI = needsWinCFI(MF);
1904 DebugLoc DL;
1906
1907 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1908
1910 // Refresh the reserved regs in case there are any potential changes since the
1911 // last freeze.
1912 MRI.freezeReservedRegs();
1913
1914 if (homogeneousPrologEpilog(MF)) {
1915 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1917
1918 for (auto &RPI : RegPairs) {
1919 MIB.addReg(RPI.Reg1);
1920 MIB.addReg(RPI.Reg2);
1921
1922 // Update register live in.
1923 if (!MRI.isReserved(RPI.Reg1))
1924 MBB.addLiveIn(RPI.Reg1);
1925 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1926 MBB.addLiveIn(RPI.Reg2);
1927 }
1928 return true;
1929 }
1930 bool PTrueCreated = false;
1931 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1932 Register Reg1 = RPI.Reg1;
1933 Register Reg2 = RPI.Reg2;
1934 unsigned StrOpc;
1935
1936 // Issue sequence of spills for cs regs. The first spill may be converted
1937 // to a pre-decrement store later by emitPrologue if the callee-save stack
1938 // area allocation can't be combined with the local stack area allocation.
1939 // For example:
1940 // stp x22, x21, [sp, #0] // addImm(+0)
1941 // stp x20, x19, [sp, #16] // addImm(+2)
1942 // stp fp, lr, [sp, #32] // addImm(+4)
1943 // Rationale: This sequence saves uop updates compared to a sequence of
1944 // pre-increment spills like stp xi,xj,[sp,#-16]!
1945 // Note: Similar rationale and sequence for restores in epilog.
1946 unsigned Size = TRI->getSpillSize(*RPI.RC);
1947 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1948 switch (RPI.Type) {
1949 case RegPairInfo::GPR:
1950 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1951 break;
1952 case RegPairInfo::FPR64:
1953 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1954 break;
1955 case RegPairInfo::FPR128:
1956 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
1957 break;
1958 case RegPairInfo::ZPR:
1959 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
1960 break;
1961 case RegPairInfo::PPR:
1962 StrOpc = AArch64::STR_PXI;
1963 break;
1964 case RegPairInfo::VG:
1965 StrOpc = AArch64::STRXui;
1966 break;
1967 }
1968
1969 Register X0Scratch;
1970 auto RestoreX0 = make_scope_exit([&] {
1971 if (X0Scratch != AArch64::NoRegister)
1972 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
1973 .addReg(X0Scratch)
1975 });
1976
1977 if (Reg1 == AArch64::VG) {
1978 // Find an available register to store value of VG to.
1979 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
1980 assert(Reg1 != AArch64::NoRegister);
1981 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
1982 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
1983 .addImm(31)
1984 .addImm(1)
1986 } else {
1988 if (any_of(MBB.liveins(),
1989 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
1990 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
1991 AArch64::X0, LiveIn.PhysReg);
1992 })) {
1993 X0Scratch = Reg1;
1994 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
1995 .addReg(AArch64::X0)
1997 }
1998
1999 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2000 const uint32_t *RegMask =
2001 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2002 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2003 .addExternalSymbol(TLI.getLibcallName(LC))
2004 .addRegMask(RegMask)
2005 .addReg(AArch64::X0, RegState::ImplicitDefine)
2007 Reg1 = AArch64::X0;
2008 }
2009 }
2010
2011 LLVM_DEBUG({
2012 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2013 if (RPI.isPaired())
2014 dbgs() << ", " << printReg(Reg2, TRI);
2015 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2016 if (RPI.isPaired())
2017 dbgs() << ", " << RPI.FrameIdx + 1;
2018 dbgs() << ")\n";
2019 });
2020
2021 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2022 "Windows unwdinding requires a consecutive (FP,LR) pair");
2023 // Windows unwind codes require consecutive registers if registers are
2024 // paired. Make the switch here, so that the code below will save (x,x+1)
2025 // and not (x+1,x).
2026 unsigned FrameIdxReg1 = RPI.FrameIdx;
2027 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2028 if (NeedsWinCFI && RPI.isPaired()) {
2029 std::swap(Reg1, Reg2);
2030 std::swap(FrameIdxReg1, FrameIdxReg2);
2031 }
2032
2033 if (RPI.isPaired() && RPI.isScalable()) {
2034 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2037 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2038 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2039 "Expects SVE2.1 or SME2 target and a predicate register");
2040#ifdef EXPENSIVE_CHECKS
2041 auto IsPPR = [](const RegPairInfo &c) {
2042 return c.Reg1 == RegPairInfo::PPR;
2043 };
2044 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2045 auto IsZPR = [](const RegPairInfo &c) {
2046 return c.Type == RegPairInfo::ZPR;
2047 };
2048 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2049 assert(!(PPRBegin < ZPRBegin) &&
2050 "Expected callee save predicate to be handled first");
2051#endif
2052 if (!PTrueCreated) {
2053 PTrueCreated = true;
2054 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2056 }
2057 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2058 if (!MRI.isReserved(Reg1))
2059 MBB.addLiveIn(Reg1);
2060 if (!MRI.isReserved(Reg2))
2061 MBB.addLiveIn(Reg2);
2062 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2064 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2065 MachineMemOperand::MOStore, Size, Alignment));
2066 MIB.addReg(PnReg);
2067 MIB.addReg(AArch64::SP)
2068 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2069 // where 2*vscale is implicit
2072 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2073 MachineMemOperand::MOStore, Size, Alignment));
2074 if (NeedsWinCFI)
2075 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2076 } else { // The code when the pair of ZReg is not present
2077 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2078 if (!MRI.isReserved(Reg1))
2079 MBB.addLiveIn(Reg1);
2080 if (RPI.isPaired()) {
2081 if (!MRI.isReserved(Reg2))
2082 MBB.addLiveIn(Reg2);
2083 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2085 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2086 MachineMemOperand::MOStore, Size, Alignment));
2087 }
2088 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2089 .addReg(AArch64::SP)
2090 .addImm(RPI.Offset) // [sp, #offset*vscale],
2091 // where factor*vscale is implicit
2094 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2095 MachineMemOperand::MOStore, Size, Alignment));
2096 if (NeedsWinCFI)
2097 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2098 }
2099 // Update the StackIDs of the SVE stack slots.
2100 MachineFrameInfo &MFI = MF.getFrameInfo();
2101 if (RPI.Type == RegPairInfo::ZPR) {
2102 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2103 if (RPI.isPaired())
2104 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2105 } else if (RPI.Type == RegPairInfo::PPR) {
2107 if (RPI.isPaired())
2109 }
2110 }
2111 return true;
2112}
2113
2117 MachineFunction &MF = *MBB.getParent();
2119 DebugLoc DL;
2121 bool NeedsWinCFI = needsWinCFI(MF);
2122
2123 if (MBBI != MBB.end())
2124 DL = MBBI->getDebugLoc();
2125
2126 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2127 if (homogeneousPrologEpilog(MF, &MBB)) {
2128 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2130 for (auto &RPI : RegPairs) {
2131 MIB.addReg(RPI.Reg1, RegState::Define);
2132 MIB.addReg(RPI.Reg2, RegState::Define);
2133 }
2134 return true;
2135 }
2136
2137 // For performance reasons restore SVE register in increasing order
2138 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2139 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2140 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2141 std::reverse(PPRBegin, PPREnd);
2142 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2143 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2144 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2145 std::reverse(ZPRBegin, ZPREnd);
2146
2147 bool PTrueCreated = false;
2148 for (const RegPairInfo &RPI : RegPairs) {
2149 Register Reg1 = RPI.Reg1;
2150 Register Reg2 = RPI.Reg2;
2151
2152 // Issue sequence of restores for cs regs. The last restore may be converted
2153 // to a post-increment load later by emitEpilogue if the callee-save stack
2154 // area allocation can't be combined with the local stack area allocation.
2155 // For example:
2156 // ldp fp, lr, [sp, #32] // addImm(+4)
2157 // ldp x20, x19, [sp, #16] // addImm(+2)
2158 // ldp x22, x21, [sp, #0] // addImm(+0)
2159 // Note: see comment in spillCalleeSavedRegisters()
2160 unsigned LdrOpc;
2161 unsigned Size = TRI->getSpillSize(*RPI.RC);
2162 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2163 switch (RPI.Type) {
2164 case RegPairInfo::GPR:
2165 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2166 break;
2167 case RegPairInfo::FPR64:
2168 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2169 break;
2170 case RegPairInfo::FPR128:
2171 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2172 break;
2173 case RegPairInfo::ZPR:
2174 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2175 break;
2176 case RegPairInfo::PPR:
2177 LdrOpc = AArch64::LDR_PXI;
2178 break;
2179 case RegPairInfo::VG:
2180 continue;
2181 }
2182 LLVM_DEBUG({
2183 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2184 if (RPI.isPaired())
2185 dbgs() << ", " << printReg(Reg2, TRI);
2186 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2187 if (RPI.isPaired())
2188 dbgs() << ", " << RPI.FrameIdx + 1;
2189 dbgs() << ")\n";
2190 });
2191
2192 // Windows unwind codes require consecutive registers if registers are
2193 // paired. Make the switch here, so that the code below will save (x,x+1)
2194 // and not (x+1,x).
2195 unsigned FrameIdxReg1 = RPI.FrameIdx;
2196 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2197 if (NeedsWinCFI && RPI.isPaired()) {
2198 std::swap(Reg1, Reg2);
2199 std::swap(FrameIdxReg1, FrameIdxReg2);
2200 }
2201
2203 if (RPI.isPaired() && RPI.isScalable()) {
2204 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2206 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2207 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2208 "Expects SVE2.1 or SME2 target and a predicate register");
2209#ifdef EXPENSIVE_CHECKS
2210 assert(!(PPRBegin < ZPRBegin) &&
2211 "Expected callee save predicate to be handled first");
2212#endif
2213 if (!PTrueCreated) {
2214 PTrueCreated = true;
2215 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2217 }
2218 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2219 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2220 getDefRegState(true));
2222 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2223 MachineMemOperand::MOLoad, Size, Alignment));
2224 MIB.addReg(PnReg);
2225 MIB.addReg(AArch64::SP)
2226 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2227 // where 2*vscale is implicit
2230 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2231 MachineMemOperand::MOLoad, Size, Alignment));
2232 if (NeedsWinCFI)
2233 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2234 } else {
2235 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2236 if (RPI.isPaired()) {
2237 MIB.addReg(Reg2, getDefRegState(true));
2239 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2240 MachineMemOperand::MOLoad, Size, Alignment));
2241 }
2242 MIB.addReg(Reg1, getDefRegState(true));
2243 MIB.addReg(AArch64::SP)
2244 .addImm(RPI.Offset) // [sp, #offset*vscale]
2245 // where factor*vscale is implicit
2248 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2249 MachineMemOperand::MOLoad, Size, Alignment));
2250 if (NeedsWinCFI)
2251 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2252 }
2253 }
2254 return true;
2255}
2256
2257// Return the FrameID for a MMO.
2258static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2259 const MachineFrameInfo &MFI) {
2260 auto *PSV =
2262 if (PSV)
2263 return std::optional<int>(PSV->getFrameIndex());
2264
2265 if (MMO->getValue()) {
2266 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2267 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2268 FI++)
2269 if (MFI.getObjectAllocation(FI) == Al)
2270 return FI;
2271 }
2272 }
2273
2274 return std::nullopt;
2275}
2276
2277// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2278static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2279 const MachineFrameInfo &MFI) {
2280 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2281 return std::nullopt;
2282
2283 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2284}
2285
2286// Returns true if the LDST MachineInstr \p MI is a PPR access.
2287static bool isPPRAccess(const MachineInstr &MI) {
2288 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2289}
2290
2291// Check if a Hazard slot is needed for the current function, and if so create
2292// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2293// which can be used to determine if any hazard padding is needed.
2294void AArch64FrameLowering::determineStackHazardSlot(
2295 MachineFunction &MF, BitVector &SavedRegs) const {
2296 unsigned StackHazardSize = getStackHazardSize(MF);
2297 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2298 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2300 return;
2301
2302 // Stack hazards are only needed in streaming functions.
2303 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2304 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2305 return;
2306
2307 MachineFrameInfo &MFI = MF.getFrameInfo();
2308
2309 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2310 // stack objects.
2311 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2312 return AArch64::FPR64RegClass.contains(Reg) ||
2313 AArch64::FPR128RegClass.contains(Reg) ||
2314 AArch64::ZPRRegClass.contains(Reg);
2315 });
2316 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2317 return AArch64::PPRRegClass.contains(Reg);
2318 });
2319 bool HasFPRStackObjects = false;
2320 bool HasPPRStackObjects = false;
2321 if (!HasFPRCSRs || SplitSVEObjects) {
2322 enum SlotType : uint8_t {
2323 Unknown = 0,
2324 ZPRorFPR = 1 << 0,
2325 PPR = 1 << 1,
2326 GPR = 1 << 2,
2328 };
2329
2330 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2331 // based on the kinds of accesses used in the function.
2332 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2333 for (auto &MBB : MF) {
2334 for (auto &MI : MBB) {
2335 std::optional<int> FI = getLdStFrameID(MI, MFI);
2336 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2337 continue;
2338 if (MFI.hasScalableStackID(*FI)) {
2339 SlotTypes[*FI] |=
2340 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2341 } else {
2342 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2343 ? SlotType::ZPRorFPR
2344 : SlotType::GPR;
2345 }
2346 }
2347 }
2348
2349 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2350 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2351 // For SplitSVEObjects remember that this stack slot is a predicate, this
2352 // will be needed later when determining the frame layout.
2353 if (SlotTypes[FI] == SlotType::PPR) {
2355 HasPPRStackObjects = true;
2356 }
2357 }
2358 }
2359
2360 if (HasFPRCSRs || HasFPRStackObjects) {
2361 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2362 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2363 << StackHazardSize << "\n");
2365 }
2366
2367 // Determine if we should use SplitSVEObjects. This should only be used if
2368 // there's a possibility of a stack hazard between PPRs and ZPRs or FPRs.
2369 if (SplitSVEObjects) {
2370 if (!HasPPRCSRs && !HasPPRStackObjects) {
2371 LLVM_DEBUG(
2372 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2373 return;
2374 }
2375
2376 if (!HasFPRCSRs && !HasFPRStackObjects) {
2377 LLVM_DEBUG(
2378 dbgs()
2379 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2380 return;
2381 }
2382
2383 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2384 if (MFI.hasVarSizedObjects() || TRI->hasStackRealignment(MF)) {
2385 LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with variable "
2386 "sized objects or realignment\n");
2387 return;
2388 }
2389
2390 // If another calling convention is explicitly set FPRs can't be promoted to
2391 // ZPR callee-saves.
2394 MF.getFunction().getCallingConv())) {
2395 LLVM_DEBUG(
2396 dbgs() << "Calling convention is not supported with SplitSVEObjects");
2397 return;
2398 }
2399
2400 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2401 MF.getSubtarget<AArch64Subtarget>();
2403 "Expected SVE to be available for PPRs");
2404
2405 // With SplitSVEObjects the CS hazard padding is placed between the
2406 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2407 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2408 BitVector FPRZRegs(SavedRegs.size());
2409 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2410 BitVector::reference RegBit = SavedRegs[Reg];
2411 if (!RegBit)
2412 continue;
2413 unsigned SubRegIdx = 0;
2414 if (AArch64::FPR64RegClass.contains(Reg))
2415 SubRegIdx = AArch64::dsub;
2416 else if (AArch64::FPR128RegClass.contains(Reg))
2417 SubRegIdx = AArch64::zsub;
2418 else
2419 continue;
2420 // Clear the bit for the FPR save.
2421 RegBit = false;
2422 // Mark that we should save the corresponding ZPR.
2423 Register ZReg =
2424 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2425 FPRZRegs.set(ZReg);
2426 }
2427 SavedRegs |= FPRZRegs;
2428
2429 AFI->setSplitSVEObjects(true);
2430 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2431 }
2432}
2433
2435 BitVector &SavedRegs,
2436 RegScavenger *RS) const {
2437 // All calls are tail calls in GHC calling conv, and functions have no
2438 // prologue/epilogue.
2440 return;
2441
2442 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2443
2445 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2447 unsigned UnspilledCSGPR = AArch64::NoRegister;
2448 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2449
2450 MachineFrameInfo &MFI = MF.getFrameInfo();
2451 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2452
2453 MCRegister BasePointerReg =
2454 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2455
2456 unsigned ExtraCSSpill = 0;
2457 bool HasUnpairedGPR64 = false;
2458 bool HasPairZReg = false;
2459 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2460 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2461
2462 // Figure out which callee-saved registers to save/restore.
2463 for (unsigned i = 0; CSRegs[i]; ++i) {
2464 const MCRegister Reg = CSRegs[i];
2465
2466 // Add the base pointer register to SavedRegs if it is callee-save.
2467 if (Reg == BasePointerReg)
2468 SavedRegs.set(Reg);
2469
2470 // Don't save manually reserved registers set through +reserve-x#i,
2471 // even for callee-saved registers, as per GCC's behavior.
2472 if (UserReservedRegs[Reg]) {
2473 SavedRegs.reset(Reg);
2474 continue;
2475 }
2476
2477 bool RegUsed = SavedRegs.test(Reg);
2478 MCRegister PairedReg;
2479 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2480 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2481 AArch64::FPR128RegClass.contains(Reg)) {
2482 // Compensate for odd numbers of GP CSRs.
2483 // For now, all the known cases of odd number of CSRs are of GPRs.
2484 if (HasUnpairedGPR64)
2485 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2486 else
2487 PairedReg = CSRegs[i ^ 1];
2488 }
2489
2490 // If the function requires all the GP registers to save (SavedRegs),
2491 // and there are an odd number of GP CSRs at the same time (CSRegs),
2492 // PairedReg could be in a different register class from Reg, which would
2493 // lead to a FPR (usually D8) accidentally being marked saved.
2494 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2495 PairedReg = AArch64::NoRegister;
2496 HasUnpairedGPR64 = true;
2497 }
2498 assert(PairedReg == AArch64::NoRegister ||
2499 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2500 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2501 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2502
2503 if (!RegUsed) {
2504 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2505 UnspilledCSGPR = Reg;
2506 UnspilledCSGPRPaired = PairedReg;
2507 }
2508 continue;
2509 }
2510
2511 // MachO's compact unwind format relies on all registers being stored in
2512 // pairs.
2513 // FIXME: the usual format is actually better if unwinding isn't needed.
2514 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2515 !SavedRegs.test(PairedReg)) {
2516 SavedRegs.set(PairedReg);
2517 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2518 !ReservedRegs[PairedReg])
2519 ExtraCSSpill = PairedReg;
2520 }
2521 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2522 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2523 SavedRegs.test(CSRegs[i ^ 1]));
2524 }
2525
2526 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2528 // Find a suitable predicate register for the multi-vector spill/fill
2529 // instructions.
2530 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2531 if (PnReg.isValid())
2532 AFI->setPredicateRegForFillSpill(PnReg);
2533 // If no free callee-save has been found assign one.
2534 if (!AFI->getPredicateRegForFillSpill() &&
2535 MF.getFunction().getCallingConv() ==
2537 SavedRegs.set(AArch64::P8);
2538 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2539 }
2540
2541 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2542 "Predicate cannot be a reserved register");
2543 }
2544
2546 !Subtarget.isTargetWindows()) {
2547 // For Windows calling convention on a non-windows OS, where X18 is treated
2548 // as reserved, back up X18 when entering non-windows code (marked with the
2549 // Windows calling convention) and restore when returning regardless of
2550 // whether the individual function uses it - it might call other functions
2551 // that clobber it.
2552 SavedRegs.set(AArch64::X18);
2553 }
2554
2555 // Determine if a Hazard slot should be used and where it should go.
2556 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2557 // and ZPRs. Otherwise, it goes in the callee save area.
2558 determineStackHazardSlot(MF, SavedRegs);
2559
2560 // Calculates the callee saved stack size.
2561 unsigned CSStackSize = 0;
2562 unsigned ZPRCSStackSize = 0;
2563 unsigned PPRCSStackSize = 0;
2565 for (unsigned Reg : SavedRegs.set_bits()) {
2566 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2567 assert(RC && "expected register class!");
2568 auto SpillSize = TRI->getSpillSize(*RC);
2569 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2570 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2571 if (IsZPR)
2572 ZPRCSStackSize += SpillSize;
2573 else if (IsPPR)
2574 PPRCSStackSize += SpillSize;
2575 else
2576 CSStackSize += SpillSize;
2577 }
2578
2579 // Save number of saved regs, so we can easily update CSStackSize later to
2580 // account for any additional 64-bit GPR saves. Note: After this point
2581 // only 64-bit GPRs can be added to SavedRegs.
2582 unsigned NumSavedRegs = SavedRegs.count();
2583
2584 // If we have hazard padding in the CS area add that to the size.
2586 CSStackSize += getStackHazardSize(MF);
2587
2588 // Increase the callee-saved stack size if the function has streaming mode
2589 // changes, as we will need to spill the value of the VG register.
2590 if (requiresSaveVG(MF))
2591 CSStackSize += 8;
2592
2593 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2594 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2595 SavedRegs.set(AArch64::LR);
2596
2597 // The frame record needs to be created by saving the appropriate registers
2598 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2599 if (hasFP(MF) ||
2600 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2601 SavedRegs.set(AArch64::FP);
2602 SavedRegs.set(AArch64::LR);
2603 }
2604
2605 LLVM_DEBUG({
2606 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2607 for (unsigned Reg : SavedRegs.set_bits())
2608 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2609 dbgs() << "\n";
2610 });
2611
2612 // If any callee-saved registers are used, the frame cannot be eliminated.
2613 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2615 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2616 uint64_t SVEStackSize =
2617 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2618 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2619
2620 // The CSR spill slots have not been allocated yet, so estimateStackSize
2621 // won't include them.
2622 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2623
2624 // We may address some of the stack above the canonical frame address, either
2625 // for our own arguments or during a call. Include that in calculating whether
2626 // we have complicated addressing concerns.
2627 int64_t CalleeStackUsed = 0;
2628 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2629 int64_t FixedOff = MFI.getObjectOffset(I);
2630 if (FixedOff > CalleeStackUsed)
2631 CalleeStackUsed = FixedOff;
2632 }
2633
2634 // Conservatively always assume BigStack when there are SVE spills.
2635 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2636 CalleeStackUsed) > EstimatedStackSizeLimit;
2637 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2638 AFI->setHasStackFrame(true);
2639
2640 // Estimate if we might need to scavenge a register at some point in order
2641 // to materialize a stack offset. If so, either spill one additional
2642 // callee-saved register or reserve a special spill slot to facilitate
2643 // register scavenging. If we already spilled an extra callee-saved register
2644 // above to keep the number of spills even, we don't need to do anything else
2645 // here.
2646 if (BigStack) {
2647 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2648 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2649 << " to get a scratch register.\n");
2650 SavedRegs.set(UnspilledCSGPR);
2651 ExtraCSSpill = UnspilledCSGPR;
2652
2653 // MachO's compact unwind format relies on all registers being stored in
2654 // pairs, so if we need to spill one extra for BigStack, then we need to
2655 // store the pair.
2656 if (producePairRegisters(MF)) {
2657 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2658 // Failed to make a pair for compact unwind format, revert spilling.
2659 if (produceCompactUnwindFrame(*this, MF)) {
2660 SavedRegs.reset(UnspilledCSGPR);
2661 ExtraCSSpill = AArch64::NoRegister;
2662 }
2663 } else
2664 SavedRegs.set(UnspilledCSGPRPaired);
2665 }
2666 }
2667
2668 // If we didn't find an extra callee-saved register to spill, create
2669 // an emergency spill slot.
2670 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2672 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2673 unsigned Size = TRI->getSpillSize(RC);
2674 Align Alignment = TRI->getSpillAlign(RC);
2675 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2676 RS->addScavengingFrameIndex(FI);
2677 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2678 << " as the emergency spill slot.\n");
2679 }
2680 }
2681
2682 // Adding the size of additional 64bit GPR saves.
2683 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2684
2685 // A Swift asynchronous context extends the frame record with a pointer
2686 // directly before FP.
2687 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2688 CSStackSize += 8;
2689
2690 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2691 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2692 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2693
2695 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2696 "Should not invalidate callee saved info");
2697
2698 // Round up to register pair alignment to avoid additional SP adjustment
2699 // instructions.
2700 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2701 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2702 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2703}
2704
2706 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2707 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2708 unsigned &MaxCSFrameIndex) const {
2709 bool NeedsWinCFI = needsWinCFI(MF);
2710 unsigned StackHazardSize = getStackHazardSize(MF);
2711 // To match the canonical windows frame layout, reverse the list of
2712 // callee saved registers to get them laid out by PrologEpilogInserter
2713 // in the right order. (PrologEpilogInserter allocates stack objects top
2714 // down. Windows canonical prologs store higher numbered registers at
2715 // the top, thus have the CSI array start from the highest registers.)
2716 if (NeedsWinCFI)
2717 std::reverse(CSI.begin(), CSI.end());
2718
2719 if (CSI.empty())
2720 return true; // Early exit if no callee saved registers are modified!
2721
2722 // Now that we know which registers need to be saved and restored, allocate
2723 // stack slots for them.
2724 MachineFrameInfo &MFI = MF.getFrameInfo();
2725 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2726
2727 bool UsesWinAAPCS = isTargetWindows(MF);
2728 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2729 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2730 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2731 if ((unsigned)FrameIdx < MinCSFrameIndex)
2732 MinCSFrameIndex = FrameIdx;
2733 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2734 MaxCSFrameIndex = FrameIdx;
2735 }
2736
2737 // Insert VG into the list of CSRs, immediately before LR if saved.
2738 if (requiresSaveVG(MF)) {
2739 CalleeSavedInfo VGInfo(AArch64::VG);
2740 auto It =
2741 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2742 if (It != CSI.end())
2743 CSI.insert(It, VGInfo);
2744 else
2745 CSI.push_back(VGInfo);
2746 }
2747
2748 Register LastReg = 0;
2749 int HazardSlotIndex = std::numeric_limits<int>::max();
2750 for (auto &CS : CSI) {
2751 MCRegister Reg = CS.getReg();
2752 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2753
2754 // Create a hazard slot as we switch between GPR and FPR CSRs.
2756 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2758 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2759 "Unexpected register order for hazard slot");
2760 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2761 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2762 << "\n");
2763 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2764 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2765 MinCSFrameIndex = HazardSlotIndex;
2766 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2767 MaxCSFrameIndex = HazardSlotIndex;
2768 }
2769
2770 unsigned Size = RegInfo->getSpillSize(*RC);
2771 Align Alignment(RegInfo->getSpillAlign(*RC));
2772 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2773 CS.setFrameIdx(FrameIdx);
2774
2775 if ((unsigned)FrameIdx < MinCSFrameIndex)
2776 MinCSFrameIndex = FrameIdx;
2777 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2778 MaxCSFrameIndex = FrameIdx;
2779
2780 // Grab 8 bytes below FP for the extended asynchronous frame info.
2781 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2782 Reg == AArch64::FP) {
2783 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2784 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2785 if ((unsigned)FrameIdx < MinCSFrameIndex)
2786 MinCSFrameIndex = FrameIdx;
2787 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2788 MaxCSFrameIndex = FrameIdx;
2789 }
2790 LastReg = Reg;
2791 }
2792
2793 // Add hazard slot in the case where no FPR CSRs are present.
2795 HazardSlotIndex == std::numeric_limits<int>::max()) {
2796 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2797 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2798 << "\n");
2799 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2800 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2801 MinCSFrameIndex = HazardSlotIndex;
2802 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2803 MaxCSFrameIndex = HazardSlotIndex;
2804 }
2805
2806 return true;
2807}
2808
2810 const MachineFunction &MF) const {
2812 // If the function has streaming-mode changes, don't scavenge a
2813 // spillslot in the callee-save area, as that might require an
2814 // 'addvl' in the streaming-mode-changing call-sequence when the
2815 // function doesn't use a FP.
2816 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2817 return false;
2818 // Don't allow register salvaging with hazard slots, in case it moves objects
2819 // into the wrong place.
2820 if (AFI->hasStackHazardSlotIndex())
2821 return false;
2822 return AFI->hasCalleeSaveStackFreeSpace();
2823}
2824
2825/// returns true if there are any SVE callee saves.
2827 int &Min, int &Max) {
2828 Min = std::numeric_limits<int>::max();
2829 Max = std::numeric_limits<int>::min();
2830
2831 if (!MFI.isCalleeSavedInfoValid())
2832 return false;
2833
2834 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2835 for (auto &CS : CSI) {
2836 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2837 AArch64::PPRRegClass.contains(CS.getReg())) {
2838 assert((Max == std::numeric_limits<int>::min() ||
2839 Max + 1 == CS.getFrameIdx()) &&
2840 "SVE CalleeSaves are not consecutive");
2841 Min = std::min(Min, CS.getFrameIdx());
2842 Max = std::max(Max, CS.getFrameIdx());
2843 }
2844 }
2845 return Min != std::numeric_limits<int>::max();
2846}
2847
2849 AssignObjectOffsets AssignOffsets) {
2850 MachineFrameInfo &MFI = MF.getFrameInfo();
2851 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2852
2853 SVEStackSizes SVEStack{};
2854
2855 // With SplitSVEObjects we maintain separate stack offsets for predicates
2856 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2857 // are included in the SVE vector area.
2858 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2859 uint64_t &PPRStackTop =
2860 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2861
2862#ifndef NDEBUG
2863 // First process all fixed stack objects.
2864 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2865 assert(!MFI.hasScalableStackID(I) &&
2866 "SVE vectors should never be passed on the stack by value, only by "
2867 "reference.");
2868#endif
2869
2870 auto AllocateObject = [&](int FI) {
2872 ? ZPRStackTop
2873 : PPRStackTop;
2874
2875 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2876 // two, we'd need to align every object dynamically at runtime if the
2877 // alignment is larger than 16. This is not yet supported.
2878 Align Alignment = MFI.getObjectAlign(FI);
2879 if (Alignment > Align(16))
2881 "Alignment of scalable vectors > 16 bytes is not yet supported");
2882
2883 StackTop += MFI.getObjectSize(FI);
2884 StackTop = alignTo(StackTop, Alignment);
2885
2886 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2887 "SVE StackTop far too large?!");
2888
2889 int64_t Offset = -int64_t(StackTop);
2890 if (AssignOffsets == AssignObjectOffsets::Yes)
2891 MFI.setObjectOffset(FI, Offset);
2892
2893 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2894 };
2895
2896 // Then process all callee saved slots.
2897 int MinCSFrameIndex, MaxCSFrameIndex;
2898 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2899 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2900 AllocateObject(FI);
2901 }
2902
2903 // Ensure the CS area is 16-byte aligned.
2904 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2905 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2906
2907 // Create a buffer of SVE objects to allocate and sort it.
2908 SmallVector<int, 8> ObjectsToAllocate;
2909 // If we have a stack protector, and we've previously decided that we have SVE
2910 // objects on the stack and thus need it to go in the SVE stack area, then it
2911 // needs to go first.
2912 int StackProtectorFI = -1;
2913 if (MFI.hasStackProtectorIndex()) {
2914 StackProtectorFI = MFI.getStackProtectorIndex();
2915 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2916 ObjectsToAllocate.push_back(StackProtectorFI);
2917 }
2918
2919 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2920 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2921 continue;
2922 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2923 continue;
2924
2927 continue;
2928
2929 ObjectsToAllocate.push_back(FI);
2930 }
2931
2932 // Allocate all SVE locals and spills
2933 for (unsigned FI : ObjectsToAllocate)
2934 AllocateObject(FI);
2935
2936 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2937 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2938
2939 if (AssignOffsets == AssignObjectOffsets::Yes)
2940 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2941
2942 return SVEStack;
2943}
2944
2946 MachineFunction &MF, RegScavenger *RS) const {
2948 "Upwards growing stack unsupported");
2949
2951
2952 // If this function isn't doing Win64-style C++ EH, we don't need to do
2953 // anything.
2954 if (!MF.hasEHFunclets())
2955 return;
2956
2957 MachineFrameInfo &MFI = MF.getFrameInfo();
2958 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2959
2960 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
2961 // object area right next to the UnwindHelp object.
2962 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
2963 int64_t CurrentOffset =
2965 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
2966 for (WinEHHandlerType &H : TBME.HandlerArray) {
2967 int FrameIndex = H.CatchObj.FrameIndex;
2968 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
2969 CurrentOffset =
2970 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
2971 CurrentOffset += MFI.getObjectSize(FrameIndex);
2972 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
2973 }
2974 }
2975 }
2976
2977 // Create an UnwindHelp object.
2978 // The UnwindHelp object is allocated at the start of the fixed object area
2979 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
2980 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
2981 /*IsFunclet*/ false) &&
2982 "UnwindHelpOffset must be at the start of the fixed object area");
2983 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
2984 /*IsImmutable=*/false);
2985 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
2986
2987 MachineBasicBlock &MBB = MF.front();
2988 auto MBBI = MBB.begin();
2989 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
2990 ++MBBI;
2991
2992 // We need to store -2 into the UnwindHelp object at the start of the
2993 // function.
2994 DebugLoc DL;
2995 RS->enterBasicBlockEnd(MBB);
2996 RS->backward(MBBI);
2997 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
2998 assert(DstReg && "There must be a free register after frame setup");
3000 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3001 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3002 .addReg(DstReg, getKillRegState(true))
3003 .addFrameIndex(UnwindHelpFI)
3004 .addImm(0);
3005}
3006
3007namespace {
3008struct TagStoreInstr {
3010 int64_t Offset, Size;
3011 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3012 : MI(MI), Offset(Offset), Size(Size) {}
3013};
3014
3015class TagStoreEdit {
3016 MachineFunction *MF;
3017 MachineBasicBlock *MBB;
3018 MachineRegisterInfo *MRI;
3019 // Tag store instructions that are being replaced.
3021 // Combined memref arguments of the above instructions.
3023
3024 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3025 // FrameRegOffset + Size) with the address tag of SP.
3026 Register FrameReg;
3027 StackOffset FrameRegOffset;
3028 int64_t Size;
3029 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3030 // end.
3031 std::optional<int64_t> FrameRegUpdate;
3032 // MIFlags for any FrameReg updating instructions.
3033 unsigned FrameRegUpdateFlags;
3034
3035 // Use zeroing instruction variants.
3036 bool ZeroData;
3037 DebugLoc DL;
3038
3039 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3040 void emitLoop(MachineBasicBlock::iterator InsertI);
3041
3042public:
3043 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3044 : MBB(MBB), ZeroData(ZeroData) {
3045 MF = MBB->getParent();
3046 MRI = &MF->getRegInfo();
3047 }
3048 // Add an instruction to be replaced. Instructions must be added in the
3049 // ascending order of Offset, and have to be adjacent.
3050 void addInstruction(TagStoreInstr I) {
3051 assert((TagStores.empty() ||
3052 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3053 "Non-adjacent tag store instructions.");
3054 TagStores.push_back(I);
3055 }
3056 void clear() { TagStores.clear(); }
3057 // Emit equivalent code at the given location, and erase the current set of
3058 // instructions. May skip if the replacement is not profitable. May invalidate
3059 // the input iterator and replace it with a valid one.
3060 void emitCode(MachineBasicBlock::iterator &InsertI,
3061 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3062};
3063
3064void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3065 const AArch64InstrInfo *TII =
3066 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3067
3068 const int64_t kMinOffset = -256 * 16;
3069 const int64_t kMaxOffset = 255 * 16;
3070
3071 Register BaseReg = FrameReg;
3072 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3073 if (BaseRegOffsetBytes < kMinOffset ||
3074 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3075 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3076 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3077 // is required for the offset of ST2G.
3078 BaseRegOffsetBytes % 16 != 0) {
3079 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3080 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3081 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3082 BaseReg = ScratchReg;
3083 BaseRegOffsetBytes = 0;
3084 }
3085
3086 MachineInstr *LastI = nullptr;
3087 while (Size) {
3088 int64_t InstrSize = (Size > 16) ? 32 : 16;
3089 unsigned Opcode =
3090 InstrSize == 16
3091 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3092 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3093 assert(BaseRegOffsetBytes % 16 == 0);
3094 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3095 .addReg(AArch64::SP)
3096 .addReg(BaseReg)
3097 .addImm(BaseRegOffsetBytes / 16)
3098 .setMemRefs(CombinedMemRefs);
3099 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3100 // final SP adjustment in the epilogue.
3101 if (BaseRegOffsetBytes == 0)
3102 LastI = I;
3103 BaseRegOffsetBytes += InstrSize;
3104 Size -= InstrSize;
3105 }
3106
3107 if (LastI)
3108 MBB->splice(InsertI, MBB, LastI);
3109}
3110
3111void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3112 const AArch64InstrInfo *TII =
3113 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3114
3115 Register BaseReg = FrameRegUpdate
3116 ? FrameReg
3117 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3118 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3119
3120 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3121
3122 int64_t LoopSize = Size;
3123 // If the loop size is not a multiple of 32, split off one 16-byte store at
3124 // the end to fold BaseReg update into.
3125 if (FrameRegUpdate && *FrameRegUpdate)
3126 LoopSize -= LoopSize % 32;
3127 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3128 TII->get(ZeroData ? AArch64::STZGloop_wback
3129 : AArch64::STGloop_wback))
3130 .addDef(SizeReg)
3131 .addDef(BaseReg)
3132 .addImm(LoopSize)
3133 .addReg(BaseReg)
3134 .setMemRefs(CombinedMemRefs);
3135 if (FrameRegUpdate)
3136 LoopI->setFlags(FrameRegUpdateFlags);
3137
3138 int64_t ExtraBaseRegUpdate =
3139 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3140 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3141 << ", Size=" << Size
3142 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3143 << ", FrameRegUpdate=" << FrameRegUpdate
3144 << ", FrameRegOffset.getFixed()="
3145 << FrameRegOffset.getFixed() << "\n");
3146 if (LoopSize < Size) {
3147 assert(FrameRegUpdate);
3148 assert(Size - LoopSize == 16);
3149 // Tag 16 more bytes at BaseReg and update BaseReg.
3150 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3151 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3152 "STG immediate out of range");
3153 BuildMI(*MBB, InsertI, DL,
3154 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3155 .addDef(BaseReg)
3156 .addReg(BaseReg)
3157 .addReg(BaseReg)
3158 .addImm(STGOffset / 16)
3159 .setMemRefs(CombinedMemRefs)
3160 .setMIFlags(FrameRegUpdateFlags);
3161 } else if (ExtraBaseRegUpdate) {
3162 // Update BaseReg.
3163 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3164 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3165 BuildMI(
3166 *MBB, InsertI, DL,
3167 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3168 .addDef(BaseReg)
3169 .addReg(BaseReg)
3170 .addImm(AddSubOffset)
3171 .addImm(0)
3172 .setMIFlags(FrameRegUpdateFlags);
3173 }
3174}
3175
3176// Check if *II is a register update that can be merged into STGloop that ends
3177// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3178// end of the loop.
3179bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3180 int64_t Size, int64_t *TotalOffset) {
3181 MachineInstr &MI = *II;
3182 if ((MI.getOpcode() == AArch64::ADDXri ||
3183 MI.getOpcode() == AArch64::SUBXri) &&
3184 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3185 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3186 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3187 if (MI.getOpcode() == AArch64::SUBXri)
3188 Offset = -Offset;
3189 int64_t PostOffset = Offset - Size;
3190 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3191 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3192 // chosen depends on the alignment of the loop size, but the difference
3193 // between the valid ranges for the two instructions is small, so we
3194 // conservatively assume that it could be either case here.
3195 //
3196 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3197 // instruction.
3198 const int64_t kMaxOffset = 4080 - 16;
3199 // Max offset of SUBXri.
3200 const int64_t kMinOffset = -4095;
3201 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3202 PostOffset % 16 == 0) {
3203 *TotalOffset = Offset;
3204 return true;
3205 }
3206 }
3207 return false;
3208}
3209
3210void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3212 MemRefs.clear();
3213 for (auto &TS : TSE) {
3214 MachineInstr *MI = TS.MI;
3215 // An instruction without memory operands may access anything. Be
3216 // conservative and return an empty list.
3217 if (MI->memoperands_empty()) {
3218 MemRefs.clear();
3219 return;
3220 }
3221 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3222 }
3223}
3224
3225void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3226 const AArch64FrameLowering *TFI,
3227 bool TryMergeSPUpdate) {
3228 if (TagStores.empty())
3229 return;
3230 TagStoreInstr &FirstTagStore = TagStores[0];
3231 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3232 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3233 DL = TagStores[0].MI->getDebugLoc();
3234
3235 Register Reg;
3236 FrameRegOffset = TFI->resolveFrameOffsetReference(
3237 *MF, FirstTagStore.Offset, false /*isFixed*/,
3238 TargetStackID::Default /*StackID*/, Reg,
3239 /*PreferFP=*/false, /*ForSimm=*/true);
3240 FrameReg = Reg;
3241 FrameRegUpdate = std::nullopt;
3242
3243 mergeMemRefs(TagStores, CombinedMemRefs);
3244
3245 LLVM_DEBUG({
3246 dbgs() << "Replacing adjacent STG instructions:\n";
3247 for (const auto &Instr : TagStores) {
3248 dbgs() << " " << *Instr.MI;
3249 }
3250 });
3251
3252 // Size threshold where a loop becomes shorter than a linear sequence of
3253 // tagging instructions.
3254 const int kSetTagLoopThreshold = 176;
3255 if (Size < kSetTagLoopThreshold) {
3256 if (TagStores.size() < 2)
3257 return;
3258 emitUnrolled(InsertI);
3259 } else {
3260 MachineInstr *UpdateInstr = nullptr;
3261 int64_t TotalOffset = 0;
3262 if (TryMergeSPUpdate) {
3263 // See if we can merge base register update into the STGloop.
3264 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3265 // but STGloop is way too unusual for that, and also it only
3266 // realistically happens in function epilogue. Also, STGloop is expanded
3267 // before that pass.
3268 if (InsertI != MBB->end() &&
3269 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3270 &TotalOffset)) {
3271 UpdateInstr = &*InsertI++;
3272 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3273 << *UpdateInstr);
3274 }
3275 }
3276
3277 if (!UpdateInstr && TagStores.size() < 2)
3278 return;
3279
3280 if (UpdateInstr) {
3281 FrameRegUpdate = TotalOffset;
3282 FrameRegUpdateFlags = UpdateInstr->getFlags();
3283 }
3284 emitLoop(InsertI);
3285 if (UpdateInstr)
3286 UpdateInstr->eraseFromParent();
3287 }
3288
3289 for (auto &TS : TagStores)
3290 TS.MI->eraseFromParent();
3291}
3292
3293bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3294 int64_t &Size, bool &ZeroData) {
3295 MachineFunction &MF = *MI.getParent()->getParent();
3296 const MachineFrameInfo &MFI = MF.getFrameInfo();
3297
3298 unsigned Opcode = MI.getOpcode();
3299 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3300 Opcode == AArch64::STZ2Gi);
3301
3302 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3303 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3304 return false;
3305 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3306 return false;
3307 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3308 Size = MI.getOperand(2).getImm();
3309 return true;
3310 }
3311
3312 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3313 Size = 16;
3314 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3315 Size = 32;
3316 else
3317 return false;
3318
3319 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3320 return false;
3321
3322 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3323 16 * MI.getOperand(2).getImm();
3324 return true;
3325}
3326
3327// Detect a run of memory tagging instructions for adjacent stack frame slots,
3328// and replace them with a shorter instruction sequence:
3329// * replace STG + STG with ST2G
3330// * replace STGloop + STGloop with STGloop
3331// This code needs to run when stack slot offsets are already known, but before
3332// FrameIndex operands in STG instructions are eliminated.
3334 const AArch64FrameLowering *TFI,
3335 RegScavenger *RS) {
3336 bool FirstZeroData;
3337 int64_t Size, Offset;
3338 MachineInstr &MI = *II;
3339 MachineBasicBlock *MBB = MI.getParent();
3341 if (&MI == &MBB->instr_back())
3342 return II;
3343 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3344 return II;
3345
3347 Instrs.emplace_back(&MI, Offset, Size);
3348
3349 constexpr int kScanLimit = 10;
3350 int Count = 0;
3352 NextI != E && Count < kScanLimit; ++NextI) {
3353 MachineInstr &MI = *NextI;
3354 bool ZeroData;
3355 int64_t Size, Offset;
3356 // Collect instructions that update memory tags with a FrameIndex operand
3357 // and (when applicable) constant size, and whose output registers are dead
3358 // (the latter is almost always the case in practice). Since these
3359 // instructions effectively have no inputs or outputs, we are free to skip
3360 // any non-aliasing instructions in between without tracking used registers.
3361 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3362 if (ZeroData != FirstZeroData)
3363 break;
3364 Instrs.emplace_back(&MI, Offset, Size);
3365 continue;
3366 }
3367
3368 // Only count non-transient, non-tagging instructions toward the scan
3369 // limit.
3370 if (!MI.isTransient())
3371 ++Count;
3372
3373 // Just in case, stop before the epilogue code starts.
3374 if (MI.getFlag(MachineInstr::FrameSetup) ||
3376 break;
3377
3378 // Reject anything that may alias the collected instructions.
3379 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3380 break;
3381 }
3382
3383 // New code will be inserted after the last tagging instruction we've found.
3384 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3385
3386 // All the gathered stack tag instructions are merged and placed after
3387 // last tag store in the list. The check should be made if the nzcv
3388 // flag is live at the point where we are trying to insert. Otherwise
3389 // the nzcv flag might get clobbered if any stg loops are present.
3390
3391 // FIXME : This approach of bailing out from merge is conservative in
3392 // some ways like even if stg loops are not present after merge the
3393 // insert list, this liveness check is done (which is not needed).
3395 LiveRegs.addLiveOuts(*MBB);
3396 for (auto I = MBB->rbegin();; ++I) {
3397 MachineInstr &MI = *I;
3398 if (MI == InsertI)
3399 break;
3400 LiveRegs.stepBackward(*I);
3401 }
3402 InsertI++;
3403 if (LiveRegs.contains(AArch64::NZCV))
3404 return InsertI;
3405
3406 llvm::stable_sort(Instrs,
3407 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3408 return Left.Offset < Right.Offset;
3409 });
3410
3411 // Make sure that we don't have any overlapping stores.
3412 int64_t CurOffset = Instrs[0].Offset;
3413 for (auto &Instr : Instrs) {
3414 if (CurOffset > Instr.Offset)
3415 return NextI;
3416 CurOffset = Instr.Offset + Instr.Size;
3417 }
3418
3419 // Find contiguous runs of tagged memory and emit shorter instruction
3420 // sequences for them when possible.
3421 TagStoreEdit TSE(MBB, FirstZeroData);
3422 std::optional<int64_t> EndOffset;
3423 for (auto &Instr : Instrs) {
3424 if (EndOffset && *EndOffset != Instr.Offset) {
3425 // Found a gap.
3426 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3427 TSE.clear();
3428 }
3429
3430 TSE.addInstruction(Instr);
3431 EndOffset = Instr.Offset + Instr.Size;
3432 }
3433
3434 const MachineFunction *MF = MBB->getParent();
3435 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3436 TSE.emitCode(
3437 InsertI, TFI, /*TryMergeSPUpdate = */
3439
3440 return InsertI;
3441}
3442} // namespace
3443
3445 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3446 for (auto &BB : MF)
3447 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3449 II = tryMergeAdjacentSTG(II, this, RS);
3450 }
3451
3452 // By the time this method is called, most of the prologue/epilogue code is
3453 // already emitted, whether its location was affected by the shrink-wrapping
3454 // optimization or not.
3455 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3456 shouldSignReturnAddressEverywhere(MF))
3458}
3459
3460/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3461/// before the update. This is easily retrieved as it is exactly the offset
3462/// that is set in processFunctionBeforeFrameFinalized.
3464 const MachineFunction &MF, int FI, Register &FrameReg,
3465 bool IgnoreSPUpdates) const {
3466 const MachineFrameInfo &MFI = MF.getFrameInfo();
3467 if (IgnoreSPUpdates) {
3468 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3469 << MFI.getObjectOffset(FI) << "\n");
3470 FrameReg = AArch64::SP;
3471 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3472 }
3473
3474 // Go to common code if we cannot provide sp + offset.
3475 if (MFI.hasVarSizedObjects() ||
3478 return getFrameIndexReference(MF, FI, FrameReg);
3479
3480 FrameReg = AArch64::SP;
3481 return getStackOffset(MF, MFI.getObjectOffset(FI));
3482}
3483
3484/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3485/// the parent's frame pointer
3487 const MachineFunction &MF) const {
3488 return 0;
3489}
3490
3491/// Funclets only need to account for space for the callee saved registers,
3492/// as the locals are accounted for in the parent's stack frame.
3494 const MachineFunction &MF) const {
3495 // This is the size of the pushed CSRs.
3496 unsigned CSSize =
3497 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3498 // This is the amount of stack a funclet needs to allocate.
3499 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3500 getStackAlign());
3501}
3502
3503namespace {
3504struct FrameObject {
3505 bool IsValid = false;
3506 // Index of the object in MFI.
3507 int ObjectIndex = 0;
3508 // Group ID this object belongs to.
3509 int GroupIndex = -1;
3510 // This object should be placed first (closest to SP).
3511 bool ObjectFirst = false;
3512 // This object's group (which always contains the object with
3513 // ObjectFirst==true) should be placed first.
3514 bool GroupFirst = false;
3515
3516 // Used to distinguish between FP and GPR accesses. The values are decided so
3517 // that they sort FPR < Hazard < GPR and they can be or'd together.
3518 unsigned Accesses = 0;
3519 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3520};
3521
3522class GroupBuilder {
3523 SmallVector<int, 8> CurrentMembers;
3524 int NextGroupIndex = 0;
3525 std::vector<FrameObject> &Objects;
3526
3527public:
3528 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3529 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3530 void EndCurrentGroup() {
3531 if (CurrentMembers.size() > 1) {
3532 // Create a new group with the current member list. This might remove them
3533 // from their pre-existing groups. That's OK, dealing with overlapping
3534 // groups is too hard and unlikely to make a difference.
3535 LLVM_DEBUG(dbgs() << "group:");
3536 for (int Index : CurrentMembers) {
3537 Objects[Index].GroupIndex = NextGroupIndex;
3538 LLVM_DEBUG(dbgs() << " " << Index);
3539 }
3540 LLVM_DEBUG(dbgs() << "\n");
3541 NextGroupIndex++;
3542 }
3543 CurrentMembers.clear();
3544 }
3545};
3546
3547bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3548 // Objects at a lower index are closer to FP; objects at a higher index are
3549 // closer to SP.
3550 //
3551 // For consistency in our comparison, all invalid objects are placed
3552 // at the end. This also allows us to stop walking when we hit the
3553 // first invalid item after it's all sorted.
3554 //
3555 // If we want to include a stack hazard region, order FPR accesses < the
3556 // hazard object < GPRs accesses in order to create a separation between the
3557 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3558 //
3559 // Otherwise the "first" object goes first (closest to SP), followed by the
3560 // members of the "first" group.
3561 //
3562 // The rest are sorted by the group index to keep the groups together.
3563 // Higher numbered groups are more likely to be around longer (i.e. untagged
3564 // in the function epilogue and not at some earlier point). Place them closer
3565 // to SP.
3566 //
3567 // If all else equal, sort by the object index to keep the objects in the
3568 // original order.
3569 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3570 A.GroupIndex, A.ObjectIndex) <
3571 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3572 B.GroupIndex, B.ObjectIndex);
3573}
3574} // namespace
3575
3577 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3579
3580 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3581 ObjectsToAllocate.empty())
3582 return;
3583
3584 const MachineFrameInfo &MFI = MF.getFrameInfo();
3585 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3586 for (auto &Obj : ObjectsToAllocate) {
3587 FrameObjects[Obj].IsValid = true;
3588 FrameObjects[Obj].ObjectIndex = Obj;
3589 }
3590
3591 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3592 // the same time.
3593 GroupBuilder GB(FrameObjects);
3594 for (auto &MBB : MF) {
3595 for (auto &MI : MBB) {
3596 if (MI.isDebugInstr())
3597 continue;
3598
3599 if (AFI.hasStackHazardSlotIndex()) {
3600 std::optional<int> FI = getLdStFrameID(MI, MFI);
3601 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3602 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3604 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3605 else
3606 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3607 }
3608 }
3609
3610 int OpIndex;
3611 switch (MI.getOpcode()) {
3612 case AArch64::STGloop:
3613 case AArch64::STZGloop:
3614 OpIndex = 3;
3615 break;
3616 case AArch64::STGi:
3617 case AArch64::STZGi:
3618 case AArch64::ST2Gi:
3619 case AArch64::STZ2Gi:
3620 OpIndex = 1;
3621 break;
3622 default:
3623 OpIndex = -1;
3624 }
3625
3626 int TaggedFI = -1;
3627 if (OpIndex >= 0) {
3628 const MachineOperand &MO = MI.getOperand(OpIndex);
3629 if (MO.isFI()) {
3630 int FI = MO.getIndex();
3631 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3632 FrameObjects[FI].IsValid)
3633 TaggedFI = FI;
3634 }
3635 }
3636
3637 // If this is a stack tagging instruction for a slot that is not part of a
3638 // group yet, either start a new group or add it to the current one.
3639 if (TaggedFI >= 0)
3640 GB.AddMember(TaggedFI);
3641 else
3642 GB.EndCurrentGroup();
3643 }
3644 // Groups should never span multiple basic blocks.
3645 GB.EndCurrentGroup();
3646 }
3647
3648 if (AFI.hasStackHazardSlotIndex()) {
3649 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3650 FrameObject::AccessHazard;
3651 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3652 for (auto &Obj : FrameObjects)
3653 if (!Obj.Accesses ||
3654 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3655 Obj.Accesses = FrameObject::AccessGPR;
3656 }
3657
3658 // If the function's tagged base pointer is pinned to a stack slot, we want to
3659 // put that slot first when possible. This will likely place it at SP + 0,
3660 // and save one instruction when generating the base pointer because IRG does
3661 // not allow an immediate offset.
3662 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3663 if (TBPI) {
3664 FrameObjects[*TBPI].ObjectFirst = true;
3665 FrameObjects[*TBPI].GroupFirst = true;
3666 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3667 if (FirstGroupIndex >= 0)
3668 for (FrameObject &Object : FrameObjects)
3669 if (Object.GroupIndex == FirstGroupIndex)
3670 Object.GroupFirst = true;
3671 }
3672
3673 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3674
3675 int i = 0;
3676 for (auto &Obj : FrameObjects) {
3677 // All invalid items are sorted at the end, so it's safe to stop.
3678 if (!Obj.IsValid)
3679 break;
3680 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3681 }
3682
3683 LLVM_DEBUG({
3684 dbgs() << "Final frame order:\n";
3685 for (auto &Obj : FrameObjects) {
3686 if (!Obj.IsValid)
3687 break;
3688 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3689 if (Obj.ObjectFirst)
3690 dbgs() << ", first";
3691 if (Obj.GroupFirst)
3692 dbgs() << ", group-first";
3693 dbgs() << "\n";
3694 }
3695 });
3696}
3697
3698/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3699/// least every ProbeSize bytes. Returns an iterator of the first instruction
3700/// after the loop. The difference between SP and TargetReg must be an exact
3701/// multiple of ProbeSize.
3703AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3704 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3705 Register TargetReg) const {
3706 MachineBasicBlock &MBB = *MBBI->getParent();
3707 MachineFunction &MF = *MBB.getParent();
3708 const AArch64InstrInfo *TII =
3709 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3710 DebugLoc DL = MBB.findDebugLoc(MBBI);
3711
3712 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3713 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3714 MF.insert(MBBInsertPoint, LoopMBB);
3715 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3716 MF.insert(MBBInsertPoint, ExitMBB);
3717
3718 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3719 // in SUB).
3720 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3721 StackOffset::getFixed(-ProbeSize), TII,
3723 // STR XZR, [SP]
3724 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3725 .addReg(AArch64::XZR)
3726 .addReg(AArch64::SP)
3727 .addImm(0)
3729 // CMP SP, TargetReg
3730 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3731 AArch64::XZR)
3732 .addReg(AArch64::SP)
3733 .addReg(TargetReg)
3736 // B.CC Loop
3737 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3739 .addMBB(LoopMBB)
3741
3742 LoopMBB->addSuccessor(ExitMBB);
3743 LoopMBB->addSuccessor(LoopMBB);
3744 // Synthesize the exit MBB.
3745 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3747 MBB.addSuccessor(LoopMBB);
3748 // Update liveins.
3749 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3750
3751 return ExitMBB->begin();
3752}
3753
3754void AArch64FrameLowering::inlineStackProbeFixed(
3755 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3756 StackOffset CFAOffset) const {
3757 MachineBasicBlock *MBB = MBBI->getParent();
3758 MachineFunction &MF = *MBB->getParent();
3759 const AArch64InstrInfo *TII =
3760 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3761 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3762 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3763 bool HasFP = hasFP(MF);
3764
3765 DebugLoc DL;
3766 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3767 int64_t NumBlocks = FrameSize / ProbeSize;
3768 int64_t ResidualSize = FrameSize % ProbeSize;
3769
3770 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3771 << NumBlocks << " blocks of " << ProbeSize
3772 << " bytes, plus " << ResidualSize << " bytes\n");
3773
3774 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3775 // ordinary loop.
3776 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3777 for (int i = 0; i < NumBlocks; ++i) {
3778 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3779 // encodable in a SUB).
3780 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3781 StackOffset::getFixed(-ProbeSize), TII,
3782 MachineInstr::FrameSetup, false, false, nullptr,
3783 EmitAsyncCFI && !HasFP, CFAOffset);
3784 CFAOffset += StackOffset::getFixed(ProbeSize);
3785 // STR XZR, [SP]
3786 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3787 .addReg(AArch64::XZR)
3788 .addReg(AArch64::SP)
3789 .addImm(0)
3791 }
3792 } else if (NumBlocks != 0) {
3793 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3794 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3795 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3796 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3797 MachineInstr::FrameSetup, false, false, nullptr,
3798 EmitAsyncCFI && !HasFP, CFAOffset);
3799 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3800 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3801 MBB = MBBI->getParent();
3802 if (EmitAsyncCFI && !HasFP) {
3803 // Set the CFA register back to SP.
3804 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3805 .buildDefCFARegister(AArch64::SP);
3806 }
3807 }
3808
3809 if (ResidualSize != 0) {
3810 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3811 // in SUB).
3812 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3813 StackOffset::getFixed(-ResidualSize), TII,
3814 MachineInstr::FrameSetup, false, false, nullptr,
3815 EmitAsyncCFI && !HasFP, CFAOffset);
3816 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3817 // STR XZR, [SP]
3818 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3819 .addReg(AArch64::XZR)
3820 .addReg(AArch64::SP)
3821 .addImm(0)
3823 }
3824 }
3825}
3826
3827void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3828 MachineBasicBlock &MBB) const {
3829 // Get the instructions that need to be replaced. We emit at most two of
3830 // these. Remember them in order to avoid complications coming from the need
3831 // to traverse the block while potentially creating more blocks.
3832 SmallVector<MachineInstr *, 4> ToReplace;
3833 for (MachineInstr &MI : MBB)
3834 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3835 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3836 ToReplace.push_back(&MI);
3837
3838 for (MachineInstr *MI : ToReplace) {
3839 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3840 Register ScratchReg = MI->getOperand(0).getReg();
3841 int64_t FrameSize = MI->getOperand(1).getImm();
3842 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3843 MI->getOperand(3).getImm());
3844 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3845 CFAOffset);
3846 } else {
3847 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3848 "Stack probe pseudo-instruction expected");
3849 const AArch64InstrInfo *TII =
3850 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3851 Register TargetReg = MI->getOperand(0).getReg();
3852 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3853 }
3854 MI->eraseFromParent();
3855 }
3856}
3857
3860 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3861 GPR = 1 << 0, // A general purpose register.
3862 PPR = 1 << 1, // A predicate register.
3863 FPR = 1 << 2, // A floating point/Neon/SVE register.
3864 };
3865
3866 int Idx;
3868 int64_t Size;
3869 unsigned AccessTypes;
3870
3872
3873 bool operator<(const StackAccess &Rhs) const {
3874 return std::make_tuple(start(), Idx) <
3875 std::make_tuple(Rhs.start(), Rhs.Idx);
3876 }
3877
3878 bool isCPU() const {
3879 // Predicate register load and store instructions execute on the CPU.
3881 }
3882 bool isSME() const { return AccessTypes & AccessType::FPR; }
3883 bool isMixed() const { return isCPU() && isSME(); }
3884
3885 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3886 int64_t end() const { return start() + Size; }
3887
3888 std::string getTypeString() const {
3889 switch (AccessTypes) {
3890 case AccessType::FPR:
3891 return "FPR";
3892 case AccessType::PPR:
3893 return "PPR";
3894 case AccessType::GPR:
3895 return "GPR";
3897 return "NA";
3898 default:
3899 return "Mixed";
3900 }
3901 }
3902
3903 void print(raw_ostream &OS) const {
3904 OS << getTypeString() << " stack object at [SP"
3905 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3906 if (Offset.getScalable())
3907 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3908 << " * vscale";
3909 OS << "]";
3910 }
3911};
3912
3913static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3914 SA.print(OS);
3915 return OS;
3916}
3917
3918void AArch64FrameLowering::emitRemarks(
3919 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3920
3921 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3923 return;
3924
3925 unsigned StackHazardSize = getStackHazardSize(MF);
3926 const uint64_t HazardSize =
3927 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3928
3929 if (HazardSize == 0)
3930 return;
3931
3932 const MachineFrameInfo &MFI = MF.getFrameInfo();
3933 // Bail if function has no stack objects.
3934 if (!MFI.hasStackObjects())
3935 return;
3936
3937 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3938
3939 size_t NumFPLdSt = 0;
3940 size_t NumNonFPLdSt = 0;
3941
3942 // Collect stack accesses via Load/Store instructions.
3943 for (const MachineBasicBlock &MBB : MF) {
3944 for (const MachineInstr &MI : MBB) {
3945 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3946 continue;
3947 for (MachineMemOperand *MMO : MI.memoperands()) {
3948 std::optional<int> FI = getMMOFrameID(MMO, MFI);
3949 if (FI && !MFI.isDeadObjectIndex(*FI)) {
3950 int FrameIdx = *FI;
3951
3952 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
3953 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
3954 StackAccesses[ArrIdx].Idx = FrameIdx;
3955 StackAccesses[ArrIdx].Offset =
3956 getFrameIndexReferenceFromSP(MF, FrameIdx);
3957 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
3958 }
3959
3960 unsigned RegTy = StackAccess::AccessType::GPR;
3961 if (MFI.hasScalableStackID(FrameIdx))
3964 RegTy = StackAccess::FPR;
3965
3966 StackAccesses[ArrIdx].AccessTypes |= RegTy;
3967
3968 if (RegTy == StackAccess::FPR)
3969 ++NumFPLdSt;
3970 else
3971 ++NumNonFPLdSt;
3972 }
3973 }
3974 }
3975 }
3976
3977 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
3978 return;
3979
3980 llvm::sort(StackAccesses);
3981 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
3983 });
3984
3987
3988 if (StackAccesses.front().isMixed())
3989 MixedObjects.push_back(&StackAccesses.front());
3990
3991 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
3992 It != End; ++It) {
3993 const auto &First = *It;
3994 const auto &Second = *(It + 1);
3995
3996 if (Second.isMixed())
3997 MixedObjects.push_back(&Second);
3998
3999 if ((First.isSME() && Second.isCPU()) ||
4000 (First.isCPU() && Second.isSME())) {
4001 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4002 if (Distance < HazardSize)
4003 HazardPairs.emplace_back(&First, &Second);
4004 }
4005 }
4006
4007 auto EmitRemark = [&](llvm::StringRef Str) {
4008 ORE->emit([&]() {
4009 auto R = MachineOptimizationRemarkAnalysis(
4010 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4011 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4012 });
4013 };
4014
4015 for (const auto &P : HazardPairs)
4016 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4017
4018 for (const auto *Obj : MixedObjects)
4019 EmitRemark(
4020 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4021}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Wrapper class representing virtual and physical registers.
Definition Register.h:19
constexpr bool isValid() const
Definition Register.h:107
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:338
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2058
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1732
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1622
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1758
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2120
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1897
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray