LLVM 19.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the StackHazardSize option. Without changing the layout of the
130// stack frame in the diagram above, a stack object of size StackHazardSize is
131// added between GPR and FPR CSRs. Another is added to the stack objects
132// section, and stack objects are sorted so that FPR > Hazard padding slot >
133// GPRs (where possible). Unfortunately some things are not handled well (VLA
134// area, arguments on the stack, object with both GPR and FPR accesses), but if
135// those are controlled by the user then the entire stack frame becomes GPR at
136// the start/end with FPR in the middle, surrounded by Hazard padding.
137//
138// An example of the prologue:
139//
140// .globl __foo
141// .align 2
142// __foo:
143// Ltmp0:
144// .cfi_startproc
145// .cfi_personality 155, ___gxx_personality_v0
146// Leh_func_begin:
147// .cfi_lsda 16, Lexception33
148//
149// stp xa,bx, [sp, -#offset]!
150// ...
151// stp x28, x27, [sp, #offset-32]
152// stp fp, lr, [sp, #offset-16]
153// add fp, sp, #offset - 16
154// sub sp, sp, #1360
155//
156// The Stack:
157// +-------------------------------------------+
158// 10000 | ........ | ........ | ........ | ........ |
159// 10004 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10008 | ........ | ........ | ........ | ........ |
162// 1000c | ........ | ........ | ........ | ........ |
163// +===========================================+
164// 10010 | X28 Register |
165// 10014 | X28 Register |
166// +-------------------------------------------+
167// 10018 | X27 Register |
168// 1001c | X27 Register |
169// +===========================================+
170// 10020 | Frame Pointer |
171// 10024 | Frame Pointer |
172// +-------------------------------------------+
173// 10028 | Link Register |
174// 1002c | Link Register |
175// +===========================================+
176// 10030 | ........ | ........ | ........ | ........ |
177// 10034 | ........ | ........ | ........ | ........ |
178// +-------------------------------------------+
179// 10038 | ........ | ........ | ........ | ........ |
180// 1003c | ........ | ........ | ........ | ........ |
181// +-------------------------------------------+
182//
183// [sp] = 10030 :: >>initial value<<
184// sp = 10020 :: stp fp, lr, [sp, #-16]!
185// fp = sp == 10020 :: mov fp, sp
186// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
187// sp == 10010 :: >>final value<<
188//
189// The frame pointer (w29) points to address 10020. If we use an offset of
190// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
191// for w27, and -32 for w28:
192//
193// Ltmp1:
194// .cfi_def_cfa w29, 16
195// Ltmp2:
196// .cfi_offset w30, -8
197// Ltmp3:
198// .cfi_offset w29, -16
199// Ltmp4:
200// .cfi_offset w27, -24
201// Ltmp5:
202// .cfi_offset w28, -32
203//
204//===----------------------------------------------------------------------===//
205
206#include "AArch64FrameLowering.h"
207#include "AArch64InstrInfo.h"
209#include "AArch64RegisterInfo.h"
210#include "AArch64Subtarget.h"
211#include "AArch64TargetMachine.h"
214#include "llvm/ADT/ScopeExit.h"
215#include "llvm/ADT/SmallVector.h"
216#include "llvm/ADT/Statistic.h"
233#include "llvm/IR/Attributes.h"
234#include "llvm/IR/CallingConv.h"
235#include "llvm/IR/DataLayout.h"
236#include "llvm/IR/DebugLoc.h"
237#include "llvm/IR/Function.h"
238#include "llvm/MC/MCAsmInfo.h"
239#include "llvm/MC/MCDwarf.h"
241#include "llvm/Support/Debug.h"
247#include <cassert>
248#include <cstdint>
249#include <iterator>
250#include <optional>
251#include <vector>
252
253using namespace llvm;
254
255#define DEBUG_TYPE "frame-info"
256
257static cl::opt<bool> EnableRedZone("aarch64-redzone",
258 cl::desc("enable use of redzone on AArch64"),
259 cl::init(false), cl::Hidden);
260
262 "stack-tagging-merge-settag",
263 cl::desc("merge settag instruction in function epilog"), cl::init(true),
264 cl::Hidden);
265
266static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
267 cl::desc("sort stack allocations"),
268 cl::init(true), cl::Hidden);
269
271 "homogeneous-prolog-epilog", cl::Hidden,
272 cl::desc("Emit homogeneous prologue and epilogue for the size "
273 "optimization (default = off)"));
274
275// Stack hazard padding size. 0 = disabled.
276static cl::opt<unsigned> StackHazardSize("aarch64-stack-hazard-size",
277 cl::init(0), cl::Hidden);
278// Whether to insert padding into non-streaming functions (for testing).
279static cl::opt<bool>
280 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
281 cl::init(false), cl::Hidden);
282
283STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
284
285/// Returns how much of the incoming argument stack area (in bytes) we should
286/// clean up in an epilogue. For the C calling convention this will be 0, for
287/// guaranteed tail call conventions it can be positive (a normal return or a
288/// tail call to a function that uses less stack space for arguments) or
289/// negative (for a tail call to a function that needs more stack space than us
290/// for arguments).
295 bool IsTailCallReturn = (MBB.end() != MBBI)
297 : false;
298
299 int64_t ArgumentPopSize = 0;
300 if (IsTailCallReturn) {
301 MachineOperand &StackAdjust = MBBI->getOperand(1);
302
303 // For a tail-call in a callee-pops-arguments environment, some or all of
304 // the stack may actually be in use for the call's arguments, this is
305 // calculated during LowerCall and consumed here...
306 ArgumentPopSize = StackAdjust.getImm();
307 } else {
308 // ... otherwise the amount to pop is *all* of the argument space,
309 // conveniently stored in the MachineFunctionInfo by
310 // LowerFormalArguments. This will, of course, be zero for the C calling
311 // convention.
312 ArgumentPopSize = AFI->getArgumentStackToRestore();
313 }
314
315 return ArgumentPopSize;
316}
317
319static bool needsWinCFI(const MachineFunction &MF);
322
323/// Returns true if a homogeneous prolog or epilog code can be emitted
324/// for the size optimization. If possible, a frame helper call is injected.
325/// When Exit block is given, this check is for epilog.
326bool AArch64FrameLowering::homogeneousPrologEpilog(
327 MachineFunction &MF, MachineBasicBlock *Exit) const {
328 if (!MF.getFunction().hasMinSize())
329 return false;
331 return false;
332 if (EnableRedZone)
333 return false;
334
335 // TODO: Window is supported yet.
336 if (needsWinCFI(MF))
337 return false;
338 // TODO: SVE is not supported yet.
339 if (getSVEStackSize(MF))
340 return false;
341
342 // Bail on stack adjustment needed on return for simplicity.
343 const MachineFrameInfo &MFI = MF.getFrameInfo();
345 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
346 return false;
347 if (Exit && getArgumentStackToRestore(MF, *Exit))
348 return false;
349
350 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
351 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
352 return false;
353
354 // If there are an odd number of GPRs before LR and FP in the CSRs list,
355 // they will not be paired into one RegPairInfo, which is incompatible with
356 // the assumption made by the homogeneous prolog epilog pass.
357 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
358 unsigned NumGPRs = 0;
359 for (unsigned I = 0; CSRegs[I]; ++I) {
360 Register Reg = CSRegs[I];
361 if (Reg == AArch64::LR) {
362 assert(CSRegs[I + 1] == AArch64::FP);
363 if (NumGPRs % 2 != 0)
364 return false;
365 break;
366 }
367 if (AArch64::GPR64RegClass.contains(Reg))
368 ++NumGPRs;
369 }
370
371 return true;
372}
373
374/// Returns true if CSRs should be paired.
375bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
376 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
377}
378
379/// This is the biggest offset to the stack pointer we can encode in aarch64
380/// instructions (without using a separate calculation and a temp register).
381/// Note that the exception here are vector stores/loads which cannot encode any
382/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
383static const unsigned DefaultSafeSPDisplacement = 255;
384
385/// Look at each instruction that references stack frames and return the stack
386/// size limit beyond which some of these instructions will require a scratch
387/// register during their expansion later.
389 // FIXME: For now, just conservatively guestimate based on unscaled indexing
390 // range. We'll end up allocating an unnecessary spill slot a lot, but
391 // realistically that's not a big deal at this stage of the game.
392 for (MachineBasicBlock &MBB : MF) {
393 for (MachineInstr &MI : MBB) {
394 if (MI.isDebugInstr() || MI.isPseudo() ||
395 MI.getOpcode() == AArch64::ADDXri ||
396 MI.getOpcode() == AArch64::ADDSXri)
397 continue;
398
399 for (const MachineOperand &MO : MI.operands()) {
400 if (!MO.isFI())
401 continue;
402
404 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
406 return 0;
407 }
408 }
409 }
411}
412
416}
417
418/// Returns the size of the fixed object area (allocated next to sp on entry)
419/// On Win64 this may include a var args area and an UnwindHelp object for EH.
420static unsigned getFixedObjectSize(const MachineFunction &MF,
421 const AArch64FunctionInfo *AFI, bool IsWin64,
422 bool IsFunclet) {
423 if (!IsWin64 || IsFunclet) {
424 return AFI->getTailCallReservedStack();
425 } else {
426 if (AFI->getTailCallReservedStack() != 0 &&
428 Attribute::SwiftAsync))
429 report_fatal_error("cannot generate ABI-changing tail call for Win64");
430 // Var args are stored here in the primary function.
431 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
432 // To support EH funclets we allocate an UnwindHelp object
433 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
434 return AFI->getTailCallReservedStack() +
435 alignTo(VarArgsArea + UnwindHelpObject, 16);
436 }
437}
438
439/// Returns the size of the entire SVE stackframe (calleesaves + spills).
442 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
443}
444
446 if (!EnableRedZone)
447 return false;
448
449 // Don't use the red zone if the function explicitly asks us not to.
450 // This is typically used for kernel code.
451 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
452 const unsigned RedZoneSize =
454 if (!RedZoneSize)
455 return false;
456
457 const MachineFrameInfo &MFI = MF.getFrameInfo();
459 uint64_t NumBytes = AFI->getLocalStackSize();
460
461 // If neither NEON or SVE are available, a COPY from one Q-reg to
462 // another requires a spill -> reload sequence. We can do that
463 // using a pre-decrementing store/post-decrementing load, but
464 // if we do so, we can't use the Red Zone.
465 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
466 !Subtarget.isNeonAvailable() &&
467 !Subtarget.hasSVE();
468
469 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
470 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
471}
472
473/// hasFP - Return true if the specified function should have a dedicated frame
474/// pointer register.
476 const MachineFrameInfo &MFI = MF.getFrameInfo();
477 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
478
479 // Win64 EH requires a frame pointer if funclets are present, as the locals
480 // are accessed off the frame pointer in both the parent function and the
481 // funclets.
482 if (MF.hasEHFunclets())
483 return true;
484 // Retain behavior of always omitting the FP for leaf functions when possible.
486 return true;
487 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
488 MFI.hasStackMap() || MFI.hasPatchPoint() ||
489 RegInfo->hasStackRealignment(MF))
490 return true;
491 // With large callframes around we may need to use FP to access the scavenging
492 // emergency spillslot.
493 //
494 // Unfortunately some calls to hasFP() like machine verifier ->
495 // getReservedReg() -> hasFP in the middle of global isel are too early
496 // to know the max call frame size. Hopefully conservatively returning "true"
497 // in those cases is fine.
498 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
499 if (!MFI.isMaxCallFrameSizeComputed() ||
501 return true;
502
503 return false;
504}
505
506/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
507/// not required, we reserve argument space for call sites in the function
508/// immediately on entry to the current function. This eliminates the need for
509/// add/sub sp brackets around call sites. Returns true if the call frame is
510/// included as part of the stack frame.
512 const MachineFunction &MF) const {
513 // The stack probing code for the dynamically allocated outgoing arguments
514 // area assumes that the stack is probed at the top - either by the prologue
515 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
516 // most recent variable-sized object allocation. Changing the condition here
517 // may need to be followed up by changes to the probe issuing logic.
518 return !MF.getFrameInfo().hasVarSizedObjects();
519}
520
524 const AArch64InstrInfo *TII =
525 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
526 const AArch64TargetLowering *TLI =
527 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
528 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
529 DebugLoc DL = I->getDebugLoc();
530 unsigned Opc = I->getOpcode();
531 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
532 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
533
534 if (!hasReservedCallFrame(MF)) {
535 int64_t Amount = I->getOperand(0).getImm();
536 Amount = alignTo(Amount, getStackAlign());
537 if (!IsDestroy)
538 Amount = -Amount;
539
540 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
541 // doesn't have to pop anything), then the first operand will be zero too so
542 // this adjustment is a no-op.
543 if (CalleePopAmount == 0) {
544 // FIXME: in-function stack adjustment for calls is limited to 24-bits
545 // because there's no guaranteed temporary register available.
546 //
547 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
548 // 1) For offset <= 12-bit, we use LSL #0
549 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
550 // LSL #0, and the other uses LSL #12.
551 //
552 // Most call frames will be allocated at the start of a function so
553 // this is OK, but it is a limitation that needs dealing with.
554 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
555
556 if (TLI->hasInlineStackProbe(MF) &&
558 // When stack probing is enabled, the decrement of SP may need to be
559 // probed. We only need to do this if the call site needs 1024 bytes of
560 // space or more, because a region smaller than that is allowed to be
561 // unprobed at an ABI boundary. We rely on the fact that SP has been
562 // probed exactly at this point, either by the prologue or most recent
563 // dynamic allocation.
565 "non-reserved call frame without var sized objects?");
566 Register ScratchReg =
567 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
568 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
569 } else {
570 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
571 StackOffset::getFixed(Amount), TII);
572 }
573 }
574 } else if (CalleePopAmount != 0) {
575 // If the calling convention demands that the callee pops arguments from the
576 // stack, we want to add it back if we have a reserved call frame.
577 assert(CalleePopAmount < 0xffffff && "call frame too large");
578 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
579 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
580 }
581 return MBB.erase(I);
582}
583
584void AArch64FrameLowering::emitCalleeSavedGPRLocations(
587 MachineFrameInfo &MFI = MF.getFrameInfo();
589 SMEAttrs Attrs(MF.getFunction());
590 bool LocallyStreaming =
591 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
592
593 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
594 if (CSI.empty())
595 return;
596
597 const TargetSubtargetInfo &STI = MF.getSubtarget();
598 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
599 const TargetInstrInfo &TII = *STI.getInstrInfo();
601
602 for (const auto &Info : CSI) {
603 unsigned FrameIdx = Info.getFrameIdx();
604 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
605 continue;
606
607 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
608 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
609 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
610
611 // The location of VG will be emitted before each streaming-mode change in
612 // the function. Only locally-streaming functions require emitting the
613 // non-streaming VG location here.
614 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
615 (!LocallyStreaming &&
616 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
617 continue;
618
619 unsigned CFIIndex = MF.addFrameInst(
620 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
621 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
622 .addCFIIndex(CFIIndex)
624 }
625}
626
627void AArch64FrameLowering::emitCalleeSavedSVELocations(
630 MachineFrameInfo &MFI = MF.getFrameInfo();
631
632 // Add callee saved registers to move list.
633 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
634 if (CSI.empty())
635 return;
636
637 const TargetSubtargetInfo &STI = MF.getSubtarget();
638 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
639 const TargetInstrInfo &TII = *STI.getInstrInfo();
642
643 for (const auto &Info : CSI) {
644 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
645 continue;
646
647 // Not all unwinders may know about SVE registers, so assume the lowest
648 // common demoninator.
649 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
650 unsigned Reg = Info.getReg();
651 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
652 continue;
653
655 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
657
658 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
659 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
660 .addCFIIndex(CFIIndex)
662 }
663}
664
668 unsigned DwarfReg) {
669 unsigned CFIIndex =
670 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
671 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
672}
673
675 MachineBasicBlock &MBB) const {
676
678 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
679 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
680 const auto &TRI =
681 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
682 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
683
684 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
685 DebugLoc DL;
686
687 // Reset the CFA to `SP + 0`.
689 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
690 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
691 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
692
693 // Flip the RA sign state.
694 if (MFI.shouldSignReturnAddress(MF)) {
696 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
697 }
698
699 // Shadow call stack uses X18, reset it.
700 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
701 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
702 TRI.getDwarfRegNum(AArch64::X18, true));
703
704 // Emit .cfi_same_value for callee-saved registers.
705 const std::vector<CalleeSavedInfo> &CSI =
707 for (const auto &Info : CSI) {
708 unsigned Reg = Info.getReg();
709 if (!TRI.regNeedsCFI(Reg, Reg))
710 continue;
711 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
712 TRI.getDwarfRegNum(Reg, true));
713 }
714}
715
718 bool SVE) {
720 MachineFrameInfo &MFI = MF.getFrameInfo();
721
722 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
723 if (CSI.empty())
724 return;
725
726 const TargetSubtargetInfo &STI = MF.getSubtarget();
727 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
728 const TargetInstrInfo &TII = *STI.getInstrInfo();
730
731 for (const auto &Info : CSI) {
732 if (SVE !=
733 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
734 continue;
735
736 unsigned Reg = Info.getReg();
737 if (SVE &&
738 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
739 continue;
740
741 if (!Info.isRestored())
742 continue;
743
744 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
745 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
746 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
747 .addCFIIndex(CFIIndex)
749 }
750}
751
752void AArch64FrameLowering::emitCalleeSavedGPRRestores(
755}
756
757void AArch64FrameLowering::emitCalleeSavedSVERestores(
760}
761
762// Return the maximum possible number of bytes for `Size` due to the
763// architectural limit on the size of a SVE register.
764static int64_t upperBound(StackOffset Size) {
765 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
766 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
767}
768
769void AArch64FrameLowering::allocateStackSpace(
771 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
772 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
773 bool FollowupAllocs) const {
774
775 if (!AllocSize)
776 return;
777
778 DebugLoc DL;
780 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
781 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
783 const MachineFrameInfo &MFI = MF.getFrameInfo();
784
785 const int64_t MaxAlign = MFI.getMaxAlign().value();
786 const uint64_t AndMask = ~(MaxAlign - 1);
787
788 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
789 Register TargetReg = RealignmentPadding
791 : AArch64::SP;
792 // SUB Xd/SP, SP, AllocSize
793 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
794 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
795 EmitCFI, InitialOffset);
796
797 if (RealignmentPadding) {
798 // AND SP, X9, 0b11111...0000
799 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
800 .addReg(TargetReg, RegState::Kill)
803 AFI.setStackRealigned(true);
804
805 // No need for SEH instructions here; if we're realigning the stack,
806 // we've set a frame pointer and already finished the SEH prologue.
807 assert(!NeedsWinCFI);
808 }
809 return;
810 }
811
812 //
813 // Stack probing allocation.
814 //
815
816 // Fixed length allocation. If we don't need to re-align the stack and don't
817 // have SVE objects, we can use a more efficient sequence for stack probing.
818 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
820 assert(ScratchReg != AArch64::NoRegister);
821 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
822 .addDef(ScratchReg)
823 .addImm(AllocSize.getFixed())
824 .addImm(InitialOffset.getFixed())
825 .addImm(InitialOffset.getScalable());
826 // The fixed allocation may leave unprobed bytes at the top of the
827 // stack. If we have subsequent alocation (e.g. if we have variable-sized
828 // objects), we need to issue an extra probe, so these allocations start in
829 // a known state.
830 if (FollowupAllocs) {
831 // STR XZR, [SP]
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
833 .addReg(AArch64::XZR)
834 .addReg(AArch64::SP)
835 .addImm(0)
837 }
838
839 return;
840 }
841
842 // Variable length allocation.
843
844 // If the (unknown) allocation size cannot exceed the probe size, decrement
845 // the stack pointer right away.
846 int64_t ProbeSize = AFI.getStackProbeSize();
847 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
848 Register ScratchReg = RealignmentPadding
850 : AArch64::SP;
851 assert(ScratchReg != AArch64::NoRegister);
852 // SUB Xd, SP, AllocSize
853 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
854 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
855 EmitCFI, InitialOffset);
856 if (RealignmentPadding) {
857 // AND SP, Xn, 0b11111...0000
858 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
859 .addReg(ScratchReg, RegState::Kill)
862 AFI.setStackRealigned(true);
863 }
864 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
866 // STR XZR, [SP]
867 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
868 .addReg(AArch64::XZR)
869 .addReg(AArch64::SP)
870 .addImm(0)
872 }
873 return;
874 }
875
876 // Emit a variable-length allocation probing loop.
877 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
878 // each of them guaranteed to adjust the stack by less than the probe size.
880 assert(TargetReg != AArch64::NoRegister);
881 // SUB Xd, SP, AllocSize
882 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
883 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
884 EmitCFI, InitialOffset);
885 if (RealignmentPadding) {
886 // AND Xn, Xn, 0b11111...0000
887 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
888 .addReg(TargetReg, RegState::Kill)
891 }
892
893 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
894 .addReg(TargetReg);
895 if (EmitCFI) {
896 // Set the CFA register back to SP.
897 unsigned Reg =
898 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
899 unsigned CFIIndex =
901 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
902 .addCFIIndex(CFIIndex)
904 }
905 if (RealignmentPadding)
906 AFI.setStackRealigned(true);
907}
908
909static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
910 switch (Reg.id()) {
911 default:
912 // The called routine is expected to preserve r19-r28
913 // r29 and r30 are used as frame pointer and link register resp.
914 return 0;
915
916 // GPRs
917#define CASE(n) \
918 case AArch64::W##n: \
919 case AArch64::X##n: \
920 return AArch64::X##n
921 CASE(0);
922 CASE(1);
923 CASE(2);
924 CASE(3);
925 CASE(4);
926 CASE(5);
927 CASE(6);
928 CASE(7);
929 CASE(8);
930 CASE(9);
931 CASE(10);
932 CASE(11);
933 CASE(12);
934 CASE(13);
935 CASE(14);
936 CASE(15);
937 CASE(16);
938 CASE(17);
939 CASE(18);
940#undef CASE
941
942 // FPRs
943#define CASE(n) \
944 case AArch64::B##n: \
945 case AArch64::H##n: \
946 case AArch64::S##n: \
947 case AArch64::D##n: \
948 case AArch64::Q##n: \
949 return HasSVE ? AArch64::Z##n : AArch64::Q##n
950 CASE(0);
951 CASE(1);
952 CASE(2);
953 CASE(3);
954 CASE(4);
955 CASE(5);
956 CASE(6);
957 CASE(7);
958 CASE(8);
959 CASE(9);
960 CASE(10);
961 CASE(11);
962 CASE(12);
963 CASE(13);
964 CASE(14);
965 CASE(15);
966 CASE(16);
967 CASE(17);
968 CASE(18);
969 CASE(19);
970 CASE(20);
971 CASE(21);
972 CASE(22);
973 CASE(23);
974 CASE(24);
975 CASE(25);
976 CASE(26);
977 CASE(27);
978 CASE(28);
979 CASE(29);
980 CASE(30);
981 CASE(31);
982#undef CASE
983 }
984}
985
986void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
987 MachineBasicBlock &MBB) const {
988 // Insertion point.
990
991 // Fake a debug loc.
992 DebugLoc DL;
993 if (MBBI != MBB.end())
994 DL = MBBI->getDebugLoc();
995
996 const MachineFunction &MF = *MBB.getParent();
999
1000 BitVector GPRsToZero(TRI.getNumRegs());
1001 BitVector FPRsToZero(TRI.getNumRegs());
1002 bool HasSVE = STI.hasSVE();
1003 for (MCRegister Reg : RegsToZero.set_bits()) {
1004 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1005 // For GPRs, we only care to clear out the 64-bit register.
1006 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1007 GPRsToZero.set(XReg);
1008 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1009 // For FPRs,
1010 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1011 FPRsToZero.set(XReg);
1012 }
1013 }
1014
1015 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1016
1017 // Zero out GPRs.
1018 for (MCRegister Reg : GPRsToZero.set_bits())
1019 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1020
1021 // Zero out FP/vector registers.
1022 for (MCRegister Reg : FPRsToZero.set_bits())
1023 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1024
1025 if (HasSVE) {
1026 for (MCRegister PReg :
1027 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1028 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1029 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1030 AArch64::P15}) {
1031 if (RegsToZero[PReg])
1032 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1033 }
1034 }
1035}
1036
1038 const MachineBasicBlock &MBB) {
1039 const MachineFunction *MF = MBB.getParent();
1040 LiveRegs.addLiveIns(MBB);
1041 // Mark callee saved registers as used so we will not choose them.
1042 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1043 for (unsigned i = 0; CSRegs[i]; ++i)
1044 LiveRegs.addReg(CSRegs[i]);
1045}
1046
1047// Find a scratch register that we can use at the start of the prologue to
1048// re-align the stack pointer. We avoid using callee-save registers since they
1049// may appear to be free when this is called from canUseAsPrologue (during
1050// shrink wrapping), but then no longer be free when this is called from
1051// emitPrologue.
1052//
1053// FIXME: This is a bit conservative, since in the above case we could use one
1054// of the callee-save registers as a scratch temp to re-align the stack pointer,
1055// but we would then have to make sure that we were in fact saving at least one
1056// callee-save register in the prologue, which is additional complexity that
1057// doesn't seem worth the benefit.
1059 MachineFunction *MF = MBB->getParent();
1060
1061 // If MBB is an entry block, use X9 as the scratch register
1062 // preserve_none functions may be using X9 to pass arguments,
1063 // so prefer to pick an available register below.
1064 if (&MF->front() == MBB &&
1066 return AArch64::X9;
1067
1068 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1069 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1070 LivePhysRegs LiveRegs(TRI);
1071 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1072
1073 // Prefer X9 since it was historically used for the prologue scratch reg.
1074 const MachineRegisterInfo &MRI = MF->getRegInfo();
1075 if (LiveRegs.available(MRI, AArch64::X9))
1076 return AArch64::X9;
1077
1078 for (unsigned Reg : AArch64::GPR64RegClass) {
1079 if (LiveRegs.available(MRI, Reg))
1080 return Reg;
1081 }
1082 return AArch64::NoRegister;
1083}
1084
1086 const MachineBasicBlock &MBB) const {
1087 const MachineFunction *MF = MBB.getParent();
1088 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1089 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1090 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1091 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1093
1094 if (AFI->hasSwiftAsyncContext()) {
1095 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1096 const MachineRegisterInfo &MRI = MF->getRegInfo();
1097 LivePhysRegs LiveRegs(TRI);
1098 getLiveRegsForEntryMBB(LiveRegs, MBB);
1099 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1100 // available.
1101 if (!LiveRegs.available(MRI, AArch64::X16) ||
1102 !LiveRegs.available(MRI, AArch64::X17))
1103 return false;
1104 }
1105
1106 // Certain stack probing sequences might clobber flags, then we can't use
1107 // the block as a prologue if the flags register is a live-in.
1109 MBB.isLiveIn(AArch64::NZCV))
1110 return false;
1111
1112 // Don't need a scratch register if we're not going to re-align the stack or
1113 // emit stack probes.
1114 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1115 return true;
1116 // Otherwise, we can use any block as long as it has a scratch register
1117 // available.
1118 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1119}
1120
1122 uint64_t StackSizeInBytes) {
1123 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1125 // TODO: When implementing stack protectors, take that into account
1126 // for the probe threshold.
1127 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1128 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1129}
1130
1131static bool needsWinCFI(const MachineFunction &MF) {
1132 const Function &F = MF.getFunction();
1133 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1134 F.needsUnwindTableEntry();
1135}
1136
1137bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1138 MachineFunction &MF, uint64_t StackBumpBytes) const {
1140 const MachineFrameInfo &MFI = MF.getFrameInfo();
1141 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1142 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1143 if (homogeneousPrologEpilog(MF))
1144 return false;
1145
1146 if (AFI->getLocalStackSize() == 0)
1147 return false;
1148
1149 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1150 // (to force a stp with predecrement) to match the packed unwind format,
1151 // provided that there actually are any callee saved registers to merge the
1152 // decrement with.
1153 // This is potentially marginally slower, but allows using the packed
1154 // unwind format for functions that both have a local area and callee saved
1155 // registers. Using the packed unwind format notably reduces the size of
1156 // the unwind info.
1157 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1158 MF.getFunction().hasOptSize())
1159 return false;
1160
1161 // 512 is the maximum immediate for stp/ldp that will be used for
1162 // callee-save save/restores
1163 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1164 return false;
1165
1166 if (MFI.hasVarSizedObjects())
1167 return false;
1168
1169 if (RegInfo->hasStackRealignment(MF))
1170 return false;
1171
1172 // This isn't strictly necessary, but it simplifies things a bit since the
1173 // current RedZone handling code assumes the SP is adjusted by the
1174 // callee-save save/restore code.
1175 if (canUseRedZone(MF))
1176 return false;
1177
1178 // When there is an SVE area on the stack, always allocate the
1179 // callee-saves and spills/locals separately.
1180 if (getSVEStackSize(MF))
1181 return false;
1182
1183 return true;
1184}
1185
1186bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1187 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
1188 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1189 return false;
1190
1191 if (MBB.empty())
1192 return true;
1193
1194 // Disable combined SP bump if the last instruction is an MTE tag store. It
1195 // is almost always better to merge SP adjustment into those instructions.
1198 while (LastI != Begin) {
1199 --LastI;
1200 if (LastI->isTransient())
1201 continue;
1202 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1203 break;
1204 }
1205 switch (LastI->getOpcode()) {
1206 case AArch64::STGloop:
1207 case AArch64::STZGloop:
1208 case AArch64::STGi:
1209 case AArch64::STZGi:
1210 case AArch64::ST2Gi:
1211 case AArch64::STZ2Gi:
1212 return false;
1213 default:
1214 return true;
1215 }
1216 llvm_unreachable("unreachable");
1217}
1218
1219// Given a load or a store instruction, generate an appropriate unwinding SEH
1220// code on Windows.
1222 const TargetInstrInfo &TII,
1223 MachineInstr::MIFlag Flag) {
1224 unsigned Opc = MBBI->getOpcode();
1226 MachineFunction &MF = *MBB->getParent();
1227 DebugLoc DL = MBBI->getDebugLoc();
1228 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1229 int Imm = MBBI->getOperand(ImmIdx).getImm();
1231 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1232 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1233
1234 switch (Opc) {
1235 default:
1236 llvm_unreachable("No SEH Opcode for this instruction");
1237 case AArch64::LDPDpost:
1238 Imm = -Imm;
1239 [[fallthrough]];
1240 case AArch64::STPDpre: {
1241 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1242 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1243 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1244 .addImm(Reg0)
1245 .addImm(Reg1)
1246 .addImm(Imm * 8)
1247 .setMIFlag(Flag);
1248 break;
1249 }
1250 case AArch64::LDPXpost:
1251 Imm = -Imm;
1252 [[fallthrough]];
1253 case AArch64::STPXpre: {
1254 Register Reg0 = MBBI->getOperand(1).getReg();
1255 Register Reg1 = MBBI->getOperand(2).getReg();
1256 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1257 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1258 .addImm(Imm * 8)
1259 .setMIFlag(Flag);
1260 else
1261 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1262 .addImm(RegInfo->getSEHRegNum(Reg0))
1263 .addImm(RegInfo->getSEHRegNum(Reg1))
1264 .addImm(Imm * 8)
1265 .setMIFlag(Flag);
1266 break;
1267 }
1268 case AArch64::LDRDpost:
1269 Imm = -Imm;
1270 [[fallthrough]];
1271 case AArch64::STRDpre: {
1272 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1273 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1274 .addImm(Reg)
1275 .addImm(Imm)
1276 .setMIFlag(Flag);
1277 break;
1278 }
1279 case AArch64::LDRXpost:
1280 Imm = -Imm;
1281 [[fallthrough]];
1282 case AArch64::STRXpre: {
1283 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1284 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1285 .addImm(Reg)
1286 .addImm(Imm)
1287 .setMIFlag(Flag);
1288 break;
1289 }
1290 case AArch64::STPDi:
1291 case AArch64::LDPDi: {
1292 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1293 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1294 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1295 .addImm(Reg0)
1296 .addImm(Reg1)
1297 .addImm(Imm * 8)
1298 .setMIFlag(Flag);
1299 break;
1300 }
1301 case AArch64::STPXi:
1302 case AArch64::LDPXi: {
1303 Register Reg0 = MBBI->getOperand(0).getReg();
1304 Register Reg1 = MBBI->getOperand(1).getReg();
1305 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1306 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1307 .addImm(Imm * 8)
1308 .setMIFlag(Flag);
1309 else
1310 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1311 .addImm(RegInfo->getSEHRegNum(Reg0))
1312 .addImm(RegInfo->getSEHRegNum(Reg1))
1313 .addImm(Imm * 8)
1314 .setMIFlag(Flag);
1315 break;
1316 }
1317 case AArch64::STRXui:
1318 case AArch64::LDRXui: {
1319 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1321 .addImm(Reg)
1322 .addImm(Imm * 8)
1323 .setMIFlag(Flag);
1324 break;
1325 }
1326 case AArch64::STRDui:
1327 case AArch64::LDRDui: {
1328 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1329 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1330 .addImm(Reg)
1331 .addImm(Imm * 8)
1332 .setMIFlag(Flag);
1333 break;
1334 }
1335 case AArch64::STPQi:
1336 case AArch64::LDPQi: {
1337 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1338 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1339 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1340 .addImm(Reg0)
1341 .addImm(Reg1)
1342 .addImm(Imm * 16)
1343 .setMIFlag(Flag);
1344 break;
1345 }
1346 case AArch64::LDPQpost:
1347 Imm = -Imm;
1348 [[fallthrough]];
1349 case AArch64::STPQpre: {
1350 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1351 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1352 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1353 .addImm(Reg0)
1354 .addImm(Reg1)
1355 .addImm(Imm * 16)
1356 .setMIFlag(Flag);
1357 break;
1358 }
1359 }
1360 auto I = MBB->insertAfter(MBBI, MIB);
1361 return I;
1362}
1363
1364// Fix up the SEH opcode associated with the save/restore instruction.
1366 unsigned LocalStackSize) {
1367 MachineOperand *ImmOpnd = nullptr;
1368 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1369 switch (MBBI->getOpcode()) {
1370 default:
1371 llvm_unreachable("Fix the offset in the SEH instruction");
1372 case AArch64::SEH_SaveFPLR:
1373 case AArch64::SEH_SaveRegP:
1374 case AArch64::SEH_SaveReg:
1375 case AArch64::SEH_SaveFRegP:
1376 case AArch64::SEH_SaveFReg:
1377 case AArch64::SEH_SaveAnyRegQP:
1378 case AArch64::SEH_SaveAnyRegQPX:
1379 ImmOpnd = &MBBI->getOperand(ImmIdx);
1380 break;
1381 }
1382 if (ImmOpnd)
1383 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1384}
1385
1388 return AFI->hasStreamingModeChanges() &&
1389 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1390}
1391
1393 unsigned Opc = MBBI->getOpcode();
1394 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1395 Opc == AArch64::UBFMXri)
1396 return true;
1397
1398 if (requiresGetVGCall(*MBBI->getMF())) {
1399 if (Opc == AArch64::ORRXrr)
1400 return true;
1401
1402 if (Opc == AArch64::BL) {
1403 auto Op1 = MBBI->getOperand(0);
1404 return Op1.isSymbol() &&
1405 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1406 }
1407 }
1408
1409 return false;
1410}
1411
1412// Convert callee-save register save/restore instruction to do stack pointer
1413// decrement/increment to allocate/deallocate the callee-save stack area by
1414// converting store/load to use pre/post increment version.
1417 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1418 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1420 int CFAOffset = 0) {
1421 unsigned NewOpc;
1422
1423 // If the function contains streaming mode changes, we expect instructions
1424 // to calculate the value of VG before spilling. For locally-streaming
1425 // functions, we need to do this for both the streaming and non-streaming
1426 // vector length. Move past these instructions if necessary.
1427 MachineFunction &MF = *MBB.getParent();
1429 if (AFI->hasStreamingModeChanges())
1430 while (isVGInstruction(MBBI))
1431 ++MBBI;
1432
1433 switch (MBBI->getOpcode()) {
1434 default:
1435 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1436 case AArch64::STPXi:
1437 NewOpc = AArch64::STPXpre;
1438 break;
1439 case AArch64::STPDi:
1440 NewOpc = AArch64::STPDpre;
1441 break;
1442 case AArch64::STPQi:
1443 NewOpc = AArch64::STPQpre;
1444 break;
1445 case AArch64::STRXui:
1446 NewOpc = AArch64::STRXpre;
1447 break;
1448 case AArch64::STRDui:
1449 NewOpc = AArch64::STRDpre;
1450 break;
1451 case AArch64::STRQui:
1452 NewOpc = AArch64::STRQpre;
1453 break;
1454 case AArch64::LDPXi:
1455 NewOpc = AArch64::LDPXpost;
1456 break;
1457 case AArch64::LDPDi:
1458 NewOpc = AArch64::LDPDpost;
1459 break;
1460 case AArch64::LDPQi:
1461 NewOpc = AArch64::LDPQpost;
1462 break;
1463 case AArch64::LDRXui:
1464 NewOpc = AArch64::LDRXpost;
1465 break;
1466 case AArch64::LDRDui:
1467 NewOpc = AArch64::LDRDpost;
1468 break;
1469 case AArch64::LDRQui:
1470 NewOpc = AArch64::LDRQpost;
1471 break;
1472 }
1473 // Get rid of the SEH code associated with the old instruction.
1474 if (NeedsWinCFI) {
1475 auto SEH = std::next(MBBI);
1477 SEH->eraseFromParent();
1478 }
1479
1480 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1481 int64_t MinOffset, MaxOffset;
1482 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1483 NewOpc, Scale, Width, MinOffset, MaxOffset);
1484 (void)Success;
1485 assert(Success && "unknown load/store opcode");
1486
1487 // If the first store isn't right where we want SP then we can't fold the
1488 // update in so create a normal arithmetic instruction instead.
1489 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1490 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1491 // If we are destroying the frame, make sure we add the increment after the
1492 // last frame operation.
1493 if (FrameFlag == MachineInstr::FrameDestroy)
1494 ++MBBI;
1495 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1496 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1497 false, false, nullptr, EmitCFI,
1498 StackOffset::getFixed(CFAOffset));
1499
1500 return std::prev(MBBI);
1501 }
1502
1503 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1504 MIB.addReg(AArch64::SP, RegState::Define);
1505
1506 // Copy all operands other than the immediate offset.
1507 unsigned OpndIdx = 0;
1508 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1509 ++OpndIdx)
1510 MIB.add(MBBI->getOperand(OpndIdx));
1511
1512 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1513 "Unexpected immediate offset in first/last callee-save save/restore "
1514 "instruction!");
1515 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1516 "Unexpected base register in callee-save save/restore instruction!");
1517 assert(CSStackSizeInc % Scale == 0);
1518 MIB.addImm(CSStackSizeInc / (int)Scale);
1519
1520 MIB.setMIFlags(MBBI->getFlags());
1521 MIB.setMemRefs(MBBI->memoperands());
1522
1523 // Generate a new SEH code that corresponds to the new instruction.
1524 if (NeedsWinCFI) {
1525 *HasWinCFI = true;
1526 InsertSEH(*MIB, *TII, FrameFlag);
1527 }
1528
1529 if (EmitCFI) {
1530 unsigned CFIIndex = MF.addFrameInst(
1531 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1532 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1533 .addCFIIndex(CFIIndex)
1534 .setMIFlags(FrameFlag);
1535 }
1536
1537 return std::prev(MBB.erase(MBBI));
1538}
1539
1540// Fixup callee-save register save/restore instructions to take into account
1541// combined SP bump by adding the local stack size to the stack offsets.
1543 uint64_t LocalStackSize,
1544 bool NeedsWinCFI,
1545 bool *HasWinCFI) {
1547 return;
1548
1549 unsigned Opc = MI.getOpcode();
1550 unsigned Scale;
1551 switch (Opc) {
1552 case AArch64::STPXi:
1553 case AArch64::STRXui:
1554 case AArch64::STPDi:
1555 case AArch64::STRDui:
1556 case AArch64::LDPXi:
1557 case AArch64::LDRXui:
1558 case AArch64::LDPDi:
1559 case AArch64::LDRDui:
1560 Scale = 8;
1561 break;
1562 case AArch64::STPQi:
1563 case AArch64::STRQui:
1564 case AArch64::LDPQi:
1565 case AArch64::LDRQui:
1566 Scale = 16;
1567 break;
1568 default:
1569 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1570 }
1571
1572 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1573 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1574 "Unexpected base register in callee-save save/restore instruction!");
1575 // Last operand is immediate offset that needs fixing.
1576 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1577 // All generated opcodes have scaled offsets.
1578 assert(LocalStackSize % Scale == 0);
1579 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1580
1581 if (NeedsWinCFI) {
1582 *HasWinCFI = true;
1583 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1584 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1586 "Expecting a SEH instruction");
1587 fixupSEHOpcode(MBBI, LocalStackSize);
1588 }
1589}
1590
1591static bool isTargetWindows(const MachineFunction &MF) {
1593}
1594
1595// Convenience function to determine whether I is an SVE callee save.
1597 switch (I->getOpcode()) {
1598 default:
1599 return false;
1600 case AArch64::PTRUE_C_B:
1601 case AArch64::LD1B_2Z_IMM:
1602 case AArch64::ST1B_2Z_IMM:
1603 case AArch64::STR_ZXI:
1604 case AArch64::STR_PXI:
1605 case AArch64::LDR_ZXI:
1606 case AArch64::LDR_PXI:
1607 return I->getFlag(MachineInstr::FrameSetup) ||
1608 I->getFlag(MachineInstr::FrameDestroy);
1609 }
1610}
1611
1613 MachineFunction &MF,
1616 const DebugLoc &DL, bool NeedsWinCFI,
1617 bool NeedsUnwindInfo) {
1618 // Shadow call stack prolog: str x30, [x18], #8
1619 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1620 .addReg(AArch64::X18, RegState::Define)
1621 .addReg(AArch64::LR)
1622 .addReg(AArch64::X18)
1623 .addImm(8)
1625
1626 // This instruction also makes x18 live-in to the entry block.
1627 MBB.addLiveIn(AArch64::X18);
1628
1629 if (NeedsWinCFI)
1630 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1632
1633 if (NeedsUnwindInfo) {
1634 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1635 // x18 when unwinding past this frame.
1636 static const char CFIInst[] = {
1637 dwarf::DW_CFA_val_expression,
1638 18, // register
1639 2, // length
1640 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1641 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1642 };
1643 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1644 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1645 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1646 .addCFIIndex(CFIIndex)
1648 }
1649}
1650
1652 MachineFunction &MF,
1655 const DebugLoc &DL) {
1656 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1657 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1658 .addReg(AArch64::X18, RegState::Define)
1659 .addReg(AArch64::LR, RegState::Define)
1660 .addReg(AArch64::X18)
1661 .addImm(-8)
1663
1665 unsigned CFIIndex =
1667 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1668 .addCFIIndex(CFIIndex)
1670 }
1671}
1672
1673// Define the current CFA rule to use the provided FP.
1676 const DebugLoc &DL, unsigned FixedObject) {
1679 const TargetInstrInfo *TII = STI.getInstrInfo();
1681
1682 const int OffsetToFirstCalleeSaveFromFP =
1685 Register FramePtr = TRI->getFrameRegister(MF);
1686 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1687 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1688 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1689 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1690 .addCFIIndex(CFIIndex)
1692}
1693
1694#ifndef NDEBUG
1695/// Collect live registers from the end of \p MI's parent up to (including) \p
1696/// MI in \p LiveRegs.
1698 LivePhysRegs &LiveRegs) {
1699
1700 MachineBasicBlock &MBB = *MI.getParent();
1701 LiveRegs.addLiveOuts(MBB);
1702 for (const MachineInstr &MI :
1703 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1704 LiveRegs.stepBackward(MI);
1705}
1706#endif
1707
1709 MachineBasicBlock &MBB) const {
1711 const MachineFrameInfo &MFI = MF.getFrameInfo();
1712 const Function &F = MF.getFunction();
1713 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1714 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1715 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1716
1717 MachineModuleInfo &MMI = MF.getMMI();
1719 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1720 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1721 bool HasFP = hasFP(MF);
1722 bool NeedsWinCFI = needsWinCFI(MF);
1723 bool HasWinCFI = false;
1724 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1725
1727#ifndef NDEBUG
1729 // Collect live register from the end of MBB up to the start of the existing
1730 // frame setup instructions.
1731 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1732 while (NonFrameStart != End &&
1733 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1734 ++NonFrameStart;
1735
1736 LivePhysRegs LiveRegs(*TRI);
1737 if (NonFrameStart != MBB.end()) {
1738 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1739 // Ignore registers used for stack management for now.
1740 LiveRegs.removeReg(AArch64::SP);
1741 LiveRegs.removeReg(AArch64::X19);
1742 LiveRegs.removeReg(AArch64::FP);
1743 LiveRegs.removeReg(AArch64::LR);
1744
1745 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1746 // This is necessary to spill VG if required where SVE is unavailable, but
1747 // X0 is preserved around this call.
1748 if (requiresGetVGCall(MF))
1749 LiveRegs.removeReg(AArch64::X0);
1750 }
1751
1752 auto VerifyClobberOnExit = make_scope_exit([&]() {
1753 if (NonFrameStart == MBB.end())
1754 return;
1755 // Check if any of the newly instructions clobber any of the live registers.
1756 for (MachineInstr &MI :
1757 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1758 for (auto &Op : MI.operands())
1759 if (Op.isReg() && Op.isDef())
1760 assert(!LiveRegs.contains(Op.getReg()) &&
1761 "live register clobbered by inserted prologue instructions");
1762 }
1763 });
1764#endif
1765
1766 bool IsFunclet = MBB.isEHFuncletEntry();
1767
1768 // At this point, we're going to decide whether or not the function uses a
1769 // redzone. In most cases, the function doesn't have a redzone so let's
1770 // assume that's false and set it to true in the case that there's a redzone.
1771 AFI->setHasRedZone(false);
1772
1773 // Debug location must be unknown since the first debug location is used
1774 // to determine the end of the prologue.
1775 DebugLoc DL;
1776
1777 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1778 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1779 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1780 MFnI.needsDwarfUnwindInfo(MF));
1781
1782 if (MFnI.shouldSignReturnAddress(MF)) {
1783 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1785 if (NeedsWinCFI)
1786 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1787 }
1788
1789 if (EmitCFI && MFnI.isMTETagged()) {
1790 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1792 }
1793
1794 // We signal the presence of a Swift extended frame to external tools by
1795 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1796 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1797 // bits so that is still true.
1798 if (HasFP && AFI->hasSwiftAsyncContext()) {
1801 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1802 // The special symbol below is absolute and has a *value* that can be
1803 // combined with the frame pointer to signal an extended frame.
1804 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1805 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1807 if (NeedsWinCFI) {
1808 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1810 HasWinCFI = true;
1811 }
1812 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1813 .addUse(AArch64::FP)
1814 .addUse(AArch64::X16)
1815 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1816 if (NeedsWinCFI) {
1817 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1819 HasWinCFI = true;
1820 }
1821 break;
1822 }
1823 [[fallthrough]];
1824
1826 // ORR x29, x29, #0x1000_0000_0000_0000
1827 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1828 .addUse(AArch64::FP)
1829 .addImm(0x1100)
1831 if (NeedsWinCFI) {
1832 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1834 HasWinCFI = true;
1835 }
1836 break;
1837
1839 break;
1840 }
1841 }
1842
1843 // All calls are tail calls in GHC calling conv, and functions have no
1844 // prologue/epilogue.
1846 return;
1847
1848 // Set tagged base pointer to the requested stack slot.
1849 // Ideally it should match SP value after prologue.
1850 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1851 if (TBPI)
1853 else
1855
1856 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1857
1858 // getStackSize() includes all the locals in its size calculation. We don't
1859 // include these locals when computing the stack size of a funclet, as they
1860 // are allocated in the parent's stack frame and accessed via the frame
1861 // pointer from the funclet. We only save the callee saved registers in the
1862 // funclet, which are really the callee saved registers of the parent
1863 // function, including the funclet.
1864 int64_t NumBytes =
1865 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1866 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1867 assert(!HasFP && "unexpected function without stack frame but with FP");
1868 assert(!SVEStackSize &&
1869 "unexpected function without stack frame but with SVE objects");
1870 // All of the stack allocation is for locals.
1871 AFI->setLocalStackSize(NumBytes);
1872 if (!NumBytes)
1873 return;
1874 // REDZONE: If the stack size is less than 128 bytes, we don't need
1875 // to actually allocate.
1876 if (canUseRedZone(MF)) {
1877 AFI->setHasRedZone(true);
1878 ++NumRedZoneFunctions;
1879 } else {
1880 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1881 StackOffset::getFixed(-NumBytes), TII,
1882 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1883 if (EmitCFI) {
1884 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1885 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1886 // Encode the stack size of the leaf function.
1887 unsigned CFIIndex = MF.addFrameInst(
1888 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1889 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1890 .addCFIIndex(CFIIndex)
1892 }
1893 }
1894
1895 if (NeedsWinCFI) {
1896 HasWinCFI = true;
1897 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1899 }
1900
1901 return;
1902 }
1903
1904 bool IsWin64 =
1906 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1907
1908 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1909 // All of the remaining stack allocations are for locals.
1910 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1911 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1912 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1913 if (CombineSPBump) {
1914 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1915 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1916 StackOffset::getFixed(-NumBytes), TII,
1917 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1918 EmitAsyncCFI);
1919 NumBytes = 0;
1920 } else if (HomPrologEpilog) {
1921 // Stack has been already adjusted.
1922 NumBytes -= PrologueSaveSize;
1923 } else if (PrologueSaveSize != 0) {
1925 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1926 EmitAsyncCFI);
1927 NumBytes -= PrologueSaveSize;
1928 }
1929 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1930
1931 // Move past the saves of the callee-saved registers, fixing up the offsets
1932 // and pre-inc if we decided to combine the callee-save and local stack
1933 // pointer bump above.
1934 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1936 // Move past instructions generated to calculate VG
1937 if (AFI->hasStreamingModeChanges())
1938 while (isVGInstruction(MBBI))
1939 ++MBBI;
1940
1941 if (CombineSPBump)
1943 NeedsWinCFI, &HasWinCFI);
1944 ++MBBI;
1945 }
1946
1947 // For funclets the FP belongs to the containing function.
1948 if (!IsFunclet && HasFP) {
1949 // Only set up FP if we actually need to.
1950 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1951
1952 if (CombineSPBump)
1953 FPOffset += AFI->getLocalStackSize();
1954
1955 if (AFI->hasSwiftAsyncContext()) {
1956 // Before we update the live FP we have to ensure there's a valid (or
1957 // null) asynchronous context in its slot just before FP in the frame
1958 // record, so store it now.
1959 const auto &Attrs = MF.getFunction().getAttributes();
1960 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1961 if (HaveInitialContext)
1962 MBB.addLiveIn(AArch64::X22);
1963 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1964 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1965 .addUse(Reg)
1966 .addUse(AArch64::SP)
1967 .addImm(FPOffset - 8)
1969 if (NeedsWinCFI) {
1970 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1971 // to multiple instructions, should be mutually-exclusive.
1972 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1973 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1975 HasWinCFI = true;
1976 }
1977 }
1978
1979 if (HomPrologEpilog) {
1980 auto Prolog = MBBI;
1981 --Prolog;
1982 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1983 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1984 } else {
1985 // Issue sub fp, sp, FPOffset or
1986 // mov fp,sp when FPOffset is zero.
1987 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1988 // This code marks the instruction(s) that set the FP also.
1989 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1990 StackOffset::getFixed(FPOffset), TII,
1991 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1992 if (NeedsWinCFI && HasWinCFI) {
1993 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1995 // After setting up the FP, the rest of the prolog doesn't need to be
1996 // included in the SEH unwind info.
1997 NeedsWinCFI = false;
1998 }
1999 }
2000 if (EmitAsyncCFI)
2001 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2002 }
2003
2004 // Now emit the moves for whatever callee saved regs we have (including FP,
2005 // LR if those are saved). Frame instructions for SVE register are emitted
2006 // later, after the instruction which actually save SVE regs.
2007 if (EmitAsyncCFI)
2008 emitCalleeSavedGPRLocations(MBB, MBBI);
2009
2010 // Alignment is required for the parent frame, not the funclet
2011 const bool NeedsRealignment =
2012 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2013 const int64_t RealignmentPadding =
2014 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2015 ? MFI.getMaxAlign().value() - 16
2016 : 0;
2017
2018 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2019 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2020 if (NeedsWinCFI) {
2021 HasWinCFI = true;
2022 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2023 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2024 // This is at most two instructions, MOVZ follwed by MOVK.
2025 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2026 // exceeding 256MB in size.
2027 if (NumBytes >= (1 << 28))
2028 report_fatal_error("Stack size cannot exceed 256MB for stack "
2029 "unwinding purposes");
2030
2031 uint32_t LowNumWords = NumWords & 0xFFFF;
2032 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2033 .addImm(LowNumWords)
2036 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2038 if ((NumWords & 0xFFFF0000) != 0) {
2039 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2040 .addReg(AArch64::X15)
2041 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2044 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2046 }
2047 } else {
2048 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2049 .addImm(NumWords)
2051 }
2052
2053 const char *ChkStk = Subtarget.getChkStkName();
2054 switch (MF.getTarget().getCodeModel()) {
2055 case CodeModel::Tiny:
2056 case CodeModel::Small:
2057 case CodeModel::Medium:
2058 case CodeModel::Kernel:
2059 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2060 .addExternalSymbol(ChkStk)
2061 .addReg(AArch64::X15, RegState::Implicit)
2066 if (NeedsWinCFI) {
2067 HasWinCFI = true;
2068 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2070 }
2071 break;
2072 case CodeModel::Large:
2073 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2074 .addReg(AArch64::X16, RegState::Define)
2075 .addExternalSymbol(ChkStk)
2076 .addExternalSymbol(ChkStk)
2078 if (NeedsWinCFI) {
2079 HasWinCFI = true;
2080 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2082 }
2083
2084 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2085 .addReg(AArch64::X16, RegState::Kill)
2091 if (NeedsWinCFI) {
2092 HasWinCFI = true;
2093 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2095 }
2096 break;
2097 }
2098
2099 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2100 .addReg(AArch64::SP, RegState::Kill)
2101 .addReg(AArch64::X15, RegState::Kill)
2104 if (NeedsWinCFI) {
2105 HasWinCFI = true;
2106 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2107 .addImm(NumBytes)
2109 }
2110 NumBytes = 0;
2111
2112 if (RealignmentPadding > 0) {
2113 if (RealignmentPadding >= 4096) {
2114 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2115 .addReg(AArch64::X16, RegState::Define)
2116 .addImm(RealignmentPadding)
2118 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2119 .addReg(AArch64::SP)
2120 .addReg(AArch64::X16, RegState::Kill)
2123 } else {
2124 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2125 .addReg(AArch64::SP)
2126 .addImm(RealignmentPadding)
2127 .addImm(0)
2129 }
2130
2131 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2132 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2133 .addReg(AArch64::X15, RegState::Kill)
2135 AFI->setStackRealigned(true);
2136
2137 // No need for SEH instructions here; if we're realigning the stack,
2138 // we've set a frame pointer and already finished the SEH prologue.
2139 assert(!NeedsWinCFI);
2140 }
2141 }
2142
2143 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2144 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2145
2146 // Process the SVE callee-saves to determine what space needs to be
2147 // allocated.
2148 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2149 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2150 << "\n");
2151 // Find callee save instructions in frame.
2152 CalleeSavesBegin = MBBI;
2153 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2155 ++MBBI;
2156 CalleeSavesEnd = MBBI;
2157
2158 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2159 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2160 }
2161
2162 // Allocate space for the callee saves (if any).
2163 StackOffset CFAOffset =
2164 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2165 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2166 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2167 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2168 MFI.hasVarSizedObjects() || LocalsSize);
2169 CFAOffset += SVECalleeSavesSize;
2170
2171 if (EmitAsyncCFI)
2172 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2173
2174 // Allocate space for the rest of the frame including SVE locals. Align the
2175 // stack as necessary.
2176 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2177 "Cannot use redzone with stack realignment");
2178 if (!canUseRedZone(MF)) {
2179 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2180 // the correct value here, as NumBytes also includes padding bytes,
2181 // which shouldn't be counted here.
2182 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2183 SVELocalsSize + StackOffset::getFixed(NumBytes),
2184 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2185 CFAOffset, MFI.hasVarSizedObjects());
2186 }
2187
2188 // If we need a base pointer, set it up here. It's whatever the value of the
2189 // stack pointer is at this point. Any variable size objects will be allocated
2190 // after this, so we can still use the base pointer to reference locals.
2191 //
2192 // FIXME: Clarify FrameSetup flags here.
2193 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2194 // needed.
2195 // For funclets the BP belongs to the containing function.
2196 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2197 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2198 false);
2199 if (NeedsWinCFI) {
2200 HasWinCFI = true;
2201 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2203 }
2204 }
2205
2206 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2207 // SEH opcode indicating the prologue end.
2208 if (NeedsWinCFI && HasWinCFI) {
2209 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2211 }
2212
2213 // SEH funclets are passed the frame pointer in X1. If the parent
2214 // function uses the base register, then the base register is used
2215 // directly, and is not retrieved from X1.
2216 if (IsFunclet && F.hasPersonalityFn()) {
2217 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2218 if (isAsynchronousEHPersonality(Per)) {
2219 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2220 .addReg(AArch64::X1)
2222 MBB.addLiveIn(AArch64::X1);
2223 }
2224 }
2225
2226 if (EmitCFI && !EmitAsyncCFI) {
2227 if (HasFP) {
2228 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2229 } else {
2230 StackOffset TotalSize =
2231 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2232 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2233 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2234 /*LastAdjustmentWasScalable=*/false));
2235 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2236 .addCFIIndex(CFIIndex)
2238 }
2239 emitCalleeSavedGPRLocations(MBB, MBBI);
2240 emitCalleeSavedSVELocations(MBB, MBBI);
2241 }
2242}
2243
2245 switch (MI.getOpcode()) {
2246 default:
2247 return false;
2248 case AArch64::CATCHRET:
2249 case AArch64::CLEANUPRET:
2250 return true;
2251 }
2252}
2253
2255 MachineBasicBlock &MBB) const {
2257 MachineFrameInfo &MFI = MF.getFrameInfo();
2259 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2260 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2261 DebugLoc DL;
2262 bool NeedsWinCFI = needsWinCFI(MF);
2263 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2264 bool HasWinCFI = false;
2265 bool IsFunclet = false;
2266
2267 if (MBB.end() != MBBI) {
2268 DL = MBBI->getDebugLoc();
2269 IsFunclet = isFuncletReturnInstr(*MBBI);
2270 }
2271
2272 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2273
2274 auto FinishingTouches = make_scope_exit([&]() {
2275 if (AFI->shouldSignReturnAddress(MF)) {
2276 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2277 TII->get(AArch64::PAUTH_EPILOGUE))
2278 .setMIFlag(MachineInstr::FrameDestroy);
2279 if (NeedsWinCFI)
2280 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2281 }
2284 if (EmitCFI)
2285 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2286 if (HasWinCFI) {
2288 TII->get(AArch64::SEH_EpilogEnd))
2290 if (!MF.hasWinCFI())
2291 MF.setHasWinCFI(true);
2292 }
2293 if (NeedsWinCFI) {
2294 assert(EpilogStartI != MBB.end());
2295 if (!HasWinCFI)
2296 MBB.erase(EpilogStartI);
2297 }
2298 });
2299
2300 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2301 : MFI.getStackSize();
2302
2303 // All calls are tail calls in GHC calling conv, and functions have no
2304 // prologue/epilogue.
2306 return;
2307
2308 // How much of the stack used by incoming arguments this function is expected
2309 // to restore in this particular epilogue.
2310 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2311 bool IsWin64 =
2312 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2313 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2314
2315 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2316 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2317 // We cannot rely on the local stack size set in emitPrologue if the function
2318 // has funclets, as funclets have different local stack size requirements, and
2319 // the current value set in emitPrologue may be that of the containing
2320 // function.
2321 if (MF.hasEHFunclets())
2322 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2323 if (homogeneousPrologEpilog(MF, &MBB)) {
2324 assert(!NeedsWinCFI);
2325 auto LastPopI = MBB.getFirstTerminator();
2326 if (LastPopI != MBB.begin()) {
2327 auto HomogeneousEpilog = std::prev(LastPopI);
2328 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2329 LastPopI = HomogeneousEpilog;
2330 }
2331
2332 // Adjust local stack
2333 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2335 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2336
2337 // SP has been already adjusted while restoring callee save regs.
2338 // We've bailed-out the case with adjusting SP for arguments.
2339 assert(AfterCSRPopSize == 0);
2340 return;
2341 }
2342 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2343 // Assume we can't combine the last pop with the sp restore.
2344
2345 bool CombineAfterCSRBump = false;
2346 if (!CombineSPBump && PrologueSaveSize != 0) {
2348 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2350 Pop = std::prev(Pop);
2351 // Converting the last ldp to a post-index ldp is valid only if the last
2352 // ldp's offset is 0.
2353 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2354 // If the offset is 0 and the AfterCSR pop is not actually trying to
2355 // allocate more stack for arguments (in space that an untimely interrupt
2356 // may clobber), convert it to a post-index ldp.
2357 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2359 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2360 MachineInstr::FrameDestroy, PrologueSaveSize);
2361 } else {
2362 // If not, make sure to emit an add after the last ldp.
2363 // We're doing this by transfering the size to be restored from the
2364 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2365 // pops.
2366 AfterCSRPopSize += PrologueSaveSize;
2367 CombineAfterCSRBump = true;
2368 }
2369 }
2370
2371 // Move past the restores of the callee-saved registers.
2372 // If we plan on combining the sp bump of the local stack size and the callee
2373 // save stack size, we might need to adjust the CSR save and restore offsets.
2376 while (LastPopI != Begin) {
2377 --LastPopI;
2378 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2379 IsSVECalleeSave(LastPopI)) {
2380 ++LastPopI;
2381 break;
2382 } else if (CombineSPBump)
2384 NeedsWinCFI, &HasWinCFI);
2385 }
2386
2387 if (NeedsWinCFI) {
2388 // Note that there are cases where we insert SEH opcodes in the
2389 // epilogue when we had no SEH opcodes in the prologue. For
2390 // example, when there is no stack frame but there are stack
2391 // arguments. Insert the SEH_EpilogStart and remove it later if it
2392 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2393 // functions that don't need it.
2394 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2396 EpilogStartI = LastPopI;
2397 --EpilogStartI;
2398 }
2399
2400 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2403 // Avoid the reload as it is GOT relative, and instead fall back to the
2404 // hardcoded value below. This allows a mismatch between the OS and
2405 // application without immediately terminating on the difference.
2406 [[fallthrough]];
2408 // We need to reset FP to its untagged state on return. Bit 60 is
2409 // currently used to show the presence of an extended frame.
2410
2411 // BIC x29, x29, #0x1000_0000_0000_0000
2412 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2413 AArch64::FP)
2414 .addUse(AArch64::FP)
2415 .addImm(0x10fe)
2417 if (NeedsWinCFI) {
2418 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2420 HasWinCFI = true;
2421 }
2422 break;
2423
2425 break;
2426 }
2427 }
2428
2429 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2430
2431 // If there is a single SP update, insert it before the ret and we're done.
2432 if (CombineSPBump) {
2433 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2434
2435 // When we are about to restore the CSRs, the CFA register is SP again.
2436 if (EmitCFI && hasFP(MF)) {
2437 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2438 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2439 unsigned CFIIndex =
2440 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2441 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2442 .addCFIIndex(CFIIndex)
2444 }
2445
2446 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2447 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2448 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2449 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2450 return;
2451 }
2452
2453 NumBytes -= PrologueSaveSize;
2454 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2455
2456 // Process the SVE callee-saves to determine what space needs to be
2457 // deallocated.
2458 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2459 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2460 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2461 RestoreBegin = std::prev(RestoreEnd);
2462 while (RestoreBegin != MBB.begin() &&
2463 IsSVECalleeSave(std::prev(RestoreBegin)))
2464 --RestoreBegin;
2465
2466 assert(IsSVECalleeSave(RestoreBegin) &&
2467 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2468
2469 StackOffset CalleeSavedSizeAsOffset =
2470 StackOffset::getScalable(CalleeSavedSize);
2471 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2472 DeallocateAfter = CalleeSavedSizeAsOffset;
2473 }
2474
2475 // Deallocate the SVE area.
2476 if (SVEStackSize) {
2477 // If we have stack realignment or variable sized objects on the stack,
2478 // restore the stack pointer from the frame pointer prior to SVE CSR
2479 // restoration.
2480 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2481 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2482 // Set SP to start of SVE callee-save area from which they can
2483 // be reloaded. The code below will deallocate the stack space
2484 // space by moving FP -> SP.
2485 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2486 StackOffset::getScalable(-CalleeSavedSize), TII,
2488 }
2489 } else {
2490 if (AFI->getSVECalleeSavedStackSize()) {
2491 // Deallocate the non-SVE locals first before we can deallocate (and
2492 // restore callee saves) from the SVE area.
2494 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2496 false, false, nullptr, EmitCFI && !hasFP(MF),
2497 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2498 NumBytes = 0;
2499 }
2500
2501 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2502 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2503 false, nullptr, EmitCFI && !hasFP(MF),
2504 SVEStackSize +
2505 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2506
2507 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2508 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2509 false, nullptr, EmitCFI && !hasFP(MF),
2510 DeallocateAfter +
2511 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2512 }
2513 if (EmitCFI)
2514 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2515 }
2516
2517 if (!hasFP(MF)) {
2518 bool RedZone = canUseRedZone(MF);
2519 // If this was a redzone leaf function, we don't need to restore the
2520 // stack pointer (but we may need to pop stack args for fastcc).
2521 if (RedZone && AfterCSRPopSize == 0)
2522 return;
2523
2524 // Pop the local variables off the stack. If there are no callee-saved
2525 // registers, it means we are actually positioned at the terminator and can
2526 // combine stack increment for the locals and the stack increment for
2527 // callee-popped arguments into (possibly) a single instruction and be done.
2528 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2529 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2530 if (NoCalleeSaveRestore)
2531 StackRestoreBytes += AfterCSRPopSize;
2532
2534 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2535 StackOffset::getFixed(StackRestoreBytes), TII,
2536 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2537 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2538
2539 // If we were able to combine the local stack pop with the argument pop,
2540 // then we're done.
2541 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2542 return;
2543 }
2544
2545 NumBytes = 0;
2546 }
2547
2548 // Restore the original stack pointer.
2549 // FIXME: Rather than doing the math here, we should instead just use
2550 // non-post-indexed loads for the restores if we aren't actually going to
2551 // be able to save any instructions.
2552 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2554 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2556 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2557 } else if (NumBytes)
2558 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2559 StackOffset::getFixed(NumBytes), TII,
2560 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2561
2562 // When we are about to restore the CSRs, the CFA register is SP again.
2563 if (EmitCFI && hasFP(MF)) {
2564 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2565 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2566 unsigned CFIIndex = MF.addFrameInst(
2567 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2568 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2569 .addCFIIndex(CFIIndex)
2571 }
2572
2573 // This must be placed after the callee-save restore code because that code
2574 // assumes the SP is at the same location as it was after the callee-save save
2575 // code in the prologue.
2576 if (AfterCSRPopSize) {
2577 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2578 "interrupt may have clobbered");
2579
2581 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2583 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2584 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2585 }
2586}
2587
2590 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2591}
2592
2593/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2594/// debug info. It's the same as what we use for resolving the code-gen
2595/// references for now. FIXME: This can go wrong when references are
2596/// SP-relative and simple call frames aren't used.
2599 Register &FrameReg) const {
2601 MF, FI, FrameReg,
2602 /*PreferFP=*/
2603 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2604 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2605 /*ForSimm=*/false);
2606}
2607
2610 int FI) const {
2612}
2613
2615 int64_t ObjectOffset) {
2616 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2617 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2618 bool IsWin64 =
2619 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2620 unsigned FixedObject =
2621 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2622 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2623 int64_t FPAdjust =
2624 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2625 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2626}
2627
2629 int64_t ObjectOffset) {
2630 const auto &MFI = MF.getFrameInfo();
2631 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2632}
2633
2634// TODO: This function currently does not work for scalable vectors.
2636 int FI) const {
2637 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2639 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2640 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2641 ? getFPOffset(MF, ObjectOffset).getFixed()
2642 : getStackOffset(MF, ObjectOffset).getFixed();
2643}
2644
2646 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2647 bool ForSimm) const {
2648 const auto &MFI = MF.getFrameInfo();
2649 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2650 bool isFixed = MFI.isFixedObjectIndex(FI);
2651 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2652 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2653 PreferFP, ForSimm);
2654}
2655
2657 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2658 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2659 const auto &MFI = MF.getFrameInfo();
2660 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2662 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2663 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2664
2665 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2666 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2667 bool isCSR =
2668 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2669
2670 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2671
2672 // Use frame pointer to reference fixed objects. Use it for locals if
2673 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2674 // reliable as a base). Make sure useFPForScavengingIndex() does the
2675 // right thing for the emergency spill slot.
2676 bool UseFP = false;
2677 if (AFI->hasStackFrame() && !isSVE) {
2678 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2679 // there are scalable (SVE) objects in between the FP and the fixed-sized
2680 // objects.
2681 PreferFP &= !SVEStackSize;
2682
2683 // Note: Keeping the following as multiple 'if' statements rather than
2684 // merging to a single expression for readability.
2685 //
2686 // Argument access should always use the FP.
2687 if (isFixed) {
2688 UseFP = hasFP(MF);
2689 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2690 // References to the CSR area must use FP if we're re-aligning the stack
2691 // since the dynamically-sized alignment padding is between the SP/BP and
2692 // the CSR area.
2693 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2694 UseFP = true;
2695 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2696 // If the FPOffset is negative and we're producing a signed immediate, we
2697 // have to keep in mind that the available offset range for negative
2698 // offsets is smaller than for positive ones. If an offset is available
2699 // via the FP and the SP, use whichever is closest.
2700 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2701 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2702
2703 if (MFI.hasVarSizedObjects()) {
2704 // If we have variable sized objects, we can use either FP or BP, as the
2705 // SP offset is unknown. We can use the base pointer if we have one and
2706 // FP is not preferred. If not, we're stuck with using FP.
2707 bool CanUseBP = RegInfo->hasBasePointer(MF);
2708 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2709 UseFP = PreferFP;
2710 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2711 UseFP = true;
2712 // else we can use BP and FP, but the offset from FP won't fit.
2713 // That will make us scavenge registers which we can probably avoid by
2714 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2715 } else if (FPOffset >= 0) {
2716 // Use SP or FP, whichever gives us the best chance of the offset
2717 // being in range for direct access. If the FPOffset is positive,
2718 // that'll always be best, as the SP will be even further away.
2719 UseFP = true;
2720 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2721 // Funclets access the locals contained in the parent's stack frame
2722 // via the frame pointer, so we have to use the FP in the parent
2723 // function.
2724 (void) Subtarget;
2725 assert(
2726 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2727 "Funclets should only be present on Win64");
2728 UseFP = true;
2729 } else {
2730 // We have the choice between FP and (SP or BP).
2731 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2732 UseFP = true;
2733 }
2734 }
2735 }
2736
2737 assert(
2738 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2739 "In the presence of dynamic stack pointer realignment, "
2740 "non-argument/CSR objects cannot be accessed through the frame pointer");
2741
2742 if (isSVE) {
2743 StackOffset FPOffset =
2745 StackOffset SPOffset =
2746 SVEStackSize +
2747 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2748 ObjectOffset);
2749 // Always use the FP for SVE spills if available and beneficial.
2750 if (hasFP(MF) && (SPOffset.getFixed() ||
2751 FPOffset.getScalable() < SPOffset.getScalable() ||
2752 RegInfo->hasStackRealignment(MF))) {
2753 FrameReg = RegInfo->getFrameRegister(MF);
2754 return FPOffset;
2755 }
2756
2757 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2758 : (unsigned)AArch64::SP;
2759 return SPOffset;
2760 }
2761
2762 StackOffset ScalableOffset = {};
2763 if (UseFP && !(isFixed || isCSR))
2764 ScalableOffset = -SVEStackSize;
2765 if (!UseFP && (isFixed || isCSR))
2766 ScalableOffset = SVEStackSize;
2767
2768 if (UseFP) {
2769 FrameReg = RegInfo->getFrameRegister(MF);
2770 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2771 }
2772
2773 // Use the base pointer if we have one.
2774 if (RegInfo->hasBasePointer(MF))
2775 FrameReg = RegInfo->getBaseRegister();
2776 else {
2777 assert(!MFI.hasVarSizedObjects() &&
2778 "Can't use SP when we have var sized objects.");
2779 FrameReg = AArch64::SP;
2780 // If we're using the red zone for this function, the SP won't actually
2781 // be adjusted, so the offsets will be negative. They're also all
2782 // within range of the signed 9-bit immediate instructions.
2783 if (canUseRedZone(MF))
2784 Offset -= AFI->getLocalStackSize();
2785 }
2786
2787 return StackOffset::getFixed(Offset) + ScalableOffset;
2788}
2789
2790static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2791 // Do not set a kill flag on values that are also marked as live-in. This
2792 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2793 // callee saved registers.
2794 // Omitting the kill flags is conservatively correct even if the live-in
2795 // is not used after all.
2796 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2797 return getKillRegState(!IsLiveIn);
2798}
2799
2801 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2803 return Subtarget.isTargetMachO() &&
2804 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2805 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2807}
2808
2809static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2810 bool NeedsWinCFI, bool IsFirst,
2811 const TargetRegisterInfo *TRI) {
2812 // If we are generating register pairs for a Windows function that requires
2813 // EH support, then pair consecutive registers only. There are no unwind
2814 // opcodes for saves/restores of non-consectuve register pairs.
2815 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2816 // save_lrpair.
2817 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2818
2819 if (Reg2 == AArch64::FP)
2820 return true;
2821 if (!NeedsWinCFI)
2822 return false;
2823 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2824 return false;
2825 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2826 // opcode. If this is the first register pair, it would end up with a
2827 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2828 // if LR is paired with something else than the first register.
2829 // The save_lrpair opcode requires the first register to be an odd one.
2830 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2831 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2832 return false;
2833 return true;
2834}
2835
2836/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2837/// WindowsCFI requires that only consecutive registers can be paired.
2838/// LR and FP need to be allocated together when the frame needs to save
2839/// the frame-record. This means any other register pairing with LR is invalid.
2840static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2841 bool UsesWinAAPCS, bool NeedsWinCFI,
2842 bool NeedsFrameRecord, bool IsFirst,
2843 const TargetRegisterInfo *TRI) {
2844 if (UsesWinAAPCS)
2845 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2846 TRI);
2847
2848 // If we need to store the frame record, don't pair any register
2849 // with LR other than FP.
2850 if (NeedsFrameRecord)
2851 return Reg2 == AArch64::LR;
2852
2853 return false;
2854}
2855
2856namespace {
2857
2858struct RegPairInfo {
2859 unsigned Reg1 = AArch64::NoRegister;
2860 unsigned Reg2 = AArch64::NoRegister;
2861 int FrameIdx;
2862 int Offset;
2863 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2864
2865 RegPairInfo() = default;
2866
2867 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2868
2869 unsigned getScale() const {
2870 switch (Type) {
2871 case PPR:
2872 return 2;
2873 case GPR:
2874 case FPR64:
2875 case VG:
2876 return 8;
2877 case ZPR:
2878 case FPR128:
2879 return 16;
2880 }
2881 llvm_unreachable("Unsupported type");
2882 }
2883
2884 bool isScalable() const { return Type == PPR || Type == ZPR; }
2885};
2886
2887} // end anonymous namespace
2888
2889unsigned findFreePredicateReg(BitVector &SavedRegs) {
2890 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2891 if (SavedRegs.test(PReg)) {
2892 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2893 return PNReg;
2894 }
2895 }
2896 return AArch64::NoRegister;
2897}
2898
2902 bool NeedsFrameRecord) {
2903
2904 if (CSI.empty())
2905 return;
2906
2907 bool IsWindows = isTargetWindows(MF);
2908 bool NeedsWinCFI = needsWinCFI(MF);
2910 MachineFrameInfo &MFI = MF.getFrameInfo();
2912 unsigned Count = CSI.size();
2913 (void)CC;
2914 // MachO's compact unwind format relies on all registers being stored in
2915 // pairs.
2918 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2919 "Odd number of callee-saved regs to spill!");
2920 int ByteOffset = AFI->getCalleeSavedStackSize();
2921 int StackFillDir = -1;
2922 int RegInc = 1;
2923 unsigned FirstReg = 0;
2924 if (NeedsWinCFI) {
2925 // For WinCFI, fill the stack from the bottom up.
2926 ByteOffset = 0;
2927 StackFillDir = 1;
2928 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2929 // backwards, to pair up registers starting from lower numbered registers.
2930 RegInc = -1;
2931 FirstReg = Count - 1;
2932 }
2933 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2934 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2935 Register LastReg = 0;
2936
2937 // When iterating backwards, the loop condition relies on unsigned wraparound.
2938 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2939 RegPairInfo RPI;
2940 RPI.Reg1 = CSI[i].getReg();
2941
2942 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2943 RPI.Type = RegPairInfo::GPR;
2944 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2945 RPI.Type = RegPairInfo::FPR64;
2946 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2947 RPI.Type = RegPairInfo::FPR128;
2948 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2949 RPI.Type = RegPairInfo::ZPR;
2950 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2951 RPI.Type = RegPairInfo::PPR;
2952 else if (RPI.Reg1 == AArch64::VG)
2953 RPI.Type = RegPairInfo::VG;
2954 else
2955 llvm_unreachable("Unsupported register class.");
2956
2957 // Add the stack hazard size as we transition from GPR->FPR CSRs.
2958 if (AFI->hasStackHazardSlotIndex() &&
2959 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2961 ByteOffset += StackFillDir * StackHazardSize;
2962 LastReg = RPI.Reg1;
2963
2964 // Add the next reg to the pair if it is in the same register class.
2965 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
2966 Register NextReg = CSI[i + RegInc].getReg();
2967 bool IsFirst = i == FirstReg;
2968 switch (RPI.Type) {
2969 case RegPairInfo::GPR:
2970 if (AArch64::GPR64RegClass.contains(NextReg) &&
2971 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2972 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2973 TRI))
2974 RPI.Reg2 = NextReg;
2975 break;
2976 case RegPairInfo::FPR64:
2977 if (AArch64::FPR64RegClass.contains(NextReg) &&
2978 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2979 IsFirst, TRI))
2980 RPI.Reg2 = NextReg;
2981 break;
2982 case RegPairInfo::FPR128:
2983 if (AArch64::FPR128RegClass.contains(NextReg))
2984 RPI.Reg2 = NextReg;
2985 break;
2986 case RegPairInfo::PPR:
2987 break;
2988 case RegPairInfo::ZPR:
2989 if (AFI->getPredicateRegForFillSpill() != 0)
2990 if (((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1))
2991 RPI.Reg2 = NextReg;
2992 break;
2993 case RegPairInfo::VG:
2994 break;
2995 }
2996 }
2997
2998 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2999 // list to come in sorted by frame index so that we can issue the store
3000 // pair instructions directly. Assert if we see anything otherwise.
3001 //
3002 // The order of the registers in the list is controlled by
3003 // getCalleeSavedRegs(), so they will always be in-order, as well.
3004 assert((!RPI.isPaired() ||
3005 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3006 "Out of order callee saved regs!");
3007
3008 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3009 RPI.Reg1 == AArch64::LR) &&
3010 "FrameRecord must be allocated together with LR");
3011
3012 // Windows AAPCS has FP and LR reversed.
3013 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3014 RPI.Reg2 == AArch64::LR) &&
3015 "FrameRecord must be allocated together with LR");
3016
3017 // MachO's compact unwind format relies on all registers being stored in
3018 // adjacent register pairs.
3022 (RPI.isPaired() &&
3023 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3024 RPI.Reg1 + 1 == RPI.Reg2))) &&
3025 "Callee-save registers not saved as adjacent register pair!");
3026
3027 RPI.FrameIdx = CSI[i].getFrameIdx();
3028 if (NeedsWinCFI &&
3029 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3030 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3031 int Scale = RPI.getScale();
3032
3033 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3034 assert(OffsetPre % Scale == 0);
3035
3036 if (RPI.isScalable())
3037 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3038 else
3039 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3040
3041 // Swift's async context is directly before FP, so allocate an extra
3042 // 8 bytes for it.
3043 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3044 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3045 (IsWindows && RPI.Reg2 == AArch64::LR)))
3046 ByteOffset += StackFillDir * 8;
3047
3048 // Round up size of non-pair to pair size if we need to pad the
3049 // callee-save area to ensure 16-byte alignment.
3050 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3051 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3052 ByteOffset % 16 != 0) {
3053 ByteOffset += 8 * StackFillDir;
3054 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3055 // A stack frame with a gap looks like this, bottom up:
3056 // d9, d8. x21, gap, x20, x19.
3057 // Set extra alignment on the x21 object to create the gap above it.
3058 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3059 NeedGapToAlignStack = false;
3060 }
3061
3062 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3063 assert(OffsetPost % Scale == 0);
3064 // If filling top down (default), we want the offset after incrementing it.
3065 // If filling bottom up (WinCFI) we need the original offset.
3066 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3067
3068 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3069 // Swift context can directly precede FP.
3070 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3071 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3072 (IsWindows && RPI.Reg2 == AArch64::LR)))
3073 Offset += 8;
3074 RPI.Offset = Offset / Scale;
3075
3076 assert((!RPI.isPaired() ||
3077 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3078 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3079 "Offset out of bounds for LDP/STP immediate");
3080
3081 // Save the offset to frame record so that the FP register can point to the
3082 // innermost frame record (spilled FP and LR registers).
3083 if (NeedsFrameRecord &&
3084 ((!IsWindows && RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3085 (IsWindows && RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR)))
3087
3088 RegPairs.push_back(RPI);
3089 if (RPI.isPaired())
3090 i += RegInc;
3091 }
3092 if (NeedsWinCFI) {
3093 // If we need an alignment gap in the stack, align the topmost stack
3094 // object. A stack frame with a gap looks like this, bottom up:
3095 // x19, d8. d9, gap.
3096 // Set extra alignment on the topmost stack object (the first element in
3097 // CSI, which goes top down), to create the gap above it.
3098 if (AFI->hasCalleeSaveStackFreeSpace())
3099 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3100 // We iterated bottom up over the registers; flip RegPairs back to top
3101 // down order.
3102 std::reverse(RegPairs.begin(), RegPairs.end());
3103 }
3104}
3105
3109 MachineFunction &MF = *MBB.getParent();
3112 bool NeedsWinCFI = needsWinCFI(MF);
3113 DebugLoc DL;
3115
3116 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3117
3119 // Refresh the reserved regs in case there are any potential changes since the
3120 // last freeze.
3121 MRI.freezeReservedRegs();
3122
3123 if (homogeneousPrologEpilog(MF)) {
3124 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3126
3127 for (auto &RPI : RegPairs) {
3128 MIB.addReg(RPI.Reg1);
3129 MIB.addReg(RPI.Reg2);
3130
3131 // Update register live in.
3132 if (!MRI.isReserved(RPI.Reg1))
3133 MBB.addLiveIn(RPI.Reg1);
3134 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3135 MBB.addLiveIn(RPI.Reg2);
3136 }
3137 return true;
3138 }
3139 bool PTrueCreated = false;
3140 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3141 unsigned Reg1 = RPI.Reg1;
3142 unsigned Reg2 = RPI.Reg2;
3143 unsigned StrOpc;
3144
3145 // Issue sequence of spills for cs regs. The first spill may be converted
3146 // to a pre-decrement store later by emitPrologue if the callee-save stack
3147 // area allocation can't be combined with the local stack area allocation.
3148 // For example:
3149 // stp x22, x21, [sp, #0] // addImm(+0)
3150 // stp x20, x19, [sp, #16] // addImm(+2)
3151 // stp fp, lr, [sp, #32] // addImm(+4)
3152 // Rationale: This sequence saves uop updates compared to a sequence of
3153 // pre-increment spills like stp xi,xj,[sp,#-16]!
3154 // Note: Similar rationale and sequence for restores in epilog.
3155 unsigned Size;
3156 Align Alignment;
3157 switch (RPI.Type) {
3158 case RegPairInfo::GPR:
3159 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3160 Size = 8;
3161 Alignment = Align(8);
3162 break;
3163 case RegPairInfo::FPR64:
3164 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3165 Size = 8;
3166 Alignment = Align(8);
3167 break;
3168 case RegPairInfo::FPR128:
3169 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3170 Size = 16;
3171 Alignment = Align(16);
3172 break;
3173 case RegPairInfo::ZPR:
3174 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3175 Size = 16;
3176 Alignment = Align(16);
3177 break;
3178 case RegPairInfo::PPR:
3179 StrOpc = AArch64::STR_PXI;
3180 Size = 2;
3181 Alignment = Align(2);
3182 break;
3183 case RegPairInfo::VG:
3184 StrOpc = AArch64::STRXui;
3185 Size = 8;
3186 Alignment = Align(8);
3187 break;
3188 }
3189
3190 unsigned X0Scratch = AArch64::NoRegister;
3191 if (Reg1 == AArch64::VG) {
3192 // Find an available register to store value of VG to.
3194 assert(Reg1 != AArch64::NoRegister);
3195 SMEAttrs Attrs(MF.getFunction());
3196
3197 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3198 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3199 // For locally-streaming functions, we need to store both the streaming
3200 // & non-streaming VG. Spill the streaming value first.
3201 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3202 .addImm(1)
3204 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3205 .addReg(Reg1)
3206 .addImm(3)
3207 .addImm(63)
3209
3210 AFI->setStreamingVGIdx(RPI.FrameIdx);
3211 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3212 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3213 .addImm(31)
3214 .addImm(1)
3216 AFI->setVGIdx(RPI.FrameIdx);
3217 } else {
3219 if (llvm::any_of(
3220 MBB.liveins(),
3221 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3222 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3223 AArch64::X0, LiveIn.PhysReg);
3224 }))
3225 X0Scratch = Reg1;
3226
3227 if (X0Scratch != AArch64::NoRegister)
3228 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3229 .addReg(AArch64::XZR)
3230 .addReg(AArch64::X0, RegState::Undef)
3231 .addReg(AArch64::X0, RegState::Implicit)
3233
3234 const uint32_t *RegMask = TRI->getCallPreservedMask(
3235 MF,
3237 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3238 .addExternalSymbol("__arm_get_current_vg")
3239 .addRegMask(RegMask)
3240 .addReg(AArch64::X0, RegState::ImplicitDefine)
3242 Reg1 = AArch64::X0;
3243 AFI->setVGIdx(RPI.FrameIdx);
3244 }
3245 }
3246
3247 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3248 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3249 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3250 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3251 dbgs() << ")\n");
3252
3253 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3254 "Windows unwdinding requires a consecutive (FP,LR) pair");
3255 // Windows unwind codes require consecutive registers if registers are
3256 // paired. Make the switch here, so that the code below will save (x,x+1)
3257 // and not (x+1,x).
3258 unsigned FrameIdxReg1 = RPI.FrameIdx;
3259 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3260 if (NeedsWinCFI && RPI.isPaired()) {
3261 std::swap(Reg1, Reg2);
3262 std::swap(FrameIdxReg1, FrameIdxReg2);
3263 }
3264
3265 if (RPI.isPaired() && RPI.isScalable()) {
3266 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3269 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3270 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3271 "Expects SVE2.1 or SME2 target and a predicate register");
3272#ifdef EXPENSIVE_CHECKS
3273 auto IsPPR = [](const RegPairInfo &c) {
3274 return c.Reg1 == RegPairInfo::PPR;
3275 };
3276 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3277 auto IsZPR = [](const RegPairInfo &c) {
3278 return c.Type == RegPairInfo::ZPR;
3279 };
3280 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3281 assert(!(PPRBegin < ZPRBegin) &&
3282 "Expected callee save predicate to be handled first");
3283#endif
3284 if (!PTrueCreated) {
3285 PTrueCreated = true;
3286 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3288 }
3289 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3290 if (!MRI.isReserved(Reg1))
3291 MBB.addLiveIn(Reg1);
3292 if (!MRI.isReserved(Reg2))
3293 MBB.addLiveIn(Reg2);
3294 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3296 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3297 MachineMemOperand::MOStore, Size, Alignment));
3298 MIB.addReg(PnReg);
3299 MIB.addReg(AArch64::SP)
3300 .addImm(RPI.Offset) // [sp, #offset*scale],
3301 // where factor*scale is implicit
3304 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3305 MachineMemOperand::MOStore, Size, Alignment));
3306 if (NeedsWinCFI)
3308 } else { // The code when the pair of ZReg is not present
3309 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3310 if (!MRI.isReserved(Reg1))
3311 MBB.addLiveIn(Reg1);
3312 if (RPI.isPaired()) {
3313 if (!MRI.isReserved(Reg2))
3314 MBB.addLiveIn(Reg2);
3315 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3317 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3318 MachineMemOperand::MOStore, Size, Alignment));
3319 }
3320 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3321 .addReg(AArch64::SP)
3322 .addImm(RPI.Offset) // [sp, #offset*scale],
3323 // where factor*scale is implicit
3326 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3327 MachineMemOperand::MOStore, Size, Alignment));
3328 if (NeedsWinCFI)
3330 }
3331 // Update the StackIDs of the SVE stack slots.
3332 MachineFrameInfo &MFI = MF.getFrameInfo();
3333 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3334 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3335 if (RPI.isPaired())
3336 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3337 }
3338
3339 if (X0Scratch != AArch64::NoRegister)
3340 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3341 .addReg(AArch64::XZR)
3342 .addReg(X0Scratch, RegState::Undef)
3343 .addReg(X0Scratch, RegState::Implicit)
3345 }
3346 return true;
3347}
3348
3352 MachineFunction &MF = *MBB.getParent();
3354 DebugLoc DL;
3356 bool NeedsWinCFI = needsWinCFI(MF);
3357
3358 if (MBBI != MBB.end())
3359 DL = MBBI->getDebugLoc();
3360
3361 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3362 if (homogeneousPrologEpilog(MF, &MBB)) {
3363 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3365 for (auto &RPI : RegPairs) {
3366 MIB.addReg(RPI.Reg1, RegState::Define);
3367 MIB.addReg(RPI.Reg2, RegState::Define);
3368 }
3369 return true;
3370 }
3371
3372 // For performance reasons restore SVE register in increasing order
3373 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3374 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3375 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3376 std::reverse(PPRBegin, PPREnd);
3377 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3378 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3379 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3380 std::reverse(ZPRBegin, ZPREnd);
3381
3382 bool PTrueCreated = false;
3383 for (const RegPairInfo &RPI : RegPairs) {
3384 unsigned Reg1 = RPI.Reg1;
3385 unsigned Reg2 = RPI.Reg2;
3386
3387 // Issue sequence of restores for cs regs. The last restore may be converted
3388 // to a post-increment load later by emitEpilogue if the callee-save stack
3389 // area allocation can't be combined with the local stack area allocation.
3390 // For example:
3391 // ldp fp, lr, [sp, #32] // addImm(+4)
3392 // ldp x20, x19, [sp, #16] // addImm(+2)
3393 // ldp x22, x21, [sp, #0] // addImm(+0)
3394 // Note: see comment in spillCalleeSavedRegisters()
3395 unsigned LdrOpc;
3396 unsigned Size;
3397 Align Alignment;
3398 switch (RPI.Type) {
3399 case RegPairInfo::GPR:
3400 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3401 Size = 8;
3402 Alignment = Align(8);
3403 break;
3404 case RegPairInfo::FPR64:
3405 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3406 Size = 8;
3407 Alignment = Align(8);
3408 break;
3409 case RegPairInfo::FPR128:
3410 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3411 Size = 16;
3412 Alignment = Align(16);
3413 break;
3414 case RegPairInfo::ZPR:
3415 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3416 Size = 16;
3417 Alignment = Align(16);
3418 break;
3419 case RegPairInfo::PPR:
3420 LdrOpc = AArch64::LDR_PXI;
3421 Size = 2;
3422 Alignment = Align(2);
3423 break;
3424 case RegPairInfo::VG:
3425 continue;
3426 }
3427 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3428 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3429 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3430 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3431 dbgs() << ")\n");
3432
3433 // Windows unwind codes require consecutive registers if registers are
3434 // paired. Make the switch here, so that the code below will save (x,x+1)
3435 // and not (x+1,x).
3436 unsigned FrameIdxReg1 = RPI.FrameIdx;
3437 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3438 if (NeedsWinCFI && RPI.isPaired()) {
3439 std::swap(Reg1, Reg2);
3440 std::swap(FrameIdxReg1, FrameIdxReg2);
3441 }
3442
3444 if (RPI.isPaired() && RPI.isScalable()) {
3445 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3447 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3448 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3449 "Expects SVE2.1 or SME2 target and a predicate register");
3450#ifdef EXPENSIVE_CHECKS
3451 assert(!(PPRBegin < ZPRBegin) &&
3452 "Expected callee save predicate to be handled first");
3453#endif
3454 if (!PTrueCreated) {
3455 PTrueCreated = true;
3456 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3458 }
3459 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3460 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3461 getDefRegState(true));
3463 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3464 MachineMemOperand::MOLoad, Size, Alignment));
3465 MIB.addReg(PnReg);
3466 MIB.addReg(AArch64::SP)
3467 .addImm(RPI.Offset) // [sp, #offset*scale]
3468 // where factor*scale is implicit
3471 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3472 MachineMemOperand::MOLoad, Size, Alignment));
3473 if (NeedsWinCFI)
3475 } else {
3476 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3477 if (RPI.isPaired()) {
3478 MIB.addReg(Reg2, getDefRegState(true));
3480 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3481 MachineMemOperand::MOLoad, Size, Alignment));
3482 }
3483 MIB.addReg(Reg1, getDefRegState(true));
3484 MIB.addReg(AArch64::SP)
3485 .addImm(RPI.Offset) // [sp, #offset*scale]
3486 // where factor*scale is implicit
3489 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3490 MachineMemOperand::MOLoad, Size, Alignment));
3491 if (NeedsWinCFI)
3493 }
3494 }
3495 return true;
3496}
3497
3498// Return the FrameID for a Load/Store instruction by looking at the MMO.
3499static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3500 const MachineFrameInfo &MFI) {
3501 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3502 return std::nullopt;
3503
3504 MachineMemOperand *MMO = *MI.memoperands_begin();
3505 auto *PSV =
3506 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3507 if (PSV)
3508 return std::optional<int>(PSV->getFrameIndex());
3509
3510 if (MMO->getValue()) {
3511 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3512 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3513 FI++)
3514 if (MFI.getObjectAllocation(FI) == Al)
3515 return FI;
3516 }
3517 }
3518
3519 return std::nullopt;
3520}
3521
3522// Check if a Hazard slot is needed for the current function, and if so create
3523// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3524// which can be used to determine if any hazard padding is needed.
3525void AArch64FrameLowering::determineStackHazardSlot(
3526 MachineFunction &MF, BitVector &SavedRegs) const {
3527 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3529 return;
3530
3531 // Stack hazards are only needed in streaming functions.
3533 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3534 return;
3535
3536 MachineFrameInfo &MFI = MF.getFrameInfo();
3537
3538 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3539 // stack objects.
3540 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3541 return AArch64::FPR64RegClass.contains(Reg) ||
3542 AArch64::FPR128RegClass.contains(Reg) ||
3543 AArch64::ZPRRegClass.contains(Reg) ||
3544 AArch64::PPRRegClass.contains(Reg);
3545 });
3546 bool HasFPRStackObjects = false;
3547 if (!HasFPRCSRs) {
3548 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3549 for (auto &MBB : MF) {
3550 for (auto &MI : MBB) {
3551 std::optional<int> FI = getLdStFrameID(MI, MFI);
3552 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3553 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3555 FrameObjects[*FI] |= 2;
3556 else
3557 FrameObjects[*FI] |= 1;
3558 }
3559 }
3560 }
3561 HasFPRStackObjects =
3562 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3563 }
3564
3565 if (HasFPRCSRs || HasFPRStackObjects) {
3566 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3567 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3568 << StackHazardSize << "\n");
3569 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3570 }
3571}
3572
3574 BitVector &SavedRegs,
3575 RegScavenger *RS) const {
3576 // All calls are tail calls in GHC calling conv, and functions have no
3577 // prologue/epilogue.
3579 return;
3580
3582 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3584 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3586 unsigned UnspilledCSGPR = AArch64::NoRegister;
3587 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3588
3589 MachineFrameInfo &MFI = MF.getFrameInfo();
3590 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3591
3592 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3593 ? RegInfo->getBaseRegister()
3594 : (unsigned)AArch64::NoRegister;
3595
3596 unsigned ExtraCSSpill = 0;
3597 bool HasUnpairedGPR64 = false;
3598 bool HasPairZReg = false;
3599 // Figure out which callee-saved registers to save/restore.
3600 for (unsigned i = 0; CSRegs[i]; ++i) {
3601 const unsigned Reg = CSRegs[i];
3602
3603 // Add the base pointer register to SavedRegs if it is callee-save.
3604 if (Reg == BasePointerReg)
3605 SavedRegs.set(Reg);
3606
3607 bool RegUsed = SavedRegs.test(Reg);
3608 unsigned PairedReg = AArch64::NoRegister;
3609 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3610 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3611 AArch64::FPR128RegClass.contains(Reg)) {
3612 // Compensate for odd numbers of GP CSRs.
3613 // For now, all the known cases of odd number of CSRs are of GPRs.
3614 if (HasUnpairedGPR64)
3615 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3616 else
3617 PairedReg = CSRegs[i ^ 1];
3618 }
3619
3620 // If the function requires all the GP registers to save (SavedRegs),
3621 // and there are an odd number of GP CSRs at the same time (CSRegs),
3622 // PairedReg could be in a different register class from Reg, which would
3623 // lead to a FPR (usually D8) accidentally being marked saved.
3624 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3625 PairedReg = AArch64::NoRegister;
3626 HasUnpairedGPR64 = true;
3627 }
3628 assert(PairedReg == AArch64::NoRegister ||
3629 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3630 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3631 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3632
3633 if (!RegUsed) {
3634 if (AArch64::GPR64RegClass.contains(Reg) &&
3635 !RegInfo->isReservedReg(MF, Reg)) {
3636 UnspilledCSGPR = Reg;
3637 UnspilledCSGPRPaired = PairedReg;
3638 }
3639 continue;
3640 }
3641
3642 // MachO's compact unwind format relies on all registers being stored in
3643 // pairs.
3644 // FIXME: the usual format is actually better if unwinding isn't needed.
3645 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3646 !SavedRegs.test(PairedReg)) {
3647 SavedRegs.set(PairedReg);
3648 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3649 !RegInfo->isReservedReg(MF, PairedReg))
3650 ExtraCSSpill = PairedReg;
3651 }
3652 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3653 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3654 SavedRegs.test(CSRegs[i ^ 1]));
3655 }
3656
3657 if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
3659 // Find a suitable predicate register for the multi-vector spill/fill
3660 // instructions.
3661 unsigned PnReg = findFreePredicateReg(SavedRegs);
3662 if (PnReg != AArch64::NoRegister)
3663 AFI->setPredicateRegForFillSpill(PnReg);
3664 // If no free callee-save has been found assign one.
3665 if (!AFI->getPredicateRegForFillSpill() &&
3666 MF.getFunction().getCallingConv() ==
3668 SavedRegs.set(AArch64::P8);
3669 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3670 }
3671
3672 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3673 "Predicate cannot be a reserved register");
3674 }
3675
3677 !Subtarget.isTargetWindows()) {
3678 // For Windows calling convention on a non-windows OS, where X18 is treated
3679 // as reserved, back up X18 when entering non-windows code (marked with the
3680 // Windows calling convention) and restore when returning regardless of
3681 // whether the individual function uses it - it might call other functions
3682 // that clobber it.
3683 SavedRegs.set(AArch64::X18);
3684 }
3685
3686 // Calculates the callee saved stack size.
3687 unsigned CSStackSize = 0;
3688 unsigned SVECSStackSize = 0;
3690 const MachineRegisterInfo &MRI = MF.getRegInfo();
3691 for (unsigned Reg : SavedRegs.set_bits()) {
3692 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3693 if (AArch64::PPRRegClass.contains(Reg) ||
3694 AArch64::ZPRRegClass.contains(Reg))
3695 SVECSStackSize += RegSize;
3696 else
3697 CSStackSize += RegSize;
3698 }
3699
3700 // Increase the callee-saved stack size if the function has streaming mode
3701 // changes, as we will need to spill the value of the VG register.
3702 // For locally streaming functions, we spill both the streaming and
3703 // non-streaming VG value.
3704 const Function &F = MF.getFunction();
3705 SMEAttrs Attrs(F);
3706 if (AFI->hasStreamingModeChanges()) {
3707 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3708 CSStackSize += 16;
3709 else
3710 CSStackSize += 8;
3711 }
3712
3713 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3714 // StackHazardSize if so.
3715 determineStackHazardSlot(MF, SavedRegs);
3716 if (AFI->hasStackHazardSlotIndex())
3717 CSStackSize += StackHazardSize;
3718
3719 // Save number of saved regs, so we can easily update CSStackSize later.
3720 unsigned NumSavedRegs = SavedRegs.count();
3721
3722 // The frame record needs to be created by saving the appropriate registers
3723 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3724 if (hasFP(MF) ||
3725 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3726 SavedRegs.set(AArch64::FP);
3727 SavedRegs.set(AArch64::LR);
3728 }
3729
3730 LLVM_DEBUG({
3731 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3732 for (unsigned Reg : SavedRegs.set_bits())
3733 dbgs() << ' ' << printReg(Reg, RegInfo);
3734 dbgs() << "\n";
3735 });
3736
3737 // If any callee-saved registers are used, the frame cannot be eliminated.
3738 int64_t SVEStackSize =
3739 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3740 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3741
3742 // The CSR spill slots have not been allocated yet, so estimateStackSize
3743 // won't include them.
3744 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3745
3746 // We may address some of the stack above the canonical frame address, either
3747 // for our own arguments or during a call. Include that in calculating whether
3748 // we have complicated addressing concerns.
3749 int64_t CalleeStackUsed = 0;
3750 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3751 int64_t FixedOff = MFI.getObjectOffset(I);
3752 if (FixedOff > CalleeStackUsed)
3753 CalleeStackUsed = FixedOff;
3754 }
3755
3756 // Conservatively always assume BigStack when there are SVE spills.
3757 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3758 CalleeStackUsed) > EstimatedStackSizeLimit;
3759 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3760 AFI->setHasStackFrame(true);
3761
3762 // Estimate if we might need to scavenge a register at some point in order
3763 // to materialize a stack offset. If so, either spill one additional
3764 // callee-saved register or reserve a special spill slot to facilitate
3765 // register scavenging. If we already spilled an extra callee-saved register
3766 // above to keep the number of spills even, we don't need to do anything else
3767 // here.
3768 if (BigStack) {
3769 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3770 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3771 << " to get a scratch register.\n");
3772 SavedRegs.set(UnspilledCSGPR);
3773 ExtraCSSpill = UnspilledCSGPR;
3774
3775 // MachO's compact unwind format relies on all registers being stored in
3776 // pairs, so if we need to spill one extra for BigStack, then we need to
3777 // store the pair.
3778 if (producePairRegisters(MF)) {
3779 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3780 // Failed to make a pair for compact unwind format, revert spilling.
3781 if (produceCompactUnwindFrame(MF)) {
3782 SavedRegs.reset(UnspilledCSGPR);
3783 ExtraCSSpill = AArch64::NoRegister;
3784 }
3785 } else
3786 SavedRegs.set(UnspilledCSGPRPaired);
3787 }
3788 }
3789
3790 // If we didn't find an extra callee-saved register to spill, create
3791 // an emergency spill slot.
3792 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3794 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3795 unsigned Size = TRI->getSpillSize(RC);
3796 Align Alignment = TRI->getSpillAlign(RC);
3797 int FI = MFI.CreateStackObject(Size, Alignment, false);
3799 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3800 << " as the emergency spill slot.\n");
3801 }
3802 }
3803
3804 // Adding the size of additional 64bit GPR saves.
3805 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3806
3807 // A Swift asynchronous context extends the frame record with a pointer
3808 // directly before FP.
3809 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3810 CSStackSize += 8;
3811
3812 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3813 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3814 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3815
3817 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3818 "Should not invalidate callee saved info");
3819
3820 // Round up to register pair alignment to avoid additional SP adjustment
3821 // instructions.
3822 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3823 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3824 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3825}
3826
3828 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3829 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3830 unsigned &MaxCSFrameIndex) const {
3831 bool NeedsWinCFI = needsWinCFI(MF);
3832 // To match the canonical windows frame layout, reverse the list of
3833 // callee saved registers to get them laid out by PrologEpilogInserter
3834 // in the right order. (PrologEpilogInserter allocates stack objects top
3835 // down. Windows canonical prologs store higher numbered registers at
3836 // the top, thus have the CSI array start from the highest registers.)
3837 if (NeedsWinCFI)
3838 std::reverse(CSI.begin(), CSI.end());
3839
3840 if (CSI.empty())
3841 return true; // Early exit if no callee saved registers are modified!
3842
3843 // Now that we know which registers need to be saved and restored, allocate
3844 // stack slots for them.
3845 MachineFrameInfo &MFI = MF.getFrameInfo();
3846 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3847
3848 bool UsesWinAAPCS = isTargetWindows(MF);
3849 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3850 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3851 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3852 if ((unsigned)FrameIdx < MinCSFrameIndex)
3853 MinCSFrameIndex = FrameIdx;
3854 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3855 MaxCSFrameIndex = FrameIdx;
3856 }
3857
3858 // Insert VG into the list of CSRs, immediately before LR if saved.
3859 if (AFI->hasStreamingModeChanges()) {
3860 std::vector<CalleeSavedInfo> VGSaves;
3861 SMEAttrs Attrs(MF.getFunction());
3862
3863 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3864 VGInfo.setRestored(false);
3865 VGSaves.push_back(VGInfo);
3866
3867 // Add VG again if the function is locally-streaming, as we will spill two
3868 // values.
3869 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3870 VGSaves.push_back(VGInfo);
3871
3872 bool InsertBeforeLR = false;
3873
3874 for (unsigned I = 0; I < CSI.size(); I++)
3875 if (CSI[I].getReg() == AArch64::LR) {
3876 InsertBeforeLR = true;
3877 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3878 break;
3879 }
3880
3881 if (!InsertBeforeLR)
3882 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3883 }
3884
3885 Register LastReg = 0;
3886 int HazardSlotIndex = std::numeric_limits<int>::max();
3887 for (auto &CS : CSI) {
3888 Register Reg = CS.getReg();
3889 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3890
3891 // Create a hazard slot as we switch between GPR and FPR CSRs.
3892 if (AFI->hasStackHazardSlotIndex() &&
3893 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3895 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3896 "Unexpected register order for hazard slot");
3897 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3898 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3899 << "\n");
3900 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3901 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3902 MinCSFrameIndex = HazardSlotIndex;
3903 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3904 MaxCSFrameIndex = HazardSlotIndex;
3905 }
3906
3907 unsigned Size = RegInfo->getSpillSize(*RC);
3908 Align Alignment(RegInfo->getSpillAlign(*RC));
3909 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3910 CS.setFrameIdx(FrameIdx);
3911
3912 if ((unsigned)FrameIdx < MinCSFrameIndex)
3913 MinCSFrameIndex = FrameIdx;
3914 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3915 MaxCSFrameIndex = FrameIdx;
3916
3917 // Grab 8 bytes below FP for the extended asynchronous frame info.
3918 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3919 Reg == AArch64::FP) {
3920 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3921 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3922 if ((unsigned)FrameIdx < MinCSFrameIndex)
3923 MinCSFrameIndex = FrameIdx;
3924 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3925 MaxCSFrameIndex = FrameIdx;
3926 }
3927 LastReg = Reg;
3928 }
3929
3930 // Add hazard slot in the case where no FPR CSRs are present.
3931 if (AFI->hasStackHazardSlotIndex() &&
3932 HazardSlotIndex == std::numeric_limits<int>::max()) {
3933 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3934 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3935 << "\n");
3936 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3937 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3938 MinCSFrameIndex = HazardSlotIndex;
3939 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3940 MaxCSFrameIndex = HazardSlotIndex;
3941 }
3942
3943 return true;
3944}
3945
3947 const MachineFunction &MF) const {
3949 // If the function has streaming-mode changes, don't scavenge a
3950 // spillslot in the callee-save area, as that might require an
3951 // 'addvl' in the streaming-mode-changing call-sequence when the
3952 // function doesn't use a FP.
3953 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3954 return false;
3955 // Don't allow register salvaging with hazard slots, in case it moves objects
3956 // into the wrong place.
3957 if (AFI->hasStackHazardSlotIndex())
3958 return false;
3959 return AFI->hasCalleeSaveStackFreeSpace();
3960}
3961
3962/// returns true if there are any SVE callee saves.
3964 int &Min, int &Max) {
3965 Min = std::numeric_limits<int>::max();
3966 Max = std::numeric_limits<int>::min();
3967
3968 if (!MFI.isCalleeSavedInfoValid())
3969 return false;
3970
3971 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3972 for (auto &CS : CSI) {
3973 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3974 AArch64::PPRRegClass.contains(CS.getReg())) {
3975 assert((Max == std::numeric_limits<int>::min() ||
3976 Max + 1 == CS.getFrameIdx()) &&
3977 "SVE CalleeSaves are not consecutive");
3978
3979 Min = std::min(Min, CS.getFrameIdx());
3980 Max = std::max(Max, CS.getFrameIdx());
3981 }
3982 }
3983 return Min != std::numeric_limits<int>::max();
3984}
3985
3986// Process all the SVE stack objects and determine offsets for each
3987// object. If AssignOffsets is true, the offsets get assigned.
3988// Fills in the first and last callee-saved frame indices into
3989// Min/MaxCSFrameIndex, respectively.
3990// Returns the size of the stack.
3992 int &MinCSFrameIndex,
3993 int &MaxCSFrameIndex,
3994 bool AssignOffsets) {
3995#ifndef NDEBUG
3996 // First process all fixed stack objects.
3997 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3999 "SVE vectors should never be passed on the stack by value, only by "
4000 "reference.");
4001#endif
4002
4003 auto Assign = [&MFI](int FI, int64_t Offset) {
4004 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4005 MFI.setObjectOffset(FI, Offset);
4006 };
4007
4008 int64_t Offset = 0;
4009
4010 // Then process all callee saved slots.
4011 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4012 // Assign offsets to the callee save slots.
4013 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4014 Offset += MFI.getObjectSize(I);
4016 if (AssignOffsets)
4017 Assign(I, -Offset);
4018 }
4019 }
4020
4021 // Ensure that the Callee-save area is aligned to 16bytes.
4022 Offset = alignTo(Offset, Align(16U));
4023
4024 // Create a buffer of SVE objects to allocate and sort it.
4025 SmallVector<int, 8> ObjectsToAllocate;
4026 // If we have a stack protector, and we've previously decided that we have SVE
4027 // objects on the stack and thus need it to go in the SVE stack area, then it
4028 // needs to go first.
4029 int StackProtectorFI = -1;
4030 if (MFI.hasStackProtectorIndex()) {
4031 StackProtectorFI = MFI.getStackProtectorIndex();
4032 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4033 ObjectsToAllocate.push_back(StackProtectorFI);
4034 }
4035 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4036 unsigned StackID = MFI.getStackID(I);
4037 if (StackID != TargetStackID::ScalableVector)
4038 continue;
4039 if (I == StackProtectorFI)
4040 continue;
4041 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4042 continue;
4043 if (MFI.isDeadObjectIndex(I))
4044 continue;
4045
4046 ObjectsToAllocate.push_back(I);
4047 }
4048
4049 // Allocate all SVE locals and spills
4050 for (unsigned FI : ObjectsToAllocate) {
4051 Align Alignment = MFI.getObjectAlign(FI);
4052 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4053 // two, we'd need to align every object dynamically at runtime if the
4054 // alignment is larger than 16. This is not yet supported.
4055 if (Alignment > Align(16))
4057 "Alignment of scalable vectors > 16 bytes is not yet supported");
4058
4059 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4060 if (AssignOffsets)
4061 Assign(FI, -Offset);
4062 }
4063
4064 return Offset;
4065}
4066
4067int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4068 MachineFrameInfo &MFI) const {
4069 int MinCSFrameIndex, MaxCSFrameIndex;
4070 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4071}
4072
4073int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4074 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4075 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4076 true);
4077}
4078
4080 MachineFunction &MF, RegScavenger *RS) const {
4081 MachineFrameInfo &MFI = MF.getFrameInfo();
4082
4084 "Upwards growing stack unsupported");
4085
4086 int MinCSFrameIndex, MaxCSFrameIndex;
4087 int64_t SVEStackSize =
4088 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4089
4091 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4092 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4093
4094 // If this function isn't doing Win64-style C++ EH, we don't need to do
4095 // anything.
4096 if (!MF.hasEHFunclets())
4097 return;
4099 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4100
4101 MachineBasicBlock &MBB = MF.front();
4102 auto MBBI = MBB.begin();
4103 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4104 ++MBBI;
4105
4106 // Create an UnwindHelp object.
4107 // The UnwindHelp object is allocated at the start of the fixed object area
4108 int64_t FixedObject =
4109 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4110 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4111 /*SPOffset*/ -FixedObject,
4112 /*IsImmutable=*/false);
4113 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4114
4115 // We need to store -2 into the UnwindHelp object at the start of the
4116 // function.
4117 DebugLoc DL;
4119 RS->backward(MBBI);
4120 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4121 assert(DstReg && "There must be a free register after frame setup");
4122 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4123 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4124 .addReg(DstReg, getKillRegState(true))
4125 .addFrameIndex(UnwindHelpFI)
4126 .addImm(0);
4127}
4128
4129namespace {
4130struct TagStoreInstr {
4132 int64_t Offset, Size;
4133 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4134 : MI(MI), Offset(Offset), Size(Size) {}
4135};
4136
4137class TagStoreEdit {
4138 MachineFunction *MF;
4141 // Tag store instructions that are being replaced.
4143 // Combined memref arguments of the above instructions.
4145
4146 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4147 // FrameRegOffset + Size) with the address tag of SP.
4148 Register FrameReg;
4149 StackOffset FrameRegOffset;
4150 int64_t Size;
4151 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4152 // end.
4153 std::optional<int64_t> FrameRegUpdate;
4154 // MIFlags for any FrameReg updating instructions.
4155 unsigned FrameRegUpdateFlags;
4156
4157 // Use zeroing instruction variants.
4158 bool ZeroData;
4159 DebugLoc DL;
4160
4161 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4162 void emitLoop(MachineBasicBlock::iterator InsertI);
4163
4164public:
4165 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4166 : MBB(MBB), ZeroData(ZeroData) {
4167 MF = MBB->getParent();
4168 MRI = &MF->getRegInfo();
4169 }
4170 // Add an instruction to be replaced. Instructions must be added in the
4171 // ascending order of Offset, and have to be adjacent.
4172 void addInstruction(TagStoreInstr I) {
4173 assert((TagStores.empty() ||
4174 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4175 "Non-adjacent tag store instructions.");
4176 TagStores.push_back(I);
4177 }
4178 void clear() { TagStores.clear(); }
4179 // Emit equivalent code at the given location, and erase the current set of
4180 // instructions. May skip if the replacement is not profitable. May invalidate
4181 // the input iterator and replace it with a valid one.
4182 void emitCode(MachineBasicBlock::iterator &InsertI,
4183 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4184};
4185
4186void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4187 const AArch64InstrInfo *TII =
4188 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4189
4190 const int64_t kMinOffset = -256 * 16;
4191 const int64_t kMaxOffset = 255 * 16;
4192
4193 Register BaseReg = FrameReg;
4194 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4195 if (BaseRegOffsetBytes < kMinOffset ||
4196 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4197 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4198 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4199 // is required for the offset of ST2G.
4200 BaseRegOffsetBytes % 16 != 0) {
4201 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4202 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4203 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4204 BaseReg = ScratchReg;
4205 BaseRegOffsetBytes = 0;
4206 }
4207
4208 MachineInstr *LastI = nullptr;
4209 while (Size) {
4210 int64_t InstrSize = (Size > 16) ? 32 : 16;
4211 unsigned Opcode =
4212 InstrSize == 16
4213 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4214 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4215 assert(BaseRegOffsetBytes % 16 == 0);
4216 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4217 .addReg(AArch64::SP)
4218 .addReg(BaseReg)
4219 .addImm(BaseRegOffsetBytes / 16)
4220 .setMemRefs(CombinedMemRefs);
4221 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4222 // final SP adjustment in the epilogue.
4223 if (BaseRegOffsetBytes == 0)
4224 LastI = I;
4225 BaseRegOffsetBytes += InstrSize;
4226 Size -= InstrSize;
4227 }
4228
4229 if (LastI)
4230 MBB->splice(InsertI, MBB, LastI);
4231}
4232
4233void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4234 const AArch64InstrInfo *TII =
4235 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4236
4237 Register BaseReg = FrameRegUpdate
4238 ? FrameReg
4239 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4240 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4241
4242 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4243
4244 int64_t LoopSize = Size;
4245 // If the loop size is not a multiple of 32, split off one 16-byte store at
4246 // the end to fold BaseReg update into.
4247 if (FrameRegUpdate && *FrameRegUpdate)
4248 LoopSize -= LoopSize % 32;
4249 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4250 TII->get(ZeroData ? AArch64::STZGloop_wback
4251 : AArch64::STGloop_wback))
4252 .addDef(SizeReg)
4253 .addDef(BaseReg)
4254 .addImm(LoopSize)
4255 .addReg(BaseReg)
4256 .setMemRefs(CombinedMemRefs);
4257 if (FrameRegUpdate)
4258 LoopI->setFlags(FrameRegUpdateFlags);
4259
4260 int64_t ExtraBaseRegUpdate =
4261 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4262 if (LoopSize < Size) {
4263 assert(FrameRegUpdate);
4264 assert(Size - LoopSize == 16);
4265 // Tag 16 more bytes at BaseReg and update BaseReg.
4266 BuildMI(*MBB, InsertI, DL,
4267 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4268 .addDef(BaseReg)
4269 .addReg(BaseReg)
4270 .addReg(BaseReg)
4271 .addImm(1 + ExtraBaseRegUpdate / 16)
4272 .setMemRefs(CombinedMemRefs)
4273 .setMIFlags(FrameRegUpdateFlags);
4274 } else if (ExtraBaseRegUpdate) {
4275 // Update BaseReg.
4276 BuildMI(
4277 *MBB, InsertI, DL,
4278 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4279 .addDef(BaseReg)
4280 .addReg(BaseReg)
4281 .addImm(std::abs(ExtraBaseRegUpdate))
4282 .addImm(0)
4283 .setMIFlags(FrameRegUpdateFlags);
4284 }
4285}
4286
4287// Check if *II is a register update that can be merged into STGloop that ends
4288// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4289// end of the loop.
4290bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4291 int64_t Size, int64_t *TotalOffset) {
4292 MachineInstr &MI = *II;
4293 if ((MI.getOpcode() == AArch64::ADDXri ||
4294 MI.getOpcode() == AArch64::SUBXri) &&
4295 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4296 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4297 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4298 if (MI.getOpcode() == AArch64::SUBXri)
4299 Offset = -Offset;
4300 int64_t AbsPostOffset = std::abs(Offset - Size);
4301 const int64_t kMaxOffset =
4302 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
4303 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
4304 *TotalOffset = Offset;
4305 return true;
4306 }
4307 }
4308 return false;
4309}
4310
4311void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4313 MemRefs.clear();
4314 for (auto &TS : TSE) {
4315 MachineInstr *MI = TS.MI;
4316 // An instruction without memory operands may access anything. Be
4317 // conservative and return an empty list.
4318 if (MI->memoperands_empty()) {
4319 MemRefs.clear();
4320 return;
4321 }
4322 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4323 }
4324}
4325
4326void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4327 const AArch64FrameLowering *TFI,
4328 bool TryMergeSPUpdate) {
4329 if (TagStores.empty())
4330 return;
4331 TagStoreInstr &FirstTagStore = TagStores[0];
4332 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4333 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4334 DL = TagStores[0].MI->getDebugLoc();
4335
4336 Register Reg;
4337 FrameRegOffset = TFI->resolveFrameOffsetReference(
4338 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4339 /*PreferFP=*/false, /*ForSimm=*/true);
4340 FrameReg = Reg;
4341 FrameRegUpdate = std::nullopt;
4342
4343 mergeMemRefs(TagStores, CombinedMemRefs);
4344
4345 LLVM_DEBUG({
4346 dbgs() << "Replacing adjacent STG instructions:\n";
4347 for (const auto &Instr : TagStores) {
4348 dbgs() << " " << *Instr.MI;
4349 }
4350 });
4351
4352 // Size threshold where a loop becomes shorter than a linear sequence of
4353 // tagging instructions.
4354 const int kSetTagLoopThreshold = 176;
4355 if (Size < kSetTagLoopThreshold) {
4356 if (TagStores.size() < 2)
4357 return;
4358 emitUnrolled(InsertI);
4359 } else {
4360 MachineInstr *UpdateInstr = nullptr;
4361 int64_t TotalOffset = 0;
4362 if (TryMergeSPUpdate) {
4363 // See if we can merge base register update into the STGloop.
4364 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4365 // but STGloop is way too unusual for that, and also it only
4366 // realistically happens in function epilogue. Also, STGloop is expanded
4367 // before that pass.
4368 if (InsertI != MBB->end() &&
4369 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4370 &TotalOffset)) {
4371 UpdateInstr = &*InsertI++;
4372 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4373 << *UpdateInstr);
4374 }
4375 }
4376
4377 if (!UpdateInstr && TagStores.size() < 2)
4378 return;
4379
4380 if (UpdateInstr) {
4381 FrameRegUpdate = TotalOffset;
4382 FrameRegUpdateFlags = UpdateInstr->getFlags();
4383 }
4384 emitLoop(InsertI);
4385 if (UpdateInstr)
4386 UpdateInstr->eraseFromParent();
4387 }
4388
4389 for (auto &TS : TagStores)
4390 TS.MI->eraseFromParent();
4391}
4392
4393bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4394 int64_t &Size, bool &ZeroData) {
4395 MachineFunction &MF = *MI.getParent()->getParent();
4396 const MachineFrameInfo &MFI = MF.getFrameInfo();
4397
4398 unsigned Opcode = MI.getOpcode();
4399 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4400 Opcode == AArch64::STZ2Gi);
4401
4402 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4403 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4404 return false;
4405 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4406 return false;
4407 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4408 Size = MI.getOperand(2).getImm();
4409 return true;
4410 }
4411
4412 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4413 Size = 16;
4414 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4415 Size = 32;
4416 else
4417 return false;
4418
4419 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4420 return false;
4421
4422 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4423 16 * MI.getOperand(2).getImm();
4424 return true;
4425}
4426
4427// Detect a run of memory tagging instructions for adjacent stack frame slots,
4428// and replace them with a shorter instruction sequence:
4429// * replace STG + STG with ST2G
4430// * replace STGloop + STGloop with STGloop
4431// This code needs to run when stack slot offsets are already known, but before
4432// FrameIndex operands in STG instructions are eliminated.
4434 const AArch64FrameLowering *TFI,
4435 RegScavenger *RS) {
4436 bool FirstZeroData;
4437 int64_t Size, Offset;
4438 MachineInstr &MI = *II;
4439 MachineBasicBlock *MBB = MI.getParent();
4441 if (&MI == &MBB->instr_back())
4442 return II;
4443 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4444 return II;
4445
4447 Instrs.emplace_back(&MI, Offset, Size);
4448
4449 constexpr int kScanLimit = 10;
4450 int Count = 0;
4452 NextI != E && Count < kScanLimit; ++NextI) {
4453 MachineInstr &MI = *NextI;
4454 bool ZeroData;
4455 int64_t Size, Offset;
4456 // Collect instructions that update memory tags with a FrameIndex operand
4457 // and (when applicable) constant size, and whose output registers are dead
4458 // (the latter is almost always the case in practice). Since these
4459 // instructions effectively have no inputs or outputs, we are free to skip
4460 // any non-aliasing instructions in between without tracking used registers.
4461 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4462 if (ZeroData != FirstZeroData)
4463 break;
4464 Instrs.emplace_back(&MI, Offset, Size);
4465 continue;
4466 }
4467
4468 // Only count non-transient, non-tagging instructions toward the scan
4469 // limit.
4470 if (!MI.isTransient())
4471 ++Count;
4472
4473 // Just in case, stop before the epilogue code starts.
4474 if (MI.getFlag(MachineInstr::FrameSetup) ||
4476 break;
4477
4478 // Reject anything that may alias the collected instructions.
4479 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
4480 break;
4481 }
4482
4483 // New code will be inserted after the last tagging instruction we've found.
4484 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4485
4486 // All the gathered stack tag instructions are merged and placed after
4487 // last tag store in the list. The check should be made if the nzcv
4488 // flag is live at the point where we are trying to insert. Otherwise
4489 // the nzcv flag might get clobbered if any stg loops are present.
4490
4491 // FIXME : This approach of bailing out from merge is conservative in
4492 // some ways like even if stg loops are not present after merge the
4493 // insert list, this liveness check is done (which is not needed).
4495 LiveRegs.addLiveOuts(*MBB);
4496 for (auto I = MBB->rbegin();; ++I) {
4497 MachineInstr &MI = *I;
4498 if (MI == InsertI)
4499 break;
4500 LiveRegs.stepBackward(*I);
4501 }
4502 InsertI++;
4503 if (LiveRegs.contains(AArch64::NZCV))
4504 return InsertI;
4505
4506 llvm::stable_sort(Instrs,
4507 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4508 return Left.Offset < Right.Offset;
4509 });
4510
4511 // Make sure that we don't have any overlapping stores.
4512 int64_t CurOffset = Instrs[0].Offset;
4513 for (auto &Instr : Instrs) {
4514 if (CurOffset > Instr.Offset)
4515 return NextI;
4516 CurOffset = Instr.Offset + Instr.Size;
4517 }
4518
4519 // Find contiguous runs of tagged memory and emit shorter instruction
4520 // sequencies for them when possible.
4521 TagStoreEdit TSE(MBB, FirstZeroData);
4522 std::optional<int64_t> EndOffset;
4523 for (auto &Instr : Instrs) {
4524 if (EndOffset && *EndOffset != Instr.Offset) {
4525 // Found a gap.
4526 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4527 TSE.clear();
4528 }
4529
4530 TSE.addInstruction(Instr);
4531 EndOffset = Instr.Offset + Instr.Size;
4532 }
4533
4534 const MachineFunction *MF = MBB->getParent();
4535 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4536 TSE.emitCode(
4537 InsertI, TFI, /*TryMergeSPUpdate = */
4539
4540 return InsertI;
4541}
4542} // namespace
4543
4545 const AArch64FrameLowering *TFI) {
4546 MachineInstr &MI = *II;
4547 MachineBasicBlock *MBB = MI.getParent();
4548 MachineFunction *MF = MBB->getParent();
4549
4550 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4551 MI.getOpcode() != AArch64::VGRestorePseudo)
4552 return II;
4553
4554 SMEAttrs FuncAttrs(MF->getFunction());
4555 bool LocallyStreaming =
4556 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4559 const AArch64InstrInfo *TII =
4560 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4561
4562 int64_t VGFrameIdx =
4563 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4564 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4565 "Expected FrameIdx for VG");
4566
4567 unsigned CFIIndex;
4568 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4569 const MachineFrameInfo &MFI = MF->getFrameInfo();
4570 int64_t Offset =
4571 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4573 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4574 } else
4576 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4577
4578 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4579 TII->get(TargetOpcode::CFI_INSTRUCTION))
4580 .addCFIIndex(CFIIndex);
4581
4582 MI.eraseFromParent();
4583 return UnwindInst->getIterator();
4584}
4585
4587 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4589 for (auto &BB : MF)
4590 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4591 if (AFI->hasStreamingModeChanges())
4592 II = emitVGSaveRestore(II, this);
4594 II = tryMergeAdjacentSTG(II, this, RS);
4595 }
4596}
4597
4598/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4599/// before the update. This is easily retrieved as it is exactly the offset
4600/// that is set in processFunctionBeforeFrameFinalized.
4602 const MachineFunction &MF, int FI, Register &FrameReg,
4603 bool IgnoreSPUpdates) const {
4604 const MachineFrameInfo &MFI = MF.getFrameInfo();
4605 if (IgnoreSPUpdates) {
4606 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4607 << MFI.getObjectOffset(FI) << "\n");
4608 FrameReg = AArch64::SP;
4609 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4610 }
4611
4612 // Go to common code if we cannot provide sp + offset.
4613 if (MFI.hasVarSizedObjects() ||
4616 return getFrameIndexReference(MF, FI, FrameReg);
4617
4618 FrameReg = AArch64::SP;
4619 return getStackOffset(MF, MFI.getObjectOffset(FI));
4620}
4621
4622/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4623/// the parent's frame pointer
4625 const MachineFunction &MF) const {
4626 return 0;
4627}
4628
4629/// Funclets only need to account for space for the callee saved registers,
4630/// as the locals are accounted for in the parent's stack frame.
4632 const MachineFunction &MF) const {
4633 // This is the size of the pushed CSRs.
4634 unsigned CSSize =
4635 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4636 // This is the amount of stack a funclet needs to allocate.
4637 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4638 getStackAlign());
4639}
4640
4641namespace {
4642struct FrameObject {
4643 bool IsValid = false;
4644 // Index of the object in MFI.
4645 int ObjectIndex = 0;
4646 // Group ID this object belongs to.
4647 int GroupIndex = -1;
4648 // This object should be placed first (closest to SP).
4649 bool ObjectFirst = false;
4650 // This object's group (which always contains the object with
4651 // ObjectFirst==true) should be placed first.
4652 bool GroupFirst = false;
4653
4654 // Used to distinguish between FP and GPR accesses. The values are decided so
4655 // that they sort FPR < Hazard < GPR and they can be or'd together.
4656 unsigned Accesses = 0;
4657 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4658};
4659
4660class GroupBuilder {
4661 SmallVector<int, 8> CurrentMembers;
4662 int NextGroupIndex = 0;
4663 std::vector<FrameObject> &Objects;
4664
4665public:
4666 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4667 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4668 void EndCurrentGroup() {
4669 if (CurrentMembers.size() > 1) {
4670 // Create a new group with the current member list. This might remove them
4671 // from their pre-existing groups. That's OK, dealing with overlapping
4672 // groups is too hard and unlikely to make a difference.
4673 LLVM_DEBUG(dbgs() << "group:");
4674 for (int Index : CurrentMembers) {
4675 Objects[Index].GroupIndex = NextGroupIndex;
4676 LLVM_DEBUG(dbgs() << " " << Index);
4677 }
4678 LLVM_DEBUG(dbgs() << "\n");
4679 NextGroupIndex++;
4680 }
4681 CurrentMembers.clear();
4682 }
4683};
4684
4685bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4686 // Objects at a lower index are closer to FP; objects at a higher index are
4687 // closer to SP.
4688 //
4689 // For consistency in our comparison, all invalid objects are placed
4690 // at the end. This also allows us to stop walking when we hit the
4691 // first invalid item after it's all sorted.
4692 //
4693 // If we want to include a stack hazard region, order FPR accesses < the
4694 // hazard object < GPRs accesses in order to create a separation between the
4695 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4696 //
4697 // Otherwise the "first" object goes first (closest to SP), followed by the
4698 // members of the "first" group.
4699 //
4700 // The rest are sorted by the group index to keep the groups together.
4701 // Higher numbered groups are more likely to be around longer (i.e. untagged
4702 // in the function epilogue and not at some earlier point). Place them closer
4703 // to SP.
4704 //
4705 // If all else equal, sort by the object index to keep the objects in the
4706 // original order.
4707 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4708 A.GroupIndex, A.ObjectIndex) <
4709 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4710 B.GroupIndex, B.ObjectIndex);
4711}
4712} // namespace
4713
4715 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4716 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4717 return;
4718
4720 const MachineFrameInfo &MFI = MF.getFrameInfo();
4721 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4722 for (auto &Obj : ObjectsToAllocate) {
4723 FrameObjects[Obj].IsValid = true;
4724 FrameObjects[Obj].ObjectIndex = Obj;
4725 }
4726
4727 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4728 // the same time.
4729 GroupBuilder GB(FrameObjects);
4730 for (auto &MBB : MF) {
4731 for (auto &MI : MBB) {
4732 if (MI.isDebugInstr())
4733 continue;
4734
4735 if (AFI.hasStackHazardSlotIndex()) {
4736 std::optional<int> FI = getLdStFrameID(MI, MFI);
4737 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4738 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4740 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4741 else
4742 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4743 }
4744 }
4745
4746 int OpIndex;
4747 switch (MI.getOpcode()) {
4748 case AArch64::STGloop:
4749 case AArch64::STZGloop:
4750 OpIndex = 3;
4751 break;
4752 case AArch64::STGi:
4753 case AArch64::STZGi:
4754 case AArch64::ST2Gi:
4755 case AArch64::STZ2Gi:
4756 OpIndex = 1;
4757 break;
4758 default:
4759 OpIndex = -1;
4760 }
4761
4762 int TaggedFI = -1;
4763 if (OpIndex >= 0) {
4764 const MachineOperand &MO = MI.getOperand(OpIndex);
4765 if (MO.isFI()) {
4766 int FI = MO.getIndex();
4767 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4768 FrameObjects[FI].IsValid)
4769 TaggedFI = FI;
4770 }
4771 }
4772
4773 // If this is a stack tagging instruction for a slot that is not part of a
4774 // group yet, either start a new group or add it to the current one.
4775 if (TaggedFI >= 0)
4776 GB.AddMember(TaggedFI);
4777 else
4778 GB.EndCurrentGroup();
4779 }
4780 // Groups should never span multiple basic blocks.
4781 GB.EndCurrentGroup();
4782 }
4783
4784 if (AFI.hasStackHazardSlotIndex()) {
4785 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4786 FrameObject::AccessHazard;
4787 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4788 for (auto &Obj : FrameObjects)
4789 if (!Obj.Accesses ||
4790 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4791 Obj.Accesses = FrameObject::AccessGPR;
4792 }
4793
4794 // If the function's tagged base pointer is pinned to a stack slot, we want to
4795 // put that slot first when possible. This will likely place it at SP + 0,
4796 // and save one instruction when generating the base pointer because IRG does
4797 // not allow an immediate offset.
4798 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4799 if (TBPI) {
4800 FrameObjects[*TBPI].ObjectFirst = true;
4801 FrameObjects[*TBPI].GroupFirst = true;
4802 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4803 if (FirstGroupIndex >= 0)
4804 for (FrameObject &Object : FrameObjects)
4805 if (Object.GroupIndex == FirstGroupIndex)
4806 Object.GroupFirst = true;
4807 }
4808
4809 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4810
4811 int i = 0;
4812 for (auto &Obj : FrameObjects) {
4813 // All invalid items are sorted at the end, so it's safe to stop.
4814 if (!Obj.IsValid)
4815 break;
4816 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4817 }
4818
4819 LLVM_DEBUG({
4820 dbgs() << "Final frame order:\n";
4821 for (auto &Obj : FrameObjects) {
4822 if (!Obj.IsValid)
4823 break;
4824 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4825 if (Obj.ObjectFirst)
4826 dbgs() << ", first";
4827 if (Obj.GroupFirst)
4828 dbgs() << ", group-first";
4829 dbgs() << "\n";
4830 }
4831 });
4832}
4833
4834/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4835/// least every ProbeSize bytes. Returns an iterator of the first instruction
4836/// after the loop. The difference between SP and TargetReg must be an exact
4837/// multiple of ProbeSize.
4839AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4840 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4841 Register TargetReg) const {
4843 MachineFunction &MF = *MBB.getParent();
4844 const AArch64InstrInfo *TII =
4845 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4847
4848 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4850 MF.insert(MBBInsertPoint, LoopMBB);
4852 MF.insert(MBBInsertPoint, ExitMBB);
4853
4854 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4855 // in SUB).
4856 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4857 StackOffset::getFixed(-ProbeSize), TII,
4859 // STR XZR, [SP]
4860 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4861 .addReg(AArch64::XZR)
4862 .addReg(AArch64::SP)
4863 .addImm(0)
4865 // CMP SP, TargetReg
4866 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4867 AArch64::XZR)
4868 .addReg(AArch64::SP)
4869 .addReg(TargetReg)
4872 // B.CC Loop
4873 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4875 .addMBB(LoopMBB)
4877
4878 LoopMBB->addSuccessor(ExitMBB);
4879 LoopMBB->addSuccessor(LoopMBB);
4880 // Synthesize the exit MBB.
4881 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4883 MBB.addSuccessor(LoopMBB);
4884 // Update liveins.
4885 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4886
4887 return ExitMBB->begin();
4888}
4889
4890void AArch64FrameLowering::inlineStackProbeFixed(
4891 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4892 StackOffset CFAOffset) const {
4894 MachineFunction &MF = *MBB->getParent();
4895 const AArch64InstrInfo *TII =
4896 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4898 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4899 bool HasFP = hasFP(MF);
4900
4901 DebugLoc DL;
4902 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4903 int64_t NumBlocks = FrameSize / ProbeSize;
4904 int64_t ResidualSize = FrameSize % ProbeSize;
4905
4906 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4907 << NumBlocks << " blocks of " << ProbeSize
4908 << " bytes, plus " << ResidualSize << " bytes\n");
4909
4910 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4911 // ordinary loop.
4912 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4913 for (int i = 0; i < NumBlocks; ++i) {
4914 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4915 // encodable in a SUB).
4916 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4917 StackOffset::getFixed(-ProbeSize), TII,
4918 MachineInstr::FrameSetup, false, false, nullptr,
4919 EmitAsyncCFI && !HasFP, CFAOffset);
4920 CFAOffset += StackOffset::getFixed(ProbeSize);
4921 // STR XZR, [SP]
4922 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4923 .addReg(AArch64::XZR)
4924 .addReg(AArch64::SP)
4925 .addImm(0)
4927 }
4928 } else if (NumBlocks != 0) {
4929 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4930 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4931 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4932 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4933 MachineInstr::FrameSetup, false, false, nullptr,
4934 EmitAsyncCFI && !HasFP, CFAOffset);
4935 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4936 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4937 MBB = MBBI->getParent();
4938 if (EmitAsyncCFI && !HasFP) {
4939 // Set the CFA register back to SP.
4941 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
4942 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
4943 unsigned CFIIndex =
4945 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
4946 .addCFIIndex(CFIIndex)
4948 }
4949 }
4950
4951 if (ResidualSize != 0) {
4952 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4953 // in SUB).
4954 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4955 StackOffset::getFixed(-ResidualSize), TII,
4956 MachineInstr::FrameSetup, false, false, nullptr,
4957 EmitAsyncCFI && !HasFP, CFAOffset);
4958 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4959 // STR XZR, [SP]
4960 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4961 .addReg(AArch64::XZR)
4962 .addReg(AArch64::SP)
4963 .addImm(0)
4965 }
4966 }
4967}
4968
4969void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4970 MachineBasicBlock &MBB) const {
4971 // Get the instructions that need to be replaced. We emit at most two of
4972 // these. Remember them in order to avoid complications coming from the need
4973 // to traverse the block while potentially creating more blocks.
4975 for (MachineInstr &MI : MBB)
4976 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4977 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4978 ToReplace.push_back(&MI);
4979
4980 for (MachineInstr *MI : ToReplace) {
4981 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4982 Register ScratchReg = MI->getOperand(0).getReg();
4983 int64_t FrameSize = MI->getOperand(1).getImm();
4984 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4985 MI->getOperand(3).getImm());
4986 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4987 CFAOffset);
4988 } else {
4989 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4990 "Stack probe pseudo-instruction expected");
4991 const AArch64InstrInfo *TII =
4992 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4993 Register TargetReg = MI->getOperand(0).getReg();
4994 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4995 }
4996 MI->eraseFromParent();
4997 }
4998}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static cl::opt< unsigned > StackHazardSize("aarch64-stack-hazard-size", cl::init(0), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:148
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:698
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:695
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:274
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:350
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:719
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:793
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:565
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:600
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:573
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:633
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:626
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:558
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:664
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:647
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:346
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isLiveIn(MCPhysReg Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:409
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:391
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:586
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1299
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
self_iterator getIterator()
Definition: ilist_node.h:132
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:1995
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1729
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:419
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.