LLVM 19.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
243 "stack-tagging-merge-settag",
244 cl::desc("merge settag instruction in function epilog"), cl::init(true),
245 cl::Hidden);
246
247static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
248 cl::desc("sort stack allocations"),
249 cl::init(true), cl::Hidden);
250
252 "homogeneous-prolog-epilog", cl::Hidden,
253 cl::desc("Emit homogeneous prologue and epilogue for the size "
254 "optimization (default = off)"));
255
256STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
257
258/// Returns how much of the incoming argument stack area (in bytes) we should
259/// clean up in an epilogue. For the C calling convention this will be 0, for
260/// guaranteed tail call conventions it can be positive (a normal return or a
261/// tail call to a function that uses less stack space for arguments) or
262/// negative (for a tail call to a function that needs more stack space than us
263/// for arguments).
268 bool IsTailCallReturn = (MBB.end() != MBBI)
270 : false;
271
272 int64_t ArgumentPopSize = 0;
273 if (IsTailCallReturn) {
274 MachineOperand &StackAdjust = MBBI->getOperand(1);
275
276 // For a tail-call in a callee-pops-arguments environment, some or all of
277 // the stack may actually be in use for the call's arguments, this is
278 // calculated during LowerCall and consumed here...
279 ArgumentPopSize = StackAdjust.getImm();
280 } else {
281 // ... otherwise the amount to pop is *all* of the argument space,
282 // conveniently stored in the MachineFunctionInfo by
283 // LowerFormalArguments. This will, of course, be zero for the C calling
284 // convention.
285 ArgumentPopSize = AFI->getArgumentStackToRestore();
286 }
287
288 return ArgumentPopSize;
289}
290
292static bool needsWinCFI(const MachineFunction &MF);
295
296/// Returns true if a homogeneous prolog or epilog code can be emitted
297/// for the size optimization. If possible, a frame helper call is injected.
298/// When Exit block is given, this check is for epilog.
299bool AArch64FrameLowering::homogeneousPrologEpilog(
300 MachineFunction &MF, MachineBasicBlock *Exit) const {
301 if (!MF.getFunction().hasMinSize())
302 return false;
304 return false;
305 if (EnableRedZone)
306 return false;
307
308 // TODO: Window is supported yet.
309 if (needsWinCFI(MF))
310 return false;
311 // TODO: SVE is not supported yet.
312 if (getSVEStackSize(MF))
313 return false;
314
315 // Bail on stack adjustment needed on return for simplicity.
316 const MachineFrameInfo &MFI = MF.getFrameInfo();
318 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
319 return false;
320 if (Exit && getArgumentStackToRestore(MF, *Exit))
321 return false;
322
323 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
324 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
325 return false;
326
327 // If there are an odd number of GPRs before LR and FP in the CSRs list,
328 // they will not be paired into one RegPairInfo, which is incompatible with
329 // the assumption made by the homogeneous prolog epilog pass.
330 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
331 unsigned NumGPRs = 0;
332 for (unsigned I = 0; CSRegs[I]; ++I) {
333 Register Reg = CSRegs[I];
334 if (Reg == AArch64::LR) {
335 assert(CSRegs[I + 1] == AArch64::FP);
336 if (NumGPRs % 2 != 0)
337 return false;
338 break;
339 }
340 if (AArch64::GPR64RegClass.contains(Reg))
341 ++NumGPRs;
342 }
343
344 return true;
345}
346
347/// Returns true if CSRs should be paired.
348bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
349 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
350}
351
352/// This is the biggest offset to the stack pointer we can encode in aarch64
353/// instructions (without using a separate calculation and a temp register).
354/// Note that the exception here are vector stores/loads which cannot encode any
355/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
356static const unsigned DefaultSafeSPDisplacement = 255;
357
358/// Look at each instruction that references stack frames and return the stack
359/// size limit beyond which some of these instructions will require a scratch
360/// register during their expansion later.
362 // FIXME: For now, just conservatively guestimate based on unscaled indexing
363 // range. We'll end up allocating an unnecessary spill slot a lot, but
364 // realistically that's not a big deal at this stage of the game.
365 for (MachineBasicBlock &MBB : MF) {
366 for (MachineInstr &MI : MBB) {
367 if (MI.isDebugInstr() || MI.isPseudo() ||
368 MI.getOpcode() == AArch64::ADDXri ||
369 MI.getOpcode() == AArch64::ADDSXri)
370 continue;
371
372 for (const MachineOperand &MO : MI.operands()) {
373 if (!MO.isFI())
374 continue;
375
377 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
379 return 0;
380 }
381 }
382 }
384}
385
389}
390
391/// Returns the size of the fixed object area (allocated next to sp on entry)
392/// On Win64 this may include a var args area and an UnwindHelp object for EH.
393static unsigned getFixedObjectSize(const MachineFunction &MF,
394 const AArch64FunctionInfo *AFI, bool IsWin64,
395 bool IsFunclet) {
396 if (!IsWin64 || IsFunclet) {
397 return AFI->getTailCallReservedStack();
398 } else {
399 if (AFI->getTailCallReservedStack() != 0 &&
401 Attribute::SwiftAsync))
402 report_fatal_error("cannot generate ABI-changing tail call for Win64");
403 // Var args are stored here in the primary function.
404 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
405 // To support EH funclets we allocate an UnwindHelp object
406 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
407 return AFI->getTailCallReservedStack() +
408 alignTo(VarArgsArea + UnwindHelpObject, 16);
409 }
410}
411
412/// Returns the size of the entire SVE stackframe (calleesaves + spills).
415 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
416}
417
419 if (!EnableRedZone)
420 return false;
421
422 // Don't use the red zone if the function explicitly asks us not to.
423 // This is typically used for kernel code.
424 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
425 const unsigned RedZoneSize =
427 if (!RedZoneSize)
428 return false;
429
430 const MachineFrameInfo &MFI = MF.getFrameInfo();
432 uint64_t NumBytes = AFI->getLocalStackSize();
433
434 // If neither NEON or SVE are available, a COPY from one Q-reg to
435 // another requires a spill -> reload sequence. We can do that
436 // using a pre-decrementing store/post-decrementing load, but
437 // if we do so, we can't use the Red Zone.
438 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
439 !Subtarget.isNeonAvailable() &&
440 !Subtarget.hasSVE();
441
442 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
443 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
444}
445
446/// hasFP - Return true if the specified function should have a dedicated frame
447/// pointer register.
449 const MachineFrameInfo &MFI = MF.getFrameInfo();
450 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
451
452 // Win64 EH requires a frame pointer if funclets are present, as the locals
453 // are accessed off the frame pointer in both the parent function and the
454 // funclets.
455 if (MF.hasEHFunclets())
456 return true;
457 // Retain behavior of always omitting the FP for leaf functions when possible.
459 return true;
460 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
461 MFI.hasStackMap() || MFI.hasPatchPoint() ||
462 RegInfo->hasStackRealignment(MF))
463 return true;
464 // With large callframes around we may need to use FP to access the scavenging
465 // emergency spillslot.
466 //
467 // Unfortunately some calls to hasFP() like machine verifier ->
468 // getReservedReg() -> hasFP in the middle of global isel are too early
469 // to know the max call frame size. Hopefully conservatively returning "true"
470 // in those cases is fine.
471 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
472 if (!MFI.isMaxCallFrameSizeComputed() ||
474 return true;
475
476 return false;
477}
478
479/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
480/// not required, we reserve argument space for call sites in the function
481/// immediately on entry to the current function. This eliminates the need for
482/// add/sub sp brackets around call sites. Returns true if the call frame is
483/// included as part of the stack frame.
484bool
486 // The stack probing code for the dynamically allocated outgoing arguments
487 // area assumes that the stack is probed at the top - either by the prologue
488 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
489 // most recent variable-sized object allocation. Changing the condition here
490 // may need to be followed up by changes to the probe issuing logic.
491 return !MF.getFrameInfo().hasVarSizedObjects();
492}
493
497 const AArch64InstrInfo *TII =
498 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
499 const AArch64TargetLowering *TLI =
500 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
501 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
502 DebugLoc DL = I->getDebugLoc();
503 unsigned Opc = I->getOpcode();
504 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
505 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
506
507 if (!hasReservedCallFrame(MF)) {
508 int64_t Amount = I->getOperand(0).getImm();
509 Amount = alignTo(Amount, getStackAlign());
510 if (!IsDestroy)
511 Amount = -Amount;
512
513 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
514 // doesn't have to pop anything), then the first operand will be zero too so
515 // this adjustment is a no-op.
516 if (CalleePopAmount == 0) {
517 // FIXME: in-function stack adjustment for calls is limited to 24-bits
518 // because there's no guaranteed temporary register available.
519 //
520 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
521 // 1) For offset <= 12-bit, we use LSL #0
522 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
523 // LSL #0, and the other uses LSL #12.
524 //
525 // Most call frames will be allocated at the start of a function so
526 // this is OK, but it is a limitation that needs dealing with.
527 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
528
529 if (TLI->hasInlineStackProbe(MF) &&
531 // When stack probing is enabled, the decrement of SP may need to be
532 // probed. We only need to do this if the call site needs 1024 bytes of
533 // space or more, because a region smaller than that is allowed to be
534 // unprobed at an ABI boundary. We rely on the fact that SP has been
535 // probed exactly at this point, either by the prologue or most recent
536 // dynamic allocation.
538 "non-reserved call frame without var sized objects?");
539 Register ScratchReg =
540 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
541 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
542 } else {
543 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
544 StackOffset::getFixed(Amount), TII);
545 }
546 }
547 } else if (CalleePopAmount != 0) {
548 // If the calling convention demands that the callee pops arguments from the
549 // stack, we want to add it back if we have a reserved call frame.
550 assert(CalleePopAmount < 0xffffff && "call frame too large");
551 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
552 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
553 }
554 return MBB.erase(I);
555}
556
557void AArch64FrameLowering::emitCalleeSavedGPRLocations(
560 MachineFrameInfo &MFI = MF.getFrameInfo();
562 SMEAttrs Attrs(MF.getFunction());
563 bool LocallyStreaming =
564 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
565
566 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
567 if (CSI.empty())
568 return;
569
570 const TargetSubtargetInfo &STI = MF.getSubtarget();
571 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
572 const TargetInstrInfo &TII = *STI.getInstrInfo();
574
575 for (const auto &Info : CSI) {
576 unsigned FrameIdx = Info.getFrameIdx();
577 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
578 continue;
579
580 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
581 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
582 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
583
584 // The location of VG will be emitted before each streaming-mode change in
585 // the function. Only locally-streaming functions require emitting the
586 // non-streaming VG location here.
587 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
588 (!LocallyStreaming &&
589 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
590 continue;
591
592 unsigned CFIIndex = MF.addFrameInst(
593 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
594 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
595 .addCFIIndex(CFIIndex)
597 }
598}
599
600void AArch64FrameLowering::emitCalleeSavedSVELocations(
603 MachineFrameInfo &MFI = MF.getFrameInfo();
604
605 // Add callee saved registers to move list.
606 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
607 if (CSI.empty())
608 return;
609
610 const TargetSubtargetInfo &STI = MF.getSubtarget();
611 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
612 const TargetInstrInfo &TII = *STI.getInstrInfo();
615
616 for (const auto &Info : CSI) {
617 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
618 continue;
619
620 // Not all unwinders may know about SVE registers, so assume the lowest
621 // common demoninator.
622 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
623 unsigned Reg = Info.getReg();
624 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
625 continue;
626
628 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
630
631 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
632 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
633 .addCFIIndex(CFIIndex)
635 }
636}
637
641 unsigned DwarfReg) {
642 unsigned CFIIndex =
643 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
644 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
645}
646
648 MachineBasicBlock &MBB) const {
649
651 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
652 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
653 const auto &TRI =
654 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
655 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
656
657 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
658 DebugLoc DL;
659
660 // Reset the CFA to `SP + 0`.
662 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
663 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
664 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
665
666 // Flip the RA sign state.
667 if (MFI.shouldSignReturnAddress(MF)) {
669 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
670 }
671
672 // Shadow call stack uses X18, reset it.
673 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
674 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
675 TRI.getDwarfRegNum(AArch64::X18, true));
676
677 // Emit .cfi_same_value for callee-saved registers.
678 const std::vector<CalleeSavedInfo> &CSI =
680 for (const auto &Info : CSI) {
681 unsigned Reg = Info.getReg();
682 if (!TRI.regNeedsCFI(Reg, Reg))
683 continue;
684 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
685 TRI.getDwarfRegNum(Reg, true));
686 }
687}
688
691 bool SVE) {
693 MachineFrameInfo &MFI = MF.getFrameInfo();
694
695 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
696 if (CSI.empty())
697 return;
698
699 const TargetSubtargetInfo &STI = MF.getSubtarget();
700 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
701 const TargetInstrInfo &TII = *STI.getInstrInfo();
703
704 for (const auto &Info : CSI) {
705 if (SVE !=
706 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
707 continue;
708
709 unsigned Reg = Info.getReg();
710 if (SVE &&
711 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
712 continue;
713
714 if (!Info.isRestored())
715 continue;
716
717 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
718 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
719 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
720 .addCFIIndex(CFIIndex)
722 }
723}
724
725void AArch64FrameLowering::emitCalleeSavedGPRRestores(
728}
729
730void AArch64FrameLowering::emitCalleeSavedSVERestores(
733}
734
735// Return the maximum possible number of bytes for `Size` due to the
736// architectural limit on the size of a SVE register.
737static int64_t upperBound(StackOffset Size) {
738 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
739 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
740}
741
742void AArch64FrameLowering::allocateStackSpace(
744 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
745 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
746 bool FollowupAllocs) const {
747
748 if (!AllocSize)
749 return;
750
751 DebugLoc DL;
753 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
754 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
756 const MachineFrameInfo &MFI = MF.getFrameInfo();
757
758 const int64_t MaxAlign = MFI.getMaxAlign().value();
759 const uint64_t AndMask = ~(MaxAlign - 1);
760
761 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
762 Register TargetReg = RealignmentPadding
764 : AArch64::SP;
765 // SUB Xd/SP, SP, AllocSize
766 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
767 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
768 EmitCFI, InitialOffset);
769
770 if (RealignmentPadding) {
771 // AND SP, X9, 0b11111...0000
772 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
773 .addReg(TargetReg, RegState::Kill)
776 AFI.setStackRealigned(true);
777
778 // No need for SEH instructions here; if we're realigning the stack,
779 // we've set a frame pointer and already finished the SEH prologue.
780 assert(!NeedsWinCFI);
781 }
782 return;
783 }
784
785 //
786 // Stack probing allocation.
787 //
788
789 // Fixed length allocation. If we don't need to re-align the stack and don't
790 // have SVE objects, we can use a more efficient sequence for stack probing.
791 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
793 assert(ScratchReg != AArch64::NoRegister);
794 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
795 .addDef(ScratchReg)
796 .addImm(AllocSize.getFixed())
797 .addImm(InitialOffset.getFixed())
798 .addImm(InitialOffset.getScalable());
799 // The fixed allocation may leave unprobed bytes at the top of the
800 // stack. If we have subsequent alocation (e.g. if we have variable-sized
801 // objects), we need to issue an extra probe, so these allocations start in
802 // a known state.
803 if (FollowupAllocs) {
804 // STR XZR, [SP]
805 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
806 .addReg(AArch64::XZR)
807 .addReg(AArch64::SP)
808 .addImm(0)
810 }
811
812 return;
813 }
814
815 // Variable length allocation.
816
817 // If the (unknown) allocation size cannot exceed the probe size, decrement
818 // the stack pointer right away.
819 int64_t ProbeSize = AFI.getStackProbeSize();
820 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
821 Register ScratchReg = RealignmentPadding
823 : AArch64::SP;
824 assert(ScratchReg != AArch64::NoRegister);
825 // SUB Xd, SP, AllocSize
826 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
827 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
828 EmitCFI, InitialOffset);
829 if (RealignmentPadding) {
830 // AND SP, Xn, 0b11111...0000
831 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
832 .addReg(ScratchReg, RegState::Kill)
835 AFI.setStackRealigned(true);
836 }
837 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
839 // STR XZR, [SP]
840 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
841 .addReg(AArch64::XZR)
842 .addReg(AArch64::SP)
843 .addImm(0)
845 }
846 return;
847 }
848
849 // Emit a variable-length allocation probing loop.
850 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
851 // each of them guaranteed to adjust the stack by less than the probe size.
853 assert(TargetReg != AArch64::NoRegister);
854 // SUB Xd, SP, AllocSize
855 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
856 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
857 EmitCFI, InitialOffset);
858 if (RealignmentPadding) {
859 // AND Xn, Xn, 0b11111...0000
860 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
861 .addReg(TargetReg, RegState::Kill)
864 }
865
866 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
867 .addReg(TargetReg);
868 if (EmitCFI) {
869 // Set the CFA register back to SP.
870 unsigned Reg =
871 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
872 unsigned CFIIndex =
874 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
875 .addCFIIndex(CFIIndex)
877 }
878 if (RealignmentPadding)
879 AFI.setStackRealigned(true);
880}
881
882static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
883 switch (Reg.id()) {
884 default:
885 // The called routine is expected to preserve r19-r28
886 // r29 and r30 are used as frame pointer and link register resp.
887 return 0;
888
889 // GPRs
890#define CASE(n) \
891 case AArch64::W##n: \
892 case AArch64::X##n: \
893 return AArch64::X##n
894 CASE(0);
895 CASE(1);
896 CASE(2);
897 CASE(3);
898 CASE(4);
899 CASE(5);
900 CASE(6);
901 CASE(7);
902 CASE(8);
903 CASE(9);
904 CASE(10);
905 CASE(11);
906 CASE(12);
907 CASE(13);
908 CASE(14);
909 CASE(15);
910 CASE(16);
911 CASE(17);
912 CASE(18);
913#undef CASE
914
915 // FPRs
916#define CASE(n) \
917 case AArch64::B##n: \
918 case AArch64::H##n: \
919 case AArch64::S##n: \
920 case AArch64::D##n: \
921 case AArch64::Q##n: \
922 return HasSVE ? AArch64::Z##n : AArch64::Q##n
923 CASE(0);
924 CASE(1);
925 CASE(2);
926 CASE(3);
927 CASE(4);
928 CASE(5);
929 CASE(6);
930 CASE(7);
931 CASE(8);
932 CASE(9);
933 CASE(10);
934 CASE(11);
935 CASE(12);
936 CASE(13);
937 CASE(14);
938 CASE(15);
939 CASE(16);
940 CASE(17);
941 CASE(18);
942 CASE(19);
943 CASE(20);
944 CASE(21);
945 CASE(22);
946 CASE(23);
947 CASE(24);
948 CASE(25);
949 CASE(26);
950 CASE(27);
951 CASE(28);
952 CASE(29);
953 CASE(30);
954 CASE(31);
955#undef CASE
956 }
957}
958
959void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
960 MachineBasicBlock &MBB) const {
961 // Insertion point.
963
964 // Fake a debug loc.
965 DebugLoc DL;
966 if (MBBI != MBB.end())
967 DL = MBBI->getDebugLoc();
968
969 const MachineFunction &MF = *MBB.getParent();
972
973 BitVector GPRsToZero(TRI.getNumRegs());
974 BitVector FPRsToZero(TRI.getNumRegs());
975 bool HasSVE = STI.hasSVE();
976 for (MCRegister Reg : RegsToZero.set_bits()) {
977 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
978 // For GPRs, we only care to clear out the 64-bit register.
979 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
980 GPRsToZero.set(XReg);
981 } else if (AArch64::FPR128RegClass.contains(Reg) ||
982 AArch64::FPR64RegClass.contains(Reg) ||
983 AArch64::FPR32RegClass.contains(Reg) ||
984 AArch64::FPR16RegClass.contains(Reg) ||
985 AArch64::FPR8RegClass.contains(Reg)) {
986 // For FPRs,
987 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
988 FPRsToZero.set(XReg);
989 }
990 }
991
992 const AArch64InstrInfo &TII = *STI.getInstrInfo();
993
994 // Zero out GPRs.
995 for (MCRegister Reg : GPRsToZero.set_bits())
996 TII.buildClearRegister(Reg, MBB, MBBI, DL);
997
998 // Zero out FP/vector registers.
999 for (MCRegister Reg : FPRsToZero.set_bits())
1000 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1001
1002 if (HasSVE) {
1003 for (MCRegister PReg :
1004 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1005 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1006 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1007 AArch64::P15}) {
1008 if (RegsToZero[PReg])
1009 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1010 }
1011 }
1012}
1013
1015 const MachineBasicBlock &MBB) {
1016 const MachineFunction *MF = MBB.getParent();
1017 LiveRegs.addLiveIns(MBB);
1018 // Mark callee saved registers as used so we will not choose them.
1019 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1020 for (unsigned i = 0; CSRegs[i]; ++i)
1021 LiveRegs.addReg(CSRegs[i]);
1022}
1023
1024// Find a scratch register that we can use at the start of the prologue to
1025// re-align the stack pointer. We avoid using callee-save registers since they
1026// may appear to be free when this is called from canUseAsPrologue (during
1027// shrink wrapping), but then no longer be free when this is called from
1028// emitPrologue.
1029//
1030// FIXME: This is a bit conservative, since in the above case we could use one
1031// of the callee-save registers as a scratch temp to re-align the stack pointer,
1032// but we would then have to make sure that we were in fact saving at least one
1033// callee-save register in the prologue, which is additional complexity that
1034// doesn't seem worth the benefit.
1036 MachineFunction *MF = MBB->getParent();
1037
1038 // If MBB is an entry block, use X9 as the scratch register
1039 // preserve_none functions may be using X9 to pass arguments,
1040 // so prefer to pick an available register below.
1041 if (&MF->front() == MBB &&
1043 return AArch64::X9;
1044
1045 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1046 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1047 LivePhysRegs LiveRegs(TRI);
1048 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1049
1050 // Prefer X9 since it was historically used for the prologue scratch reg.
1051 const MachineRegisterInfo &MRI = MF->getRegInfo();
1052 if (LiveRegs.available(MRI, AArch64::X9))
1053 return AArch64::X9;
1054
1055 for (unsigned Reg : AArch64::GPR64RegClass) {
1056 if (LiveRegs.available(MRI, Reg))
1057 return Reg;
1058 }
1059 return AArch64::NoRegister;
1060}
1061
1063 const MachineBasicBlock &MBB) const {
1064 const MachineFunction *MF = MBB.getParent();
1065 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1066 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1067 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1068 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1070
1071 if (AFI->hasSwiftAsyncContext()) {
1072 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1073 const MachineRegisterInfo &MRI = MF->getRegInfo();
1074 LivePhysRegs LiveRegs(TRI);
1075 getLiveRegsForEntryMBB(LiveRegs, MBB);
1076 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1077 // available.
1078 if (!LiveRegs.available(MRI, AArch64::X16) ||
1079 !LiveRegs.available(MRI, AArch64::X17))
1080 return false;
1081 }
1082
1083 // Certain stack probing sequences might clobber flags, then we can't use
1084 // the block as a prologue if the flags register is a live-in.
1086 MBB.isLiveIn(AArch64::NZCV))
1087 return false;
1088
1089 // Don't need a scratch register if we're not going to re-align the stack or
1090 // emit stack probes.
1091 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1092 return true;
1093 // Otherwise, we can use any block as long as it has a scratch register
1094 // available.
1095 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1096}
1097
1099 uint64_t StackSizeInBytes) {
1100 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1102 // TODO: When implementing stack protectors, take that into account
1103 // for the probe threshold.
1104 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1105 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1106}
1107
1108static bool needsWinCFI(const MachineFunction &MF) {
1109 const Function &F = MF.getFunction();
1110 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1111 F.needsUnwindTableEntry();
1112}
1113
1114bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1115 MachineFunction &MF, uint64_t StackBumpBytes) const {
1117 const MachineFrameInfo &MFI = MF.getFrameInfo();
1118 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1119 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1120 if (homogeneousPrologEpilog(MF))
1121 return false;
1122
1123 if (AFI->getLocalStackSize() == 0)
1124 return false;
1125
1126 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1127 // (to force a stp with predecrement) to match the packed unwind format,
1128 // provided that there actually are any callee saved registers to merge the
1129 // decrement with.
1130 // This is potentially marginally slower, but allows using the packed
1131 // unwind format for functions that both have a local area and callee saved
1132 // registers. Using the packed unwind format notably reduces the size of
1133 // the unwind info.
1134 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1135 MF.getFunction().hasOptSize())
1136 return false;
1137
1138 // 512 is the maximum immediate for stp/ldp that will be used for
1139 // callee-save save/restores
1140 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1141 return false;
1142
1143 if (MFI.hasVarSizedObjects())
1144 return false;
1145
1146 if (RegInfo->hasStackRealignment(MF))
1147 return false;
1148
1149 // This isn't strictly necessary, but it simplifies things a bit since the
1150 // current RedZone handling code assumes the SP is adjusted by the
1151 // callee-save save/restore code.
1152 if (canUseRedZone(MF))
1153 return false;
1154
1155 // When there is an SVE area on the stack, always allocate the
1156 // callee-saves and spills/locals separately.
1157 if (getSVEStackSize(MF))
1158 return false;
1159
1160 return true;
1161}
1162
1163bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1164 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
1165 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1166 return false;
1167
1168 if (MBB.empty())
1169 return true;
1170
1171 // Disable combined SP bump if the last instruction is an MTE tag store. It
1172 // is almost always better to merge SP adjustment into those instructions.
1175 while (LastI != Begin) {
1176 --LastI;
1177 if (LastI->isTransient())
1178 continue;
1179 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1180 break;
1181 }
1182 switch (LastI->getOpcode()) {
1183 case AArch64::STGloop:
1184 case AArch64::STZGloop:
1185 case AArch64::STGi:
1186 case AArch64::STZGi:
1187 case AArch64::ST2Gi:
1188 case AArch64::STZ2Gi:
1189 return false;
1190 default:
1191 return true;
1192 }
1193 llvm_unreachable("unreachable");
1194}
1195
1196// Given a load or a store instruction, generate an appropriate unwinding SEH
1197// code on Windows.
1199 const TargetInstrInfo &TII,
1200 MachineInstr::MIFlag Flag) {
1201 unsigned Opc = MBBI->getOpcode();
1203 MachineFunction &MF = *MBB->getParent();
1204 DebugLoc DL = MBBI->getDebugLoc();
1205 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1206 int Imm = MBBI->getOperand(ImmIdx).getImm();
1208 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1209 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1210
1211 switch (Opc) {
1212 default:
1213 llvm_unreachable("No SEH Opcode for this instruction");
1214 case AArch64::LDPDpost:
1215 Imm = -Imm;
1216 [[fallthrough]];
1217 case AArch64::STPDpre: {
1218 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1219 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1220 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1221 .addImm(Reg0)
1222 .addImm(Reg1)
1223 .addImm(Imm * 8)
1224 .setMIFlag(Flag);
1225 break;
1226 }
1227 case AArch64::LDPXpost:
1228 Imm = -Imm;
1229 [[fallthrough]];
1230 case AArch64::STPXpre: {
1231 Register Reg0 = MBBI->getOperand(1).getReg();
1232 Register Reg1 = MBBI->getOperand(2).getReg();
1233 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1234 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1235 .addImm(Imm * 8)
1236 .setMIFlag(Flag);
1237 else
1238 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1239 .addImm(RegInfo->getSEHRegNum(Reg0))
1240 .addImm(RegInfo->getSEHRegNum(Reg1))
1241 .addImm(Imm * 8)
1242 .setMIFlag(Flag);
1243 break;
1244 }
1245 case AArch64::LDRDpost:
1246 Imm = -Imm;
1247 [[fallthrough]];
1248 case AArch64::STRDpre: {
1249 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1250 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1251 .addImm(Reg)
1252 .addImm(Imm)
1253 .setMIFlag(Flag);
1254 break;
1255 }
1256 case AArch64::LDRXpost:
1257 Imm = -Imm;
1258 [[fallthrough]];
1259 case AArch64::STRXpre: {
1260 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1261 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1262 .addImm(Reg)
1263 .addImm(Imm)
1264 .setMIFlag(Flag);
1265 break;
1266 }
1267 case AArch64::STPDi:
1268 case AArch64::LDPDi: {
1269 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1270 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1272 .addImm(Reg0)
1273 .addImm(Reg1)
1274 .addImm(Imm * 8)
1275 .setMIFlag(Flag);
1276 break;
1277 }
1278 case AArch64::STPXi:
1279 case AArch64::LDPXi: {
1280 Register Reg0 = MBBI->getOperand(0).getReg();
1281 Register Reg1 = MBBI->getOperand(1).getReg();
1282 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1283 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1284 .addImm(Imm * 8)
1285 .setMIFlag(Flag);
1286 else
1287 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1288 .addImm(RegInfo->getSEHRegNum(Reg0))
1289 .addImm(RegInfo->getSEHRegNum(Reg1))
1290 .addImm(Imm * 8)
1291 .setMIFlag(Flag);
1292 break;
1293 }
1294 case AArch64::STRXui:
1295 case AArch64::LDRXui: {
1296 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1297 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1298 .addImm(Reg)
1299 .addImm(Imm * 8)
1300 .setMIFlag(Flag);
1301 break;
1302 }
1303 case AArch64::STRDui:
1304 case AArch64::LDRDui: {
1305 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1306 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1307 .addImm(Reg)
1308 .addImm(Imm * 8)
1309 .setMIFlag(Flag);
1310 break;
1311 }
1312 case AArch64::STPQi:
1313 case AArch64::LDPQi: {
1314 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1315 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1316 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1317 .addImm(Reg0)
1318 .addImm(Reg1)
1319 .addImm(Imm * 16)
1320 .setMIFlag(Flag);
1321 break;
1322 }
1323 case AArch64::LDPQpost:
1324 Imm = -Imm;
1325 [[fallthrough]];
1326 case AArch64::STPQpre: {
1327 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1328 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1329 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1330 .addImm(Reg0)
1331 .addImm(Reg1)
1332 .addImm(Imm * 16)
1333 .setMIFlag(Flag);
1334 break;
1335 }
1336 }
1337 auto I = MBB->insertAfter(MBBI, MIB);
1338 return I;
1339}
1340
1341// Fix up the SEH opcode associated with the save/restore instruction.
1343 unsigned LocalStackSize) {
1344 MachineOperand *ImmOpnd = nullptr;
1345 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1346 switch (MBBI->getOpcode()) {
1347 default:
1348 llvm_unreachable("Fix the offset in the SEH instruction");
1349 case AArch64::SEH_SaveFPLR:
1350 case AArch64::SEH_SaveRegP:
1351 case AArch64::SEH_SaveReg:
1352 case AArch64::SEH_SaveFRegP:
1353 case AArch64::SEH_SaveFReg:
1354 case AArch64::SEH_SaveAnyRegQP:
1355 case AArch64::SEH_SaveAnyRegQPX:
1356 ImmOpnd = &MBBI->getOperand(ImmIdx);
1357 break;
1358 }
1359 if (ImmOpnd)
1360 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1361}
1362
1365 return AFI->hasStreamingModeChanges() &&
1366 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1367}
1368
1370 unsigned Opc = MBBI->getOpcode();
1371 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1372 Opc == AArch64::UBFMXri)
1373 return true;
1374
1375 if (requiresGetVGCall(*MBBI->getMF())) {
1376 if (Opc == AArch64::ORRXrr)
1377 return true;
1378
1379 if (Opc == AArch64::BL) {
1380 auto Op1 = MBBI->getOperand(0);
1381 return Op1.isSymbol() &&
1382 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1383 }
1384 }
1385
1386 return false;
1387}
1388
1389// Convert callee-save register save/restore instruction to do stack pointer
1390// decrement/increment to allocate/deallocate the callee-save stack area by
1391// converting store/load to use pre/post increment version.
1394 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1395 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1397 int CFAOffset = 0) {
1398 unsigned NewOpc;
1399
1400 // If the function contains streaming mode changes, we expect instructions
1401 // to calculate the value of VG before spilling. For locally-streaming
1402 // functions, we need to do this for both the streaming and non-streaming
1403 // vector length. Move past these instructions if necessary.
1404 MachineFunction &MF = *MBB.getParent();
1406 if (AFI->hasStreamingModeChanges())
1407 while (isVGInstruction(MBBI))
1408 ++MBBI;
1409
1410 switch (MBBI->getOpcode()) {
1411 default:
1412 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1413 case AArch64::STPXi:
1414 NewOpc = AArch64::STPXpre;
1415 break;
1416 case AArch64::STPDi:
1417 NewOpc = AArch64::STPDpre;
1418 break;
1419 case AArch64::STPQi:
1420 NewOpc = AArch64::STPQpre;
1421 break;
1422 case AArch64::STRXui:
1423 NewOpc = AArch64::STRXpre;
1424 break;
1425 case AArch64::STRDui:
1426 NewOpc = AArch64::STRDpre;
1427 break;
1428 case AArch64::STRQui:
1429 NewOpc = AArch64::STRQpre;
1430 break;
1431 case AArch64::LDPXi:
1432 NewOpc = AArch64::LDPXpost;
1433 break;
1434 case AArch64::LDPDi:
1435 NewOpc = AArch64::LDPDpost;
1436 break;
1437 case AArch64::LDPQi:
1438 NewOpc = AArch64::LDPQpost;
1439 break;
1440 case AArch64::LDRXui:
1441 NewOpc = AArch64::LDRXpost;
1442 break;
1443 case AArch64::LDRDui:
1444 NewOpc = AArch64::LDRDpost;
1445 break;
1446 case AArch64::LDRQui:
1447 NewOpc = AArch64::LDRQpost;
1448 break;
1449 }
1450 // Get rid of the SEH code associated with the old instruction.
1451 if (NeedsWinCFI) {
1452 auto SEH = std::next(MBBI);
1454 SEH->eraseFromParent();
1455 }
1456
1457 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1458 int64_t MinOffset, MaxOffset;
1459 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1460 NewOpc, Scale, Width, MinOffset, MaxOffset);
1461 (void)Success;
1462 assert(Success && "unknown load/store opcode");
1463
1464 // If the first store isn't right where we want SP then we can't fold the
1465 // update in so create a normal arithmetic instruction instead.
1466 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1467 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1468 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1469 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1470 false, false, nullptr, EmitCFI,
1471 StackOffset::getFixed(CFAOffset));
1472
1473 return std::prev(MBBI);
1474 }
1475
1476 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1477 MIB.addReg(AArch64::SP, RegState::Define);
1478
1479 // Copy all operands other than the immediate offset.
1480 unsigned OpndIdx = 0;
1481 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1482 ++OpndIdx)
1483 MIB.add(MBBI->getOperand(OpndIdx));
1484
1485 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1486 "Unexpected immediate offset in first/last callee-save save/restore "
1487 "instruction!");
1488 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1489 "Unexpected base register in callee-save save/restore instruction!");
1490 assert(CSStackSizeInc % Scale == 0);
1491 MIB.addImm(CSStackSizeInc / (int)Scale);
1492
1493 MIB.setMIFlags(MBBI->getFlags());
1494 MIB.setMemRefs(MBBI->memoperands());
1495
1496 // Generate a new SEH code that corresponds to the new instruction.
1497 if (NeedsWinCFI) {
1498 *HasWinCFI = true;
1499 InsertSEH(*MIB, *TII, FrameFlag);
1500 }
1501
1502 if (EmitCFI) {
1503 unsigned CFIIndex = MF.addFrameInst(
1504 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1505 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1506 .addCFIIndex(CFIIndex)
1507 .setMIFlags(FrameFlag);
1508 }
1509
1510 return std::prev(MBB.erase(MBBI));
1511}
1512
1513// Fixup callee-save register save/restore instructions to take into account
1514// combined SP bump by adding the local stack size to the stack offsets.
1516 uint64_t LocalStackSize,
1517 bool NeedsWinCFI,
1518 bool *HasWinCFI) {
1520 return;
1521
1522 unsigned Opc = MI.getOpcode();
1523 unsigned Scale;
1524 switch (Opc) {
1525 case AArch64::STPXi:
1526 case AArch64::STRXui:
1527 case AArch64::STPDi:
1528 case AArch64::STRDui:
1529 case AArch64::LDPXi:
1530 case AArch64::LDRXui:
1531 case AArch64::LDPDi:
1532 case AArch64::LDRDui:
1533 Scale = 8;
1534 break;
1535 case AArch64::STPQi:
1536 case AArch64::STRQui:
1537 case AArch64::LDPQi:
1538 case AArch64::LDRQui:
1539 Scale = 16;
1540 break;
1541 default:
1542 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1543 }
1544
1545 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1546 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1547 "Unexpected base register in callee-save save/restore instruction!");
1548 // Last operand is immediate offset that needs fixing.
1549 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1550 // All generated opcodes have scaled offsets.
1551 assert(LocalStackSize % Scale == 0);
1552 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1553
1554 if (NeedsWinCFI) {
1555 *HasWinCFI = true;
1556 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1557 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1559 "Expecting a SEH instruction");
1560 fixupSEHOpcode(MBBI, LocalStackSize);
1561 }
1562}
1563
1564static bool isTargetWindows(const MachineFunction &MF) {
1566}
1567
1568// Convenience function to determine whether I is an SVE callee save.
1570 switch (I->getOpcode()) {
1571 default:
1572 return false;
1573 case AArch64::PTRUE_C_B:
1574 case AArch64::LD1B_2Z_IMM:
1575 case AArch64::ST1B_2Z_IMM:
1576 case AArch64::STR_ZXI:
1577 case AArch64::STR_PXI:
1578 case AArch64::LDR_ZXI:
1579 case AArch64::LDR_PXI:
1580 return I->getFlag(MachineInstr::FrameSetup) ||
1581 I->getFlag(MachineInstr::FrameDestroy);
1582 }
1583}
1584
1586 MachineFunction &MF,
1589 const DebugLoc &DL, bool NeedsWinCFI,
1590 bool NeedsUnwindInfo) {
1591 // Shadow call stack prolog: str x30, [x18], #8
1592 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1593 .addReg(AArch64::X18, RegState::Define)
1594 .addReg(AArch64::LR)
1595 .addReg(AArch64::X18)
1596 .addImm(8)
1598
1599 // This instruction also makes x18 live-in to the entry block.
1600 MBB.addLiveIn(AArch64::X18);
1601
1602 if (NeedsWinCFI)
1603 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1605
1606 if (NeedsUnwindInfo) {
1607 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1608 // x18 when unwinding past this frame.
1609 static const char CFIInst[] = {
1610 dwarf::DW_CFA_val_expression,
1611 18, // register
1612 2, // length
1613 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1614 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1615 };
1616 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1617 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1618 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1619 .addCFIIndex(CFIIndex)
1621 }
1622}
1623
1625 MachineFunction &MF,
1628 const DebugLoc &DL) {
1629 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1630 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1631 .addReg(AArch64::X18, RegState::Define)
1632 .addReg(AArch64::LR, RegState::Define)
1633 .addReg(AArch64::X18)
1634 .addImm(-8)
1636
1638 unsigned CFIIndex =
1640 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1641 .addCFIIndex(CFIIndex)
1643 }
1644}
1645
1646// Define the current CFA rule to use the provided FP.
1649 const DebugLoc &DL, unsigned FixedObject) {
1652 const TargetInstrInfo *TII = STI.getInstrInfo();
1654
1655 const int OffsetToFirstCalleeSaveFromFP =
1658 Register FramePtr = TRI->getFrameRegister(MF);
1659 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1660 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1661 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1662 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1663 .addCFIIndex(CFIIndex)
1665}
1666
1667#ifndef NDEBUG
1668/// Collect live registers from the end of \p MI's parent up to (including) \p
1669/// MI in \p LiveRegs.
1671 LivePhysRegs &LiveRegs) {
1672
1673 MachineBasicBlock &MBB = *MI.getParent();
1674 LiveRegs.addLiveOuts(MBB);
1675 for (const MachineInstr &MI :
1676 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1677 LiveRegs.stepBackward(MI);
1678}
1679#endif
1680
1682 MachineBasicBlock &MBB) const {
1684 const MachineFrameInfo &MFI = MF.getFrameInfo();
1685 const Function &F = MF.getFunction();
1686 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1687 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1688 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1689
1690 MachineModuleInfo &MMI = MF.getMMI();
1692 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1693 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1694 bool HasFP = hasFP(MF);
1695 bool NeedsWinCFI = needsWinCFI(MF);
1696 bool HasWinCFI = false;
1697 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1698
1700#ifndef NDEBUG
1702 // Collect live register from the end of MBB up to the start of the existing
1703 // frame setup instructions.
1704 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1705 while (NonFrameStart != End &&
1706 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1707 ++NonFrameStart;
1708
1709 LivePhysRegs LiveRegs(*TRI);
1710 if (NonFrameStart != MBB.end()) {
1711 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1712 // Ignore registers used for stack management for now.
1713 LiveRegs.removeReg(AArch64::SP);
1714 LiveRegs.removeReg(AArch64::X19);
1715 LiveRegs.removeReg(AArch64::FP);
1716 LiveRegs.removeReg(AArch64::LR);
1717
1718 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1719 // This is necessary to spill VG if required where SVE is unavailable, but
1720 // X0 is preserved around this call.
1721 if (requiresGetVGCall(MF))
1722 LiveRegs.removeReg(AArch64::X0);
1723 }
1724
1725 auto VerifyClobberOnExit = make_scope_exit([&]() {
1726 if (NonFrameStart == MBB.end())
1727 return;
1728 // Check if any of the newly instructions clobber any of the live registers.
1729 for (MachineInstr &MI :
1730 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1731 for (auto &Op : MI.operands())
1732 if (Op.isReg() && Op.isDef())
1733 assert(!LiveRegs.contains(Op.getReg()) &&
1734 "live register clobbered by inserted prologue instructions");
1735 }
1736 });
1737#endif
1738
1739 bool IsFunclet = MBB.isEHFuncletEntry();
1740
1741 // At this point, we're going to decide whether or not the function uses a
1742 // redzone. In most cases, the function doesn't have a redzone so let's
1743 // assume that's false and set it to true in the case that there's a redzone.
1744 AFI->setHasRedZone(false);
1745
1746 // Debug location must be unknown since the first debug location is used
1747 // to determine the end of the prologue.
1748 DebugLoc DL;
1749
1750 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1751 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1752 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1753 MFnI.needsDwarfUnwindInfo(MF));
1754
1755 if (MFnI.shouldSignReturnAddress(MF)) {
1756 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1758 if (NeedsWinCFI)
1759 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1760 }
1761
1762 if (EmitCFI && MFnI.isMTETagged()) {
1763 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1765 }
1766
1767 // We signal the presence of a Swift extended frame to external tools by
1768 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1769 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1770 // bits so that is still true.
1771 if (HasFP && AFI->hasSwiftAsyncContext()) {
1774 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1775 // The special symbol below is absolute and has a *value* that can be
1776 // combined with the frame pointer to signal an extended frame.
1777 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1778 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1780 if (NeedsWinCFI) {
1781 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1783 HasWinCFI = true;
1784 }
1785 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1786 .addUse(AArch64::FP)
1787 .addUse(AArch64::X16)
1788 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1789 if (NeedsWinCFI) {
1790 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1792 HasWinCFI = true;
1793 }
1794 break;
1795 }
1796 [[fallthrough]];
1797
1799 // ORR x29, x29, #0x1000_0000_0000_0000
1800 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1801 .addUse(AArch64::FP)
1802 .addImm(0x1100)
1804 if (NeedsWinCFI) {
1805 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1807 HasWinCFI = true;
1808 }
1809 break;
1810
1812 break;
1813 }
1814 }
1815
1816 // All calls are tail calls in GHC calling conv, and functions have no
1817 // prologue/epilogue.
1819 return;
1820
1821 // Set tagged base pointer to the requested stack slot.
1822 // Ideally it should match SP value after prologue.
1823 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1824 if (TBPI)
1826 else
1828
1829 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1830
1831 // getStackSize() includes all the locals in its size calculation. We don't
1832 // include these locals when computing the stack size of a funclet, as they
1833 // are allocated in the parent's stack frame and accessed via the frame
1834 // pointer from the funclet. We only save the callee saved registers in the
1835 // funclet, which are really the callee saved registers of the parent
1836 // function, including the funclet.
1837 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1838 : MFI.getStackSize();
1839 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1840 assert(!HasFP && "unexpected function without stack frame but with FP");
1841 assert(!SVEStackSize &&
1842 "unexpected function without stack frame but with SVE objects");
1843 // All of the stack allocation is for locals.
1844 AFI->setLocalStackSize(NumBytes);
1845 if (!NumBytes)
1846 return;
1847 // REDZONE: If the stack size is less than 128 bytes, we don't need
1848 // to actually allocate.
1849 if (canUseRedZone(MF)) {
1850 AFI->setHasRedZone(true);
1851 ++NumRedZoneFunctions;
1852 } else {
1853 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1854 StackOffset::getFixed(-NumBytes), TII,
1855 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1856 if (EmitCFI) {
1857 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1858 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1859 // Encode the stack size of the leaf function.
1860 unsigned CFIIndex = MF.addFrameInst(
1861 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1862 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1863 .addCFIIndex(CFIIndex)
1865 }
1866 }
1867
1868 if (NeedsWinCFI) {
1869 HasWinCFI = true;
1870 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1872 }
1873
1874 return;
1875 }
1876
1877 bool IsWin64 =
1879 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1880
1881 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1882 // All of the remaining stack allocations are for locals.
1883 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1884 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1885 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1886 if (CombineSPBump) {
1887 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1888 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1889 StackOffset::getFixed(-NumBytes), TII,
1890 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1891 EmitAsyncCFI);
1892 NumBytes = 0;
1893 } else if (HomPrologEpilog) {
1894 // Stack has been already adjusted.
1895 NumBytes -= PrologueSaveSize;
1896 } else if (PrologueSaveSize != 0) {
1898 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1899 EmitAsyncCFI);
1900 NumBytes -= PrologueSaveSize;
1901 }
1902 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1903
1904 // Move past the saves of the callee-saved registers, fixing up the offsets
1905 // and pre-inc if we decided to combine the callee-save and local stack
1906 // pointer bump above.
1907 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1909 // Move past instructions generated to calculate VG
1910 if (AFI->hasStreamingModeChanges())
1911 while (isVGInstruction(MBBI))
1912 ++MBBI;
1913
1914 if (CombineSPBump)
1916 NeedsWinCFI, &HasWinCFI);
1917 ++MBBI;
1918 }
1919
1920 // For funclets the FP belongs to the containing function.
1921 if (!IsFunclet && HasFP) {
1922 // Only set up FP if we actually need to.
1923 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1924
1925 if (CombineSPBump)
1926 FPOffset += AFI->getLocalStackSize();
1927
1928 if (AFI->hasSwiftAsyncContext()) {
1929 // Before we update the live FP we have to ensure there's a valid (or
1930 // null) asynchronous context in its slot just before FP in the frame
1931 // record, so store it now.
1932 const auto &Attrs = MF.getFunction().getAttributes();
1933 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1934 if (HaveInitialContext)
1935 MBB.addLiveIn(AArch64::X22);
1936 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1937 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1938 .addUse(Reg)
1939 .addUse(AArch64::SP)
1940 .addImm(FPOffset - 8)
1942 if (NeedsWinCFI) {
1943 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1944 // to multiple instructions, should be mutually-exclusive.
1945 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1946 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1948 HasWinCFI = true;
1949 }
1950 }
1951
1952 if (HomPrologEpilog) {
1953 auto Prolog = MBBI;
1954 --Prolog;
1955 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1956 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1957 } else {
1958 // Issue sub fp, sp, FPOffset or
1959 // mov fp,sp when FPOffset is zero.
1960 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1961 // This code marks the instruction(s) that set the FP also.
1962 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1963 StackOffset::getFixed(FPOffset), TII,
1964 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1965 if (NeedsWinCFI && HasWinCFI) {
1966 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1968 // After setting up the FP, the rest of the prolog doesn't need to be
1969 // included in the SEH unwind info.
1970 NeedsWinCFI = false;
1971 }
1972 }
1973 if (EmitAsyncCFI)
1974 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1975 }
1976
1977 // Now emit the moves for whatever callee saved regs we have (including FP,
1978 // LR if those are saved). Frame instructions for SVE register are emitted
1979 // later, after the instruction which actually save SVE regs.
1980 if (EmitAsyncCFI)
1981 emitCalleeSavedGPRLocations(MBB, MBBI);
1982
1983 // Alignment is required for the parent frame, not the funclet
1984 const bool NeedsRealignment =
1985 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1986 const int64_t RealignmentPadding =
1987 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1988 ? MFI.getMaxAlign().value() - 16
1989 : 0;
1990
1991 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1992 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1993 if (NeedsWinCFI) {
1994 HasWinCFI = true;
1995 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1996 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1997 // This is at most two instructions, MOVZ follwed by MOVK.
1998 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1999 // exceeding 256MB in size.
2000 if (NumBytes >= (1 << 28))
2001 report_fatal_error("Stack size cannot exceed 256MB for stack "
2002 "unwinding purposes");
2003
2004 uint32_t LowNumWords = NumWords & 0xFFFF;
2005 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2006 .addImm(LowNumWords)
2009 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2011 if ((NumWords & 0xFFFF0000) != 0) {
2012 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2013 .addReg(AArch64::X15)
2014 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2017 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2019 }
2020 } else {
2021 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2022 .addImm(NumWords)
2024 }
2025
2026 const char* ChkStk = Subtarget.getChkStkName();
2027 switch (MF.getTarget().getCodeModel()) {
2028 case CodeModel::Tiny:
2029 case CodeModel::Small:
2030 case CodeModel::Medium:
2031 case CodeModel::Kernel:
2032 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2033 .addExternalSymbol(ChkStk)
2034 .addReg(AArch64::X15, RegState::Implicit)
2039 if (NeedsWinCFI) {
2040 HasWinCFI = true;
2041 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2043 }
2044 break;
2045 case CodeModel::Large:
2046 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2047 .addReg(AArch64::X16, RegState::Define)
2048 .addExternalSymbol(ChkStk)
2049 .addExternalSymbol(ChkStk)
2051 if (NeedsWinCFI) {
2052 HasWinCFI = true;
2053 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2055 }
2056
2057 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2058 .addReg(AArch64::X16, RegState::Kill)
2064 if (NeedsWinCFI) {
2065 HasWinCFI = true;
2066 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2068 }
2069 break;
2070 }
2071
2072 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2073 .addReg(AArch64::SP, RegState::Kill)
2074 .addReg(AArch64::X15, RegState::Kill)
2077 if (NeedsWinCFI) {
2078 HasWinCFI = true;
2079 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2080 .addImm(NumBytes)
2082 }
2083 NumBytes = 0;
2084
2085 if (RealignmentPadding > 0) {
2086 if (RealignmentPadding >= 4096) {
2087 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2088 .addReg(AArch64::X16, RegState::Define)
2089 .addImm(RealignmentPadding)
2091 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2092 .addReg(AArch64::SP)
2093 .addReg(AArch64::X16, RegState::Kill)
2096 } else {
2097 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2098 .addReg(AArch64::SP)
2099 .addImm(RealignmentPadding)
2100 .addImm(0)
2102 }
2103
2104 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2105 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2106 .addReg(AArch64::X15, RegState::Kill)
2108 AFI->setStackRealigned(true);
2109
2110 // No need for SEH instructions here; if we're realigning the stack,
2111 // we've set a frame pointer and already finished the SEH prologue.
2112 assert(!NeedsWinCFI);
2113 }
2114 }
2115
2116 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2117 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2118
2119 // Process the SVE callee-saves to determine what space needs to be
2120 // allocated.
2121 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2122 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2123 << "\n");
2124 // Find callee save instructions in frame.
2125 CalleeSavesBegin = MBBI;
2126 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2128 ++MBBI;
2129 CalleeSavesEnd = MBBI;
2130
2131 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2132 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2133 }
2134
2135 // Allocate space for the callee saves (if any).
2136 StackOffset CFAOffset =
2137 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2138 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2139 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2140 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2141 MFI.hasVarSizedObjects() || LocalsSize);
2142 CFAOffset += SVECalleeSavesSize;
2143
2144 if (EmitAsyncCFI)
2145 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2146
2147 // Allocate space for the rest of the frame including SVE locals. Align the
2148 // stack as necessary.
2149 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2150 "Cannot use redzone with stack realignment");
2151 if (!canUseRedZone(MF)) {
2152 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2153 // the correct value here, as NumBytes also includes padding bytes,
2154 // which shouldn't be counted here.
2155 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2156 SVELocalsSize + StackOffset::getFixed(NumBytes),
2157 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2158 CFAOffset, MFI.hasVarSizedObjects());
2159 }
2160
2161 // If we need a base pointer, set it up here. It's whatever the value of the
2162 // stack pointer is at this point. Any variable size objects will be allocated
2163 // after this, so we can still use the base pointer to reference locals.
2164 //
2165 // FIXME: Clarify FrameSetup flags here.
2166 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2167 // needed.
2168 // For funclets the BP belongs to the containing function.
2169 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2170 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2171 false);
2172 if (NeedsWinCFI) {
2173 HasWinCFI = true;
2174 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2176 }
2177 }
2178
2179 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2180 // SEH opcode indicating the prologue end.
2181 if (NeedsWinCFI && HasWinCFI) {
2182 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2184 }
2185
2186 // SEH funclets are passed the frame pointer in X1. If the parent
2187 // function uses the base register, then the base register is used
2188 // directly, and is not retrieved from X1.
2189 if (IsFunclet && F.hasPersonalityFn()) {
2190 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2191 if (isAsynchronousEHPersonality(Per)) {
2192 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2193 .addReg(AArch64::X1)
2195 MBB.addLiveIn(AArch64::X1);
2196 }
2197 }
2198
2199 if (EmitCFI && !EmitAsyncCFI) {
2200 if (HasFP) {
2201 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2202 } else {
2203 StackOffset TotalSize =
2204 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2205 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2206 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2207 /*LastAdjustmentWasScalable=*/false));
2208 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2209 .addCFIIndex(CFIIndex)
2211 }
2212 emitCalleeSavedGPRLocations(MBB, MBBI);
2213 emitCalleeSavedSVELocations(MBB, MBBI);
2214 }
2215}
2216
2218 switch (MI.getOpcode()) {
2219 default:
2220 return false;
2221 case AArch64::CATCHRET:
2222 case AArch64::CLEANUPRET:
2223 return true;
2224 }
2225}
2226
2228 MachineBasicBlock &MBB) const {
2230 MachineFrameInfo &MFI = MF.getFrameInfo();
2232 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2233 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2234 DebugLoc DL;
2235 bool NeedsWinCFI = needsWinCFI(MF);
2236 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2237 bool HasWinCFI = false;
2238 bool IsFunclet = false;
2239
2240 if (MBB.end() != MBBI) {
2241 DL = MBBI->getDebugLoc();
2242 IsFunclet = isFuncletReturnInstr(*MBBI);
2243 }
2244
2245 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2246
2247 auto FinishingTouches = make_scope_exit([&]() {
2248 if (AFI->shouldSignReturnAddress(MF)) {
2249 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2250 TII->get(AArch64::PAUTH_EPILOGUE))
2251 .setMIFlag(MachineInstr::FrameDestroy);
2252 if (NeedsWinCFI)
2253 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2254 }
2257 if (EmitCFI)
2258 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2259 if (HasWinCFI) {
2261 TII->get(AArch64::SEH_EpilogEnd))
2263 if (!MF.hasWinCFI())
2264 MF.setHasWinCFI(true);
2265 }
2266 if (NeedsWinCFI) {
2267 assert(EpilogStartI != MBB.end());
2268 if (!HasWinCFI)
2269 MBB.erase(EpilogStartI);
2270 }
2271 });
2272
2273 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2274 : MFI.getStackSize();
2275
2276 // All calls are tail calls in GHC calling conv, and functions have no
2277 // prologue/epilogue.
2279 return;
2280
2281 // How much of the stack used by incoming arguments this function is expected
2282 // to restore in this particular epilogue.
2283 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2284 bool IsWin64 =
2285 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2286 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2287
2288 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2289 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2290 // We cannot rely on the local stack size set in emitPrologue if the function
2291 // has funclets, as funclets have different local stack size requirements, and
2292 // the current value set in emitPrologue may be that of the containing
2293 // function.
2294 if (MF.hasEHFunclets())
2295 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2296 if (homogeneousPrologEpilog(MF, &MBB)) {
2297 assert(!NeedsWinCFI);
2298 auto LastPopI = MBB.getFirstTerminator();
2299 if (LastPopI != MBB.begin()) {
2300 auto HomogeneousEpilog = std::prev(LastPopI);
2301 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2302 LastPopI = HomogeneousEpilog;
2303 }
2304
2305 // Adjust local stack
2306 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2308 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2309
2310 // SP has been already adjusted while restoring callee save regs.
2311 // We've bailed-out the case with adjusting SP for arguments.
2312 assert(AfterCSRPopSize == 0);
2313 return;
2314 }
2315 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2316 // Assume we can't combine the last pop with the sp restore.
2317
2318 bool CombineAfterCSRBump = false;
2319 if (!CombineSPBump && PrologueSaveSize != 0) {
2321 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2323 Pop = std::prev(Pop);
2324 // Converting the last ldp to a post-index ldp is valid only if the last
2325 // ldp's offset is 0.
2326 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2327 // If the offset is 0 and the AfterCSR pop is not actually trying to
2328 // allocate more stack for arguments (in space that an untimely interrupt
2329 // may clobber), convert it to a post-index ldp.
2330 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2332 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2333 MachineInstr::FrameDestroy, PrologueSaveSize);
2334 } else {
2335 // If not, make sure to emit an add after the last ldp.
2336 // We're doing this by transfering the size to be restored from the
2337 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2338 // pops.
2339 AfterCSRPopSize += PrologueSaveSize;
2340 CombineAfterCSRBump = true;
2341 }
2342 }
2343
2344 // Move past the restores of the callee-saved registers.
2345 // If we plan on combining the sp bump of the local stack size and the callee
2346 // save stack size, we might need to adjust the CSR save and restore offsets.
2349 while (LastPopI != Begin) {
2350 --LastPopI;
2351 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2352 IsSVECalleeSave(LastPopI)) {
2353 ++LastPopI;
2354 break;
2355 } else if (CombineSPBump)
2357 NeedsWinCFI, &HasWinCFI);
2358 }
2359
2360 if (NeedsWinCFI) {
2361 // Note that there are cases where we insert SEH opcodes in the
2362 // epilogue when we had no SEH opcodes in the prologue. For
2363 // example, when there is no stack frame but there are stack
2364 // arguments. Insert the SEH_EpilogStart and remove it later if it
2365 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2366 // functions that don't need it.
2367 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2369 EpilogStartI = LastPopI;
2370 --EpilogStartI;
2371 }
2372
2373 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2376 // Avoid the reload as it is GOT relative, and instead fall back to the
2377 // hardcoded value below. This allows a mismatch between the OS and
2378 // application without immediately terminating on the difference.
2379 [[fallthrough]];
2381 // We need to reset FP to its untagged state on return. Bit 60 is
2382 // currently used to show the presence of an extended frame.
2383
2384 // BIC x29, x29, #0x1000_0000_0000_0000
2385 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2386 AArch64::FP)
2387 .addUse(AArch64::FP)
2388 .addImm(0x10fe)
2390 if (NeedsWinCFI) {
2391 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2393 HasWinCFI = true;
2394 }
2395 break;
2396
2398 break;
2399 }
2400 }
2401
2402 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2403
2404 // If there is a single SP update, insert it before the ret and we're done.
2405 if (CombineSPBump) {
2406 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2407
2408 // When we are about to restore the CSRs, the CFA register is SP again.
2409 if (EmitCFI && hasFP(MF)) {
2410 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2411 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2412 unsigned CFIIndex =
2413 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2414 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2415 .addCFIIndex(CFIIndex)
2417 }
2418
2419 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2420 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2421 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2422 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2423 return;
2424 }
2425
2426 NumBytes -= PrologueSaveSize;
2427 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2428
2429 // Process the SVE callee-saves to determine what space needs to be
2430 // deallocated.
2431 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2432 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2433 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2434 RestoreBegin = std::prev(RestoreEnd);
2435 while (RestoreBegin != MBB.begin() &&
2436 IsSVECalleeSave(std::prev(RestoreBegin)))
2437 --RestoreBegin;
2438
2439 assert(IsSVECalleeSave(RestoreBegin) &&
2440 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2441
2442 StackOffset CalleeSavedSizeAsOffset =
2443 StackOffset::getScalable(CalleeSavedSize);
2444 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2445 DeallocateAfter = CalleeSavedSizeAsOffset;
2446 }
2447
2448 // Deallocate the SVE area.
2449 if (SVEStackSize) {
2450 // If we have stack realignment or variable sized objects on the stack,
2451 // restore the stack pointer from the frame pointer prior to SVE CSR
2452 // restoration.
2453 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2454 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2455 // Set SP to start of SVE callee-save area from which they can
2456 // be reloaded. The code below will deallocate the stack space
2457 // space by moving FP -> SP.
2458 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2459 StackOffset::getScalable(-CalleeSavedSize), TII,
2461 }
2462 } else {
2463 if (AFI->getSVECalleeSavedStackSize()) {
2464 // Deallocate the non-SVE locals first before we can deallocate (and
2465 // restore callee saves) from the SVE area.
2467 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2469 false, false, nullptr, EmitCFI && !hasFP(MF),
2470 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2471 NumBytes = 0;
2472 }
2473
2474 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2475 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2476 false, nullptr, EmitCFI && !hasFP(MF),
2477 SVEStackSize +
2478 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2479
2480 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2481 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2482 false, nullptr, EmitCFI && !hasFP(MF),
2483 DeallocateAfter +
2484 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2485 }
2486 if (EmitCFI)
2487 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2488 }
2489
2490 if (!hasFP(MF)) {
2491 bool RedZone = canUseRedZone(MF);
2492 // If this was a redzone leaf function, we don't need to restore the
2493 // stack pointer (but we may need to pop stack args for fastcc).
2494 if (RedZone && AfterCSRPopSize == 0)
2495 return;
2496
2497 // Pop the local variables off the stack. If there are no callee-saved
2498 // registers, it means we are actually positioned at the terminator and can
2499 // combine stack increment for the locals and the stack increment for
2500 // callee-popped arguments into (possibly) a single instruction and be done.
2501 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2502 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2503 if (NoCalleeSaveRestore)
2504 StackRestoreBytes += AfterCSRPopSize;
2505
2507 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2508 StackOffset::getFixed(StackRestoreBytes), TII,
2509 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2510 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2511
2512 // If we were able to combine the local stack pop with the argument pop,
2513 // then we're done.
2514 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2515 return;
2516 }
2517
2518 NumBytes = 0;
2519 }
2520
2521 // Restore the original stack pointer.
2522 // FIXME: Rather than doing the math here, we should instead just use
2523 // non-post-indexed loads for the restores if we aren't actually going to
2524 // be able to save any instructions.
2525 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2527 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2529 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2530 } else if (NumBytes)
2531 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2532 StackOffset::getFixed(NumBytes), TII,
2533 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2534
2535 // When we are about to restore the CSRs, the CFA register is SP again.
2536 if (EmitCFI && hasFP(MF)) {
2537 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2538 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2539 unsigned CFIIndex = MF.addFrameInst(
2540 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2541 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2542 .addCFIIndex(CFIIndex)
2544 }
2545
2546 // This must be placed after the callee-save restore code because that code
2547 // assumes the SP is at the same location as it was after the callee-save save
2548 // code in the prologue.
2549 if (AfterCSRPopSize) {
2550 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2551 "interrupt may have clobbered");
2552
2554 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2556 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2557 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2558 }
2559}
2560
2563 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2564}
2565
2566/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2567/// debug info. It's the same as what we use for resolving the code-gen
2568/// references for now. FIXME: This can go wrong when references are
2569/// SP-relative and simple call frames aren't used.
2572 Register &FrameReg) const {
2574 MF, FI, FrameReg,
2575 /*PreferFP=*/
2576 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2577 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2578 /*ForSimm=*/false);
2579}
2580
2583 int FI) const {
2585}
2586
2588 int64_t ObjectOffset) {
2589 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2590 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2591 bool IsWin64 =
2592 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2593 unsigned FixedObject =
2594 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2595 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2596 int64_t FPAdjust =
2597 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2598 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2599}
2600
2602 int64_t ObjectOffset) {
2603 const auto &MFI = MF.getFrameInfo();
2604 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2605}
2606
2607 // TODO: This function currently does not work for scalable vectors.
2609 int FI) const {
2610 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2612 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2613 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2614 ? getFPOffset(MF, ObjectOffset).getFixed()
2615 : getStackOffset(MF, ObjectOffset).getFixed();
2616}
2617
2619 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2620 bool ForSimm) const {
2621 const auto &MFI = MF.getFrameInfo();
2622 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2623 bool isFixed = MFI.isFixedObjectIndex(FI);
2624 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2625 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2626 PreferFP, ForSimm);
2627}
2628
2630 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2631 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2632 const auto &MFI = MF.getFrameInfo();
2633 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2635 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2636 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2637
2638 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2639 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2640 bool isCSR =
2641 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2642
2643 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2644
2645 // Use frame pointer to reference fixed objects. Use it for locals if
2646 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2647 // reliable as a base). Make sure useFPForScavengingIndex() does the
2648 // right thing for the emergency spill slot.
2649 bool UseFP = false;
2650 if (AFI->hasStackFrame() && !isSVE) {
2651 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2652 // there are scalable (SVE) objects in between the FP and the fixed-sized
2653 // objects.
2654 PreferFP &= !SVEStackSize;
2655
2656 // Note: Keeping the following as multiple 'if' statements rather than
2657 // merging to a single expression for readability.
2658 //
2659 // Argument access should always use the FP.
2660 if (isFixed) {
2661 UseFP = hasFP(MF);
2662 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2663 // References to the CSR area must use FP if we're re-aligning the stack
2664 // since the dynamically-sized alignment padding is between the SP/BP and
2665 // the CSR area.
2666 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2667 UseFP = true;
2668 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2669 // If the FPOffset is negative and we're producing a signed immediate, we
2670 // have to keep in mind that the available offset range for negative
2671 // offsets is smaller than for positive ones. If an offset is available
2672 // via the FP and the SP, use whichever is closest.
2673 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2674 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2675
2676 if (MFI.hasVarSizedObjects()) {
2677 // If we have variable sized objects, we can use either FP or BP, as the
2678 // SP offset is unknown. We can use the base pointer if we have one and
2679 // FP is not preferred. If not, we're stuck with using FP.
2680 bool CanUseBP = RegInfo->hasBasePointer(MF);
2681 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2682 UseFP = PreferFP;
2683 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2684 UseFP = true;
2685 // else we can use BP and FP, but the offset from FP won't fit.
2686 // That will make us scavenge registers which we can probably avoid by
2687 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2688 } else if (FPOffset >= 0) {
2689 // Use SP or FP, whichever gives us the best chance of the offset
2690 // being in range for direct access. If the FPOffset is positive,
2691 // that'll always be best, as the SP will be even further away.
2692 UseFP = true;
2693 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2694 // Funclets access the locals contained in the parent's stack frame
2695 // via the frame pointer, so we have to use the FP in the parent
2696 // function.
2697 (void) Subtarget;
2698 assert(
2699 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2700 "Funclets should only be present on Win64");
2701 UseFP = true;
2702 } else {
2703 // We have the choice between FP and (SP or BP).
2704 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2705 UseFP = true;
2706 }
2707 }
2708 }
2709
2710 assert(
2711 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2712 "In the presence of dynamic stack pointer realignment, "
2713 "non-argument/CSR objects cannot be accessed through the frame pointer");
2714
2715 if (isSVE) {
2716 StackOffset FPOffset =
2718 StackOffset SPOffset =
2719 SVEStackSize +
2720 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2721 ObjectOffset);
2722 // Always use the FP for SVE spills if available and beneficial.
2723 if (hasFP(MF) && (SPOffset.getFixed() ||
2724 FPOffset.getScalable() < SPOffset.getScalable() ||
2725 RegInfo->hasStackRealignment(MF))) {
2726 FrameReg = RegInfo->getFrameRegister(MF);
2727 return FPOffset;
2728 }
2729
2730 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2731 : (unsigned)AArch64::SP;
2732 return SPOffset;
2733 }
2734
2735 StackOffset ScalableOffset = {};
2736 if (UseFP && !(isFixed || isCSR))
2737 ScalableOffset = -SVEStackSize;
2738 if (!UseFP && (isFixed || isCSR))
2739 ScalableOffset = SVEStackSize;
2740
2741 if (UseFP) {
2742 FrameReg = RegInfo->getFrameRegister(MF);
2743 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2744 }
2745
2746 // Use the base pointer if we have one.
2747 if (RegInfo->hasBasePointer(MF))
2748 FrameReg = RegInfo->getBaseRegister();
2749 else {
2750 assert(!MFI.hasVarSizedObjects() &&
2751 "Can't use SP when we have var sized objects.");
2752 FrameReg = AArch64::SP;
2753 // If we're using the red zone for this function, the SP won't actually
2754 // be adjusted, so the offsets will be negative. They're also all
2755 // within range of the signed 9-bit immediate instructions.
2756 if (canUseRedZone(MF))
2757 Offset -= AFI->getLocalStackSize();
2758 }
2759
2760 return StackOffset::getFixed(Offset) + ScalableOffset;
2761}
2762
2763static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2764 // Do not set a kill flag on values that are also marked as live-in. This
2765 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2766 // callee saved registers.
2767 // Omitting the kill flags is conservatively correct even if the live-in
2768 // is not used after all.
2769 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2770 return getKillRegState(!IsLiveIn);
2771}
2772
2774 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2776 return Subtarget.isTargetMachO() &&
2777 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2778 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2780}
2781
2782static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2783 bool NeedsWinCFI, bool IsFirst,
2784 const TargetRegisterInfo *TRI) {
2785 // If we are generating register pairs for a Windows function that requires
2786 // EH support, then pair consecutive registers only. There are no unwind
2787 // opcodes for saves/restores of non-consectuve register pairs.
2788 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2789 // save_lrpair.
2790 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2791
2792 if (Reg2 == AArch64::FP)
2793 return true;
2794 if (!NeedsWinCFI)
2795 return false;
2796 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2797 return false;
2798 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2799 // opcode. If this is the first register pair, it would end up with a
2800 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2801 // if LR is paired with something else than the first register.
2802 // The save_lrpair opcode requires the first register to be an odd one.
2803 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2804 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2805 return false;
2806 return true;
2807}
2808
2809/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2810/// WindowsCFI requires that only consecutive registers can be paired.
2811/// LR and FP need to be allocated together when the frame needs to save
2812/// the frame-record. This means any other register pairing with LR is invalid.
2813static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2814 bool UsesWinAAPCS, bool NeedsWinCFI,
2815 bool NeedsFrameRecord, bool IsFirst,
2816 const TargetRegisterInfo *TRI) {
2817 if (UsesWinAAPCS)
2818 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2819 TRI);
2820
2821 // If we need to store the frame record, don't pair any register
2822 // with LR other than FP.
2823 if (NeedsFrameRecord)
2824 return Reg2 == AArch64::LR;
2825
2826 return false;
2827}
2828
2829namespace {
2830
2831struct RegPairInfo {
2832 unsigned Reg1 = AArch64::NoRegister;
2833 unsigned Reg2 = AArch64::NoRegister;
2834 int FrameIdx;
2835 int Offset;
2836 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2837
2838 RegPairInfo() = default;
2839
2840 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2841
2842 unsigned getScale() const {
2843 switch (Type) {
2844 case PPR:
2845 return 2;
2846 case GPR:
2847 case FPR64:
2848 case VG:
2849 return 8;
2850 case ZPR:
2851 case FPR128:
2852 return 16;
2853 }
2854 llvm_unreachable("Unsupported type");
2855 }
2856
2857 bool isScalable() const { return Type == PPR || Type == ZPR; }
2858};
2859
2860} // end anonymous namespace
2861
2862unsigned findFreePredicateReg(BitVector &SavedRegs) {
2863 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2864 if (SavedRegs.test(PReg)) {
2865 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2866 return PNReg;
2867 }
2868 }
2869 return AArch64::NoRegister;
2870}
2871
2875 bool NeedsFrameRecord) {
2876
2877 if (CSI.empty())
2878 return;
2879
2880 bool IsWindows = isTargetWindows(MF);
2881 bool NeedsWinCFI = needsWinCFI(MF);
2883 MachineFrameInfo &MFI = MF.getFrameInfo();
2885 unsigned Count = CSI.size();
2886 (void)CC;
2887 // MachO's compact unwind format relies on all registers being stored in
2888 // pairs.
2891 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2892 "Odd number of callee-saved regs to spill!");
2893 int ByteOffset = AFI->getCalleeSavedStackSize();
2894 int StackFillDir = -1;
2895 int RegInc = 1;
2896 unsigned FirstReg = 0;
2897 if (NeedsWinCFI) {
2898 // For WinCFI, fill the stack from the bottom up.
2899 ByteOffset = 0;
2900 StackFillDir = 1;
2901 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2902 // backwards, to pair up registers starting from lower numbered registers.
2903 RegInc = -1;
2904 FirstReg = Count - 1;
2905 }
2906 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2907 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2908
2909 // When iterating backwards, the loop condition relies on unsigned wraparound.
2910 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2911 RegPairInfo RPI;
2912 RPI.Reg1 = CSI[i].getReg();
2913
2914 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2915 RPI.Type = RegPairInfo::GPR;
2916 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2917 RPI.Type = RegPairInfo::FPR64;
2918 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2919 RPI.Type = RegPairInfo::FPR128;
2920 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2921 RPI.Type = RegPairInfo::ZPR;
2922 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2923 RPI.Type = RegPairInfo::PPR;
2924 else if (RPI.Reg1 == AArch64::VG)
2925 RPI.Type = RegPairInfo::VG;
2926 else
2927 llvm_unreachable("Unsupported register class.");
2928
2929 // Add the next reg to the pair if it is in the same register class.
2930 if (unsigned(i + RegInc) < Count) {
2931 Register NextReg = CSI[i + RegInc].getReg();
2932 bool IsFirst = i == FirstReg;
2933 switch (RPI.Type) {
2934 case RegPairInfo::GPR:
2935 if (AArch64::GPR64RegClass.contains(NextReg) &&
2936 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2937 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2938 TRI))
2939 RPI.Reg2 = NextReg;
2940 break;
2941 case RegPairInfo::FPR64:
2942 if (AArch64::FPR64RegClass.contains(NextReg) &&
2943 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2944 IsFirst, TRI))
2945 RPI.Reg2 = NextReg;
2946 break;
2947 case RegPairInfo::FPR128:
2948 if (AArch64::FPR128RegClass.contains(NextReg))
2949 RPI.Reg2 = NextReg;
2950 break;
2951 case RegPairInfo::PPR:
2952 break;
2953 case RegPairInfo::ZPR:
2954 if (AFI->getPredicateRegForFillSpill() != 0)
2955 if (((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1))
2956 RPI.Reg2 = NextReg;
2957 break;
2958 case RegPairInfo::VG:
2959 break;
2960 }
2961 }
2962
2963 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2964 // list to come in sorted by frame index so that we can issue the store
2965 // pair instructions directly. Assert if we see anything otherwise.
2966 //
2967 // The order of the registers in the list is controlled by
2968 // getCalleeSavedRegs(), so they will always be in-order, as well.
2969 assert((!RPI.isPaired() ||
2970 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2971 "Out of order callee saved regs!");
2972
2973 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2974 RPI.Reg1 == AArch64::LR) &&
2975 "FrameRecord must be allocated together with LR");
2976
2977 // Windows AAPCS has FP and LR reversed.
2978 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2979 RPI.Reg2 == AArch64::LR) &&
2980 "FrameRecord must be allocated together with LR");
2981
2982 // MachO's compact unwind format relies on all registers being stored in
2983 // adjacent register pairs.
2987 (RPI.isPaired() &&
2988 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2989 RPI.Reg1 + 1 == RPI.Reg2))) &&
2990 "Callee-save registers not saved as adjacent register pair!");
2991
2992 RPI.FrameIdx = CSI[i].getFrameIdx();
2993 if (NeedsWinCFI &&
2994 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2995 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2996 int Scale = RPI.getScale();
2997
2998 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2999 assert(OffsetPre % Scale == 0);
3000
3001 if (RPI.isScalable())
3002 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3003 else
3004 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3005
3006 // Swift's async context is directly before FP, so allocate an extra
3007 // 8 bytes for it.
3008 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3009 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3010 (IsWindows && RPI.Reg2 == AArch64::LR)))
3011 ByteOffset += StackFillDir * 8;
3012
3013 // Round up size of non-pair to pair size if we need to pad the
3014 // callee-save area to ensure 16-byte alignment.
3015 if (NeedGapToAlignStack && !NeedsWinCFI &&
3016 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
3017 !RPI.isPaired() && ByteOffset % 16 != 0) {
3018 ByteOffset += 8 * StackFillDir;
3019 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3020 // A stack frame with a gap looks like this, bottom up:
3021 // d9, d8. x21, gap, x20, x19.
3022 // Set extra alignment on the x21 object to create the gap above it.
3023 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3024 NeedGapToAlignStack = false;
3025 }
3026
3027 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3028 assert(OffsetPost % Scale == 0);
3029 // If filling top down (default), we want the offset after incrementing it.
3030 // If filling bottom up (WinCFI) we need the original offset.
3031 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3032
3033 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3034 // Swift context can directly precede FP.
3035 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3036 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3037 (IsWindows && RPI.Reg2 == AArch64::LR)))
3038 Offset += 8;
3039 RPI.Offset = Offset / Scale;
3040
3041 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3042 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3043 "Offset out of bounds for LDP/STP immediate");
3044
3045 // Save the offset to frame record so that the FP register can point to the
3046 // innermost frame record (spilled FP and LR registers).
3047 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
3048 RPI.Reg2 == AArch64::FP) ||
3049 (IsWindows && RPI.Reg1 == AArch64::FP &&
3050 RPI.Reg2 == AArch64::LR)))
3052
3053 RegPairs.push_back(RPI);
3054 if (RPI.isPaired())
3055 i += RegInc;
3056 }
3057 if (NeedsWinCFI) {
3058 // If we need an alignment gap in the stack, align the topmost stack
3059 // object. A stack frame with a gap looks like this, bottom up:
3060 // x19, d8. d9, gap.
3061 // Set extra alignment on the topmost stack object (the first element in
3062 // CSI, which goes top down), to create the gap above it.
3063 if (AFI->hasCalleeSaveStackFreeSpace())
3064 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3065 // We iterated bottom up over the registers; flip RegPairs back to top
3066 // down order.
3067 std::reverse(RegPairs.begin(), RegPairs.end());
3068 }
3069}
3070
3074 MachineFunction &MF = *MBB.getParent();
3077 bool NeedsWinCFI = needsWinCFI(MF);
3078 DebugLoc DL;
3080
3081 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3082
3083 const MachineRegisterInfo &MRI = MF.getRegInfo();
3084 if (homogeneousPrologEpilog(MF)) {
3085 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3087
3088 for (auto &RPI : RegPairs) {
3089 MIB.addReg(RPI.Reg1);
3090 MIB.addReg(RPI.Reg2);
3091
3092 // Update register live in.
3093 if (!MRI.isReserved(RPI.Reg1))
3094 MBB.addLiveIn(RPI.Reg1);
3095 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3096 MBB.addLiveIn(RPI.Reg2);
3097 }
3098 return true;
3099 }
3100 bool PTrueCreated = false;
3101 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3102 unsigned Reg1 = RPI.Reg1;
3103 unsigned Reg2 = RPI.Reg2;
3104 unsigned StrOpc;
3105
3106 // Issue sequence of spills for cs regs. The first spill may be converted
3107 // to a pre-decrement store later by emitPrologue if the callee-save stack
3108 // area allocation can't be combined with the local stack area allocation.
3109 // For example:
3110 // stp x22, x21, [sp, #0] // addImm(+0)
3111 // stp x20, x19, [sp, #16] // addImm(+2)
3112 // stp fp, lr, [sp, #32] // addImm(+4)
3113 // Rationale: This sequence saves uop updates compared to a sequence of
3114 // pre-increment spills like stp xi,xj,[sp,#-16]!
3115 // Note: Similar rationale and sequence for restores in epilog.
3116 unsigned Size;
3117 Align Alignment;
3118 switch (RPI.Type) {
3119 case RegPairInfo::GPR:
3120 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3121 Size = 8;
3122 Alignment = Align(8);
3123 break;
3124 case RegPairInfo::FPR64:
3125 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3126 Size = 8;
3127 Alignment = Align(8);
3128 break;
3129 case RegPairInfo::FPR128:
3130 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3131 Size = 16;
3132 Alignment = Align(16);
3133 break;
3134 case RegPairInfo::ZPR:
3135 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3136 Size = 16;
3137 Alignment = Align(16);
3138 break;
3139 case RegPairInfo::PPR:
3140 StrOpc = AArch64::STR_PXI;
3141 Size = 2;
3142 Alignment = Align(2);
3143 break;
3144 case RegPairInfo::VG:
3145 StrOpc = AArch64::STRXui;
3146 Size = 8;
3147 Alignment = Align(8);
3148 break;
3149 }
3150
3151 unsigned X0Scratch = AArch64::NoRegister;
3152 if (Reg1 == AArch64::VG) {
3153 // Find an available register to store value of VG to.
3155 assert(Reg1 != AArch64::NoRegister);
3156 SMEAttrs Attrs(MF.getFunction());
3157
3158 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3159 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3160 // For locally-streaming functions, we need to store both the streaming
3161 // & non-streaming VG. Spill the streaming value first.
3162 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3163 .addImm(1)
3165 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3166 .addReg(Reg1)
3167 .addImm(3)
3168 .addImm(63)
3170
3171 AFI->setStreamingVGIdx(RPI.FrameIdx);
3172 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3173 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3174 .addImm(31)
3175 .addImm(1)
3177 AFI->setVGIdx(RPI.FrameIdx);
3178 } else {
3180 if (llvm::any_of(
3181 MBB.liveins(),
3182 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3183 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3184 AArch64::X0, LiveIn.PhysReg);
3185 }))
3186 X0Scratch = Reg1;
3187
3188 if (X0Scratch != AArch64::NoRegister)
3189 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3190 .addReg(AArch64::XZR)
3191 .addReg(AArch64::X0, RegState::Undef)
3192 .addReg(AArch64::X0, RegState::Implicit)
3194
3195 const uint32_t *RegMask = TRI->getCallPreservedMask(
3196 MF,
3198 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3199 .addExternalSymbol("__arm_get_current_vg")
3200 .addRegMask(RegMask)
3201 .addReg(AArch64::X0, RegState::ImplicitDefine)
3203 Reg1 = AArch64::X0;
3204 AFI->setVGIdx(RPI.FrameIdx);
3205 }
3206 }
3207
3208 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3209 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3210 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3211 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3212 dbgs() << ")\n");
3213
3214 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3215 "Windows unwdinding requires a consecutive (FP,LR) pair");
3216 // Windows unwind codes require consecutive registers if registers are
3217 // paired. Make the switch here, so that the code below will save (x,x+1)
3218 // and not (x+1,x).
3219 unsigned FrameIdxReg1 = RPI.FrameIdx;
3220 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3221 if (NeedsWinCFI && RPI.isPaired()) {
3222 std::swap(Reg1, Reg2);
3223 std::swap(FrameIdxReg1, FrameIdxReg2);
3224 }
3225
3226 if (RPI.isPaired() && RPI.isScalable()) {
3227 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3230 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3231 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3232 "Expects SVE2.1 or SME2 target and a predicate register");
3233#ifdef EXPENSIVE_CHECKS
3234 auto IsPPR = [](const RegPairInfo &c) {
3235 return c.Reg1 == RegPairInfo::PPR;
3236 };
3237 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3238 auto IsZPR = [](const RegPairInfo &c) {
3239 return c.Type == RegPairInfo::ZPR;
3240 };
3241 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3242 assert(!(PPRBegin < ZPRBegin) &&
3243 "Expected callee save predicate to be handled first");
3244#endif
3245 if (!PTrueCreated) {
3246 PTrueCreated = true;
3247 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3249 }
3250 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3251 if (!MRI.isReserved(Reg1))
3252 MBB.addLiveIn(Reg1);
3253 if (!MRI.isReserved(Reg2))
3254 MBB.addLiveIn(Reg2);
3255 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3257 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3258 MachineMemOperand::MOStore, Size, Alignment));
3259 MIB.addReg(PnReg);
3260 MIB.addReg(AArch64::SP)
3261 .addImm(RPI.Offset) // [sp, #offset*scale],
3262 // where factor*scale is implicit
3265 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3266 MachineMemOperand::MOStore, Size, Alignment));
3267 if (NeedsWinCFI)
3269 } else { // The code when the pair of ZReg is not present
3270 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3271 if (!MRI.isReserved(Reg1))
3272 MBB.addLiveIn(Reg1);
3273 if (RPI.isPaired()) {
3274 if (!MRI.isReserved(Reg2))
3275 MBB.addLiveIn(Reg2);
3276 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3278 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3279 MachineMemOperand::MOStore, Size, Alignment));
3280 }
3281 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3282 .addReg(AArch64::SP)
3283 .addImm(RPI.Offset) // [sp, #offset*scale],
3284 // where factor*scale is implicit
3287 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3288 MachineMemOperand::MOStore, Size, Alignment));
3289 if (NeedsWinCFI)
3291 }
3292 // Update the StackIDs of the SVE stack slots.
3293 MachineFrameInfo &MFI = MF.getFrameInfo();
3294 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3295 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3296 if (RPI.isPaired())
3297 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3298 }
3299
3300 if (X0Scratch != AArch64::NoRegister)
3301 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3302 .addReg(AArch64::XZR)
3303 .addReg(X0Scratch, RegState::Undef)
3304 .addReg(X0Scratch, RegState::Implicit)
3306 }
3307 return true;
3308}
3309
3313 MachineFunction &MF = *MBB.getParent();
3315 DebugLoc DL;
3317 bool NeedsWinCFI = needsWinCFI(MF);
3318
3319 if (MBBI != MBB.end())
3320 DL = MBBI->getDebugLoc();
3321
3322 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3323 if (homogeneousPrologEpilog(MF, &MBB)) {
3324 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3326 for (auto &RPI : RegPairs) {
3327 MIB.addReg(RPI.Reg1, RegState::Define);
3328 MIB.addReg(RPI.Reg2, RegState::Define);
3329 }
3330 return true;
3331 }
3332
3333 // For performance reasons restore SVE register in increasing order
3334 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3335 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3336 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3337 std::reverse(PPRBegin, PPREnd);
3338 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3339 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3340 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3341 std::reverse(ZPRBegin, ZPREnd);
3342
3343 bool PTrueCreated = false;
3344 for (const RegPairInfo &RPI : RegPairs) {
3345 unsigned Reg1 = RPI.Reg1;
3346 unsigned Reg2 = RPI.Reg2;
3347
3348 // Issue sequence of restores for cs regs. The last restore may be converted
3349 // to a post-increment load later by emitEpilogue if the callee-save stack
3350 // area allocation can't be combined with the local stack area allocation.
3351 // For example:
3352 // ldp fp, lr, [sp, #32] // addImm(+4)
3353 // ldp x20, x19, [sp, #16] // addImm(+2)
3354 // ldp x22, x21, [sp, #0] // addImm(+0)
3355 // Note: see comment in spillCalleeSavedRegisters()
3356 unsigned LdrOpc;
3357 unsigned Size;
3358 Align Alignment;
3359 switch (RPI.Type) {
3360 case RegPairInfo::GPR:
3361 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3362 Size = 8;
3363 Alignment = Align(8);
3364 break;
3365 case RegPairInfo::FPR64:
3366 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3367 Size = 8;
3368 Alignment = Align(8);
3369 break;
3370 case RegPairInfo::FPR128:
3371 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3372 Size = 16;
3373 Alignment = Align(16);
3374 break;
3375 case RegPairInfo::ZPR:
3376 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3377 Size = 16;
3378 Alignment = Align(16);
3379 break;
3380 case RegPairInfo::PPR:
3381 LdrOpc = AArch64::LDR_PXI;
3382 Size = 2;
3383 Alignment = Align(2);
3384 break;
3385 case RegPairInfo::VG:
3386 continue;
3387 }
3388 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3389 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3390 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3391 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3392 dbgs() << ")\n");
3393
3394 // Windows unwind codes require consecutive registers if registers are
3395 // paired. Make the switch here, so that the code below will save (x,x+1)
3396 // and not (x+1,x).
3397 unsigned FrameIdxReg1 = RPI.FrameIdx;
3398 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3399 if (NeedsWinCFI && RPI.isPaired()) {
3400 std::swap(Reg1, Reg2);
3401 std::swap(FrameIdxReg1, FrameIdxReg2);
3402 }
3403
3405 if (RPI.isPaired() && RPI.isScalable()) {
3406 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3408 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3409 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3410 "Expects SVE2.1 or SME2 target and a predicate register");
3411#ifdef EXPENSIVE_CHECKS
3412 assert(!(PPRBegin < ZPRBegin) &&
3413 "Expected callee save predicate to be handled first");
3414#endif
3415 if (!PTrueCreated) {
3416 PTrueCreated = true;
3417 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3419 }
3420 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3421 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3422 getDefRegState(true));
3424 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3425 MachineMemOperand::MOLoad, Size, Alignment));
3426 MIB.addReg(PnReg);
3427 MIB.addReg(AArch64::SP)
3428 .addImm(RPI.Offset) // [sp, #offset*scale]
3429 // where factor*scale is implicit
3432 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3433 MachineMemOperand::MOLoad, Size, Alignment));
3434 if (NeedsWinCFI)
3436 } else {
3437 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3438 if (RPI.isPaired()) {
3439 MIB.addReg(Reg2, getDefRegState(true));
3441 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3442 MachineMemOperand::MOLoad, Size, Alignment));
3443 }
3444 MIB.addReg(Reg1, getDefRegState(true));
3445 MIB.addReg(AArch64::SP)
3446 .addImm(RPI.Offset) // [sp, #offset*scale]
3447 // where factor*scale is implicit
3450 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3451 MachineMemOperand::MOLoad, Size, Alignment));
3452 if (NeedsWinCFI)
3454 }
3455 }
3456 return true;
3457}
3458
3460 BitVector &SavedRegs,
3461 RegScavenger *RS) const {
3462 // All calls are tail calls in GHC calling conv, and functions have no
3463 // prologue/epilogue.
3465 return;
3466
3468 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3470 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3472 unsigned UnspilledCSGPR = AArch64::NoRegister;
3473 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3474
3475 MachineFrameInfo &MFI = MF.getFrameInfo();
3476 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3477
3478 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3479 ? RegInfo->getBaseRegister()
3480 : (unsigned)AArch64::NoRegister;
3481
3482 unsigned ExtraCSSpill = 0;
3483 bool HasUnpairedGPR64 = false;
3484 bool HasPairZReg = false;
3485 // Figure out which callee-saved registers to save/restore.
3486 for (unsigned i = 0; CSRegs[i]; ++i) {
3487 const unsigned Reg = CSRegs[i];
3488
3489 // Add the base pointer register to SavedRegs if it is callee-save.
3490 if (Reg == BasePointerReg)
3491 SavedRegs.set(Reg);
3492
3493 bool RegUsed = SavedRegs.test(Reg);
3494 unsigned PairedReg = AArch64::NoRegister;
3495 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3496 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3497 AArch64::FPR128RegClass.contains(Reg)) {
3498 // Compensate for odd numbers of GP CSRs.
3499 // For now, all the known cases of odd number of CSRs are of GPRs.
3500 if (HasUnpairedGPR64)
3501 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3502 else
3503 PairedReg = CSRegs[i ^ 1];
3504 }
3505
3506 // If the function requires all the GP registers to save (SavedRegs),
3507 // and there are an odd number of GP CSRs at the same time (CSRegs),
3508 // PairedReg could be in a different register class from Reg, which would
3509 // lead to a FPR (usually D8) accidentally being marked saved.
3510 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3511 PairedReg = AArch64::NoRegister;
3512 HasUnpairedGPR64 = true;
3513 }
3514 assert(PairedReg == AArch64::NoRegister ||
3515 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3516 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3517 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3518
3519 if (!RegUsed) {
3520 if (AArch64::GPR64RegClass.contains(Reg) &&
3521 !RegInfo->isReservedReg(MF, Reg)) {
3522 UnspilledCSGPR = Reg;
3523 UnspilledCSGPRPaired = PairedReg;
3524 }
3525 continue;
3526 }
3527
3528 // MachO's compact unwind format relies on all registers being stored in
3529 // pairs.
3530 // FIXME: the usual format is actually better if unwinding isn't needed.
3531 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3532 !SavedRegs.test(PairedReg)) {
3533 SavedRegs.set(PairedReg);
3534 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3535 !RegInfo->isReservedReg(MF, PairedReg))
3536 ExtraCSSpill = PairedReg;
3537 }
3538 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3539 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3540 SavedRegs.test(CSRegs[i ^ 1]));
3541 }
3542
3543 if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
3545 // Find a suitable predicate register for the multi-vector spill/fill
3546 // instructions.
3547 unsigned PnReg = findFreePredicateReg(SavedRegs);
3548 if (PnReg != AArch64::NoRegister)
3549 AFI->setPredicateRegForFillSpill(PnReg);
3550 // If no free callee-save has been found assign one.
3551 if (!AFI->getPredicateRegForFillSpill() &&
3552 MF.getFunction().getCallingConv() ==
3554 SavedRegs.set(AArch64::P8);
3555 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3556 }
3557
3558 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3559 "Predicate cannot be a reserved register");
3560 }
3561
3563 !Subtarget.isTargetWindows()) {
3564 // For Windows calling convention on a non-windows OS, where X18 is treated
3565 // as reserved, back up X18 when entering non-windows code (marked with the
3566 // Windows calling convention) and restore when returning regardless of
3567 // whether the individual function uses it - it might call other functions
3568 // that clobber it.
3569 SavedRegs.set(AArch64::X18);
3570 }
3571
3572 // Calculates the callee saved stack size.
3573 unsigned CSStackSize = 0;
3574 unsigned SVECSStackSize = 0;
3576 const MachineRegisterInfo &MRI = MF.getRegInfo();
3577 for (unsigned Reg : SavedRegs.set_bits()) {
3578 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3579 if (AArch64::PPRRegClass.contains(Reg) ||
3580 AArch64::ZPRRegClass.contains(Reg))
3581 SVECSStackSize += RegSize;
3582 else
3583 CSStackSize += RegSize;
3584 }
3585
3586 // Increase the callee-saved stack size if the function has streaming mode
3587 // changes, as we will need to spill the value of the VG register.
3588 // For locally streaming functions, we spill both the streaming and
3589 // non-streaming VG value.
3590 const Function &F = MF.getFunction();
3591 SMEAttrs Attrs(F);
3592 if (AFI->hasStreamingModeChanges()) {
3593 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3594 CSStackSize += 16;
3595 else
3596 CSStackSize += 8;
3597 }
3598
3599 // Save number of saved regs, so we can easily update CSStackSize later.
3600 unsigned NumSavedRegs = SavedRegs.count();
3601
3602 // The frame record needs to be created by saving the appropriate registers
3603 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3604 if (hasFP(MF) ||
3605 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3606 SavedRegs.set(AArch64::FP);
3607 SavedRegs.set(AArch64::LR);
3608 }
3609
3610 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3611 for (unsigned Reg
3612 : SavedRegs.set_bits()) dbgs()
3613 << ' ' << printReg(Reg, RegInfo);
3614 dbgs() << "\n";);
3615
3616 // If any callee-saved registers are used, the frame cannot be eliminated.
3617 int64_t SVEStackSize =
3618 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3619 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3620
3621 // The CSR spill slots have not been allocated yet, so estimateStackSize
3622 // won't include them.
3623 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3624
3625 // We may address some of the stack above the canonical frame address, either
3626 // for our own arguments or during a call. Include that in calculating whether
3627 // we have complicated addressing concerns.
3628 int64_t CalleeStackUsed = 0;
3629 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3630 int64_t FixedOff = MFI.getObjectOffset(I);
3631 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3632 }
3633
3634 // Conservatively always assume BigStack when there are SVE spills.
3635 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3636 CalleeStackUsed) > EstimatedStackSizeLimit;
3637 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3638 AFI->setHasStackFrame(true);
3639
3640 // Estimate if we might need to scavenge a register at some point in order
3641 // to materialize a stack offset. If so, either spill one additional
3642 // callee-saved register or reserve a special spill slot to facilitate
3643 // register scavenging. If we already spilled an extra callee-saved register
3644 // above to keep the number of spills even, we don't need to do anything else
3645 // here.
3646 if (BigStack) {
3647 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3648 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3649 << " to get a scratch register.\n");
3650 SavedRegs.set(UnspilledCSGPR);
3651 ExtraCSSpill = UnspilledCSGPR;
3652
3653 // MachO's compact unwind format relies on all registers being stored in
3654 // pairs, so if we need to spill one extra for BigStack, then we need to
3655 // store the pair.
3656 if (producePairRegisters(MF)) {
3657 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3658 // Failed to make a pair for compact unwind format, revert spilling.
3659 if (produceCompactUnwindFrame(MF)) {
3660 SavedRegs.reset(UnspilledCSGPR);
3661 ExtraCSSpill = AArch64::NoRegister;
3662 }
3663 } else
3664 SavedRegs.set(UnspilledCSGPRPaired);
3665 }
3666 }
3667
3668 // If we didn't find an extra callee-saved register to spill, create
3669 // an emergency spill slot.
3670 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3672 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3673 unsigned Size = TRI->getSpillSize(RC);
3674 Align Alignment = TRI->getSpillAlign(RC);
3675 int FI = MFI.CreateStackObject(Size, Alignment, false);
3677 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3678 << " as the emergency spill slot.\n");
3679 }
3680 }
3681
3682 // Adding the size of additional 64bit GPR saves.
3683 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3684
3685 // A Swift asynchronous context extends the frame record with a pointer
3686 // directly before FP.
3687 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3688 CSStackSize += 8;
3689
3690 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3691 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3692 << EstimatedStackSize + AlignedCSStackSize
3693 << " bytes.\n");
3694
3696 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3697 "Should not invalidate callee saved info");
3698
3699 // Round up to register pair alignment to avoid additional SP adjustment
3700 // instructions.
3701 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3702 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3703 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3704}
3705
3707 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3708 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3709 unsigned &MaxCSFrameIndex) const {
3710 bool NeedsWinCFI = needsWinCFI(MF);
3711 // To match the canonical windows frame layout, reverse the list of
3712 // callee saved registers to get them laid out by PrologEpilogInserter
3713 // in the right order. (PrologEpilogInserter allocates stack objects top
3714 // down. Windows canonical prologs store higher numbered registers at
3715 // the top, thus have the CSI array start from the highest registers.)
3716 if (NeedsWinCFI)
3717 std::reverse(CSI.begin(), CSI.end());
3718
3719 if (CSI.empty())
3720 return true; // Early exit if no callee saved registers are modified!
3721
3722 // Now that we know which registers need to be saved and restored, allocate
3723 // stack slots for them.
3724 MachineFrameInfo &MFI = MF.getFrameInfo();
3725 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3726
3727 bool UsesWinAAPCS = isTargetWindows(MF);
3728 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3729 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3730 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3731 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3732 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3733 }
3734
3735 // Insert VG into the list of CSRs, immediately before LR if saved.
3736 if (AFI->hasStreamingModeChanges()) {
3737 std::vector<CalleeSavedInfo> VGSaves;
3738 SMEAttrs Attrs(MF.getFunction());
3739
3740 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3741 VGInfo.setRestored(false);
3742 VGSaves.push_back(VGInfo);
3743
3744 // Add VG again if the function is locally-streaming, as we will spill two
3745 // values.
3746 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3747 VGSaves.push_back(VGInfo);
3748
3749 bool InsertBeforeLR = false;
3750
3751 for (unsigned I = 0; I < CSI.size(); I++)
3752 if (CSI[I].getReg() == AArch64::LR) {
3753 InsertBeforeLR = true;
3754 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3755 break;
3756 }
3757
3758 if (!InsertBeforeLR)
3759 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3760 }
3761
3762 for (auto &CS : CSI) {
3763 Register Reg = CS.getReg();
3764 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3765
3766 unsigned Size = RegInfo->getSpillSize(*RC);
3767 Align Alignment(RegInfo->getSpillAlign(*RC));
3768 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3769 CS.setFrameIdx(FrameIdx);
3770
3771 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3772 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3773
3774 // Grab 8 bytes below FP for the extended asynchronous frame info.
3775 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3776 Reg == AArch64::FP) {
3777 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3778 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3779 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3780 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3781 }
3782 }
3783 return true;
3784}
3785
3787 const MachineFunction &MF) const {
3789 // If the function has streaming-mode changes, don't scavenge a
3790 // spillslot in the callee-save area, as that might require an
3791 // 'addvl' in the streaming-mode-changing call-sequence when the
3792 // function doesn't use a FP.
3793 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3794 return false;
3795 return AFI->hasCalleeSaveStackFreeSpace();
3796}
3797
3798/// returns true if there are any SVE callee saves.
3800 int &Min, int &Max) {
3801 Min = std::numeric_limits<int>::max();
3802 Max = std::numeric_limits<int>::min();
3803
3804 if (!MFI.isCalleeSavedInfoValid())
3805 return false;
3806
3807 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3808 for (auto &CS : CSI) {
3809 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3810 AArch64::PPRRegClass.contains(CS.getReg())) {
3811 assert((Max == std::numeric_limits<int>::min() ||
3812 Max + 1 == CS.getFrameIdx()) &&
3813 "SVE CalleeSaves are not consecutive");
3814
3815 Min = std::min(Min, CS.getFrameIdx());
3816 Max = std::max(Max, CS.getFrameIdx());
3817 }
3818 }
3819 return Min != std::numeric_limits<int>::max();
3820}
3821
3822// Process all the SVE stack objects and determine offsets for each
3823// object. If AssignOffsets is true, the offsets get assigned.
3824// Fills in the first and last callee-saved frame indices into
3825// Min/MaxCSFrameIndex, respectively.
3826// Returns the size of the stack.
3828 int &MinCSFrameIndex,
3829 int &MaxCSFrameIndex,
3830 bool AssignOffsets) {
3831#ifndef NDEBUG
3832 // First process all fixed stack objects.
3833 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3835 "SVE vectors should never be passed on the stack by value, only by "
3836 "reference.");
3837#endif
3838
3839 auto Assign = [&MFI](int FI, int64_t Offset) {
3840 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3841 MFI.setObjectOffset(FI, Offset);
3842 };
3843
3844 int64_t Offset = 0;
3845
3846 // Then process all callee saved slots.
3847 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3848 // Assign offsets to the callee save slots.
3849 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3850 Offset += MFI.getObjectSize(I);
3852 if (AssignOffsets)
3853 Assign(I, -Offset);
3854 }
3855 }
3856
3857 // Ensure that the Callee-save area is aligned to 16bytes.
3858 Offset = alignTo(Offset, Align(16U));
3859
3860 // Create a buffer of SVE objects to allocate and sort it.
3861 SmallVector<int, 8> ObjectsToAllocate;
3862 // If we have a stack protector, and we've previously decided that we have SVE
3863 // objects on the stack and thus need it to go in the SVE stack area, then it
3864 // needs to go first.
3865 int StackProtectorFI = -1;
3866 if (MFI.hasStackProtectorIndex()) {
3867 StackProtectorFI = MFI.getStackProtectorIndex();
3868 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3869 ObjectsToAllocate.push_back(StackProtectorFI);
3870 }
3871 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3872 unsigned StackID = MFI.getStackID(I);
3873 if (StackID != TargetStackID::ScalableVector)
3874 continue;
3875 if (I == StackProtectorFI)
3876 continue;
3877 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3878 continue;
3879 if (MFI.isDeadObjectIndex(I))
3880 continue;
3881
3882 ObjectsToAllocate.push_back(I);
3883 }
3884
3885 // Allocate all SVE locals and spills
3886 for (unsigned FI : ObjectsToAllocate) {
3887 Align Alignment = MFI.getObjectAlign(FI);
3888 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3889 // two, we'd need to align every object dynamically at runtime if the
3890 // alignment is larger than 16. This is not yet supported.
3891 if (Alignment > Align(16))
3893 "Alignment of scalable vectors > 16 bytes is not yet supported");
3894
3895 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3896 if (AssignOffsets)
3897 Assign(FI, -Offset);
3898 }
3899
3900 return Offset;
3901}
3902
3903int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3904 MachineFrameInfo &MFI) const {
3905 int MinCSFrameIndex, MaxCSFrameIndex;
3906 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3907}
3908
3909int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3910 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3911 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3912 true);
3913}
3914
3916 MachineFunction &MF, RegScavenger *RS) const {
3917 MachineFrameInfo &MFI = MF.getFrameInfo();
3918
3920 "Upwards growing stack unsupported");
3921
3922 int MinCSFrameIndex, MaxCSFrameIndex;
3923 int64_t SVEStackSize =
3924 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3925
3927 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3928 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3929
3930 // If this function isn't doing Win64-style C++ EH, we don't need to do
3931 // anything.
3932 if (!MF.hasEHFunclets())
3933 return;
3935 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3936
3937 MachineBasicBlock &MBB = MF.front();
3938 auto MBBI = MBB.begin();
3939 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3940 ++MBBI;
3941
3942 // Create an UnwindHelp object.
3943 // The UnwindHelp object is allocated at the start of the fixed object area
3944 int64_t FixedObject =
3945 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3946 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3947 /*SPOffset*/ -FixedObject,
3948 /*IsImmutable=*/false);
3949 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3950
3951 // We need to store -2 into the UnwindHelp object at the start of the
3952 // function.
3953 DebugLoc DL;
3955 RS->backward(MBBI);
3956 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3957 assert(DstReg && "There must be a free register after frame setup");
3958 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3959 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3960 .addReg(DstReg, getKillRegState(true))
3961 .addFrameIndex(UnwindHelpFI)
3962 .addImm(0);
3963}
3964
3965namespace {
3966struct TagStoreInstr {
3968 int64_t Offset, Size;
3969 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3970 : MI(MI), Offset(Offset), Size(Size) {}
3971};
3972
3973class TagStoreEdit {
3974 MachineFunction *MF;
3977 // Tag store instructions that are being replaced.
3979 // Combined memref arguments of the above instructions.
3981
3982 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3983 // FrameRegOffset + Size) with the address tag of SP.
3984 Register FrameReg;
3985 StackOffset FrameRegOffset;
3986 int64_t Size;
3987 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3988 // end.
3989 std::optional<int64_t> FrameRegUpdate;
3990 // MIFlags for any FrameReg updating instructions.
3991 unsigned FrameRegUpdateFlags;
3992
3993 // Use zeroing instruction variants.
3994 bool ZeroData;
3995 DebugLoc DL;
3996
3997 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3998 void emitLoop(MachineBasicBlock::iterator InsertI);
3999
4000public:
4001 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4002 : MBB(MBB), ZeroData(ZeroData) {
4003 MF = MBB->getParent();
4004 MRI = &MF->getRegInfo();
4005 }
4006 // Add an instruction to be replaced. Instructions must be added in the
4007 // ascending order of Offset, and have to be adjacent.
4008 void addInstruction(TagStoreInstr I) {
4009 assert((TagStores.empty() ||
4010 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4011 "Non-adjacent tag store instructions.");
4012 TagStores.push_back(I);
4013 }
4014 void clear() { TagStores.clear(); }
4015 // Emit equivalent code at the given location, and erase the current set of
4016 // instructions. May skip if the replacement is not profitable. May invalidate
4017 // the input iterator and replace it with a valid one.
4018 void emitCode(MachineBasicBlock::iterator &InsertI,
4019 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4020};
4021
4022void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4023 const AArch64InstrInfo *TII =
4024 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4025
4026 const int64_t kMinOffset = -256 * 16;
4027 const int64_t kMaxOffset = 255 * 16;
4028
4029 Register BaseReg = FrameReg;
4030 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4031 if (BaseRegOffsetBytes < kMinOffset ||
4032 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4033 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4034 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4035 // is required for the offset of ST2G.
4036 BaseRegOffsetBytes % 16 != 0) {
4037 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4038 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4039 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4040 BaseReg = ScratchReg;
4041 BaseRegOffsetBytes = 0;
4042 }
4043
4044 MachineInstr *LastI = nullptr;
4045 while (Size) {
4046 int64_t InstrSize = (Size > 16) ? 32 : 16;
4047 unsigned Opcode =
4048 InstrSize == 16
4049 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4050 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4051 assert(BaseRegOffsetBytes % 16 == 0);
4052 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4053 .addReg(AArch64::SP)
4054 .addReg(BaseReg)
4055 .addImm(BaseRegOffsetBytes / 16)
4056 .setMemRefs(CombinedMemRefs);
4057 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4058 // final SP adjustment in the epilogue.
4059 if (BaseRegOffsetBytes == 0)
4060 LastI = I;
4061 BaseRegOffsetBytes += InstrSize;
4062 Size -= InstrSize;
4063 }
4064
4065 if (LastI)
4066 MBB->splice(InsertI, MBB, LastI);
4067}
4068
4069void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4070 const AArch64InstrInfo *TII =
4071 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4072
4073 Register BaseReg = FrameRegUpdate
4074 ? FrameReg
4075 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4076 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4077
4078 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4079
4080 int64_t LoopSize = Size;
4081 // If the loop size is not a multiple of 32, split off one 16-byte store at
4082 // the end to fold BaseReg update into.
4083 if (FrameRegUpdate && *FrameRegUpdate)
4084 LoopSize -= LoopSize % 32;
4085 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4086 TII->get(ZeroData ? AArch64::STZGloop_wback
4087 : AArch64::STGloop_wback))
4088 .addDef(SizeReg)
4089 .addDef(BaseReg)
4090 .addImm(LoopSize)
4091 .addReg(BaseReg)
4092 .setMemRefs(CombinedMemRefs);
4093 if (FrameRegUpdate)
4094 LoopI->setFlags(FrameRegUpdateFlags);
4095
4096 int64_t ExtraBaseRegUpdate =
4097 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4098 if (LoopSize < Size) {
4099 assert(FrameRegUpdate);
4100 assert(Size - LoopSize == 16);
4101 // Tag 16 more bytes at BaseReg and update BaseReg.
4102 BuildMI(*MBB, InsertI, DL,
4103 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4104 .addDef(BaseReg)
4105 .addReg(BaseReg)
4106 .addReg(BaseReg)
4107 .addImm(1 + ExtraBaseRegUpdate / 16)
4108 .setMemRefs(CombinedMemRefs)
4109 .setMIFlags(FrameRegUpdateFlags);
4110 } else if (ExtraBaseRegUpdate) {
4111 // Update BaseReg.
4112 BuildMI(
4113 *MBB, InsertI, DL,
4114 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4115 .addDef(BaseReg)
4116 .addReg(BaseReg)
4117 .addImm(std::abs(ExtraBaseRegUpdate))
4118 .addImm(0)
4119 .setMIFlags(FrameRegUpdateFlags);
4120 }
4121}
4122
4123// Check if *II is a register update that can be merged into STGloop that ends
4124// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4125// end of the loop.
4126bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4127 int64_t Size, int64_t *TotalOffset) {
4128 MachineInstr &MI = *II;
4129 if ((MI.getOpcode() == AArch64::ADDXri ||
4130 MI.getOpcode() == AArch64::SUBXri) &&
4131 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4132 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4133 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4134 if (MI.getOpcode() == AArch64::SUBXri)
4135 Offset = -Offset;
4136 int64_t AbsPostOffset = std::abs(Offset - Size);
4137 const int64_t kMaxOffset =
4138 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
4139 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
4140 *TotalOffset = Offset;
4141 return true;
4142 }
4143 }
4144 return false;
4145}
4146
4147void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4149 MemRefs.clear();
4150 for (auto &TS : TSE) {
4151 MachineInstr *MI = TS.MI;
4152 // An instruction without memory operands may access anything. Be
4153 // conservative and return an empty list.
4154 if (MI->memoperands_empty()) {
4155 MemRefs.clear();
4156 return;
4157 }
4158 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4159 }
4160}
4161
4162void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4163 const AArch64FrameLowering *TFI,
4164 bool TryMergeSPUpdate) {
4165 if (TagStores.empty())
4166 return;
4167 TagStoreInstr &FirstTagStore = TagStores[0];
4168 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4169 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4170 DL = TagStores[0].MI->getDebugLoc();
4171
4172 Register Reg;
4173 FrameRegOffset = TFI->resolveFrameOffsetReference(
4174 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4175 /*PreferFP=*/false, /*ForSimm=*/true);
4176 FrameReg = Reg;
4177 FrameRegUpdate = std::nullopt;
4178
4179 mergeMemRefs(TagStores, CombinedMemRefs);
4180
4181 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
4182 for (const auto &Instr
4183 : TagStores) { dbgs() << " " << *Instr.MI; });
4184
4185 // Size threshold where a loop becomes shorter than a linear sequence of
4186 // tagging instructions.
4187 const int kSetTagLoopThreshold = 176;
4188 if (Size < kSetTagLoopThreshold) {
4189 if (TagStores.size() < 2)
4190 return;
4191 emitUnrolled(InsertI);
4192 } else {
4193 MachineInstr *UpdateInstr = nullptr;
4194 int64_t TotalOffset = 0;
4195 if (TryMergeSPUpdate) {
4196 // See if we can merge base register update into the STGloop.
4197 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4198 // but STGloop is way too unusual for that, and also it only
4199 // realistically happens in function epilogue. Also, STGloop is expanded
4200 // before that pass.
4201 if (InsertI != MBB->end() &&
4202 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4203 &TotalOffset)) {
4204 UpdateInstr = &*InsertI++;
4205 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4206 << *UpdateInstr);
4207 }
4208 }
4209
4210 if (!UpdateInstr && TagStores.size() < 2)
4211 return;
4212
4213 if (UpdateInstr) {
4214 FrameRegUpdate = TotalOffset;
4215 FrameRegUpdateFlags = UpdateInstr->getFlags();
4216 }
4217 emitLoop(InsertI);
4218 if (UpdateInstr)
4219 UpdateInstr->eraseFromParent();
4220 }
4221
4222 for (auto &TS : TagStores)
4223 TS.MI->eraseFromParent();
4224}
4225
4226bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4227 int64_t &Size, bool &ZeroData) {
4228 MachineFunction &MF = *MI.getParent()->getParent();
4229 const MachineFrameInfo &MFI = MF.getFrameInfo();
4230
4231 unsigned Opcode = MI.getOpcode();
4232 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4233 Opcode == AArch64::STZ2Gi);
4234
4235 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4236 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4237 return false;
4238 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4239 return false;
4240 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4241 Size = MI.getOperand(2).getImm();
4242 return true;
4243 }
4244
4245 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4246 Size = 16;
4247 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4248 Size = 32;
4249 else
4250 return false;
4251
4252 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4253 return false;
4254
4255 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4256 16 * MI.getOperand(2).getImm();
4257 return true;
4258}
4259
4260// Detect a run of memory tagging instructions for adjacent stack frame slots,
4261// and replace them with a shorter instruction sequence:
4262// * replace STG + STG with ST2G
4263// * replace STGloop + STGloop with STGloop
4264// This code needs to run when stack slot offsets are already known, but before
4265// FrameIndex operands in STG instructions are eliminated.
4267 const AArch64FrameLowering *TFI,
4268 RegScavenger *RS) {
4269 bool FirstZeroData;
4270 int64_t Size, Offset;
4271 MachineInstr &MI = *II;
4272 MachineBasicBlock *MBB = MI.getParent();
4274 if (&MI == &MBB->instr_back())
4275 return II;
4276 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4277 return II;
4278
4280 Instrs.emplace_back(&MI, Offset, Size);
4281
4282 constexpr int kScanLimit = 10;
4283 int Count = 0;
4285 NextI != E && Count < kScanLimit; ++NextI) {
4286 MachineInstr &MI = *NextI;
4287 bool ZeroData;
4288 int64_t Size, Offset;
4289 // Collect instructions that update memory tags with a FrameIndex operand
4290 // and (when applicable) constant size, and whose output registers are dead
4291 // (the latter is almost always the case in practice). Since these
4292 // instructions effectively have no inputs or outputs, we are free to skip
4293 // any non-aliasing instructions in between without tracking used registers.
4294 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4295 if (ZeroData != FirstZeroData)
4296 break;
4297 Instrs.emplace_back(&MI, Offset, Size);
4298 continue;
4299 }
4300
4301 // Only count non-transient, non-tagging instructions toward the scan
4302 // limit.
4303 if (!MI.isTransient())
4304 ++Count;
4305
4306 // Just in case, stop before the epilogue code starts.
4307 if (MI.getFlag(MachineInstr::FrameSetup) ||
4309 break;
4310
4311 // Reject anything that may alias the collected instructions.
4312 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
4313 break;
4314 }
4315
4316 // New code will be inserted after the last tagging instruction we've found.
4317 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4318
4319 // All the gathered stack tag instructions are merged and placed after
4320 // last tag store in the list. The check should be made if the nzcv
4321 // flag is live at the point where we are trying to insert. Otherwise
4322 // the nzcv flag might get clobbered if any stg loops are present.
4323
4324 // FIXME : This approach of bailing out from merge is conservative in
4325 // some ways like even if stg loops are not present after merge the
4326 // insert list, this liveness check is done (which is not needed).
4328 LiveRegs.addLiveOuts(*MBB);
4329 for (auto I = MBB->rbegin();; ++I) {
4330 MachineInstr &MI = *I;
4331 if (MI == InsertI)
4332 break;
4333 LiveRegs.stepBackward(*I);
4334 }
4335 InsertI++;
4336 if (LiveRegs.contains(AArch64::NZCV))
4337 return InsertI;
4338
4339 llvm::stable_sort(Instrs,
4340 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4341 return Left.Offset < Right.Offset;
4342 });
4343
4344 // Make sure that we don't have any overlapping stores.
4345 int64_t CurOffset = Instrs[0].Offset;
4346 for (auto &Instr : Instrs) {
4347 if (CurOffset > Instr.Offset)
4348 return NextI;
4349 CurOffset = Instr.Offset + Instr.Size;
4350 }
4351
4352 // Find contiguous runs of tagged memory and emit shorter instruction
4353 // sequencies for them when possible.
4354 TagStoreEdit TSE(MBB, FirstZeroData);
4355 std::optional<int64_t> EndOffset;
4356 for (auto &Instr : Instrs) {
4357 if (EndOffset && *EndOffset != Instr.Offset) {
4358 // Found a gap.
4359 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4360 TSE.clear();
4361 }
4362
4363 TSE.addInstruction(Instr);
4364 EndOffset = Instr.Offset + Instr.Size;
4365 }
4366
4367 const MachineFunction *MF = MBB->getParent();
4368 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4369 TSE.emitCode(
4370 InsertI, TFI, /*TryMergeSPUpdate = */
4372
4373 return InsertI;
4374}
4375} // namespace
4376
4378 const AArch64FrameLowering *TFI) {
4379 MachineInstr &MI = *II;
4380 MachineBasicBlock *MBB = MI.getParent();
4381 MachineFunction *MF = MBB->getParent();
4382
4383 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4384 MI.getOpcode() != AArch64::VGRestorePseudo)
4385 return II;
4386
4387 SMEAttrs FuncAttrs(MF->getFunction());
4388 bool LocallyStreaming =
4389 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4392 const AArch64InstrInfo *TII =
4393 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4394
4395 int64_t VGFrameIdx =
4396 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4397 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4398 "Expected FrameIdx for VG");
4399
4400 unsigned CFIIndex;
4401 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4402 const MachineFrameInfo &MFI = MF->getFrameInfo();
4403 int64_t Offset =
4404 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4406 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4407 } else
4409 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4410
4411 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4412 TII->get(TargetOpcode::CFI_INSTRUCTION))
4413 .addCFIIndex(CFIIndex);
4414
4415 MI.eraseFromParent();
4416 return UnwindInst->getIterator();
4417}
4418
4420 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4422 for (auto &BB : MF)
4423 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4424 if (AFI->hasStreamingModeChanges())
4425 II = emitVGSaveRestore(II, this);
4427 II = tryMergeAdjacentSTG(II, this, RS);
4428 }
4429}
4430
4431/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4432/// before the update. This is easily retrieved as it is exactly the offset
4433/// that is set in processFunctionBeforeFrameFinalized.
4435 const MachineFunction &MF, int FI, Register &FrameReg,
4436 bool IgnoreSPUpdates) const {
4437 const MachineFrameInfo &MFI = MF.getFrameInfo();
4438 if (IgnoreSPUpdates) {
4439 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4440 << MFI.getObjectOffset(FI) << "\n");
4441 FrameReg = AArch64::SP;
4442 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4443 }
4444
4445 // Go to common code if we cannot provide sp + offset.
4446 if (MFI.hasVarSizedObjects() ||
4449 return getFrameIndexReference(MF, FI, FrameReg);
4450
4451 FrameReg = AArch64::SP;
4452 return getStackOffset(MF, MFI.getObjectOffset(FI));
4453}
4454
4455/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4456/// the parent's frame pointer
4458 const MachineFunction &MF) const {
4459 return 0;
4460}
4461
4462/// Funclets only need to account for space for the callee saved registers,
4463/// as the locals are accounted for in the parent's stack frame.
4465 const MachineFunction &MF) const {
4466 // This is the size of the pushed CSRs.
4467 unsigned CSSize =
4468 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4469 // This is the amount of stack a funclet needs to allocate.
4470 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4471 getStackAlign());
4472}
4473
4474namespace {
4475struct FrameObject {
4476 bool IsValid = false;
4477 // Index of the object in MFI.
4478 int ObjectIndex = 0;
4479 // Group ID this object belongs to.
4480 int GroupIndex = -1;
4481 // This object should be placed first (closest to SP).
4482 bool ObjectFirst = false;
4483 // This object's group (which always contains the object with
4484 // ObjectFirst==true) should be placed first.
4485 bool GroupFirst = false;
4486};
4487
4488class GroupBuilder {
4489 SmallVector<int, 8> CurrentMembers;
4490 int NextGroupIndex = 0;
4491 std::vector<FrameObject> &Objects;
4492
4493public:
4494 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4495 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4496 void EndCurrentGroup() {
4497 if (CurrentMembers.size() > 1) {
4498 // Create a new group with the current member list. This might remove them
4499 // from their pre-existing groups. That's OK, dealing with overlapping
4500 // groups is too hard and unlikely to make a difference.
4501 LLVM_DEBUG(dbgs() << "group:");
4502 for (int Index : CurrentMembers) {
4503 Objects[Index].GroupIndex = NextGroupIndex;
4504 LLVM_DEBUG(dbgs() << " " << Index);
4505 }
4506 LLVM_DEBUG(dbgs() << "\n");
4507 NextGroupIndex++;
4508 }
4509 CurrentMembers.clear();
4510 }
4511};
4512
4513bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4514 // Objects at a lower index are closer to FP; objects at a higher index are
4515 // closer to SP.
4516 //
4517 // For consistency in our comparison, all invalid objects are placed
4518 // at the end. This also allows us to stop walking when we hit the
4519 // first invalid item after it's all sorted.
4520 //
4521 // The "first" object goes first (closest to SP), followed by the members of
4522 // the "first" group.
4523 //
4524 // The rest are sorted by the group index to keep the groups together.
4525 // Higher numbered groups are more likely to be around longer (i.e. untagged
4526 // in the function epilogue and not at some earlier point). Place them closer
4527 // to SP.
4528 //
4529 // If all else equal, sort by the object index to keep the objects in the
4530 // original order.
4531 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
4532 A.ObjectIndex) <
4533 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
4534 B.ObjectIndex);
4535}
4536} // namespace
4537
4539 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4540 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4541 return;
4542
4543 const MachineFrameInfo &MFI = MF.getFrameInfo();
4544 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4545 for (auto &Obj : ObjectsToAllocate) {
4546 FrameObjects[Obj].IsValid = true;
4547 FrameObjects[Obj].ObjectIndex = Obj;
4548 }
4549
4550 // Identify stack slots that are tagged at the same time.
4551 GroupBuilder GB(FrameObjects);
4552 for (auto &MBB : MF) {
4553 for (auto &MI : MBB) {
4554 if (MI.isDebugInstr())
4555 continue;
4556 int OpIndex;
4557 switch (MI.getOpcode()) {
4558 case AArch64::STGloop:
4559 case AArch64::STZGloop:
4560 OpIndex = 3;
4561 break;
4562 case AArch64::STGi:
4563 case AArch64::STZGi:
4564 case AArch64::ST2Gi:
4565 case AArch64::STZ2Gi:
4566 OpIndex = 1;
4567 break;
4568 default:
4569 OpIndex = -1;
4570 }
4571
4572 int TaggedFI = -1;
4573 if (OpIndex >= 0) {
4574 const MachineOperand &MO = MI.getOperand(OpIndex);
4575 if (MO.isFI()) {
4576 int FI = MO.getIndex();
4577 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4578 FrameObjects[FI].IsValid)
4579 TaggedFI = FI;
4580 }
4581 }
4582
4583 // If this is a stack tagging instruction for a slot that is not part of a
4584 // group yet, either start a new group or add it to the current one.
4585 if (TaggedFI >= 0)
4586 GB.AddMember(TaggedFI);
4587 else
4588 GB.EndCurrentGroup();
4589 }
4590 // Groups should never span multiple basic blocks.
4591 GB.EndCurrentGroup();
4592 }
4593
4594 // If the function's tagged base pointer is pinned to a stack slot, we want to
4595 // put that slot first when possible. This will likely place it at SP + 0,
4596 // and save one instruction when generating the base pointer because IRG does
4597 // not allow an immediate offset.
4599 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4600 if (TBPI) {
4601 FrameObjects[*TBPI].ObjectFirst = true;
4602 FrameObjects[*TBPI].GroupFirst = true;
4603 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4604 if (FirstGroupIndex >= 0)
4605 for (FrameObject &Object : FrameObjects)
4606 if (Object.GroupIndex == FirstGroupIndex)
4607 Object.GroupFirst = true;
4608 }
4609
4610 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4611
4612 int i = 0;
4613 for (auto &Obj : FrameObjects) {
4614 // All invalid items are sorted at the end, so it's safe to stop.
4615 if (!Obj.IsValid)
4616 break;
4617 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4618 }
4619
4620 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
4621 : FrameObjects) {
4622 if (!Obj.IsValid)
4623 break;
4624 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4625 if (Obj.ObjectFirst)
4626 dbgs() << ", first";
4627 if (Obj.GroupFirst)
4628 dbgs() << ", group-first";
4629 dbgs() << "\n";
4630 });
4631}
4632
4633/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4634/// least every ProbeSize bytes. Returns an iterator of the first instruction
4635/// after the loop. The difference between SP and TargetReg must be an exact
4636/// multiple of ProbeSize.
4638AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4639 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4640 Register TargetReg) const {
4642 MachineFunction &MF = *MBB.getParent();
4643 const AArch64InstrInfo *TII =
4644 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4646
4647 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4649 MF.insert(MBBInsertPoint, LoopMBB);
4651 MF.insert(MBBInsertPoint, ExitMBB);
4652
4653 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4654 // in SUB).
4655 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4656 StackOffset::getFixed(-ProbeSize), TII,
4658 // STR XZR, [SP]
4659 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4660 .addReg(AArch64::XZR)
4661 .addReg(AArch64::SP)
4662 .addImm(0)
4664 // CMP SP, TargetReg
4665 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4666 AArch64::XZR)
4667 .addReg(AArch64::SP)
4668 .addReg(TargetReg)
4671 // B.CC Loop
4672 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4674 .addMBB(LoopMBB)
4676
4677 LoopMBB->addSuccessor(ExitMBB);
4678 LoopMBB->addSuccessor(LoopMBB);
4679 // Synthesize the exit MBB.
4680 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4682 MBB.addSuccessor(LoopMBB);
4683 // Update liveins.
4684 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4685
4686 return ExitMBB->begin();
4687}
4688
4689void AArch64FrameLowering::inlineStackProbeFixed(
4690 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4691 StackOffset CFAOffset) const {
4693 MachineFunction &MF = *MBB->getParent();
4694 const AArch64InstrInfo *TII =
4695 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4697 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4698 bool HasFP = hasFP(MF);
4699
4700 DebugLoc DL;
4701 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4702 int64_t NumBlocks = FrameSize / ProbeSize;
4703 int64_t ResidualSize = FrameSize % ProbeSize;
4704
4705 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4706 << NumBlocks << " blocks of " << ProbeSize
4707 << " bytes, plus " << ResidualSize << " bytes\n");
4708
4709 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4710 // ordinary loop.
4711 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4712 for (int i = 0; i < NumBlocks; ++i) {
4713 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4714 // encodable in a SUB).
4715 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4716 StackOffset::getFixed(-ProbeSize), TII,
4717 MachineInstr::FrameSetup, false, false, nullptr,
4718 EmitAsyncCFI && !HasFP, CFAOffset);
4719 CFAOffset += StackOffset::getFixed(ProbeSize);
4720 // STR XZR, [SP]
4721 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4722 .addReg(AArch64::XZR)
4723 .addReg(AArch64::SP)
4724 .addImm(0)
4726 }
4727 } else if (NumBlocks != 0) {
4728 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4729 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4730 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4731 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4732 MachineInstr::FrameSetup, false, false, nullptr,
4733 EmitAsyncCFI && !HasFP, CFAOffset);
4734 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4735 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4736 MBB = MBBI->getParent();
4737 if (EmitAsyncCFI && !HasFP) {
4738 // Set the CFA register back to SP.
4740 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
4741 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
4742 unsigned CFIIndex =
4744 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
4745 .addCFIIndex(CFIIndex)
4747 }
4748 }
4749
4750 if (ResidualSize != 0) {
4751 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4752 // in SUB).
4753 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4754 StackOffset::getFixed(-ResidualSize), TII,
4755 MachineInstr::FrameSetup, false, false, nullptr,
4756 EmitAsyncCFI && !HasFP, CFAOffset);
4757 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4758 // STR XZR, [SP]
4759 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4760 .addReg(AArch64::XZR)
4761 .addReg(AArch64::SP)
4762 .addImm(0)
4764 }
4765 }
4766}
4767
4768void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4769 MachineBasicBlock &MBB) const {
4770 // Get the instructions that need to be replaced. We emit at most two of
4771 // these. Remember them in order to avoid complications coming from the need
4772 // to traverse the block while potentially creating more blocks.
4774 for (MachineInstr &MI : MBB)
4775 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4776 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4777 ToReplace.push_back(&MI);
4778
4779 for (MachineInstr *MI : ToReplace) {
4780 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4781 Register ScratchReg = MI->getOperand(0).getReg();
4782 int64_t FrameSize = MI->getOperand(1).getImm();
4783 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4784 MI->getOperand(3).getImm());
4785 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4786 CFAOffset);
4787 } else {
4788 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4789 "Stack probe pseudo-instruction expected");
4790 const AArch64InstrInfo *TII =
4791 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4792 Register TargetReg = MI->getOperand(0).getReg();
4793 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4794 }
4795 MI->eraseFromParent();
4796 }
4797}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:148
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:698
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:695
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:274
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:350
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:690
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:799
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:548
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:583
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:556
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:616
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:609
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:541
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:647
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:630
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:345
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isLiveIn(MCPhysReg Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:409
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:391
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:586
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1299
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:342
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
self_iterator getIterator()
Definition: ilist_node.h:132
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:1995
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1729
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:419
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.