LLVM 20.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the StackHazardSize option. Without changing the layout of the
130// stack frame in the diagram above, a stack object of size StackHazardSize is
131// added between GPR and FPR CSRs. Another is added to the stack objects
132// section, and stack objects are sorted so that FPR > Hazard padding slot >
133// GPRs (where possible). Unfortunately some things are not handled well (VLA
134// area, arguments on the stack, object with both GPR and FPR accesses), but if
135// those are controlled by the user then the entire stack frame becomes GPR at
136// the start/end with FPR in the middle, surrounded by Hazard padding.
137//
138// An example of the prologue:
139//
140// .globl __foo
141// .align 2
142// __foo:
143// Ltmp0:
144// .cfi_startproc
145// .cfi_personality 155, ___gxx_personality_v0
146// Leh_func_begin:
147// .cfi_lsda 16, Lexception33
148//
149// stp xa,bx, [sp, -#offset]!
150// ...
151// stp x28, x27, [sp, #offset-32]
152// stp fp, lr, [sp, #offset-16]
153// add fp, sp, #offset - 16
154// sub sp, sp, #1360
155//
156// The Stack:
157// +-------------------------------------------+
158// 10000 | ........ | ........ | ........ | ........ |
159// 10004 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10008 | ........ | ........ | ........ | ........ |
162// 1000c | ........ | ........ | ........ | ........ |
163// +===========================================+
164// 10010 | X28 Register |
165// 10014 | X28 Register |
166// +-------------------------------------------+
167// 10018 | X27 Register |
168// 1001c | X27 Register |
169// +===========================================+
170// 10020 | Frame Pointer |
171// 10024 | Frame Pointer |
172// +-------------------------------------------+
173// 10028 | Link Register |
174// 1002c | Link Register |
175// +===========================================+
176// 10030 | ........ | ........ | ........ | ........ |
177// 10034 | ........ | ........ | ........ | ........ |
178// +-------------------------------------------+
179// 10038 | ........ | ........ | ........ | ........ |
180// 1003c | ........ | ........ | ........ | ........ |
181// +-------------------------------------------+
182//
183// [sp] = 10030 :: >>initial value<<
184// sp = 10020 :: stp fp, lr, [sp, #-16]!
185// fp = sp == 10020 :: mov fp, sp
186// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
187// sp == 10010 :: >>final value<<
188//
189// The frame pointer (w29) points to address 10020. If we use an offset of
190// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
191// for w27, and -32 for w28:
192//
193// Ltmp1:
194// .cfi_def_cfa w29, 16
195// Ltmp2:
196// .cfi_offset w30, -8
197// Ltmp3:
198// .cfi_offset w29, -16
199// Ltmp4:
200// .cfi_offset w27, -24
201// Ltmp5:
202// .cfi_offset w28, -32
203//
204//===----------------------------------------------------------------------===//
205
206#include "AArch64FrameLowering.h"
207#include "AArch64InstrInfo.h"
209#include "AArch64RegisterInfo.h"
210#include "AArch64Subtarget.h"
211#include "AArch64TargetMachine.h"
214#include "llvm/ADT/ScopeExit.h"
215#include "llvm/ADT/SmallVector.h"
216#include "llvm/ADT/Statistic.h"
233#include "llvm/IR/Attributes.h"
234#include "llvm/IR/CallingConv.h"
235#include "llvm/IR/DataLayout.h"
236#include "llvm/IR/DebugLoc.h"
237#include "llvm/IR/Function.h"
238#include "llvm/MC/MCAsmInfo.h"
239#include "llvm/MC/MCDwarf.h"
241#include "llvm/Support/Debug.h"
248#include <cassert>
249#include <cstdint>
250#include <iterator>
251#include <optional>
252#include <vector>
253
254using namespace llvm;
255
256#define DEBUG_TYPE "frame-info"
257
258static cl::opt<bool> EnableRedZone("aarch64-redzone",
259 cl::desc("enable use of redzone on AArch64"),
260 cl::init(false), cl::Hidden);
261
263 "stack-tagging-merge-settag",
264 cl::desc("merge settag instruction in function epilog"), cl::init(true),
265 cl::Hidden);
266
267static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
268 cl::desc("sort stack allocations"),
269 cl::init(true), cl::Hidden);
270
272 "homogeneous-prolog-epilog", cl::Hidden,
273 cl::desc("Emit homogeneous prologue and epilogue for the size "
274 "optimization (default = off)"));
275
276// Stack hazard padding size. 0 = disabled.
277static cl::opt<unsigned> StackHazardSize("aarch64-stack-hazard-size",
278 cl::init(0), cl::Hidden);
279// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
281 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
282 cl::Hidden);
283// Whether to insert padding into non-streaming functions (for testing).
284static cl::opt<bool>
285 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
286 cl::init(false), cl::Hidden);
287
288STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
289
290/// Returns how much of the incoming argument stack area (in bytes) we should
291/// clean up in an epilogue. For the C calling convention this will be 0, for
292/// guaranteed tail call conventions it can be positive (a normal return or a
293/// tail call to a function that uses less stack space for arguments) or
294/// negative (for a tail call to a function that needs more stack space than us
295/// for arguments).
300 bool IsTailCallReturn = (MBB.end() != MBBI)
302 : false;
303
304 int64_t ArgumentPopSize = 0;
305 if (IsTailCallReturn) {
306 MachineOperand &StackAdjust = MBBI->getOperand(1);
307
308 // For a tail-call in a callee-pops-arguments environment, some or all of
309 // the stack may actually be in use for the call's arguments, this is
310 // calculated during LowerCall and consumed here...
311 ArgumentPopSize = StackAdjust.getImm();
312 } else {
313 // ... otherwise the amount to pop is *all* of the argument space,
314 // conveniently stored in the MachineFunctionInfo by
315 // LowerFormalArguments. This will, of course, be zero for the C calling
316 // convention.
317 ArgumentPopSize = AFI->getArgumentStackToRestore();
318 }
319
320 return ArgumentPopSize;
321}
322
324static bool needsWinCFI(const MachineFunction &MF);
327
328/// Returns true if a homogeneous prolog or epilog code can be emitted
329/// for the size optimization. If possible, a frame helper call is injected.
330/// When Exit block is given, this check is for epilog.
331bool AArch64FrameLowering::homogeneousPrologEpilog(
332 MachineFunction &MF, MachineBasicBlock *Exit) const {
333 if (!MF.getFunction().hasMinSize())
334 return false;
336 return false;
337 if (EnableRedZone)
338 return false;
339
340 // TODO: Window is supported yet.
341 if (needsWinCFI(MF))
342 return false;
343 // TODO: SVE is not supported yet.
344 if (getSVEStackSize(MF))
345 return false;
346
347 // Bail on stack adjustment needed on return for simplicity.
348 const MachineFrameInfo &MFI = MF.getFrameInfo();
350 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
351 return false;
352 if (Exit && getArgumentStackToRestore(MF, *Exit))
353 return false;
354
355 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
356 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
357 return false;
358
359 // If there are an odd number of GPRs before LR and FP in the CSRs list,
360 // they will not be paired into one RegPairInfo, which is incompatible with
361 // the assumption made by the homogeneous prolog epilog pass.
362 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
363 unsigned NumGPRs = 0;
364 for (unsigned I = 0; CSRegs[I]; ++I) {
365 Register Reg = CSRegs[I];
366 if (Reg == AArch64::LR) {
367 assert(CSRegs[I + 1] == AArch64::FP);
368 if (NumGPRs % 2 != 0)
369 return false;
370 break;
371 }
372 if (AArch64::GPR64RegClass.contains(Reg))
373 ++NumGPRs;
374 }
375
376 return true;
377}
378
379/// Returns true if CSRs should be paired.
380bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
381 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
382}
383
384/// This is the biggest offset to the stack pointer we can encode in aarch64
385/// instructions (without using a separate calculation and a temp register).
386/// Note that the exception here are vector stores/loads which cannot encode any
387/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
388static const unsigned DefaultSafeSPDisplacement = 255;
389
390/// Look at each instruction that references stack frames and return the stack
391/// size limit beyond which some of these instructions will require a scratch
392/// register during their expansion later.
394 // FIXME: For now, just conservatively guestimate based on unscaled indexing
395 // range. We'll end up allocating an unnecessary spill slot a lot, but
396 // realistically that's not a big deal at this stage of the game.
397 for (MachineBasicBlock &MBB : MF) {
398 for (MachineInstr &MI : MBB) {
399 if (MI.isDebugInstr() || MI.isPseudo() ||
400 MI.getOpcode() == AArch64::ADDXri ||
401 MI.getOpcode() == AArch64::ADDSXri)
402 continue;
403
404 for (const MachineOperand &MO : MI.operands()) {
405 if (!MO.isFI())
406 continue;
407
409 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
411 return 0;
412 }
413 }
414 }
416}
417
421}
422
423/// Returns the size of the fixed object area (allocated next to sp on entry)
424/// On Win64 this may include a var args area and an UnwindHelp object for EH.
425static unsigned getFixedObjectSize(const MachineFunction &MF,
426 const AArch64FunctionInfo *AFI, bool IsWin64,
427 bool IsFunclet) {
428 if (!IsWin64 || IsFunclet) {
429 return AFI->getTailCallReservedStack();
430 } else {
431 if (AFI->getTailCallReservedStack() != 0 &&
433 Attribute::SwiftAsync))
434 report_fatal_error("cannot generate ABI-changing tail call for Win64");
435 // Var args are stored here in the primary function.
436 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
437 // To support EH funclets we allocate an UnwindHelp object
438 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
439 return AFI->getTailCallReservedStack() +
440 alignTo(VarArgsArea + UnwindHelpObject, 16);
441 }
442}
443
444/// Returns the size of the entire SVE stackframe (calleesaves + spills).
447 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
448}
449
451 if (!EnableRedZone)
452 return false;
453
454 // Don't use the red zone if the function explicitly asks us not to.
455 // This is typically used for kernel code.
456 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
457 const unsigned RedZoneSize =
459 if (!RedZoneSize)
460 return false;
461
462 const MachineFrameInfo &MFI = MF.getFrameInfo();
464 uint64_t NumBytes = AFI->getLocalStackSize();
465
466 // If neither NEON or SVE are available, a COPY from one Q-reg to
467 // another requires a spill -> reload sequence. We can do that
468 // using a pre-decrementing store/post-decrementing load, but
469 // if we do so, we can't use the Red Zone.
470 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
471 !Subtarget.isNeonAvailable() &&
472 !Subtarget.hasSVE();
473
474 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
475 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
476}
477
478/// hasFP - Return true if the specified function should have a dedicated frame
479/// pointer register.
481 const MachineFrameInfo &MFI = MF.getFrameInfo();
482 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
483
484 // Win64 EH requires a frame pointer if funclets are present, as the locals
485 // are accessed off the frame pointer in both the parent function and the
486 // funclets.
487 if (MF.hasEHFunclets())
488 return true;
489 // Retain behavior of always omitting the FP for leaf functions when possible.
491 return true;
492 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
493 MFI.hasStackMap() || MFI.hasPatchPoint() ||
494 RegInfo->hasStackRealignment(MF))
495 return true;
496 // With large callframes around we may need to use FP to access the scavenging
497 // emergency spillslot.
498 //
499 // Unfortunately some calls to hasFP() like machine verifier ->
500 // getReservedReg() -> hasFP in the middle of global isel are too early
501 // to know the max call frame size. Hopefully conservatively returning "true"
502 // in those cases is fine.
503 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
504 if (!MFI.isMaxCallFrameSizeComputed() ||
506 return true;
507
508 return false;
509}
510
511/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
512/// not required, we reserve argument space for call sites in the function
513/// immediately on entry to the current function. This eliminates the need for
514/// add/sub sp brackets around call sites. Returns true if the call frame is
515/// included as part of the stack frame.
517 const MachineFunction &MF) const {
518 // The stack probing code for the dynamically allocated outgoing arguments
519 // area assumes that the stack is probed at the top - either by the prologue
520 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
521 // most recent variable-sized object allocation. Changing the condition here
522 // may need to be followed up by changes to the probe issuing logic.
523 return !MF.getFrameInfo().hasVarSizedObjects();
524}
525
529 const AArch64InstrInfo *TII =
530 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
531 const AArch64TargetLowering *TLI =
532 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
533 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
534 DebugLoc DL = I->getDebugLoc();
535 unsigned Opc = I->getOpcode();
536 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
537 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
538
539 if (!hasReservedCallFrame(MF)) {
540 int64_t Amount = I->getOperand(0).getImm();
541 Amount = alignTo(Amount, getStackAlign());
542 if (!IsDestroy)
543 Amount = -Amount;
544
545 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
546 // doesn't have to pop anything), then the first operand will be zero too so
547 // this adjustment is a no-op.
548 if (CalleePopAmount == 0) {
549 // FIXME: in-function stack adjustment for calls is limited to 24-bits
550 // because there's no guaranteed temporary register available.
551 //
552 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
553 // 1) For offset <= 12-bit, we use LSL #0
554 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
555 // LSL #0, and the other uses LSL #12.
556 //
557 // Most call frames will be allocated at the start of a function so
558 // this is OK, but it is a limitation that needs dealing with.
559 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
560
561 if (TLI->hasInlineStackProbe(MF) &&
563 // When stack probing is enabled, the decrement of SP may need to be
564 // probed. We only need to do this if the call site needs 1024 bytes of
565 // space or more, because a region smaller than that is allowed to be
566 // unprobed at an ABI boundary. We rely on the fact that SP has been
567 // probed exactly at this point, either by the prologue or most recent
568 // dynamic allocation.
570 "non-reserved call frame without var sized objects?");
571 Register ScratchReg =
572 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
573 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
574 } else {
575 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
576 StackOffset::getFixed(Amount), TII);
577 }
578 }
579 } else if (CalleePopAmount != 0) {
580 // If the calling convention demands that the callee pops arguments from the
581 // stack, we want to add it back if we have a reserved call frame.
582 assert(CalleePopAmount < 0xffffff && "call frame too large");
583 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
584 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
585 }
586 return MBB.erase(I);
587}
588
589void AArch64FrameLowering::emitCalleeSavedGPRLocations(
592 MachineFrameInfo &MFI = MF.getFrameInfo();
594 SMEAttrs Attrs(MF.getFunction());
595 bool LocallyStreaming =
596 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
597
598 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
599 if (CSI.empty())
600 return;
601
602 const TargetSubtargetInfo &STI = MF.getSubtarget();
603 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
604 const TargetInstrInfo &TII = *STI.getInstrInfo();
606
607 for (const auto &Info : CSI) {
608 unsigned FrameIdx = Info.getFrameIdx();
609 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
610 continue;
611
612 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
613 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
614 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
615
616 // The location of VG will be emitted before each streaming-mode change in
617 // the function. Only locally-streaming functions require emitting the
618 // non-streaming VG location here.
619 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
620 (!LocallyStreaming &&
621 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
622 continue;
623
624 unsigned CFIIndex = MF.addFrameInst(
625 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
626 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
627 .addCFIIndex(CFIIndex)
629 }
630}
631
632void AArch64FrameLowering::emitCalleeSavedSVELocations(
635 MachineFrameInfo &MFI = MF.getFrameInfo();
636
637 // Add callee saved registers to move list.
638 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
639 if (CSI.empty())
640 return;
641
642 const TargetSubtargetInfo &STI = MF.getSubtarget();
643 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
644 const TargetInstrInfo &TII = *STI.getInstrInfo();
647
648 for (const auto &Info : CSI) {
649 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
650 continue;
651
652 // Not all unwinders may know about SVE registers, so assume the lowest
653 // common demoninator.
654 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
655 unsigned Reg = Info.getReg();
656 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
657 continue;
658
660 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
662
663 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
664 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
665 .addCFIIndex(CFIIndex)
667 }
668}
669
673 unsigned DwarfReg) {
674 unsigned CFIIndex =
675 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
676 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
677}
678
680 MachineBasicBlock &MBB) const {
681
683 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
684 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
685 const auto &TRI =
686 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
687 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
688
689 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
690 DebugLoc DL;
691
692 // Reset the CFA to `SP + 0`.
694 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
695 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
696 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
697
698 // Flip the RA sign state.
699 if (MFI.shouldSignReturnAddress(MF)) {
701 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
702 }
703
704 // Shadow call stack uses X18, reset it.
705 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
706 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
707 TRI.getDwarfRegNum(AArch64::X18, true));
708
709 // Emit .cfi_same_value for callee-saved registers.
710 const std::vector<CalleeSavedInfo> &CSI =
712 for (const auto &Info : CSI) {
713 unsigned Reg = Info.getReg();
714 if (!TRI.regNeedsCFI(Reg, Reg))
715 continue;
716 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
717 TRI.getDwarfRegNum(Reg, true));
718 }
719}
720
723 bool SVE) {
725 MachineFrameInfo &MFI = MF.getFrameInfo();
726
727 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
728 if (CSI.empty())
729 return;
730
731 const TargetSubtargetInfo &STI = MF.getSubtarget();
732 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
733 const TargetInstrInfo &TII = *STI.getInstrInfo();
735
736 for (const auto &Info : CSI) {
737 if (SVE !=
738 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
739 continue;
740
741 unsigned Reg = Info.getReg();
742 if (SVE &&
743 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
744 continue;
745
746 if (!Info.isRestored())
747 continue;
748
749 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
750 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
751 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
752 .addCFIIndex(CFIIndex)
754 }
755}
756
757void AArch64FrameLowering::emitCalleeSavedGPRRestores(
760}
761
762void AArch64FrameLowering::emitCalleeSavedSVERestores(
765}
766
767// Return the maximum possible number of bytes for `Size` due to the
768// architectural limit on the size of a SVE register.
769static int64_t upperBound(StackOffset Size) {
770 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
771 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
772}
773
774void AArch64FrameLowering::allocateStackSpace(
776 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
777 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
778 bool FollowupAllocs) const {
779
780 if (!AllocSize)
781 return;
782
783 DebugLoc DL;
785 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
786 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
788 const MachineFrameInfo &MFI = MF.getFrameInfo();
789
790 const int64_t MaxAlign = MFI.getMaxAlign().value();
791 const uint64_t AndMask = ~(MaxAlign - 1);
792
793 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
794 Register TargetReg = RealignmentPadding
796 : AArch64::SP;
797 // SUB Xd/SP, SP, AllocSize
798 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
799 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
800 EmitCFI, InitialOffset);
801
802 if (RealignmentPadding) {
803 // AND SP, X9, 0b11111...0000
804 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
805 .addReg(TargetReg, RegState::Kill)
808 AFI.setStackRealigned(true);
809
810 // No need for SEH instructions here; if we're realigning the stack,
811 // we've set a frame pointer and already finished the SEH prologue.
812 assert(!NeedsWinCFI);
813 }
814 return;
815 }
816
817 //
818 // Stack probing allocation.
819 //
820
821 // Fixed length allocation. If we don't need to re-align the stack and don't
822 // have SVE objects, we can use a more efficient sequence for stack probing.
823 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
825 assert(ScratchReg != AArch64::NoRegister);
826 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
827 .addDef(ScratchReg)
828 .addImm(AllocSize.getFixed())
829 .addImm(InitialOffset.getFixed())
830 .addImm(InitialOffset.getScalable());
831 // The fixed allocation may leave unprobed bytes at the top of the
832 // stack. If we have subsequent alocation (e.g. if we have variable-sized
833 // objects), we need to issue an extra probe, so these allocations start in
834 // a known state.
835 if (FollowupAllocs) {
836 // STR XZR, [SP]
837 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
838 .addReg(AArch64::XZR)
839 .addReg(AArch64::SP)
840 .addImm(0)
842 }
843
844 return;
845 }
846
847 // Variable length allocation.
848
849 // If the (unknown) allocation size cannot exceed the probe size, decrement
850 // the stack pointer right away.
851 int64_t ProbeSize = AFI.getStackProbeSize();
852 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
853 Register ScratchReg = RealignmentPadding
855 : AArch64::SP;
856 assert(ScratchReg != AArch64::NoRegister);
857 // SUB Xd, SP, AllocSize
858 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
859 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
860 EmitCFI, InitialOffset);
861 if (RealignmentPadding) {
862 // AND SP, Xn, 0b11111...0000
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
864 .addReg(ScratchReg, RegState::Kill)
867 AFI.setStackRealigned(true);
868 }
869 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
871 // STR XZR, [SP]
872 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
873 .addReg(AArch64::XZR)
874 .addReg(AArch64::SP)
875 .addImm(0)
877 }
878 return;
879 }
880
881 // Emit a variable-length allocation probing loop.
882 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
883 // each of them guaranteed to adjust the stack by less than the probe size.
885 assert(TargetReg != AArch64::NoRegister);
886 // SUB Xd, SP, AllocSize
887 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
888 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
889 EmitCFI, InitialOffset);
890 if (RealignmentPadding) {
891 // AND Xn, Xn, 0b11111...0000
892 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
893 .addReg(TargetReg, RegState::Kill)
896 }
897
898 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
899 .addReg(TargetReg);
900 if (EmitCFI) {
901 // Set the CFA register back to SP.
902 unsigned Reg =
903 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
904 unsigned CFIIndex =
906 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
907 .addCFIIndex(CFIIndex)
909 }
910 if (RealignmentPadding)
911 AFI.setStackRealigned(true);
912}
913
914static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
915 switch (Reg.id()) {
916 default:
917 // The called routine is expected to preserve r19-r28
918 // r29 and r30 are used as frame pointer and link register resp.
919 return 0;
920
921 // GPRs
922#define CASE(n) \
923 case AArch64::W##n: \
924 case AArch64::X##n: \
925 return AArch64::X##n
926 CASE(0);
927 CASE(1);
928 CASE(2);
929 CASE(3);
930 CASE(4);
931 CASE(5);
932 CASE(6);
933 CASE(7);
934 CASE(8);
935 CASE(9);
936 CASE(10);
937 CASE(11);
938 CASE(12);
939 CASE(13);
940 CASE(14);
941 CASE(15);
942 CASE(16);
943 CASE(17);
944 CASE(18);
945#undef CASE
946
947 // FPRs
948#define CASE(n) \
949 case AArch64::B##n: \
950 case AArch64::H##n: \
951 case AArch64::S##n: \
952 case AArch64::D##n: \
953 case AArch64::Q##n: \
954 return HasSVE ? AArch64::Z##n : AArch64::Q##n
955 CASE(0);
956 CASE(1);
957 CASE(2);
958 CASE(3);
959 CASE(4);
960 CASE(5);
961 CASE(6);
962 CASE(7);
963 CASE(8);
964 CASE(9);
965 CASE(10);
966 CASE(11);
967 CASE(12);
968 CASE(13);
969 CASE(14);
970 CASE(15);
971 CASE(16);
972 CASE(17);
973 CASE(18);
974 CASE(19);
975 CASE(20);
976 CASE(21);
977 CASE(22);
978 CASE(23);
979 CASE(24);
980 CASE(25);
981 CASE(26);
982 CASE(27);
983 CASE(28);
984 CASE(29);
985 CASE(30);
986 CASE(31);
987#undef CASE
988 }
989}
990
991void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
992 MachineBasicBlock &MBB) const {
993 // Insertion point.
995
996 // Fake a debug loc.
997 DebugLoc DL;
998 if (MBBI != MBB.end())
999 DL = MBBI->getDebugLoc();
1000
1001 const MachineFunction &MF = *MBB.getParent();
1003 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
1004
1005 BitVector GPRsToZero(TRI.getNumRegs());
1006 BitVector FPRsToZero(TRI.getNumRegs());
1007 bool HasSVE = STI.hasSVE();
1008 for (MCRegister Reg : RegsToZero.set_bits()) {
1009 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1010 // For GPRs, we only care to clear out the 64-bit register.
1011 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1012 GPRsToZero.set(XReg);
1013 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1014 // For FPRs,
1015 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1016 FPRsToZero.set(XReg);
1017 }
1018 }
1019
1020 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1021
1022 // Zero out GPRs.
1023 for (MCRegister Reg : GPRsToZero.set_bits())
1024 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1025
1026 // Zero out FP/vector registers.
1027 for (MCRegister Reg : FPRsToZero.set_bits())
1028 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1029
1030 if (HasSVE) {
1031 for (MCRegister PReg :
1032 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1033 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1034 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1035 AArch64::P15}) {
1036 if (RegsToZero[PReg])
1037 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1038 }
1039 }
1040}
1041
1043 const MachineBasicBlock &MBB) {
1044 const MachineFunction *MF = MBB.getParent();
1045 LiveRegs.addLiveIns(MBB);
1046 // Mark callee saved registers as used so we will not choose them.
1047 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1048 for (unsigned i = 0; CSRegs[i]; ++i)
1049 LiveRegs.addReg(CSRegs[i]);
1050}
1051
1052// Find a scratch register that we can use at the start of the prologue to
1053// re-align the stack pointer. We avoid using callee-save registers since they
1054// may appear to be free when this is called from canUseAsPrologue (during
1055// shrink wrapping), but then no longer be free when this is called from
1056// emitPrologue.
1057//
1058// FIXME: This is a bit conservative, since in the above case we could use one
1059// of the callee-save registers as a scratch temp to re-align the stack pointer,
1060// but we would then have to make sure that we were in fact saving at least one
1061// callee-save register in the prologue, which is additional complexity that
1062// doesn't seem worth the benefit.
1064 MachineFunction *MF = MBB->getParent();
1065
1066 // If MBB is an entry block, use X9 as the scratch register
1067 // preserve_none functions may be using X9 to pass arguments,
1068 // so prefer to pick an available register below.
1069 if (&MF->front() == MBB &&
1071 return AArch64::X9;
1072
1073 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1074 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1075 LivePhysRegs LiveRegs(TRI);
1076 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1077
1078 // Prefer X9 since it was historically used for the prologue scratch reg.
1079 const MachineRegisterInfo &MRI = MF->getRegInfo();
1080 if (LiveRegs.available(MRI, AArch64::X9))
1081 return AArch64::X9;
1082
1083 for (unsigned Reg : AArch64::GPR64RegClass) {
1084 if (LiveRegs.available(MRI, Reg))
1085 return Reg;
1086 }
1087 return AArch64::NoRegister;
1088}
1089
1091 const MachineBasicBlock &MBB) const {
1092 const MachineFunction *MF = MBB.getParent();
1093 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1094 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1095 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1096 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1098
1099 if (AFI->hasSwiftAsyncContext()) {
1100 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1101 const MachineRegisterInfo &MRI = MF->getRegInfo();
1102 LivePhysRegs LiveRegs(TRI);
1103 getLiveRegsForEntryMBB(LiveRegs, MBB);
1104 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1105 // available.
1106 if (!LiveRegs.available(MRI, AArch64::X16) ||
1107 !LiveRegs.available(MRI, AArch64::X17))
1108 return false;
1109 }
1110
1111 // Certain stack probing sequences might clobber flags, then we can't use
1112 // the block as a prologue if the flags register is a live-in.
1114 MBB.isLiveIn(AArch64::NZCV))
1115 return false;
1116
1117 // Don't need a scratch register if we're not going to re-align the stack or
1118 // emit stack probes.
1119 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1120 return true;
1121 // Otherwise, we can use any block as long as it has a scratch register
1122 // available.
1123 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1124}
1125
1127 uint64_t StackSizeInBytes) {
1128 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1130 // TODO: When implementing stack protectors, take that into account
1131 // for the probe threshold.
1132 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1133 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1134}
1135
1136static bool needsWinCFI(const MachineFunction &MF) {
1137 const Function &F = MF.getFunction();
1138 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1139 F.needsUnwindTableEntry();
1140}
1141
1142bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1143 MachineFunction &MF, uint64_t StackBumpBytes) const {
1145 const MachineFrameInfo &MFI = MF.getFrameInfo();
1146 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1147 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1148 if (homogeneousPrologEpilog(MF))
1149 return false;
1150
1151 if (AFI->getLocalStackSize() == 0)
1152 return false;
1153
1154 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1155 // (to force a stp with predecrement) to match the packed unwind format,
1156 // provided that there actually are any callee saved registers to merge the
1157 // decrement with.
1158 // This is potentially marginally slower, but allows using the packed
1159 // unwind format for functions that both have a local area and callee saved
1160 // registers. Using the packed unwind format notably reduces the size of
1161 // the unwind info.
1162 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1163 MF.getFunction().hasOptSize())
1164 return false;
1165
1166 // 512 is the maximum immediate for stp/ldp that will be used for
1167 // callee-save save/restores
1168 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1169 return false;
1170
1171 if (MFI.hasVarSizedObjects())
1172 return false;
1173
1174 if (RegInfo->hasStackRealignment(MF))
1175 return false;
1176
1177 // This isn't strictly necessary, but it simplifies things a bit since the
1178 // current RedZone handling code assumes the SP is adjusted by the
1179 // callee-save save/restore code.
1180 if (canUseRedZone(MF))
1181 return false;
1182
1183 // When there is an SVE area on the stack, always allocate the
1184 // callee-saves and spills/locals separately.
1185 if (getSVEStackSize(MF))
1186 return false;
1187
1188 return true;
1189}
1190
1191bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1192 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
1193 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1194 return false;
1195
1196 if (MBB.empty())
1197 return true;
1198
1199 // Disable combined SP bump if the last instruction is an MTE tag store. It
1200 // is almost always better to merge SP adjustment into those instructions.
1203 while (LastI != Begin) {
1204 --LastI;
1205 if (LastI->isTransient())
1206 continue;
1207 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1208 break;
1209 }
1210 switch (LastI->getOpcode()) {
1211 case AArch64::STGloop:
1212 case AArch64::STZGloop:
1213 case AArch64::STGi:
1214 case AArch64::STZGi:
1215 case AArch64::ST2Gi:
1216 case AArch64::STZ2Gi:
1217 return false;
1218 default:
1219 return true;
1220 }
1221 llvm_unreachable("unreachable");
1222}
1223
1224// Given a load or a store instruction, generate an appropriate unwinding SEH
1225// code on Windows.
1227 const TargetInstrInfo &TII,
1228 MachineInstr::MIFlag Flag) {
1229 unsigned Opc = MBBI->getOpcode();
1231 MachineFunction &MF = *MBB->getParent();
1232 DebugLoc DL = MBBI->getDebugLoc();
1233 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1234 int Imm = MBBI->getOperand(ImmIdx).getImm();
1236 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1237 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1238
1239 switch (Opc) {
1240 default:
1241 llvm_unreachable("No SEH Opcode for this instruction");
1242 case AArch64::LDPDpost:
1243 Imm = -Imm;
1244 [[fallthrough]];
1245 case AArch64::STPDpre: {
1246 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1247 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1248 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1249 .addImm(Reg0)
1250 .addImm(Reg1)
1251 .addImm(Imm * 8)
1252 .setMIFlag(Flag);
1253 break;
1254 }
1255 case AArch64::LDPXpost:
1256 Imm = -Imm;
1257 [[fallthrough]];
1258 case AArch64::STPXpre: {
1259 Register Reg0 = MBBI->getOperand(1).getReg();
1260 Register Reg1 = MBBI->getOperand(2).getReg();
1261 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1262 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1263 .addImm(Imm * 8)
1264 .setMIFlag(Flag);
1265 else
1266 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1267 .addImm(RegInfo->getSEHRegNum(Reg0))
1268 .addImm(RegInfo->getSEHRegNum(Reg1))
1269 .addImm(Imm * 8)
1270 .setMIFlag(Flag);
1271 break;
1272 }
1273 case AArch64::LDRDpost:
1274 Imm = -Imm;
1275 [[fallthrough]];
1276 case AArch64::STRDpre: {
1277 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1278 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1279 .addImm(Reg)
1280 .addImm(Imm)
1281 .setMIFlag(Flag);
1282 break;
1283 }
1284 case AArch64::LDRXpost:
1285 Imm = -Imm;
1286 [[fallthrough]];
1287 case AArch64::STRXpre: {
1288 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1289 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1290 .addImm(Reg)
1291 .addImm(Imm)
1292 .setMIFlag(Flag);
1293 break;
1294 }
1295 case AArch64::STPDi:
1296 case AArch64::LDPDi: {
1297 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1298 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1299 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1300 .addImm(Reg0)
1301 .addImm(Reg1)
1302 .addImm(Imm * 8)
1303 .setMIFlag(Flag);
1304 break;
1305 }
1306 case AArch64::STPXi:
1307 case AArch64::LDPXi: {
1308 Register Reg0 = MBBI->getOperand(0).getReg();
1309 Register Reg1 = MBBI->getOperand(1).getReg();
1310 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1311 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1312 .addImm(Imm * 8)
1313 .setMIFlag(Flag);
1314 else
1315 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1316 .addImm(RegInfo->getSEHRegNum(Reg0))
1317 .addImm(RegInfo->getSEHRegNum(Reg1))
1318 .addImm(Imm * 8)
1319 .setMIFlag(Flag);
1320 break;
1321 }
1322 case AArch64::STRXui:
1323 case AArch64::LDRXui: {
1324 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1325 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1326 .addImm(Reg)
1327 .addImm(Imm * 8)
1328 .setMIFlag(Flag);
1329 break;
1330 }
1331 case AArch64::STRDui:
1332 case AArch64::LDRDui: {
1333 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1334 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1335 .addImm(Reg)
1336 .addImm(Imm * 8)
1337 .setMIFlag(Flag);
1338 break;
1339 }
1340 case AArch64::STPQi:
1341 case AArch64::LDPQi: {
1342 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1343 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1344 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1345 .addImm(Reg0)
1346 .addImm(Reg1)
1347 .addImm(Imm * 16)
1348 .setMIFlag(Flag);
1349 break;
1350 }
1351 case AArch64::LDPQpost:
1352 Imm = -Imm;
1353 [[fallthrough]];
1354 case AArch64::STPQpre: {
1355 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1356 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1357 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1358 .addImm(Reg0)
1359 .addImm(Reg1)
1360 .addImm(Imm * 16)
1361 .setMIFlag(Flag);
1362 break;
1363 }
1364 }
1365 auto I = MBB->insertAfter(MBBI, MIB);
1366 return I;
1367}
1368
1369// Fix up the SEH opcode associated with the save/restore instruction.
1371 unsigned LocalStackSize) {
1372 MachineOperand *ImmOpnd = nullptr;
1373 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1374 switch (MBBI->getOpcode()) {
1375 default:
1376 llvm_unreachable("Fix the offset in the SEH instruction");
1377 case AArch64::SEH_SaveFPLR:
1378 case AArch64::SEH_SaveRegP:
1379 case AArch64::SEH_SaveReg:
1380 case AArch64::SEH_SaveFRegP:
1381 case AArch64::SEH_SaveFReg:
1382 case AArch64::SEH_SaveAnyRegQP:
1383 case AArch64::SEH_SaveAnyRegQPX:
1384 ImmOpnd = &MBBI->getOperand(ImmIdx);
1385 break;
1386 }
1387 if (ImmOpnd)
1388 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1389}
1390
1393 return AFI->hasStreamingModeChanges() &&
1394 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1395}
1396
1399 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1400 // is enabled with streaming mode changes.
1401 if (!AFI->hasStreamingModeChanges())
1402 return false;
1403 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1404 if (ST.isTargetDarwin())
1405 return ST.hasSVE();
1406 return true;
1407}
1408
1410 unsigned Opc = MBBI->getOpcode();
1411 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1412 Opc == AArch64::UBFMXri)
1413 return true;
1414
1415 if (requiresGetVGCall(*MBBI->getMF())) {
1416 if (Opc == AArch64::ORRXrr)
1417 return true;
1418
1419 if (Opc == AArch64::BL) {
1420 auto Op1 = MBBI->getOperand(0);
1421 return Op1.isSymbol() &&
1422 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1423 }
1424 }
1425
1426 return false;
1427}
1428
1429// Convert callee-save register save/restore instruction to do stack pointer
1430// decrement/increment to allocate/deallocate the callee-save stack area by
1431// converting store/load to use pre/post increment version.
1434 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1435 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1437 int CFAOffset = 0) {
1438 unsigned NewOpc;
1439
1440 // If the function contains streaming mode changes, we expect instructions
1441 // to calculate the value of VG before spilling. For locally-streaming
1442 // functions, we need to do this for both the streaming and non-streaming
1443 // vector length. Move past these instructions if necessary.
1444 MachineFunction &MF = *MBB.getParent();
1445 if (requiresSaveVG(MF))
1446 while (isVGInstruction(MBBI))
1447 ++MBBI;
1448
1449 switch (MBBI->getOpcode()) {
1450 default:
1451 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1452 case AArch64::STPXi:
1453 NewOpc = AArch64::STPXpre;
1454 break;
1455 case AArch64::STPDi:
1456 NewOpc = AArch64::STPDpre;
1457 break;
1458 case AArch64::STPQi:
1459 NewOpc = AArch64::STPQpre;
1460 break;
1461 case AArch64::STRXui:
1462 NewOpc = AArch64::STRXpre;
1463 break;
1464 case AArch64::STRDui:
1465 NewOpc = AArch64::STRDpre;
1466 break;
1467 case AArch64::STRQui:
1468 NewOpc = AArch64::STRQpre;
1469 break;
1470 case AArch64::LDPXi:
1471 NewOpc = AArch64::LDPXpost;
1472 break;
1473 case AArch64::LDPDi:
1474 NewOpc = AArch64::LDPDpost;
1475 break;
1476 case AArch64::LDPQi:
1477 NewOpc = AArch64::LDPQpost;
1478 break;
1479 case AArch64::LDRXui:
1480 NewOpc = AArch64::LDRXpost;
1481 break;
1482 case AArch64::LDRDui:
1483 NewOpc = AArch64::LDRDpost;
1484 break;
1485 case AArch64::LDRQui:
1486 NewOpc = AArch64::LDRQpost;
1487 break;
1488 }
1489 // Get rid of the SEH code associated with the old instruction.
1490 if (NeedsWinCFI) {
1491 auto SEH = std::next(MBBI);
1493 SEH->eraseFromParent();
1494 }
1495
1496 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1497 int64_t MinOffset, MaxOffset;
1498 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1499 NewOpc, Scale, Width, MinOffset, MaxOffset);
1500 (void)Success;
1501 assert(Success && "unknown load/store opcode");
1502
1503 // If the first store isn't right where we want SP then we can't fold the
1504 // update in so create a normal arithmetic instruction instead.
1505 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1506 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1507 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1508 // If we are destroying the frame, make sure we add the increment after the
1509 // last frame operation.
1510 if (FrameFlag == MachineInstr::FrameDestroy)
1511 ++MBBI;
1512 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1513 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1514 false, false, nullptr, EmitCFI,
1515 StackOffset::getFixed(CFAOffset));
1516
1517 return std::prev(MBBI);
1518 }
1519
1520 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1521 MIB.addReg(AArch64::SP, RegState::Define);
1522
1523 // Copy all operands other than the immediate offset.
1524 unsigned OpndIdx = 0;
1525 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1526 ++OpndIdx)
1527 MIB.add(MBBI->getOperand(OpndIdx));
1528
1529 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1530 "Unexpected immediate offset in first/last callee-save save/restore "
1531 "instruction!");
1532 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1533 "Unexpected base register in callee-save save/restore instruction!");
1534 assert(CSStackSizeInc % Scale == 0);
1535 MIB.addImm(CSStackSizeInc / (int)Scale);
1536
1537 MIB.setMIFlags(MBBI->getFlags());
1538 MIB.setMemRefs(MBBI->memoperands());
1539
1540 // Generate a new SEH code that corresponds to the new instruction.
1541 if (NeedsWinCFI) {
1542 *HasWinCFI = true;
1543 InsertSEH(*MIB, *TII, FrameFlag);
1544 }
1545
1546 if (EmitCFI) {
1547 unsigned CFIIndex = MF.addFrameInst(
1548 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1549 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1550 .addCFIIndex(CFIIndex)
1551 .setMIFlags(FrameFlag);
1552 }
1553
1554 return std::prev(MBB.erase(MBBI));
1555}
1556
1557// Fixup callee-save register save/restore instructions to take into account
1558// combined SP bump by adding the local stack size to the stack offsets.
1560 uint64_t LocalStackSize,
1561 bool NeedsWinCFI,
1562 bool *HasWinCFI) {
1564 return;
1565
1566 unsigned Opc = MI.getOpcode();
1567 unsigned Scale;
1568 switch (Opc) {
1569 case AArch64::STPXi:
1570 case AArch64::STRXui:
1571 case AArch64::STPDi:
1572 case AArch64::STRDui:
1573 case AArch64::LDPXi:
1574 case AArch64::LDRXui:
1575 case AArch64::LDPDi:
1576 case AArch64::LDRDui:
1577 Scale = 8;
1578 break;
1579 case AArch64::STPQi:
1580 case AArch64::STRQui:
1581 case AArch64::LDPQi:
1582 case AArch64::LDRQui:
1583 Scale = 16;
1584 break;
1585 default:
1586 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1587 }
1588
1589 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1590 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1591 "Unexpected base register in callee-save save/restore instruction!");
1592 // Last operand is immediate offset that needs fixing.
1593 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1594 // All generated opcodes have scaled offsets.
1595 assert(LocalStackSize % Scale == 0);
1596 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1597
1598 if (NeedsWinCFI) {
1599 *HasWinCFI = true;
1600 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1601 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1603 "Expecting a SEH instruction");
1604 fixupSEHOpcode(MBBI, LocalStackSize);
1605 }
1606}
1607
1608static bool isTargetWindows(const MachineFunction &MF) {
1610}
1611
1612// Convenience function to determine whether I is an SVE callee save.
1614 switch (I->getOpcode()) {
1615 default:
1616 return false;
1617 case AArch64::PTRUE_C_B:
1618 case AArch64::LD1B_2Z_IMM:
1619 case AArch64::ST1B_2Z_IMM:
1620 case AArch64::STR_ZXI:
1621 case AArch64::STR_PXI:
1622 case AArch64::LDR_ZXI:
1623 case AArch64::LDR_PXI:
1624 return I->getFlag(MachineInstr::FrameSetup) ||
1625 I->getFlag(MachineInstr::FrameDestroy);
1626 }
1627}
1628
1630 MachineFunction &MF,
1633 const DebugLoc &DL, bool NeedsWinCFI,
1634 bool NeedsUnwindInfo) {
1635 // Shadow call stack prolog: str x30, [x18], #8
1636 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1637 .addReg(AArch64::X18, RegState::Define)
1638 .addReg(AArch64::LR)
1639 .addReg(AArch64::X18)
1640 .addImm(8)
1642
1643 // This instruction also makes x18 live-in to the entry block.
1644 MBB.addLiveIn(AArch64::X18);
1645
1646 if (NeedsWinCFI)
1647 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1649
1650 if (NeedsUnwindInfo) {
1651 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1652 // x18 when unwinding past this frame.
1653 static const char CFIInst[] = {
1654 dwarf::DW_CFA_val_expression,
1655 18, // register
1656 2, // length
1657 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1658 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1659 };
1660 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1661 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1662 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1663 .addCFIIndex(CFIIndex)
1665 }
1666}
1667
1669 MachineFunction &MF,
1672 const DebugLoc &DL) {
1673 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1674 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1675 .addReg(AArch64::X18, RegState::Define)
1676 .addReg(AArch64::LR, RegState::Define)
1677 .addReg(AArch64::X18)
1678 .addImm(-8)
1680
1682 unsigned CFIIndex =
1684 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1685 .addCFIIndex(CFIIndex)
1687 }
1688}
1689
1690// Define the current CFA rule to use the provided FP.
1693 const DebugLoc &DL, unsigned FixedObject) {
1696 const TargetInstrInfo *TII = STI.getInstrInfo();
1698
1699 const int OffsetToFirstCalleeSaveFromFP =
1702 Register FramePtr = TRI->getFrameRegister(MF);
1703 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1704 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1705 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1706 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1707 .addCFIIndex(CFIIndex)
1709}
1710
1711#ifndef NDEBUG
1712/// Collect live registers from the end of \p MI's parent up to (including) \p
1713/// MI in \p LiveRegs.
1715 LivePhysRegs &LiveRegs) {
1716
1717 MachineBasicBlock &MBB = *MI.getParent();
1718 LiveRegs.addLiveOuts(MBB);
1719 for (const MachineInstr &MI :
1720 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1721 LiveRegs.stepBackward(MI);
1722}
1723#endif
1724
1726 MachineBasicBlock &MBB) const {
1728 const MachineFrameInfo &MFI = MF.getFrameInfo();
1729 const Function &F = MF.getFunction();
1730 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1731 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1732 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1733
1735 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1736 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1737 bool HasFP = hasFP(MF);
1738 bool NeedsWinCFI = needsWinCFI(MF);
1739 bool HasWinCFI = false;
1740 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1741
1743#ifndef NDEBUG
1745 // Collect live register from the end of MBB up to the start of the existing
1746 // frame setup instructions.
1747 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1748 while (NonFrameStart != End &&
1749 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1750 ++NonFrameStart;
1751
1752 LivePhysRegs LiveRegs(*TRI);
1753 if (NonFrameStart != MBB.end()) {
1754 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1755 // Ignore registers used for stack management for now.
1756 LiveRegs.removeReg(AArch64::SP);
1757 LiveRegs.removeReg(AArch64::X19);
1758 LiveRegs.removeReg(AArch64::FP);
1759 LiveRegs.removeReg(AArch64::LR);
1760
1761 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1762 // This is necessary to spill VG if required where SVE is unavailable, but
1763 // X0 is preserved around this call.
1764 if (requiresGetVGCall(MF))
1765 LiveRegs.removeReg(AArch64::X0);
1766 }
1767
1768 auto VerifyClobberOnExit = make_scope_exit([&]() {
1769 if (NonFrameStart == MBB.end())
1770 return;
1771 // Check if any of the newly instructions clobber any of the live registers.
1772 for (MachineInstr &MI :
1773 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1774 for (auto &Op : MI.operands())
1775 if (Op.isReg() && Op.isDef())
1776 assert(!LiveRegs.contains(Op.getReg()) &&
1777 "live register clobbered by inserted prologue instructions");
1778 }
1779 });
1780#endif
1781
1782 bool IsFunclet = MBB.isEHFuncletEntry();
1783
1784 // At this point, we're going to decide whether or not the function uses a
1785 // redzone. In most cases, the function doesn't have a redzone so let's
1786 // assume that's false and set it to true in the case that there's a redzone.
1787 AFI->setHasRedZone(false);
1788
1789 // Debug location must be unknown since the first debug location is used
1790 // to determine the end of the prologue.
1791 DebugLoc DL;
1792
1793 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1794 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1795 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1796 MFnI.needsDwarfUnwindInfo(MF));
1797
1798 if (MFnI.shouldSignReturnAddress(MF)) {
1799 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1801 if (NeedsWinCFI)
1802 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1803 }
1804
1805 if (EmitCFI && MFnI.isMTETagged()) {
1806 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1808 }
1809
1810 // We signal the presence of a Swift extended frame to external tools by
1811 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1812 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1813 // bits so that is still true.
1814 if (HasFP && AFI->hasSwiftAsyncContext()) {
1817 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1818 // The special symbol below is absolute and has a *value* that can be
1819 // combined with the frame pointer to signal an extended frame.
1820 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1821 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1823 if (NeedsWinCFI) {
1824 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1826 HasWinCFI = true;
1827 }
1828 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1829 .addUse(AArch64::FP)
1830 .addUse(AArch64::X16)
1831 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1832 if (NeedsWinCFI) {
1833 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1835 HasWinCFI = true;
1836 }
1837 break;
1838 }
1839 [[fallthrough]];
1840
1842 // ORR x29, x29, #0x1000_0000_0000_0000
1843 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1844 .addUse(AArch64::FP)
1845 .addImm(0x1100)
1847 if (NeedsWinCFI) {
1848 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1850 HasWinCFI = true;
1851 }
1852 break;
1853
1855 break;
1856 }
1857 }
1858
1859 // All calls are tail calls in GHC calling conv, and functions have no
1860 // prologue/epilogue.
1862 return;
1863
1864 // Set tagged base pointer to the requested stack slot.
1865 // Ideally it should match SP value after prologue.
1866 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1867 if (TBPI)
1869 else
1871
1872 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1873
1874 // getStackSize() includes all the locals in its size calculation. We don't
1875 // include these locals when computing the stack size of a funclet, as they
1876 // are allocated in the parent's stack frame and accessed via the frame
1877 // pointer from the funclet. We only save the callee saved registers in the
1878 // funclet, which are really the callee saved registers of the parent
1879 // function, including the funclet.
1880 int64_t NumBytes =
1881 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1882 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1883 assert(!HasFP && "unexpected function without stack frame but with FP");
1884 assert(!SVEStackSize &&
1885 "unexpected function without stack frame but with SVE objects");
1886 // All of the stack allocation is for locals.
1887 AFI->setLocalStackSize(NumBytes);
1888 if (!NumBytes)
1889 return;
1890 // REDZONE: If the stack size is less than 128 bytes, we don't need
1891 // to actually allocate.
1892 if (canUseRedZone(MF)) {
1893 AFI->setHasRedZone(true);
1894 ++NumRedZoneFunctions;
1895 } else {
1896 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1897 StackOffset::getFixed(-NumBytes), TII,
1898 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1899 if (EmitCFI) {
1900 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1901 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
1902 // Encode the stack size of the leaf function.
1903 unsigned CFIIndex = MF.addFrameInst(
1904 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1905 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1906 .addCFIIndex(CFIIndex)
1908 }
1909 }
1910
1911 if (NeedsWinCFI) {
1912 HasWinCFI = true;
1913 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1915 }
1916
1917 return;
1918 }
1919
1920 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1921 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1922
1923 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1924 // All of the remaining stack allocations are for locals.
1925 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1926 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1927 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1928 if (CombineSPBump) {
1929 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1930 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1931 StackOffset::getFixed(-NumBytes), TII,
1932 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1933 EmitAsyncCFI);
1934 NumBytes = 0;
1935 } else if (HomPrologEpilog) {
1936 // Stack has been already adjusted.
1937 NumBytes -= PrologueSaveSize;
1938 } else if (PrologueSaveSize != 0) {
1940 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1941 EmitAsyncCFI);
1942 NumBytes -= PrologueSaveSize;
1943 }
1944 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1945
1946 // Move past the saves of the callee-saved registers, fixing up the offsets
1947 // and pre-inc if we decided to combine the callee-save and local stack
1948 // pointer bump above.
1949 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1951 // Move past instructions generated to calculate VG
1952 if (requiresSaveVG(MF))
1953 while (isVGInstruction(MBBI))
1954 ++MBBI;
1955
1956 if (CombineSPBump)
1958 NeedsWinCFI, &HasWinCFI);
1959 ++MBBI;
1960 }
1961
1962 // For funclets the FP belongs to the containing function.
1963 if (!IsFunclet && HasFP) {
1964 // Only set up FP if we actually need to.
1965 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1966
1967 if (CombineSPBump)
1968 FPOffset += AFI->getLocalStackSize();
1969
1970 if (AFI->hasSwiftAsyncContext()) {
1971 // Before we update the live FP we have to ensure there's a valid (or
1972 // null) asynchronous context in its slot just before FP in the frame
1973 // record, so store it now.
1974 const auto &Attrs = MF.getFunction().getAttributes();
1975 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1976 if (HaveInitialContext)
1977 MBB.addLiveIn(AArch64::X22);
1978 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1979 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1980 .addUse(Reg)
1981 .addUse(AArch64::SP)
1982 .addImm(FPOffset - 8)
1984 if (NeedsWinCFI) {
1985 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1986 // to multiple instructions, should be mutually-exclusive.
1987 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
1988 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1990 HasWinCFI = true;
1991 }
1992 }
1993
1994 if (HomPrologEpilog) {
1995 auto Prolog = MBBI;
1996 --Prolog;
1997 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1998 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1999 } else {
2000 // Issue sub fp, sp, FPOffset or
2001 // mov fp,sp when FPOffset is zero.
2002 // Note: All stores of callee-saved registers are marked as "FrameSetup".
2003 // This code marks the instruction(s) that set the FP also.
2004 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
2005 StackOffset::getFixed(FPOffset), TII,
2006 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2007 if (NeedsWinCFI && HasWinCFI) {
2008 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2010 // After setting up the FP, the rest of the prolog doesn't need to be
2011 // included in the SEH unwind info.
2012 NeedsWinCFI = false;
2013 }
2014 }
2015 if (EmitAsyncCFI)
2016 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2017 }
2018
2019 // Now emit the moves for whatever callee saved regs we have (including FP,
2020 // LR if those are saved). Frame instructions for SVE register are emitted
2021 // later, after the instruction which actually save SVE regs.
2022 if (EmitAsyncCFI)
2023 emitCalleeSavedGPRLocations(MBB, MBBI);
2024
2025 // Alignment is required for the parent frame, not the funclet
2026 const bool NeedsRealignment =
2027 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2028 const int64_t RealignmentPadding =
2029 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2030 ? MFI.getMaxAlign().value() - 16
2031 : 0;
2032
2033 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2034 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2035 if (NeedsWinCFI) {
2036 HasWinCFI = true;
2037 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2038 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2039 // This is at most two instructions, MOVZ follwed by MOVK.
2040 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2041 // exceeding 256MB in size.
2042 if (NumBytes >= (1 << 28))
2043 report_fatal_error("Stack size cannot exceed 256MB for stack "
2044 "unwinding purposes");
2045
2046 uint32_t LowNumWords = NumWords & 0xFFFF;
2047 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2048 .addImm(LowNumWords)
2051 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2053 if ((NumWords & 0xFFFF0000) != 0) {
2054 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2055 .addReg(AArch64::X15)
2056 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2059 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2061 }
2062 } else {
2063 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2064 .addImm(NumWords)
2066 }
2067
2068 const char *ChkStk = Subtarget.getChkStkName();
2069 switch (MF.getTarget().getCodeModel()) {
2070 case CodeModel::Tiny:
2071 case CodeModel::Small:
2072 case CodeModel::Medium:
2073 case CodeModel::Kernel:
2074 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2075 .addExternalSymbol(ChkStk)
2076 .addReg(AArch64::X15, RegState::Implicit)
2081 if (NeedsWinCFI) {
2082 HasWinCFI = true;
2083 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2085 }
2086 break;
2087 case CodeModel::Large:
2088 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2089 .addReg(AArch64::X16, RegState::Define)
2090 .addExternalSymbol(ChkStk)
2091 .addExternalSymbol(ChkStk)
2093 if (NeedsWinCFI) {
2094 HasWinCFI = true;
2095 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2097 }
2098
2099 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2100 .addReg(AArch64::X16, RegState::Kill)
2106 if (NeedsWinCFI) {
2107 HasWinCFI = true;
2108 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2110 }
2111 break;
2112 }
2113
2114 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2115 .addReg(AArch64::SP, RegState::Kill)
2116 .addReg(AArch64::X15, RegState::Kill)
2119 if (NeedsWinCFI) {
2120 HasWinCFI = true;
2121 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2122 .addImm(NumBytes)
2124 }
2125 NumBytes = 0;
2126
2127 if (RealignmentPadding > 0) {
2128 if (RealignmentPadding >= 4096) {
2129 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2130 .addReg(AArch64::X16, RegState::Define)
2131 .addImm(RealignmentPadding)
2133 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2134 .addReg(AArch64::SP)
2135 .addReg(AArch64::X16, RegState::Kill)
2138 } else {
2139 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2140 .addReg(AArch64::SP)
2141 .addImm(RealignmentPadding)
2142 .addImm(0)
2144 }
2145
2146 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2147 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2148 .addReg(AArch64::X15, RegState::Kill)
2150 AFI->setStackRealigned(true);
2151
2152 // No need for SEH instructions here; if we're realigning the stack,
2153 // we've set a frame pointer and already finished the SEH prologue.
2154 assert(!NeedsWinCFI);
2155 }
2156 }
2157
2158 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2159 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2160
2161 // Process the SVE callee-saves to determine what space needs to be
2162 // allocated.
2163 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2164 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2165 << "\n");
2166 // Find callee save instructions in frame.
2167 CalleeSavesBegin = MBBI;
2168 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2170 ++MBBI;
2171 CalleeSavesEnd = MBBI;
2172
2173 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2174 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2175 }
2176
2177 // Allocate space for the callee saves (if any).
2178 StackOffset CFAOffset =
2179 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2180 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2181 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2182 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2183 MFI.hasVarSizedObjects() || LocalsSize);
2184 CFAOffset += SVECalleeSavesSize;
2185
2186 if (EmitAsyncCFI)
2187 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2188
2189 // Allocate space for the rest of the frame including SVE locals. Align the
2190 // stack as necessary.
2191 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2192 "Cannot use redzone with stack realignment");
2193 if (!canUseRedZone(MF)) {
2194 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2195 // the correct value here, as NumBytes also includes padding bytes,
2196 // which shouldn't be counted here.
2197 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2198 SVELocalsSize + StackOffset::getFixed(NumBytes),
2199 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2200 CFAOffset, MFI.hasVarSizedObjects());
2201 }
2202
2203 // If we need a base pointer, set it up here. It's whatever the value of the
2204 // stack pointer is at this point. Any variable size objects will be allocated
2205 // after this, so we can still use the base pointer to reference locals.
2206 //
2207 // FIXME: Clarify FrameSetup flags here.
2208 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2209 // needed.
2210 // For funclets the BP belongs to the containing function.
2211 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2212 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2213 false);
2214 if (NeedsWinCFI) {
2215 HasWinCFI = true;
2216 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2218 }
2219 }
2220
2221 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2222 // SEH opcode indicating the prologue end.
2223 if (NeedsWinCFI && HasWinCFI) {
2224 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2226 }
2227
2228 // SEH funclets are passed the frame pointer in X1. If the parent
2229 // function uses the base register, then the base register is used
2230 // directly, and is not retrieved from X1.
2231 if (IsFunclet && F.hasPersonalityFn()) {
2232 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2233 if (isAsynchronousEHPersonality(Per)) {
2234 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2235 .addReg(AArch64::X1)
2237 MBB.addLiveIn(AArch64::X1);
2238 }
2239 }
2240
2241 if (EmitCFI && !EmitAsyncCFI) {
2242 if (HasFP) {
2243 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2244 } else {
2245 StackOffset TotalSize =
2246 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2247 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2248 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2249 /*LastAdjustmentWasScalable=*/false));
2250 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2251 .addCFIIndex(CFIIndex)
2253 }
2254 emitCalleeSavedGPRLocations(MBB, MBBI);
2255 emitCalleeSavedSVELocations(MBB, MBBI);
2256 }
2257}
2258
2260 switch (MI.getOpcode()) {
2261 default:
2262 return false;
2263 case AArch64::CATCHRET:
2264 case AArch64::CLEANUPRET:
2265 return true;
2266 }
2267}
2268
2270 MachineBasicBlock &MBB) const {
2272 MachineFrameInfo &MFI = MF.getFrameInfo();
2274 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2275 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2276 DebugLoc DL;
2277 bool NeedsWinCFI = needsWinCFI(MF);
2278 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2279 bool HasWinCFI = false;
2280 bool IsFunclet = false;
2281
2282 if (MBB.end() != MBBI) {
2283 DL = MBBI->getDebugLoc();
2284 IsFunclet = isFuncletReturnInstr(*MBBI);
2285 }
2286
2287 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2288
2289 auto FinishingTouches = make_scope_exit([&]() {
2290 if (AFI->shouldSignReturnAddress(MF)) {
2291 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2292 TII->get(AArch64::PAUTH_EPILOGUE))
2293 .setMIFlag(MachineInstr::FrameDestroy);
2294 if (NeedsWinCFI)
2295 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2296 }
2299 if (EmitCFI)
2300 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2301 if (HasWinCFI) {
2303 TII->get(AArch64::SEH_EpilogEnd))
2305 if (!MF.hasWinCFI())
2306 MF.setHasWinCFI(true);
2307 }
2308 if (NeedsWinCFI) {
2309 assert(EpilogStartI != MBB.end());
2310 if (!HasWinCFI)
2311 MBB.erase(EpilogStartI);
2312 }
2313 });
2314
2315 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2316 : MFI.getStackSize();
2317
2318 // All calls are tail calls in GHC calling conv, and functions have no
2319 // prologue/epilogue.
2321 return;
2322
2323 // How much of the stack used by incoming arguments this function is expected
2324 // to restore in this particular epilogue.
2325 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2326 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2327 MF.getFunction().isVarArg());
2328 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2329
2330 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2331 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2332 // We cannot rely on the local stack size set in emitPrologue if the function
2333 // has funclets, as funclets have different local stack size requirements, and
2334 // the current value set in emitPrologue may be that of the containing
2335 // function.
2336 if (MF.hasEHFunclets())
2337 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2338 if (homogeneousPrologEpilog(MF, &MBB)) {
2339 assert(!NeedsWinCFI);
2340 auto LastPopI = MBB.getFirstTerminator();
2341 if (LastPopI != MBB.begin()) {
2342 auto HomogeneousEpilog = std::prev(LastPopI);
2343 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2344 LastPopI = HomogeneousEpilog;
2345 }
2346
2347 // Adjust local stack
2348 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2350 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2351
2352 // SP has been already adjusted while restoring callee save regs.
2353 // We've bailed-out the case with adjusting SP for arguments.
2354 assert(AfterCSRPopSize == 0);
2355 return;
2356 }
2357 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2358 // Assume we can't combine the last pop with the sp restore.
2359
2360 bool CombineAfterCSRBump = false;
2361 if (!CombineSPBump && PrologueSaveSize != 0) {
2363 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2365 Pop = std::prev(Pop);
2366 // Converting the last ldp to a post-index ldp is valid only if the last
2367 // ldp's offset is 0.
2368 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2369 // If the offset is 0 and the AfterCSR pop is not actually trying to
2370 // allocate more stack for arguments (in space that an untimely interrupt
2371 // may clobber), convert it to a post-index ldp.
2372 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2374 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2375 MachineInstr::FrameDestroy, PrologueSaveSize);
2376 } else {
2377 // If not, make sure to emit an add after the last ldp.
2378 // We're doing this by transfering the size to be restored from the
2379 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2380 // pops.
2381 AfterCSRPopSize += PrologueSaveSize;
2382 CombineAfterCSRBump = true;
2383 }
2384 }
2385
2386 // Move past the restores of the callee-saved registers.
2387 // If we plan on combining the sp bump of the local stack size and the callee
2388 // save stack size, we might need to adjust the CSR save and restore offsets.
2391 while (LastPopI != Begin) {
2392 --LastPopI;
2393 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2394 IsSVECalleeSave(LastPopI)) {
2395 ++LastPopI;
2396 break;
2397 } else if (CombineSPBump)
2399 NeedsWinCFI, &HasWinCFI);
2400 }
2401
2402 if (NeedsWinCFI) {
2403 // Note that there are cases where we insert SEH opcodes in the
2404 // epilogue when we had no SEH opcodes in the prologue. For
2405 // example, when there is no stack frame but there are stack
2406 // arguments. Insert the SEH_EpilogStart and remove it later if it
2407 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2408 // functions that don't need it.
2409 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2411 EpilogStartI = LastPopI;
2412 --EpilogStartI;
2413 }
2414
2415 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2418 // Avoid the reload as it is GOT relative, and instead fall back to the
2419 // hardcoded value below. This allows a mismatch between the OS and
2420 // application without immediately terminating on the difference.
2421 [[fallthrough]];
2423 // We need to reset FP to its untagged state on return. Bit 60 is
2424 // currently used to show the presence of an extended frame.
2425
2426 // BIC x29, x29, #0x1000_0000_0000_0000
2427 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2428 AArch64::FP)
2429 .addUse(AArch64::FP)
2430 .addImm(0x10fe)
2432 if (NeedsWinCFI) {
2433 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2435 HasWinCFI = true;
2436 }
2437 break;
2438
2440 break;
2441 }
2442 }
2443
2444 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2445
2446 // If there is a single SP update, insert it before the ret and we're done.
2447 if (CombineSPBump) {
2448 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2449
2450 // When we are about to restore the CSRs, the CFA register is SP again.
2451 if (EmitCFI && hasFP(MF)) {
2452 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2453 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2454 unsigned CFIIndex =
2455 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2456 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2457 .addCFIIndex(CFIIndex)
2459 }
2460
2461 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2462 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2463 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2464 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2465 return;
2466 }
2467
2468 NumBytes -= PrologueSaveSize;
2469 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2470
2471 // Process the SVE callee-saves to determine what space needs to be
2472 // deallocated.
2473 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2474 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2475 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2476 RestoreBegin = std::prev(RestoreEnd);
2477 while (RestoreBegin != MBB.begin() &&
2478 IsSVECalleeSave(std::prev(RestoreBegin)))
2479 --RestoreBegin;
2480
2481 assert(IsSVECalleeSave(RestoreBegin) &&
2482 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2483
2484 StackOffset CalleeSavedSizeAsOffset =
2485 StackOffset::getScalable(CalleeSavedSize);
2486 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2487 DeallocateAfter = CalleeSavedSizeAsOffset;
2488 }
2489
2490 // Deallocate the SVE area.
2491 if (SVEStackSize) {
2492 // If we have stack realignment or variable sized objects on the stack,
2493 // restore the stack pointer from the frame pointer prior to SVE CSR
2494 // restoration.
2495 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2496 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2497 // Set SP to start of SVE callee-save area from which they can
2498 // be reloaded. The code below will deallocate the stack space
2499 // space by moving FP -> SP.
2500 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2501 StackOffset::getScalable(-CalleeSavedSize), TII,
2503 }
2504 } else {
2505 if (AFI->getSVECalleeSavedStackSize()) {
2506 // Deallocate the non-SVE locals first before we can deallocate (and
2507 // restore callee saves) from the SVE area.
2509 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2511 false, false, nullptr, EmitCFI && !hasFP(MF),
2512 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2513 NumBytes = 0;
2514 }
2515
2516 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2517 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2518 false, nullptr, EmitCFI && !hasFP(MF),
2519 SVEStackSize +
2520 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2521
2522 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2523 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2524 false, nullptr, EmitCFI && !hasFP(MF),
2525 DeallocateAfter +
2526 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2527 }
2528 if (EmitCFI)
2529 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2530 }
2531
2532 if (!hasFP(MF)) {
2533 bool RedZone = canUseRedZone(MF);
2534 // If this was a redzone leaf function, we don't need to restore the
2535 // stack pointer (but we may need to pop stack args for fastcc).
2536 if (RedZone && AfterCSRPopSize == 0)
2537 return;
2538
2539 // Pop the local variables off the stack. If there are no callee-saved
2540 // registers, it means we are actually positioned at the terminator and can
2541 // combine stack increment for the locals and the stack increment for
2542 // callee-popped arguments into (possibly) a single instruction and be done.
2543 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2544 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2545 if (NoCalleeSaveRestore)
2546 StackRestoreBytes += AfterCSRPopSize;
2547
2549 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2550 StackOffset::getFixed(StackRestoreBytes), TII,
2551 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2552 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2553
2554 // If we were able to combine the local stack pop with the argument pop,
2555 // then we're done.
2556 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2557 return;
2558 }
2559
2560 NumBytes = 0;
2561 }
2562
2563 // Restore the original stack pointer.
2564 // FIXME: Rather than doing the math here, we should instead just use
2565 // non-post-indexed loads for the restores if we aren't actually going to
2566 // be able to save any instructions.
2567 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2569 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2571 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2572 } else if (NumBytes)
2573 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2574 StackOffset::getFixed(NumBytes), TII,
2575 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2576
2577 // When we are about to restore the CSRs, the CFA register is SP again.
2578 if (EmitCFI && hasFP(MF)) {
2579 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2580 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2581 unsigned CFIIndex = MF.addFrameInst(
2582 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2583 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2584 .addCFIIndex(CFIIndex)
2586 }
2587
2588 // This must be placed after the callee-save restore code because that code
2589 // assumes the SP is at the same location as it was after the callee-save save
2590 // code in the prologue.
2591 if (AfterCSRPopSize) {
2592 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2593 "interrupt may have clobbered");
2594
2596 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2598 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2599 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2600 }
2601}
2602
2605 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2606}
2607
2608/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2609/// debug info. It's the same as what we use for resolving the code-gen
2610/// references for now. FIXME: This can go wrong when references are
2611/// SP-relative and simple call frames aren't used.
2614 Register &FrameReg) const {
2616 MF, FI, FrameReg,
2617 /*PreferFP=*/
2618 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2619 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2620 /*ForSimm=*/false);
2621}
2622
2625 int FI) const {
2626 // This function serves to provide a comparable offset from a single reference
2627 // point (the value of SP at function entry) that can be used for analysis,
2628 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2629 // correct for all objects in the presence of VLA-area objects or dynamic
2630 // stack re-alignment.
2631
2632 const auto &MFI = MF.getFrameInfo();
2633
2634 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2635 StackOffset SVEStackSize = getSVEStackSize(MF);
2636
2637 // For VLA-area objects, just emit an offset at the end of the stack frame.
2638 // Whilst not quite correct, these objects do live at the end of the frame and
2639 // so it is more useful for analysis for the offset to reflect this.
2640 if (MFI.isVariableSizedObjectIndex(FI)) {
2641 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2642 }
2643
2644 // This is correct in the absence of any SVE stack objects.
2645 if (!SVEStackSize)
2646 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2647
2648 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2649 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2650 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2651 ObjectOffset);
2652 }
2653
2654 bool IsFixed = MFI.isFixedObjectIndex(FI);
2655 bool IsCSR =
2656 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2657
2658 StackOffset ScalableOffset = {};
2659 if (!IsFixed && !IsCSR)
2660 ScalableOffset = -SVEStackSize;
2661
2662 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2663}
2664
2667 int FI) const {
2669}
2670
2672 int64_t ObjectOffset) {
2673 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2674 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2675 const Function &F = MF.getFunction();
2676 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2677 unsigned FixedObject =
2678 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2679 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2680 int64_t FPAdjust =
2681 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2682 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2683}
2684
2686 int64_t ObjectOffset) {
2687 const auto &MFI = MF.getFrameInfo();
2688 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2689}
2690
2691// TODO: This function currently does not work for scalable vectors.
2693 int FI) const {
2694 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2696 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2697 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2698 ? getFPOffset(MF, ObjectOffset).getFixed()
2699 : getStackOffset(MF, ObjectOffset).getFixed();
2700}
2701
2703 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2704 bool ForSimm) const {
2705 const auto &MFI = MF.getFrameInfo();
2706 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2707 bool isFixed = MFI.isFixedObjectIndex(FI);
2708 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2709 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2710 PreferFP, ForSimm);
2711}
2712
2714 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2715 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2716 const auto &MFI = MF.getFrameInfo();
2717 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2719 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2720 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2721
2722 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2723 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2724 bool isCSR =
2725 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2726
2727 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2728
2729 // Use frame pointer to reference fixed objects. Use it for locals if
2730 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2731 // reliable as a base). Make sure useFPForScavengingIndex() does the
2732 // right thing for the emergency spill slot.
2733 bool UseFP = false;
2734 if (AFI->hasStackFrame() && !isSVE) {
2735 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2736 // there are scalable (SVE) objects in between the FP and the fixed-sized
2737 // objects.
2738 PreferFP &= !SVEStackSize;
2739
2740 // Note: Keeping the following as multiple 'if' statements rather than
2741 // merging to a single expression for readability.
2742 //
2743 // Argument access should always use the FP.
2744 if (isFixed) {
2745 UseFP = hasFP(MF);
2746 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2747 // References to the CSR area must use FP if we're re-aligning the stack
2748 // since the dynamically-sized alignment padding is between the SP/BP and
2749 // the CSR area.
2750 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2751 UseFP = true;
2752 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2753 // If the FPOffset is negative and we're producing a signed immediate, we
2754 // have to keep in mind that the available offset range for negative
2755 // offsets is smaller than for positive ones. If an offset is available
2756 // via the FP and the SP, use whichever is closest.
2757 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2758 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2759
2760 if (MFI.hasVarSizedObjects()) {
2761 // If we have variable sized objects, we can use either FP or BP, as the
2762 // SP offset is unknown. We can use the base pointer if we have one and
2763 // FP is not preferred. If not, we're stuck with using FP.
2764 bool CanUseBP = RegInfo->hasBasePointer(MF);
2765 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2766 UseFP = PreferFP;
2767 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2768 UseFP = true;
2769 // else we can use BP and FP, but the offset from FP won't fit.
2770 // That will make us scavenge registers which we can probably avoid by
2771 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2772 } else if (FPOffset >= 0) {
2773 // Use SP or FP, whichever gives us the best chance of the offset
2774 // being in range for direct access. If the FPOffset is positive,
2775 // that'll always be best, as the SP will be even further away.
2776 UseFP = true;
2777 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2778 // Funclets access the locals contained in the parent's stack frame
2779 // via the frame pointer, so we have to use the FP in the parent
2780 // function.
2781 (void) Subtarget;
2782 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2783 MF.getFunction().isVarArg()) &&
2784 "Funclets should only be present on Win64");
2785 UseFP = true;
2786 } else {
2787 // We have the choice between FP and (SP or BP).
2788 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2789 UseFP = true;
2790 }
2791 }
2792 }
2793
2794 assert(
2795 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2796 "In the presence of dynamic stack pointer realignment, "
2797 "non-argument/CSR objects cannot be accessed through the frame pointer");
2798
2799 if (isSVE) {
2800 StackOffset FPOffset =
2802 StackOffset SPOffset =
2803 SVEStackSize +
2804 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2805 ObjectOffset);
2806 // Always use the FP for SVE spills if available and beneficial.
2807 if (hasFP(MF) && (SPOffset.getFixed() ||
2808 FPOffset.getScalable() < SPOffset.getScalable() ||
2809 RegInfo->hasStackRealignment(MF))) {
2810 FrameReg = RegInfo->getFrameRegister(MF);
2811 return FPOffset;
2812 }
2813
2814 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2815 : (unsigned)AArch64::SP;
2816 return SPOffset;
2817 }
2818
2819 StackOffset ScalableOffset = {};
2820 if (UseFP && !(isFixed || isCSR))
2821 ScalableOffset = -SVEStackSize;
2822 if (!UseFP && (isFixed || isCSR))
2823 ScalableOffset = SVEStackSize;
2824
2825 if (UseFP) {
2826 FrameReg = RegInfo->getFrameRegister(MF);
2827 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2828 }
2829
2830 // Use the base pointer if we have one.
2831 if (RegInfo->hasBasePointer(MF))
2832 FrameReg = RegInfo->getBaseRegister();
2833 else {
2834 assert(!MFI.hasVarSizedObjects() &&
2835 "Can't use SP when we have var sized objects.");
2836 FrameReg = AArch64::SP;
2837 // If we're using the red zone for this function, the SP won't actually
2838 // be adjusted, so the offsets will be negative. They're also all
2839 // within range of the signed 9-bit immediate instructions.
2840 if (canUseRedZone(MF))
2841 Offset -= AFI->getLocalStackSize();
2842 }
2843
2844 return StackOffset::getFixed(Offset) + ScalableOffset;
2845}
2846
2847static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2848 // Do not set a kill flag on values that are also marked as live-in. This
2849 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2850 // callee saved registers.
2851 // Omitting the kill flags is conservatively correct even if the live-in
2852 // is not used after all.
2853 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2854 return getKillRegState(!IsLiveIn);
2855}
2856
2858 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2861 return Subtarget.isTargetMachO() &&
2862 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2863 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2865 !requiresSaveVG(MF) && AFI->getSVECalleeSavedStackSize() == 0;
2866}
2867
2868static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2869 bool NeedsWinCFI, bool IsFirst,
2870 const TargetRegisterInfo *TRI) {
2871 // If we are generating register pairs for a Windows function that requires
2872 // EH support, then pair consecutive registers only. There are no unwind
2873 // opcodes for saves/restores of non-consectuve register pairs.
2874 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2875 // save_lrpair.
2876 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2877
2878 if (Reg2 == AArch64::FP)
2879 return true;
2880 if (!NeedsWinCFI)
2881 return false;
2882 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2883 return false;
2884 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2885 // opcode. If this is the first register pair, it would end up with a
2886 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2887 // if LR is paired with something else than the first register.
2888 // The save_lrpair opcode requires the first register to be an odd one.
2889 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2890 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2891 return false;
2892 return true;
2893}
2894
2895/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2896/// WindowsCFI requires that only consecutive registers can be paired.
2897/// LR and FP need to be allocated together when the frame needs to save
2898/// the frame-record. This means any other register pairing with LR is invalid.
2899static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2900 bool UsesWinAAPCS, bool NeedsWinCFI,
2901 bool NeedsFrameRecord, bool IsFirst,
2902 const TargetRegisterInfo *TRI) {
2903 if (UsesWinAAPCS)
2904 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2905 TRI);
2906
2907 // If we need to store the frame record, don't pair any register
2908 // with LR other than FP.
2909 if (NeedsFrameRecord)
2910 return Reg2 == AArch64::LR;
2911
2912 return false;
2913}
2914
2915namespace {
2916
2917struct RegPairInfo {
2918 unsigned Reg1 = AArch64::NoRegister;
2919 unsigned Reg2 = AArch64::NoRegister;
2920 int FrameIdx;
2921 int Offset;
2922 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2923
2924 RegPairInfo() = default;
2925
2926 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2927
2928 unsigned getScale() const {
2929 switch (Type) {
2930 case PPR:
2931 return 2;
2932 case GPR:
2933 case FPR64:
2934 case VG:
2935 return 8;
2936 case ZPR:
2937 case FPR128:
2938 return 16;
2939 }
2940 llvm_unreachable("Unsupported type");
2941 }
2942
2943 bool isScalable() const { return Type == PPR || Type == ZPR; }
2944};
2945
2946} // end anonymous namespace
2947
2948unsigned findFreePredicateReg(BitVector &SavedRegs) {
2949 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2950 if (SavedRegs.test(PReg)) {
2951 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2952 return PNReg;
2953 }
2954 }
2955 return AArch64::NoRegister;
2956}
2957
2961 bool NeedsFrameRecord) {
2962
2963 if (CSI.empty())
2964 return;
2965
2966 bool IsWindows = isTargetWindows(MF);
2967 bool NeedsWinCFI = needsWinCFI(MF);
2969 MachineFrameInfo &MFI = MF.getFrameInfo();
2971 unsigned Count = CSI.size();
2972 (void)CC;
2973 // MachO's compact unwind format relies on all registers being stored in
2974 // pairs.
2977 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2978 "Odd number of callee-saved regs to spill!");
2979 int ByteOffset = AFI->getCalleeSavedStackSize();
2980 int StackFillDir = -1;
2981 int RegInc = 1;
2982 unsigned FirstReg = 0;
2983 if (NeedsWinCFI) {
2984 // For WinCFI, fill the stack from the bottom up.
2985 ByteOffset = 0;
2986 StackFillDir = 1;
2987 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2988 // backwards, to pair up registers starting from lower numbered registers.
2989 RegInc = -1;
2990 FirstReg = Count - 1;
2991 }
2992 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2993 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2994 Register LastReg = 0;
2995
2996 // When iterating backwards, the loop condition relies on unsigned wraparound.
2997 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2998 RegPairInfo RPI;
2999 RPI.Reg1 = CSI[i].getReg();
3000
3001 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
3002 RPI.Type = RegPairInfo::GPR;
3003 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
3004 RPI.Type = RegPairInfo::FPR64;
3005 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
3006 RPI.Type = RegPairInfo::FPR128;
3007 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
3008 RPI.Type = RegPairInfo::ZPR;
3009 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
3010 RPI.Type = RegPairInfo::PPR;
3011 else if (RPI.Reg1 == AArch64::VG)
3012 RPI.Type = RegPairInfo::VG;
3013 else
3014 llvm_unreachable("Unsupported register class.");
3015
3016 // Add the stack hazard size as we transition from GPR->FPR CSRs.
3017 if (AFI->hasStackHazardSlotIndex() &&
3018 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3020 ByteOffset += StackFillDir * StackHazardSize;
3021 LastReg = RPI.Reg1;
3022
3023 // Add the next reg to the pair if it is in the same register class.
3024 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
3025 Register NextReg = CSI[i + RegInc].getReg();
3026 bool IsFirst = i == FirstReg;
3027 switch (RPI.Type) {
3028 case RegPairInfo::GPR:
3029 if (AArch64::GPR64RegClass.contains(NextReg) &&
3030 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
3031 NeedsWinCFI, NeedsFrameRecord, IsFirst,
3032 TRI))
3033 RPI.Reg2 = NextReg;
3034 break;
3035 case RegPairInfo::FPR64:
3036 if (AArch64::FPR64RegClass.contains(NextReg) &&
3037 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
3038 IsFirst, TRI))
3039 RPI.Reg2 = NextReg;
3040 break;
3041 case RegPairInfo::FPR128:
3042 if (AArch64::FPR128RegClass.contains(NextReg))
3043 RPI.Reg2 = NextReg;
3044 break;
3045 case RegPairInfo::PPR:
3046 break;
3047 case RegPairInfo::ZPR:
3048 if (AFI->getPredicateRegForFillSpill() != 0)
3049 if (((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1))
3050 RPI.Reg2 = NextReg;
3051 break;
3052 case RegPairInfo::VG:
3053 break;
3054 }
3055 }
3056
3057 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
3058 // list to come in sorted by frame index so that we can issue the store
3059 // pair instructions directly. Assert if we see anything otherwise.
3060 //
3061 // The order of the registers in the list is controlled by
3062 // getCalleeSavedRegs(), so they will always be in-order, as well.
3063 assert((!RPI.isPaired() ||
3064 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3065 "Out of order callee saved regs!");
3066
3067 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3068 RPI.Reg1 == AArch64::LR) &&
3069 "FrameRecord must be allocated together with LR");
3070
3071 // Windows AAPCS has FP and LR reversed.
3072 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3073 RPI.Reg2 == AArch64::LR) &&
3074 "FrameRecord must be allocated together with LR");
3075
3076 // MachO's compact unwind format relies on all registers being stored in
3077 // adjacent register pairs.
3081 (RPI.isPaired() &&
3082 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3083 RPI.Reg1 + 1 == RPI.Reg2))) &&
3084 "Callee-save registers not saved as adjacent register pair!");
3085
3086 RPI.FrameIdx = CSI[i].getFrameIdx();
3087 if (NeedsWinCFI &&
3088 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3089 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3090 int Scale = RPI.getScale();
3091
3092 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3093 assert(OffsetPre % Scale == 0);
3094
3095 if (RPI.isScalable())
3096 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3097 else
3098 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3099
3100 // Swift's async context is directly before FP, so allocate an extra
3101 // 8 bytes for it.
3102 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3103 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3104 (IsWindows && RPI.Reg2 == AArch64::LR)))
3105 ByteOffset += StackFillDir * 8;
3106
3107 // Round up size of non-pair to pair size if we need to pad the
3108 // callee-save area to ensure 16-byte alignment.
3109 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3110 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3111 ByteOffset % 16 != 0) {
3112 ByteOffset += 8 * StackFillDir;
3113 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3114 // A stack frame with a gap looks like this, bottom up:
3115 // d9, d8. x21, gap, x20, x19.
3116 // Set extra alignment on the x21 object to create the gap above it.
3117 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3118 NeedGapToAlignStack = false;
3119 }
3120
3121 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3122 assert(OffsetPost % Scale == 0);
3123 // If filling top down (default), we want the offset after incrementing it.
3124 // If filling bottom up (WinCFI) we need the original offset.
3125 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3126
3127 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3128 // Swift context can directly precede FP.
3129 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3130 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3131 (IsWindows && RPI.Reg2 == AArch64::LR)))
3132 Offset += 8;
3133 RPI.Offset = Offset / Scale;
3134
3135 assert((!RPI.isPaired() ||
3136 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3137 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3138 "Offset out of bounds for LDP/STP immediate");
3139
3140 // Save the offset to frame record so that the FP register can point to the
3141 // innermost frame record (spilled FP and LR registers).
3142 if (NeedsFrameRecord &&
3143 ((!IsWindows && RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3144 (IsWindows && RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR)))
3146
3147 RegPairs.push_back(RPI);
3148 if (RPI.isPaired())
3149 i += RegInc;
3150 }
3151 if (NeedsWinCFI) {
3152 // If we need an alignment gap in the stack, align the topmost stack
3153 // object. A stack frame with a gap looks like this, bottom up:
3154 // x19, d8. d9, gap.
3155 // Set extra alignment on the topmost stack object (the first element in
3156 // CSI, which goes top down), to create the gap above it.
3157 if (AFI->hasCalleeSaveStackFreeSpace())
3158 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3159 // We iterated bottom up over the registers; flip RegPairs back to top
3160 // down order.
3161 std::reverse(RegPairs.begin(), RegPairs.end());
3162 }
3163}
3164
3168 MachineFunction &MF = *MBB.getParent();
3171 bool NeedsWinCFI = needsWinCFI(MF);
3172 DebugLoc DL;
3174
3175 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3176
3178 // Refresh the reserved regs in case there are any potential changes since the
3179 // last freeze.
3180 MRI.freezeReservedRegs();
3181
3182 if (homogeneousPrologEpilog(MF)) {
3183 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3185
3186 for (auto &RPI : RegPairs) {
3187 MIB.addReg(RPI.Reg1);
3188 MIB.addReg(RPI.Reg2);
3189
3190 // Update register live in.
3191 if (!MRI.isReserved(RPI.Reg1))
3192 MBB.addLiveIn(RPI.Reg1);
3193 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3194 MBB.addLiveIn(RPI.Reg2);
3195 }
3196 return true;
3197 }
3198 bool PTrueCreated = false;
3199 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3200 unsigned Reg1 = RPI.Reg1;
3201 unsigned Reg2 = RPI.Reg2;
3202 unsigned StrOpc;
3203
3204 // Issue sequence of spills for cs regs. The first spill may be converted
3205 // to a pre-decrement store later by emitPrologue if the callee-save stack
3206 // area allocation can't be combined with the local stack area allocation.
3207 // For example:
3208 // stp x22, x21, [sp, #0] // addImm(+0)
3209 // stp x20, x19, [sp, #16] // addImm(+2)
3210 // stp fp, lr, [sp, #32] // addImm(+4)
3211 // Rationale: This sequence saves uop updates compared to a sequence of
3212 // pre-increment spills like stp xi,xj,[sp,#-16]!
3213 // Note: Similar rationale and sequence for restores in epilog.
3214 unsigned Size;
3215 Align Alignment;
3216 switch (RPI.Type) {
3217 case RegPairInfo::GPR:
3218 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3219 Size = 8;
3220 Alignment = Align(8);
3221 break;
3222 case RegPairInfo::FPR64:
3223 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3224 Size = 8;
3225 Alignment = Align(8);
3226 break;
3227 case RegPairInfo::FPR128:
3228 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3229 Size = 16;
3230 Alignment = Align(16);
3231 break;
3232 case RegPairInfo::ZPR:
3233 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3234 Size = 16;
3235 Alignment = Align(16);
3236 break;
3237 case RegPairInfo::PPR:
3238 StrOpc = AArch64::STR_PXI;
3239 Size = 2;
3240 Alignment = Align(2);
3241 break;
3242 case RegPairInfo::VG:
3243 StrOpc = AArch64::STRXui;
3244 Size = 8;
3245 Alignment = Align(8);
3246 break;
3247 }
3248
3249 unsigned X0Scratch = AArch64::NoRegister;
3250 if (Reg1 == AArch64::VG) {
3251 // Find an available register to store value of VG to.
3253 assert(Reg1 != AArch64::NoRegister);
3254 SMEAttrs Attrs(MF.getFunction());
3255
3256 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3257 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3258 // For locally-streaming functions, we need to store both the streaming
3259 // & non-streaming VG. Spill the streaming value first.
3260 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3261 .addImm(1)
3263 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3264 .addReg(Reg1)
3265 .addImm(3)
3266 .addImm(63)
3268
3269 AFI->setStreamingVGIdx(RPI.FrameIdx);
3270 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3271 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3272 .addImm(31)
3273 .addImm(1)
3275 AFI->setVGIdx(RPI.FrameIdx);
3276 } else {
3278 if (llvm::any_of(
3279 MBB.liveins(),
3280 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3281 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3282 AArch64::X0, LiveIn.PhysReg);
3283 }))
3284 X0Scratch = Reg1;
3285
3286 if (X0Scratch != AArch64::NoRegister)
3287 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3288 .addReg(AArch64::XZR)
3289 .addReg(AArch64::X0, RegState::Undef)
3290 .addReg(AArch64::X0, RegState::Implicit)
3292
3293 const uint32_t *RegMask = TRI->getCallPreservedMask(
3294 MF,
3296 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3297 .addExternalSymbol("__arm_get_current_vg")
3298 .addRegMask(RegMask)
3299 .addReg(AArch64::X0, RegState::ImplicitDefine)
3301 Reg1 = AArch64::X0;
3302 AFI->setVGIdx(RPI.FrameIdx);
3303 }
3304 }
3305
3306 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3307 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3308 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3309 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3310 dbgs() << ")\n");
3311
3312 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3313 "Windows unwdinding requires a consecutive (FP,LR) pair");
3314 // Windows unwind codes require consecutive registers if registers are
3315 // paired. Make the switch here, so that the code below will save (x,x+1)
3316 // and not (x+1,x).
3317 unsigned FrameIdxReg1 = RPI.FrameIdx;
3318 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3319 if (NeedsWinCFI && RPI.isPaired()) {
3320 std::swap(Reg1, Reg2);
3321 std::swap(FrameIdxReg1, FrameIdxReg2);
3322 }
3323
3324 if (RPI.isPaired() && RPI.isScalable()) {
3325 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3328 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3329 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3330 "Expects SVE2.1 or SME2 target and a predicate register");
3331#ifdef EXPENSIVE_CHECKS
3332 auto IsPPR = [](const RegPairInfo &c) {
3333 return c.Reg1 == RegPairInfo::PPR;
3334 };
3335 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3336 auto IsZPR = [](const RegPairInfo &c) {
3337 return c.Type == RegPairInfo::ZPR;
3338 };
3339 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3340 assert(!(PPRBegin < ZPRBegin) &&
3341 "Expected callee save predicate to be handled first");
3342#endif
3343 if (!PTrueCreated) {
3344 PTrueCreated = true;
3345 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3347 }
3348 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3349 if (!MRI.isReserved(Reg1))
3350 MBB.addLiveIn(Reg1);
3351 if (!MRI.isReserved(Reg2))
3352 MBB.addLiveIn(Reg2);
3353 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3355 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3356 MachineMemOperand::MOStore, Size, Alignment));
3357 MIB.addReg(PnReg);
3358 MIB.addReg(AArch64::SP)
3359 .addImm(RPI.Offset) // [sp, #offset*scale],
3360 // where factor*scale is implicit
3363 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3364 MachineMemOperand::MOStore, Size, Alignment));
3365 if (NeedsWinCFI)
3367 } else { // The code when the pair of ZReg is not present
3368 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3369 if (!MRI.isReserved(Reg1))
3370 MBB.addLiveIn(Reg1);
3371 if (RPI.isPaired()) {
3372 if (!MRI.isReserved(Reg2))
3373 MBB.addLiveIn(Reg2);
3374 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3376 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3377 MachineMemOperand::MOStore, Size, Alignment));
3378 }
3379 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3380 .addReg(AArch64::SP)
3381 .addImm(RPI.Offset) // [sp, #offset*scale],
3382 // where factor*scale is implicit
3385 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3386 MachineMemOperand::MOStore, Size, Alignment));
3387 if (NeedsWinCFI)
3389 }
3390 // Update the StackIDs of the SVE stack slots.
3391 MachineFrameInfo &MFI = MF.getFrameInfo();
3392 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3393 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3394 if (RPI.isPaired())
3395 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3396 }
3397
3398 if (X0Scratch != AArch64::NoRegister)
3399 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3400 .addReg(AArch64::XZR)
3401 .addReg(X0Scratch, RegState::Undef)
3402 .addReg(X0Scratch, RegState::Implicit)
3404 }
3405 return true;
3406}
3407
3411 MachineFunction &MF = *MBB.getParent();
3413 DebugLoc DL;
3415 bool NeedsWinCFI = needsWinCFI(MF);
3416
3417 if (MBBI != MBB.end())
3418 DL = MBBI->getDebugLoc();
3419
3420 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3421 if (homogeneousPrologEpilog(MF, &MBB)) {
3422 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3424 for (auto &RPI : RegPairs) {
3425 MIB.addReg(RPI.Reg1, RegState::Define);
3426 MIB.addReg(RPI.Reg2, RegState::Define);
3427 }
3428 return true;
3429 }
3430
3431 // For performance reasons restore SVE register in increasing order
3432 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3433 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3434 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3435 std::reverse(PPRBegin, PPREnd);
3436 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3437 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3438 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3439 std::reverse(ZPRBegin, ZPREnd);
3440
3441 bool PTrueCreated = false;
3442 for (const RegPairInfo &RPI : RegPairs) {
3443 unsigned Reg1 = RPI.Reg1;
3444 unsigned Reg2 = RPI.Reg2;
3445
3446 // Issue sequence of restores for cs regs. The last restore may be converted
3447 // to a post-increment load later by emitEpilogue if the callee-save stack
3448 // area allocation can't be combined with the local stack area allocation.
3449 // For example:
3450 // ldp fp, lr, [sp, #32] // addImm(+4)
3451 // ldp x20, x19, [sp, #16] // addImm(+2)
3452 // ldp x22, x21, [sp, #0] // addImm(+0)
3453 // Note: see comment in spillCalleeSavedRegisters()
3454 unsigned LdrOpc;
3455 unsigned Size;
3456 Align Alignment;
3457 switch (RPI.Type) {
3458 case RegPairInfo::GPR:
3459 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3460 Size = 8;
3461 Alignment = Align(8);
3462 break;
3463 case RegPairInfo::FPR64:
3464 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3465 Size = 8;
3466 Alignment = Align(8);
3467 break;
3468 case RegPairInfo::FPR128:
3469 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3470 Size = 16;
3471 Alignment = Align(16);
3472 break;
3473 case RegPairInfo::ZPR:
3474 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3475 Size = 16;
3476 Alignment = Align(16);
3477 break;
3478 case RegPairInfo::PPR:
3479 LdrOpc = AArch64::LDR_PXI;
3480 Size = 2;
3481 Alignment = Align(2);
3482 break;
3483 case RegPairInfo::VG:
3484 continue;
3485 }
3486 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3487 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3488 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3489 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3490 dbgs() << ")\n");
3491
3492 // Windows unwind codes require consecutive registers if registers are
3493 // paired. Make the switch here, so that the code below will save (x,x+1)
3494 // and not (x+1,x).
3495 unsigned FrameIdxReg1 = RPI.FrameIdx;
3496 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3497 if (NeedsWinCFI && RPI.isPaired()) {
3498 std::swap(Reg1, Reg2);
3499 std::swap(FrameIdxReg1, FrameIdxReg2);
3500 }
3501
3503 if (RPI.isPaired() && RPI.isScalable()) {
3504 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3506 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3507 assert(((Subtarget.hasSVE2p1() || Subtarget.hasSME2()) && PnReg != 0) &&
3508 "Expects SVE2.1 or SME2 target and a predicate register");
3509#ifdef EXPENSIVE_CHECKS
3510 assert(!(PPRBegin < ZPRBegin) &&
3511 "Expected callee save predicate to be handled first");
3512#endif
3513 if (!PTrueCreated) {
3514 PTrueCreated = true;
3515 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3517 }
3518 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3519 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3520 getDefRegState(true));
3522 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3523 MachineMemOperand::MOLoad, Size, Alignment));
3524 MIB.addReg(PnReg);
3525 MIB.addReg(AArch64::SP)
3526 .addImm(RPI.Offset) // [sp, #offset*scale]
3527 // where factor*scale is implicit
3530 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3531 MachineMemOperand::MOLoad, Size, Alignment));
3532 if (NeedsWinCFI)
3534 } else {
3535 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3536 if (RPI.isPaired()) {
3537 MIB.addReg(Reg2, getDefRegState(true));
3539 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3540 MachineMemOperand::MOLoad, Size, Alignment));
3541 }
3542 MIB.addReg(Reg1, getDefRegState(true));
3543 MIB.addReg(AArch64::SP)
3544 .addImm(RPI.Offset) // [sp, #offset*scale]
3545 // where factor*scale is implicit
3548 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3549 MachineMemOperand::MOLoad, Size, Alignment));
3550 if (NeedsWinCFI)
3552 }
3553 }
3554 return true;
3555}
3556
3557// Return the FrameID for a MMO.
3558static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3559 const MachineFrameInfo &MFI) {
3560 auto *PSV =
3561 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3562 if (PSV)
3563 return std::optional<int>(PSV->getFrameIndex());
3564
3565 if (MMO->getValue()) {
3566 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3567 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3568 FI++)
3569 if (MFI.getObjectAllocation(FI) == Al)
3570 return FI;
3571 }
3572 }
3573
3574 return std::nullopt;
3575}
3576
3577// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3578static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3579 const MachineFrameInfo &MFI) {
3580 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3581 return std::nullopt;
3582
3583 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3584}
3585
3586// Check if a Hazard slot is needed for the current function, and if so create
3587// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3588// which can be used to determine if any hazard padding is needed.
3589void AArch64FrameLowering::determineStackHazardSlot(
3590 MachineFunction &MF, BitVector &SavedRegs) const {
3591 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3593 return;
3594
3595 // Stack hazards are only needed in streaming functions.
3597 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3598 return;
3599
3600 MachineFrameInfo &MFI = MF.getFrameInfo();
3601
3602 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3603 // stack objects.
3604 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3605 return AArch64::FPR64RegClass.contains(Reg) ||
3606 AArch64::FPR128RegClass.contains(Reg) ||
3607 AArch64::ZPRRegClass.contains(Reg) ||
3608 AArch64::PPRRegClass.contains(Reg);
3609 });
3610 bool HasFPRStackObjects = false;
3611 if (!HasFPRCSRs) {
3612 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3613 for (auto &MBB : MF) {
3614 for (auto &MI : MBB) {
3615 std::optional<int> FI = getLdStFrameID(MI, MFI);
3616 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3617 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3619 FrameObjects[*FI] |= 2;
3620 else
3621 FrameObjects[*FI] |= 1;
3622 }
3623 }
3624 }
3625 HasFPRStackObjects =
3626 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3627 }
3628
3629 if (HasFPRCSRs || HasFPRStackObjects) {
3630 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3631 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3632 << StackHazardSize << "\n");
3633 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3634 }
3635}
3636
3638 BitVector &SavedRegs,
3639 RegScavenger *RS) const {
3640 // All calls are tail calls in GHC calling conv, and functions have no
3641 // prologue/epilogue.
3643 return;
3644
3646 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3648 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3650 unsigned UnspilledCSGPR = AArch64::NoRegister;
3651 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3652
3653 MachineFrameInfo &MFI = MF.getFrameInfo();
3654 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3655
3656 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3657 ? RegInfo->getBaseRegister()
3658 : (unsigned)AArch64::NoRegister;
3659
3660 unsigned ExtraCSSpill = 0;
3661 bool HasUnpairedGPR64 = false;
3662 bool HasPairZReg = false;
3663 // Figure out which callee-saved registers to save/restore.
3664 for (unsigned i = 0; CSRegs[i]; ++i) {
3665 const unsigned Reg = CSRegs[i];
3666
3667 // Add the base pointer register to SavedRegs if it is callee-save.
3668 if (Reg == BasePointerReg)
3669 SavedRegs.set(Reg);
3670
3671 bool RegUsed = SavedRegs.test(Reg);
3672 unsigned PairedReg = AArch64::NoRegister;
3673 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3674 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3675 AArch64::FPR128RegClass.contains(Reg)) {
3676 // Compensate for odd numbers of GP CSRs.
3677 // For now, all the known cases of odd number of CSRs are of GPRs.
3678 if (HasUnpairedGPR64)
3679 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3680 else
3681 PairedReg = CSRegs[i ^ 1];
3682 }
3683
3684 // If the function requires all the GP registers to save (SavedRegs),
3685 // and there are an odd number of GP CSRs at the same time (CSRegs),
3686 // PairedReg could be in a different register class from Reg, which would
3687 // lead to a FPR (usually D8) accidentally being marked saved.
3688 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3689 PairedReg = AArch64::NoRegister;
3690 HasUnpairedGPR64 = true;
3691 }
3692 assert(PairedReg == AArch64::NoRegister ||
3693 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3694 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3695 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3696
3697 if (!RegUsed) {
3698 if (AArch64::GPR64RegClass.contains(Reg) &&
3699 !RegInfo->isReservedReg(MF, Reg)) {
3700 UnspilledCSGPR = Reg;
3701 UnspilledCSGPRPaired = PairedReg;
3702 }
3703 continue;
3704 }
3705
3706 // MachO's compact unwind format relies on all registers being stored in
3707 // pairs.
3708 // FIXME: the usual format is actually better if unwinding isn't needed.
3709 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3710 !SavedRegs.test(PairedReg)) {
3711 SavedRegs.set(PairedReg);
3712 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3713 !RegInfo->isReservedReg(MF, PairedReg))
3714 ExtraCSSpill = PairedReg;
3715 }
3716 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3717 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3718 SavedRegs.test(CSRegs[i ^ 1]));
3719 }
3720
3721 if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
3723 // Find a suitable predicate register for the multi-vector spill/fill
3724 // instructions.
3725 unsigned PnReg = findFreePredicateReg(SavedRegs);
3726 if (PnReg != AArch64::NoRegister)
3727 AFI->setPredicateRegForFillSpill(PnReg);
3728 // If no free callee-save has been found assign one.
3729 if (!AFI->getPredicateRegForFillSpill() &&
3730 MF.getFunction().getCallingConv() ==
3732 SavedRegs.set(AArch64::P8);
3733 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3734 }
3735
3736 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3737 "Predicate cannot be a reserved register");
3738 }
3739
3741 !Subtarget.isTargetWindows()) {
3742 // For Windows calling convention on a non-windows OS, where X18 is treated
3743 // as reserved, back up X18 when entering non-windows code (marked with the
3744 // Windows calling convention) and restore when returning regardless of
3745 // whether the individual function uses it - it might call other functions
3746 // that clobber it.
3747 SavedRegs.set(AArch64::X18);
3748 }
3749
3750 // Calculates the callee saved stack size.
3751 unsigned CSStackSize = 0;
3752 unsigned SVECSStackSize = 0;
3754 const MachineRegisterInfo &MRI = MF.getRegInfo();
3755 for (unsigned Reg : SavedRegs.set_bits()) {
3756 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3757 if (AArch64::PPRRegClass.contains(Reg) ||
3758 AArch64::ZPRRegClass.contains(Reg))
3759 SVECSStackSize += RegSize;
3760 else
3761 CSStackSize += RegSize;
3762 }
3763
3764 // Increase the callee-saved stack size if the function has streaming mode
3765 // changes, as we will need to spill the value of the VG register.
3766 // For locally streaming functions, we spill both the streaming and
3767 // non-streaming VG value.
3768 const Function &F = MF.getFunction();
3769 SMEAttrs Attrs(F);
3770 if (requiresSaveVG(MF)) {
3771 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3772 CSStackSize += 16;
3773 else
3774 CSStackSize += 8;
3775 }
3776
3777 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3778 // StackHazardSize if so.
3779 determineStackHazardSlot(MF, SavedRegs);
3780 if (AFI->hasStackHazardSlotIndex())
3781 CSStackSize += StackHazardSize;
3782
3783 // Save number of saved regs, so we can easily update CSStackSize later.
3784 unsigned NumSavedRegs = SavedRegs.count();
3785
3786 // The frame record needs to be created by saving the appropriate registers
3787 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3788 if (hasFP(MF) ||
3789 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3790 SavedRegs.set(AArch64::FP);
3791 SavedRegs.set(AArch64::LR);
3792 }
3793
3794 LLVM_DEBUG({
3795 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3796 for (unsigned Reg : SavedRegs.set_bits())
3797 dbgs() << ' ' << printReg(Reg, RegInfo);
3798 dbgs() << "\n";
3799 });
3800
3801 // If any callee-saved registers are used, the frame cannot be eliminated.
3802 int64_t SVEStackSize =
3803 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3804 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3805
3806 // The CSR spill slots have not been allocated yet, so estimateStackSize
3807 // won't include them.
3808 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3809
3810 // We may address some of the stack above the canonical frame address, either
3811 // for our own arguments or during a call. Include that in calculating whether
3812 // we have complicated addressing concerns.
3813 int64_t CalleeStackUsed = 0;
3814 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3815 int64_t FixedOff = MFI.getObjectOffset(I);
3816 if (FixedOff > CalleeStackUsed)
3817 CalleeStackUsed = FixedOff;
3818 }
3819
3820 // Conservatively always assume BigStack when there are SVE spills.
3821 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3822 CalleeStackUsed) > EstimatedStackSizeLimit;
3823 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3824 AFI->setHasStackFrame(true);
3825
3826 // Estimate if we might need to scavenge a register at some point in order
3827 // to materialize a stack offset. If so, either spill one additional
3828 // callee-saved register or reserve a special spill slot to facilitate
3829 // register scavenging. If we already spilled an extra callee-saved register
3830 // above to keep the number of spills even, we don't need to do anything else
3831 // here.
3832 if (BigStack) {
3833 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3834 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3835 << " to get a scratch register.\n");
3836 SavedRegs.set(UnspilledCSGPR);
3837 ExtraCSSpill = UnspilledCSGPR;
3838
3839 // MachO's compact unwind format relies on all registers being stored in
3840 // pairs, so if we need to spill one extra for BigStack, then we need to
3841 // store the pair.
3842 if (producePairRegisters(MF)) {
3843 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3844 // Failed to make a pair for compact unwind format, revert spilling.
3845 if (produceCompactUnwindFrame(MF)) {
3846 SavedRegs.reset(UnspilledCSGPR);
3847 ExtraCSSpill = AArch64::NoRegister;
3848 }
3849 } else
3850 SavedRegs.set(UnspilledCSGPRPaired);
3851 }
3852 }
3853
3854 // If we didn't find an extra callee-saved register to spill, create
3855 // an emergency spill slot.
3856 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3858 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3859 unsigned Size = TRI->getSpillSize(RC);
3860 Align Alignment = TRI->getSpillAlign(RC);
3861 int FI = MFI.CreateStackObject(Size, Alignment, false);
3863 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3864 << " as the emergency spill slot.\n");
3865 }
3866 }
3867
3868 // Adding the size of additional 64bit GPR saves.
3869 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3870
3871 // A Swift asynchronous context extends the frame record with a pointer
3872 // directly before FP.
3873 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3874 CSStackSize += 8;
3875
3876 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3877 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3878 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3879
3881 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3882 "Should not invalidate callee saved info");
3883
3884 // Round up to register pair alignment to avoid additional SP adjustment
3885 // instructions.
3886 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3887 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3888 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3889}
3890
3892 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3893 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3894 unsigned &MaxCSFrameIndex) const {
3895 bool NeedsWinCFI = needsWinCFI(MF);
3896 // To match the canonical windows frame layout, reverse the list of
3897 // callee saved registers to get them laid out by PrologEpilogInserter
3898 // in the right order. (PrologEpilogInserter allocates stack objects top
3899 // down. Windows canonical prologs store higher numbered registers at
3900 // the top, thus have the CSI array start from the highest registers.)
3901 if (NeedsWinCFI)
3902 std::reverse(CSI.begin(), CSI.end());
3903
3904 if (CSI.empty())
3905 return true; // Early exit if no callee saved registers are modified!
3906
3907 // Now that we know which registers need to be saved and restored, allocate
3908 // stack slots for them.
3909 MachineFrameInfo &MFI = MF.getFrameInfo();
3910 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3911
3912 bool UsesWinAAPCS = isTargetWindows(MF);
3913 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3914 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3915 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3916 if ((unsigned)FrameIdx < MinCSFrameIndex)
3917 MinCSFrameIndex = FrameIdx;
3918 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3919 MaxCSFrameIndex = FrameIdx;
3920 }
3921
3922 // Insert VG into the list of CSRs, immediately before LR if saved.
3923 if (requiresSaveVG(MF)) {
3924 std::vector<CalleeSavedInfo> VGSaves;
3925 SMEAttrs Attrs(MF.getFunction());
3926
3927 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3928 VGInfo.setRestored(false);
3929 VGSaves.push_back(VGInfo);
3930
3931 // Add VG again if the function is locally-streaming, as we will spill two
3932 // values.
3933 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3934 VGSaves.push_back(VGInfo);
3935
3936 bool InsertBeforeLR = false;
3937
3938 for (unsigned I = 0; I < CSI.size(); I++)
3939 if (CSI[I].getReg() == AArch64::LR) {
3940 InsertBeforeLR = true;
3941 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3942 break;
3943 }
3944
3945 if (!InsertBeforeLR)
3946 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3947 }
3948
3949 Register LastReg = 0;
3950 int HazardSlotIndex = std::numeric_limits<int>::max();
3951 for (auto &CS : CSI) {
3952 Register Reg = CS.getReg();
3953 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3954
3955 // Create a hazard slot as we switch between GPR and FPR CSRs.
3956 if (AFI->hasStackHazardSlotIndex() &&
3957 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3959 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3960 "Unexpected register order for hazard slot");
3961 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3962 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3963 << "\n");
3964 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3965 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3966 MinCSFrameIndex = HazardSlotIndex;
3967 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3968 MaxCSFrameIndex = HazardSlotIndex;
3969 }
3970
3971 unsigned Size = RegInfo->getSpillSize(*RC);
3972 Align Alignment(RegInfo->getSpillAlign(*RC));
3973 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3974 CS.setFrameIdx(FrameIdx);
3975
3976 if ((unsigned)FrameIdx < MinCSFrameIndex)
3977 MinCSFrameIndex = FrameIdx;
3978 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3979 MaxCSFrameIndex = FrameIdx;
3980
3981 // Grab 8 bytes below FP for the extended asynchronous frame info.
3982 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3983 Reg == AArch64::FP) {
3984 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3985 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3986 if ((unsigned)FrameIdx < MinCSFrameIndex)
3987 MinCSFrameIndex = FrameIdx;
3988 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3989 MaxCSFrameIndex = FrameIdx;
3990 }
3991 LastReg = Reg;
3992 }
3993
3994 // Add hazard slot in the case where no FPR CSRs are present.
3995 if (AFI->hasStackHazardSlotIndex() &&
3996 HazardSlotIndex == std::numeric_limits<int>::max()) {
3997 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3998 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3999 << "\n");
4000 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4001 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4002 MinCSFrameIndex = HazardSlotIndex;
4003 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4004 MaxCSFrameIndex = HazardSlotIndex;
4005 }
4006
4007 return true;
4008}
4009
4011 const MachineFunction &MF) const {
4013 // If the function has streaming-mode changes, don't scavenge a
4014 // spillslot in the callee-save area, as that might require an
4015 // 'addvl' in the streaming-mode-changing call-sequence when the
4016 // function doesn't use a FP.
4017 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
4018 return false;
4019 // Don't allow register salvaging with hazard slots, in case it moves objects
4020 // into the wrong place.
4021 if (AFI->hasStackHazardSlotIndex())
4022 return false;
4023 return AFI->hasCalleeSaveStackFreeSpace();
4024}
4025
4026/// returns true if there are any SVE callee saves.
4028 int &Min, int &Max) {
4029 Min = std::numeric_limits<int>::max();
4030 Max = std::numeric_limits<int>::min();
4031
4032 if (!MFI.isCalleeSavedInfoValid())
4033 return false;
4034
4035 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
4036 for (auto &CS : CSI) {
4037 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
4038 AArch64::PPRRegClass.contains(CS.getReg())) {
4039 assert((Max == std::numeric_limits<int>::min() ||
4040 Max + 1 == CS.getFrameIdx()) &&
4041 "SVE CalleeSaves are not consecutive");
4042
4043 Min = std::min(Min, CS.getFrameIdx());
4044 Max = std::max(Max, CS.getFrameIdx());
4045 }
4046 }
4047 return Min != std::numeric_limits<int>::max();
4048}
4049
4050// Process all the SVE stack objects and determine offsets for each
4051// object. If AssignOffsets is true, the offsets get assigned.
4052// Fills in the first and last callee-saved frame indices into
4053// Min/MaxCSFrameIndex, respectively.
4054// Returns the size of the stack.
4056 int &MinCSFrameIndex,
4057 int &MaxCSFrameIndex,
4058 bool AssignOffsets) {
4059#ifndef NDEBUG
4060 // First process all fixed stack objects.
4061 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
4063 "SVE vectors should never be passed on the stack by value, only by "
4064 "reference.");
4065#endif
4066
4067 auto Assign = [&MFI](int FI, int64_t Offset) {
4068 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4069 MFI.setObjectOffset(FI, Offset);
4070 };
4071
4072 int64_t Offset = 0;
4073
4074 // Then process all callee saved slots.
4075 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4076 // Assign offsets to the callee save slots.
4077 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4078 Offset += MFI.getObjectSize(I);
4080 if (AssignOffsets)
4081 Assign(I, -Offset);
4082 }
4083 }
4084
4085 // Ensure that the Callee-save area is aligned to 16bytes.
4086 Offset = alignTo(Offset, Align(16U));
4087
4088 // Create a buffer of SVE objects to allocate and sort it.
4089 SmallVector<int, 8> ObjectsToAllocate;
4090 // If we have a stack protector, and we've previously decided that we have SVE
4091 // objects on the stack and thus need it to go in the SVE stack area, then it
4092 // needs to go first.
4093 int StackProtectorFI = -1;
4094 if (MFI.hasStackProtectorIndex()) {
4095 StackProtectorFI = MFI.getStackProtectorIndex();
4096 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4097 ObjectsToAllocate.push_back(StackProtectorFI);
4098 }
4099 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4100 unsigned StackID = MFI.getStackID(I);
4101 if (StackID != TargetStackID::ScalableVector)
4102 continue;
4103 if (I == StackProtectorFI)
4104 continue;
4105 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4106 continue;
4107 if (MFI.isDeadObjectIndex(I))
4108 continue;
4109
4110 ObjectsToAllocate.push_back(I);
4111 }
4112
4113 // Allocate all SVE locals and spills
4114 for (unsigned FI : ObjectsToAllocate) {
4115 Align Alignment = MFI.getObjectAlign(FI);
4116 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4117 // two, we'd need to align every object dynamically at runtime if the
4118 // alignment is larger than 16. This is not yet supported.
4119 if (Alignment > Align(16))
4121 "Alignment of scalable vectors > 16 bytes is not yet supported");
4122
4123 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4124 if (AssignOffsets)
4125 Assign(FI, -Offset);
4126 }
4127
4128 return Offset;
4129}
4130
4131int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4132 MachineFrameInfo &MFI) const {
4133 int MinCSFrameIndex, MaxCSFrameIndex;
4134 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4135}
4136
4137int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4138 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4139 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4140 true);
4141}
4142
4144 MachineFunction &MF, RegScavenger *RS) const {
4145 MachineFrameInfo &MFI = MF.getFrameInfo();
4146
4148 "Upwards growing stack unsupported");
4149
4150 int MinCSFrameIndex, MaxCSFrameIndex;
4151 int64_t SVEStackSize =
4152 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4153
4155 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4156 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4157
4158 // If this function isn't doing Win64-style C++ EH, we don't need to do
4159 // anything.
4160 if (!MF.hasEHFunclets())
4161 return;
4163 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4164
4165 MachineBasicBlock &MBB = MF.front();
4166 auto MBBI = MBB.begin();
4167 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4168 ++MBBI;
4169
4170 // Create an UnwindHelp object.
4171 // The UnwindHelp object is allocated at the start of the fixed object area
4172 int64_t FixedObject =
4173 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4174 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4175 /*SPOffset*/ -FixedObject,
4176 /*IsImmutable=*/false);
4177 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4178
4179 // We need to store -2 into the UnwindHelp object at the start of the
4180 // function.
4181 DebugLoc DL;
4183 RS->backward(MBBI);
4184 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4185 assert(DstReg && "There must be a free register after frame setup");
4186 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4187 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4188 .addReg(DstReg, getKillRegState(true))
4189 .addFrameIndex(UnwindHelpFI)
4190 .addImm(0);
4191}
4192
4193namespace {
4194struct TagStoreInstr {
4196 int64_t Offset, Size;
4197 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4198 : MI(MI), Offset(Offset), Size(Size) {}
4199};
4200
4201class TagStoreEdit {
4202 MachineFunction *MF;
4205 // Tag store instructions that are being replaced.
4207 // Combined memref arguments of the above instructions.
4209
4210 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4211 // FrameRegOffset + Size) with the address tag of SP.
4212 Register FrameReg;
4213 StackOffset FrameRegOffset;
4214 int64_t Size;
4215 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4216 // end.
4217 std::optional<int64_t> FrameRegUpdate;
4218 // MIFlags for any FrameReg updating instructions.
4219 unsigned FrameRegUpdateFlags;
4220
4221 // Use zeroing instruction variants.
4222 bool ZeroData;
4223 DebugLoc DL;
4224
4225 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4226 void emitLoop(MachineBasicBlock::iterator InsertI);
4227
4228public:
4229 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4230 : MBB(MBB), ZeroData(ZeroData) {
4231 MF = MBB->getParent();
4232 MRI = &MF->getRegInfo();
4233 }
4234 // Add an instruction to be replaced. Instructions must be added in the
4235 // ascending order of Offset, and have to be adjacent.
4236 void addInstruction(TagStoreInstr I) {
4237 assert((TagStores.empty() ||
4238 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4239 "Non-adjacent tag store instructions.");
4240 TagStores.push_back(I);
4241 }
4242 void clear() { TagStores.clear(); }
4243 // Emit equivalent code at the given location, and erase the current set of
4244 // instructions. May skip if the replacement is not profitable. May invalidate
4245 // the input iterator and replace it with a valid one.
4246 void emitCode(MachineBasicBlock::iterator &InsertI,
4247 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4248};
4249
4250void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4251 const AArch64InstrInfo *TII =
4252 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4253
4254 const int64_t kMinOffset = -256 * 16;
4255 const int64_t kMaxOffset = 255 * 16;
4256
4257 Register BaseReg = FrameReg;
4258 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4259 if (BaseRegOffsetBytes < kMinOffset ||
4260 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4261 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4262 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4263 // is required for the offset of ST2G.
4264 BaseRegOffsetBytes % 16 != 0) {
4265 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4266 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4267 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4268 BaseReg = ScratchReg;
4269 BaseRegOffsetBytes = 0;
4270 }
4271
4272 MachineInstr *LastI = nullptr;
4273 while (Size) {
4274 int64_t InstrSize = (Size > 16) ? 32 : 16;
4275 unsigned Opcode =
4276 InstrSize == 16
4277 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4278 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4279 assert(BaseRegOffsetBytes % 16 == 0);
4280 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4281 .addReg(AArch64::SP)
4282 .addReg(BaseReg)
4283 .addImm(BaseRegOffsetBytes / 16)
4284 .setMemRefs(CombinedMemRefs);
4285 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4286 // final SP adjustment in the epilogue.
4287 if (BaseRegOffsetBytes == 0)
4288 LastI = I;
4289 BaseRegOffsetBytes += InstrSize;
4290 Size -= InstrSize;
4291 }
4292
4293 if (LastI)
4294 MBB->splice(InsertI, MBB, LastI);
4295}
4296
4297void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4298 const AArch64InstrInfo *TII =
4299 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4300
4301 Register BaseReg = FrameRegUpdate
4302 ? FrameReg
4303 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4304 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4305
4306 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4307
4308 int64_t LoopSize = Size;
4309 // If the loop size is not a multiple of 32, split off one 16-byte store at
4310 // the end to fold BaseReg update into.
4311 if (FrameRegUpdate && *FrameRegUpdate)
4312 LoopSize -= LoopSize % 32;
4313 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4314 TII->get(ZeroData ? AArch64::STZGloop_wback
4315 : AArch64::STGloop_wback))
4316 .addDef(SizeReg)
4317 .addDef(BaseReg)
4318 .addImm(LoopSize)
4319 .addReg(BaseReg)
4320 .setMemRefs(CombinedMemRefs);
4321 if (FrameRegUpdate)
4322 LoopI->setFlags(FrameRegUpdateFlags);
4323
4324 int64_t ExtraBaseRegUpdate =
4325 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4326 if (LoopSize < Size) {
4327 assert(FrameRegUpdate);
4328 assert(Size - LoopSize == 16);
4329 // Tag 16 more bytes at BaseReg and update BaseReg.
4330 BuildMI(*MBB, InsertI, DL,
4331 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4332 .addDef(BaseReg)
4333 .addReg(BaseReg)
4334 .addReg(BaseReg)
4335 .addImm(1 + ExtraBaseRegUpdate / 16)
4336 .setMemRefs(CombinedMemRefs)
4337 .setMIFlags(FrameRegUpdateFlags);
4338 } else if (ExtraBaseRegUpdate) {
4339 // Update BaseReg.
4340 BuildMI(
4341 *MBB, InsertI, DL,
4342 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4343 .addDef(BaseReg)
4344 .addReg(BaseReg)
4345 .addImm(std::abs(ExtraBaseRegUpdate))
4346 .addImm(0)
4347 .setMIFlags(FrameRegUpdateFlags);
4348 }
4349}
4350
4351// Check if *II is a register update that can be merged into STGloop that ends
4352// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4353// end of the loop.
4354bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4355 int64_t Size, int64_t *TotalOffset) {
4356 MachineInstr &MI = *II;
4357 if ((MI.getOpcode() == AArch64::ADDXri ||
4358 MI.getOpcode() == AArch64::SUBXri) &&
4359 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4360 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4361 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4362 if (MI.getOpcode() == AArch64::SUBXri)
4363 Offset = -Offset;
4364 int64_t AbsPostOffset = std::abs(Offset - Size);
4365 const int64_t kMaxOffset =
4366 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
4367 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
4368 *TotalOffset = Offset;
4369 return true;
4370 }
4371 }
4372 return false;
4373}
4374
4375void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4377 MemRefs.clear();
4378 for (auto &TS : TSE) {
4379 MachineInstr *MI = TS.MI;
4380 // An instruction without memory operands may access anything. Be
4381 // conservative and return an empty list.
4382 if (MI->memoperands_empty()) {
4383 MemRefs.clear();
4384 return;
4385 }
4386 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4387 }
4388}
4389
4390void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4391 const AArch64FrameLowering *TFI,
4392 bool TryMergeSPUpdate) {
4393 if (TagStores.empty())
4394 return;
4395 TagStoreInstr &FirstTagStore = TagStores[0];
4396 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4397 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4398 DL = TagStores[0].MI->getDebugLoc();
4399
4400 Register Reg;
4401 FrameRegOffset = TFI->resolveFrameOffsetReference(
4402 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4403 /*PreferFP=*/false, /*ForSimm=*/true);
4404 FrameReg = Reg;
4405 FrameRegUpdate = std::nullopt;
4406
4407 mergeMemRefs(TagStores, CombinedMemRefs);
4408
4409 LLVM_DEBUG({
4410 dbgs() << "Replacing adjacent STG instructions:\n";
4411 for (const auto &Instr : TagStores) {
4412 dbgs() << " " << *Instr.MI;
4413 }
4414 });
4415
4416 // Size threshold where a loop becomes shorter than a linear sequence of
4417 // tagging instructions.
4418 const int kSetTagLoopThreshold = 176;
4419 if (Size < kSetTagLoopThreshold) {
4420 if (TagStores.size() < 2)
4421 return;
4422 emitUnrolled(InsertI);
4423 } else {
4424 MachineInstr *UpdateInstr = nullptr;
4425 int64_t TotalOffset = 0;
4426 if (TryMergeSPUpdate) {
4427 // See if we can merge base register update into the STGloop.
4428 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4429 // but STGloop is way too unusual for that, and also it only
4430 // realistically happens in function epilogue. Also, STGloop is expanded
4431 // before that pass.
4432 if (InsertI != MBB->end() &&
4433 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4434 &TotalOffset)) {
4435 UpdateInstr = &*InsertI++;
4436 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4437 << *UpdateInstr);
4438 }
4439 }
4440
4441 if (!UpdateInstr && TagStores.size() < 2)
4442 return;
4443
4444 if (UpdateInstr) {
4445 FrameRegUpdate = TotalOffset;
4446 FrameRegUpdateFlags = UpdateInstr->getFlags();
4447 }
4448 emitLoop(InsertI);
4449 if (UpdateInstr)
4450 UpdateInstr->eraseFromParent();
4451 }
4452
4453 for (auto &TS : TagStores)
4454 TS.MI->eraseFromParent();
4455}
4456
4457bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4458 int64_t &Size, bool &ZeroData) {
4459 MachineFunction &MF = *MI.getParent()->getParent();
4460 const MachineFrameInfo &MFI = MF.getFrameInfo();
4461
4462 unsigned Opcode = MI.getOpcode();
4463 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4464 Opcode == AArch64::STZ2Gi);
4465
4466 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4467 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4468 return false;
4469 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4470 return false;
4471 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4472 Size = MI.getOperand(2).getImm();
4473 return true;
4474 }
4475
4476 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4477 Size = 16;
4478 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4479 Size = 32;
4480 else
4481 return false;
4482
4483 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4484 return false;
4485
4486 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4487 16 * MI.getOperand(2).getImm();
4488 return true;
4489}
4490
4491// Detect a run of memory tagging instructions for adjacent stack frame slots,
4492// and replace them with a shorter instruction sequence:
4493// * replace STG + STG with ST2G
4494// * replace STGloop + STGloop with STGloop
4495// This code needs to run when stack slot offsets are already known, but before
4496// FrameIndex operands in STG instructions are eliminated.
4498 const AArch64FrameLowering *TFI,
4499 RegScavenger *RS) {
4500 bool FirstZeroData;
4501 int64_t Size, Offset;
4502 MachineInstr &MI = *II;
4503 MachineBasicBlock *MBB = MI.getParent();
4505 if (&MI == &MBB->instr_back())
4506 return II;
4507 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4508 return II;
4509
4511 Instrs.emplace_back(&MI, Offset, Size);
4512
4513 constexpr int kScanLimit = 10;
4514 int Count = 0;
4516 NextI != E && Count < kScanLimit; ++NextI) {
4517 MachineInstr &MI = *NextI;
4518 bool ZeroData;
4519 int64_t Size, Offset;
4520 // Collect instructions that update memory tags with a FrameIndex operand
4521 // and (when applicable) constant size, and whose output registers are dead
4522 // (the latter is almost always the case in practice). Since these
4523 // instructions effectively have no inputs or outputs, we are free to skip
4524 // any non-aliasing instructions in between without tracking used registers.
4525 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4526 if (ZeroData != FirstZeroData)
4527 break;
4528 Instrs.emplace_back(&MI, Offset, Size);
4529 continue;
4530 }
4531
4532 // Only count non-transient, non-tagging instructions toward the scan
4533 // limit.
4534 if (!MI.isTransient())
4535 ++Count;
4536
4537 // Just in case, stop before the epilogue code starts.
4538 if (MI.getFlag(MachineInstr::FrameSetup) ||
4540 break;
4541
4542 // Reject anything that may alias the collected instructions.
4543 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
4544 break;
4545 }
4546
4547 // New code will be inserted after the last tagging instruction we've found.
4548 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4549
4550 // All the gathered stack tag instructions are merged and placed after
4551 // last tag store in the list. The check should be made if the nzcv
4552 // flag is live at the point where we are trying to insert. Otherwise
4553 // the nzcv flag might get clobbered if any stg loops are present.
4554
4555 // FIXME : This approach of bailing out from merge is conservative in
4556 // some ways like even if stg loops are not present after merge the
4557 // insert list, this liveness check is done (which is not needed).
4559 LiveRegs.addLiveOuts(*MBB);
4560 for (auto I = MBB->rbegin();; ++I) {
4561 MachineInstr &MI = *I;
4562 if (MI == InsertI)
4563 break;
4564 LiveRegs.stepBackward(*I);
4565 }
4566 InsertI++;
4567 if (LiveRegs.contains(AArch64::NZCV))
4568 return InsertI;
4569
4570 llvm::stable_sort(Instrs,
4571 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4572 return Left.Offset < Right.Offset;
4573 });
4574
4575 // Make sure that we don't have any overlapping stores.
4576 int64_t CurOffset = Instrs[0].Offset;
4577 for (auto &Instr : Instrs) {
4578 if (CurOffset > Instr.Offset)
4579 return NextI;
4580 CurOffset = Instr.Offset + Instr.Size;
4581 }
4582
4583 // Find contiguous runs of tagged memory and emit shorter instruction
4584 // sequencies for them when possible.
4585 TagStoreEdit TSE(MBB, FirstZeroData);
4586 std::optional<int64_t> EndOffset;
4587 for (auto &Instr : Instrs) {
4588 if (EndOffset && *EndOffset != Instr.Offset) {
4589 // Found a gap.
4590 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4591 TSE.clear();
4592 }
4593
4594 TSE.addInstruction(Instr);
4595 EndOffset = Instr.Offset + Instr.Size;
4596 }
4597
4598 const MachineFunction *MF = MBB->getParent();
4599 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4600 TSE.emitCode(
4601 InsertI, TFI, /*TryMergeSPUpdate = */
4603
4604 return InsertI;
4605}
4606} // namespace
4607
4609 const AArch64FrameLowering *TFI) {
4610 MachineInstr &MI = *II;
4611 MachineBasicBlock *MBB = MI.getParent();
4612 MachineFunction *MF = MBB->getParent();
4613
4614 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4615 MI.getOpcode() != AArch64::VGRestorePseudo)
4616 return II;
4617
4618 SMEAttrs FuncAttrs(MF->getFunction());
4619 bool LocallyStreaming =
4620 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4623 const AArch64InstrInfo *TII =
4624 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4625
4626 int64_t VGFrameIdx =
4627 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4628 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4629 "Expected FrameIdx for VG");
4630
4631 unsigned CFIIndex;
4632 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4633 const MachineFrameInfo &MFI = MF->getFrameInfo();
4634 int64_t Offset =
4635 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4637 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4638 } else
4640 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4641
4642 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4643 TII->get(TargetOpcode::CFI_INSTRUCTION))
4644 .addCFIIndex(CFIIndex);
4645
4646 MI.eraseFromParent();
4647 return UnwindInst->getIterator();
4648}
4649
4651 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4652 for (auto &BB : MF)
4653 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4654 if (requiresSaveVG(MF))
4655 II = emitVGSaveRestore(II, this);
4657 II = tryMergeAdjacentSTG(II, this, RS);
4658 }
4659}
4660
4661/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4662/// before the update. This is easily retrieved as it is exactly the offset
4663/// that is set in processFunctionBeforeFrameFinalized.
4665 const MachineFunction &MF, int FI, Register &FrameReg,
4666 bool IgnoreSPUpdates) const {
4667 const MachineFrameInfo &MFI = MF.getFrameInfo();
4668 if (IgnoreSPUpdates) {
4669 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4670 << MFI.getObjectOffset(FI) << "\n");
4671 FrameReg = AArch64::SP;
4672 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4673 }
4674
4675 // Go to common code if we cannot provide sp + offset.
4676 if (MFI.hasVarSizedObjects() ||
4679 return getFrameIndexReference(MF, FI, FrameReg);
4680
4681 FrameReg = AArch64::SP;
4682 return getStackOffset(MF, MFI.getObjectOffset(FI));
4683}
4684
4685/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4686/// the parent's frame pointer
4688 const MachineFunction &MF) const {
4689 return 0;
4690}
4691
4692/// Funclets only need to account for space for the callee saved registers,
4693/// as the locals are accounted for in the parent's stack frame.
4695 const MachineFunction &MF) const {
4696 // This is the size of the pushed CSRs.
4697 unsigned CSSize =
4698 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4699 // This is the amount of stack a funclet needs to allocate.
4700 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4701 getStackAlign());
4702}
4703
4704namespace {
4705struct FrameObject {
4706 bool IsValid = false;
4707 // Index of the object in MFI.
4708 int ObjectIndex = 0;
4709 // Group ID this object belongs to.
4710 int GroupIndex = -1;
4711 // This object should be placed first (closest to SP).
4712 bool ObjectFirst = false;
4713 // This object's group (which always contains the object with
4714 // ObjectFirst==true) should be placed first.
4715 bool GroupFirst = false;
4716
4717 // Used to distinguish between FP and GPR accesses. The values are decided so
4718 // that they sort FPR < Hazard < GPR and they can be or'd together.
4719 unsigned Accesses = 0;
4720 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4721};
4722
4723class GroupBuilder {
4724 SmallVector<int, 8> CurrentMembers;
4725 int NextGroupIndex = 0;
4726 std::vector<FrameObject> &Objects;
4727
4728public:
4729 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4730 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4731 void EndCurrentGroup() {
4732 if (CurrentMembers.size() > 1) {
4733 // Create a new group with the current member list. This might remove them
4734 // from their pre-existing groups. That's OK, dealing with overlapping
4735 // groups is too hard and unlikely to make a difference.
4736 LLVM_DEBUG(dbgs() << "group:");
4737 for (int Index : CurrentMembers) {
4738 Objects[Index].GroupIndex = NextGroupIndex;
4739 LLVM_DEBUG(dbgs() << " " << Index);
4740 }
4741 LLVM_DEBUG(dbgs() << "\n");
4742 NextGroupIndex++;
4743 }
4744 CurrentMembers.clear();
4745 }
4746};
4747
4748bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4749 // Objects at a lower index are closer to FP; objects at a higher index are
4750 // closer to SP.
4751 //
4752 // For consistency in our comparison, all invalid objects are placed
4753 // at the end. This also allows us to stop walking when we hit the
4754 // first invalid item after it's all sorted.
4755 //
4756 // If we want to include a stack hazard region, order FPR accesses < the
4757 // hazard object < GPRs accesses in order to create a separation between the
4758 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4759 //
4760 // Otherwise the "first" object goes first (closest to SP), followed by the
4761 // members of the "first" group.
4762 //
4763 // The rest are sorted by the group index to keep the groups together.
4764 // Higher numbered groups are more likely to be around longer (i.e. untagged
4765 // in the function epilogue and not at some earlier point). Place them closer
4766 // to SP.
4767 //
4768 // If all else equal, sort by the object index to keep the objects in the
4769 // original order.
4770 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4771 A.GroupIndex, A.ObjectIndex) <
4772 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4773 B.GroupIndex, B.ObjectIndex);
4774}
4775} // namespace
4776
4778 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4779 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4780 return;
4781
4783 const MachineFrameInfo &MFI = MF.getFrameInfo();
4784 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4785 for (auto &Obj : ObjectsToAllocate) {
4786 FrameObjects[Obj].IsValid = true;
4787 FrameObjects[Obj].ObjectIndex = Obj;
4788 }
4789
4790 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4791 // the same time.
4792 GroupBuilder GB(FrameObjects);
4793 for (auto &MBB : MF) {
4794 for (auto &MI : MBB) {
4795 if (MI.isDebugInstr())
4796 continue;
4797
4798 if (AFI.hasStackHazardSlotIndex()) {
4799 std::optional<int> FI = getLdStFrameID(MI, MFI);
4800 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4801 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4803 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4804 else
4805 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4806 }
4807 }
4808
4809 int OpIndex;
4810 switch (MI.getOpcode()) {
4811 case AArch64::STGloop:
4812 case AArch64::STZGloop:
4813 OpIndex = 3;
4814 break;
4815 case AArch64::STGi:
4816 case AArch64::STZGi:
4817 case AArch64::ST2Gi:
4818 case AArch64::STZ2Gi:
4819 OpIndex = 1;
4820 break;
4821 default:
4822 OpIndex = -1;
4823 }
4824
4825 int TaggedFI = -1;
4826 if (OpIndex >= 0) {
4827 const MachineOperand &MO = MI.getOperand(OpIndex);
4828 if (MO.isFI()) {
4829 int FI = MO.getIndex();
4830 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4831 FrameObjects[FI].IsValid)
4832 TaggedFI = FI;
4833 }
4834 }
4835
4836 // If this is a stack tagging instruction for a slot that is not part of a
4837 // group yet, either start a new group or add it to the current one.
4838 if (TaggedFI >= 0)
4839 GB.AddMember(TaggedFI);
4840 else
4841 GB.EndCurrentGroup();
4842 }
4843 // Groups should never span multiple basic blocks.
4844 GB.EndCurrentGroup();
4845 }
4846
4847 if (AFI.hasStackHazardSlotIndex()) {
4848 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4849 FrameObject::AccessHazard;
4850 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4851 for (auto &Obj : FrameObjects)
4852 if (!Obj.Accesses ||
4853 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4854 Obj.Accesses = FrameObject::AccessGPR;
4855 }
4856
4857 // If the function's tagged base pointer is pinned to a stack slot, we want to
4858 // put that slot first when possible. This will likely place it at SP + 0,
4859 // and save one instruction when generating the base pointer because IRG does
4860 // not allow an immediate offset.
4861 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4862 if (TBPI) {
4863 FrameObjects[*TBPI].ObjectFirst = true;
4864 FrameObjects[*TBPI].GroupFirst = true;
4865 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4866 if (FirstGroupIndex >= 0)
4867 for (FrameObject &Object : FrameObjects)
4868 if (Object.GroupIndex == FirstGroupIndex)
4869 Object.GroupFirst = true;
4870 }
4871
4872 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4873
4874 int i = 0;
4875 for (auto &Obj : FrameObjects) {
4876 // All invalid items are sorted at the end, so it's safe to stop.
4877 if (!Obj.IsValid)
4878 break;
4879 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4880 }
4881
4882 LLVM_DEBUG({
4883 dbgs() << "Final frame order:\n";
4884 for (auto &Obj : FrameObjects) {
4885 if (!Obj.IsValid)
4886 break;
4887 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4888 if (Obj.ObjectFirst)
4889 dbgs() << ", first";
4890 if (Obj.GroupFirst)
4891 dbgs() << ", group-first";
4892 dbgs() << "\n";
4893 }
4894 });
4895}
4896
4897/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4898/// least every ProbeSize bytes. Returns an iterator of the first instruction
4899/// after the loop. The difference between SP and TargetReg must be an exact
4900/// multiple of ProbeSize.
4902AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4903 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4904 Register TargetReg) const {
4906 MachineFunction &MF = *MBB.getParent();
4907 const AArch64InstrInfo *TII =
4908 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4910
4911 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4913 MF.insert(MBBInsertPoint, LoopMBB);
4915 MF.insert(MBBInsertPoint, ExitMBB);
4916
4917 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4918 // in SUB).
4919 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4920 StackOffset::getFixed(-ProbeSize), TII,
4922 // STR XZR, [SP]
4923 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4924 .addReg(AArch64::XZR)
4925 .addReg(AArch64::SP)
4926 .addImm(0)
4928 // CMP SP, TargetReg
4929 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4930 AArch64::XZR)
4931 .addReg(AArch64::SP)
4932 .addReg(TargetReg)
4935 // B.CC Loop
4936 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4938 .addMBB(LoopMBB)
4940
4941 LoopMBB->addSuccessor(ExitMBB);
4942 LoopMBB->addSuccessor(LoopMBB);
4943 // Synthesize the exit MBB.
4944 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4946 MBB.addSuccessor(LoopMBB);
4947 // Update liveins.
4948 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4949
4950 return ExitMBB->begin();
4951}
4952
4953void AArch64FrameLowering::inlineStackProbeFixed(
4954 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4955 StackOffset CFAOffset) const {
4957 MachineFunction &MF = *MBB->getParent();
4958 const AArch64InstrInfo *TII =
4959 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4961 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4962 bool HasFP = hasFP(MF);
4963
4964 DebugLoc DL;
4965 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4966 int64_t NumBlocks = FrameSize / ProbeSize;
4967 int64_t ResidualSize = FrameSize % ProbeSize;
4968
4969 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4970 << NumBlocks << " blocks of " << ProbeSize
4971 << " bytes, plus " << ResidualSize << " bytes\n");
4972
4973 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4974 // ordinary loop.
4975 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4976 for (int i = 0; i < NumBlocks; ++i) {
4977 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4978 // encodable in a SUB).
4979 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4980 StackOffset::getFixed(-ProbeSize), TII,
4981 MachineInstr::FrameSetup, false, false, nullptr,
4982 EmitAsyncCFI && !HasFP, CFAOffset);
4983 CFAOffset += StackOffset::getFixed(ProbeSize);
4984 // STR XZR, [SP]
4985 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4986 .addReg(AArch64::XZR)
4987 .addReg(AArch64::SP)
4988 .addImm(0)
4990 }
4991 } else if (NumBlocks != 0) {
4992 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4993 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4994 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4995 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4996 MachineInstr::FrameSetup, false, false, nullptr,
4997 EmitAsyncCFI && !HasFP, CFAOffset);
4998 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4999 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
5000 MBB = MBBI->getParent();
5001 if (EmitAsyncCFI && !HasFP) {
5002 // Set the CFA register back to SP.
5004 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
5005 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
5006 unsigned CFIIndex =
5008 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
5009 .addCFIIndex(CFIIndex)
5011 }
5012 }
5013
5014 if (ResidualSize != 0) {
5015 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
5016 // in SUB).
5017 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5018 StackOffset::getFixed(-ResidualSize), TII,
5019 MachineInstr::FrameSetup, false, false, nullptr,
5020 EmitAsyncCFI && !HasFP, CFAOffset);
5021 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
5022 // STR XZR, [SP]
5023 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5024 .addReg(AArch64::XZR)
5025 .addReg(AArch64::SP)
5026 .addImm(0)
5028 }
5029 }
5030}
5031
5032void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
5033 MachineBasicBlock &MBB) const {
5034 // Get the instructions that need to be replaced. We emit at most two of
5035 // these. Remember them in order to avoid complications coming from the need
5036 // to traverse the block while potentially creating more blocks.
5038 for (MachineInstr &MI : MBB)
5039 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
5040 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
5041 ToReplace.push_back(&MI);
5042
5043 for (MachineInstr *MI : ToReplace) {
5044 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
5045 Register ScratchReg = MI->getOperand(0).getReg();
5046 int64_t FrameSize = MI->getOperand(1).getImm();
5047 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
5048 MI->getOperand(3).getImm());
5049 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
5050 CFAOffset);
5051 } else {
5052 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
5053 "Stack probe pseudo-instruction expected");
5054 const AArch64InstrInfo *TII =
5055 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
5056 Register TargetReg = MI->getOperand(0).getReg();
5057 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
5058 }
5059 MI->eraseFromParent();
5060 }
5061}
5062
5065 NotAccessed = 0, // Stack object not accessed by load/store instructions.
5066 GPR = 1 << 0, // A general purpose register.
5067 PPR = 1 << 1, // A predicate register.
5068 FPR = 1 << 2, // A floating point/Neon/SVE register.
5069 };
5070
5071 int Idx;
5073 int64_t Size;
5074 unsigned AccessTypes;
5075
5076 StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
5077
5078 bool operator<(const StackAccess &Rhs) const {
5079 return std::make_tuple(start(), Idx) <
5080 std::make_tuple(Rhs.start(), Rhs.Idx);
5081 }
5082
5083 bool isCPU() const {
5084 // Predicate register load and store instructions execute on the CPU.
5085 return AccessTypes & (AccessType::GPR | AccessType::PPR);
5086 }
5087 bool isSME() const { return AccessTypes & AccessType::FPR; }
5088 bool isMixed() const { return isCPU() && isSME(); }
5089
5090 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
5091 int64_t end() const { return start() + Size; }
5092
5093 std::string getTypeString() const {
5094 switch (AccessTypes) {
5095 case AccessType::FPR:
5096 return "FPR";
5097 case AccessType::PPR:
5098 return "PPR";
5099 case AccessType::GPR:
5100 return "GPR";
5101 case AccessType::NotAccessed:
5102 return "NA";
5103 default:
5104 return "Mixed";
5105 }
5106 }
5107
5108 void print(raw_ostream &OS) const {
5109 OS << getTypeString() << " stack object at [SP"
5110 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
5111 if (Offset.getScalable())
5112 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
5113 << " * vscale";
5114 OS << "]";
5115 }
5116};
5117
5118static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
5119 SA.print(OS);
5120 return OS;
5121}
5122
5123void AArch64FrameLowering::emitRemarks(
5124 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
5125
5127 if (Attrs.hasNonStreamingInterfaceAndBody())
5128 return;
5129
5130 const uint64_t HazardSize =
5132
5133 if (HazardSize == 0)
5134 return;
5135
5136 const MachineFrameInfo &MFI = MF.getFrameInfo();
5137 // Bail if function has no stack objects.
5138 if (!MFI.hasStackObjects())
5139 return;
5140
5141 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
5142
5143 size_t NumFPLdSt = 0;
5144 size_t NumNonFPLdSt = 0;
5145
5146 // Collect stack accesses via Load/Store instructions.
5147 for (const MachineBasicBlock &MBB : MF) {
5148 for (const MachineInstr &MI : MBB) {
5149 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
5150 continue;
5151 for (MachineMemOperand *MMO : MI.memoperands()) {
5152 std::optional<int> FI = getMMOFrameID(MMO, MFI);
5153 if (FI && !MFI.isDeadObjectIndex(*FI)) {
5154 int FrameIdx = *FI;
5155
5156 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
5157 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
5158 StackAccesses[ArrIdx].Idx = FrameIdx;
5159 StackAccesses[ArrIdx].Offset =
5160 getFrameIndexReferenceFromSP(MF, FrameIdx);
5161 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
5162 }
5163
5164 unsigned RegTy = StackAccess::AccessType::GPR;
5165 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
5166 if (AArch64::PPRRegClass.contains(MI.getOperand(0).getReg()))
5167 RegTy = StackAccess::PPR;
5168 else
5169 RegTy = StackAccess::FPR;
5170 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5171 RegTy = StackAccess::FPR;
5172 }
5173
5174 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5175
5176 if (RegTy == StackAccess::FPR)
5177 ++NumFPLdSt;
5178 else
5179 ++NumNonFPLdSt;
5180 }
5181 }
5182 }
5183 }
5184
5185 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5186 return;
5187
5188 llvm::sort(StackAccesses);
5189 StackAccesses.erase(llvm::remove_if(StackAccesses,
5190 [](const StackAccess &S) {
5191 return S.AccessTypes ==
5193 }),
5194 StackAccesses.end());
5195
5198
5199 if (StackAccesses.front().isMixed())
5200 MixedObjects.push_back(&StackAccesses.front());
5201
5202 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5203 It != End; ++It) {
5204 const auto &First = *It;
5205 const auto &Second = *(It + 1);
5206
5207 if (Second.isMixed())
5208 MixedObjects.push_back(&Second);
5209
5210 if ((First.isSME() && Second.isCPU()) ||
5211 (First.isCPU() && Second.isSME())) {
5212 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5213 if (Distance < HazardSize)
5214 HazardPairs.emplace_back(&First, &Second);
5215 }
5216 }
5217
5218 auto EmitRemark = [&](llvm::StringRef Str) {
5219 ORE->emit([&]() {
5221 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5222 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5223 });
5224 };
5225
5226 for (const auto &P : HazardPairs)
5227 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5228
5229 for (const auto *Obj : MixedObjects)
5230 EmitRemark(
5231 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5232}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static cl::opt< unsigned > StackHazardSize("aarch64-stack-hazard-size", cl::init(0), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
static bool requiresSaveVG(MachineFunction &MF)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
unsigned RegSize
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:148
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition: LLParser.cpp:71
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
#define P(N)
static const MCPhysReg FPR[]
FPR - The set of FP registers that should be allocated for arguments on Darwin and AIX.
if(PassOpts->AAPipeline)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
raw_pwrite_stream & OS
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:166
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:705
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:702
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:281
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:357
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:232
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:743
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:759
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:565
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:633
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:558
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:600
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:626
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:573
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:664
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:647
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:346
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isLiveIn(MCPhysReg Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:69
void setFlags(unsigned flags)
Definition: MachineInstr.h:409
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:391
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
Diagnostic information for optimization analysis remarks.
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:586
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1291
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:202
self_iterator getIterator()
Definition: ilist_node.h:132
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition: raw_ostream.h:52
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:2020
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
auto formatv(const char *Fmt, Ts &&...Vals) -> formatv_object< decltype(std::make_tuple(support::detail::build_format_adapter(std::forward< Ts >(Vals))...))>
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1729
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:419
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1647
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
auto remove_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::remove_if which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1761
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
Definition: APFixedPoint.h:292
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.