LLVM 18.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
242static cl::opt<bool>
243 ReverseCSRRestoreSeq("reverse-csr-restore-seq",
244 cl::desc("reverse the CSR restore sequence"),
245 cl::init(false), cl::Hidden);
246
248 "stack-tagging-merge-settag",
249 cl::desc("merge settag instruction in function epilog"), cl::init(true),
250 cl::Hidden);
251
252static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
253 cl::desc("sort stack allocations"),
254 cl::init(true), cl::Hidden);
255
257 "homogeneous-prolog-epilog", cl::Hidden,
258 cl::desc("Emit homogeneous prologue and epilogue for the size "
259 "optimization (default = off)"));
260
261STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
262
263/// Returns how much of the incoming argument stack area (in bytes) we should
264/// clean up in an epilogue. For the C calling convention this will be 0, for
265/// guaranteed tail call conventions it can be positive (a normal return or a
266/// tail call to a function that uses less stack space for arguments) or
267/// negative (for a tail call to a function that needs more stack space than us
268/// for arguments).
272 bool IsTailCallReturn = false;
273 if (MBB.end() != MBBI) {
274 unsigned RetOpcode = MBBI->getOpcode();
275 IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi ||
276 RetOpcode == AArch64::TCRETURNri ||
277 RetOpcode == AArch64::TCRETURNriBTI;
278 }
280
281 int64_t ArgumentPopSize = 0;
282 if (IsTailCallReturn) {
283 MachineOperand &StackAdjust = MBBI->getOperand(1);
284
285 // For a tail-call in a callee-pops-arguments environment, some or all of
286 // the stack may actually be in use for the call's arguments, this is
287 // calculated during LowerCall and consumed here...
288 ArgumentPopSize = StackAdjust.getImm();
289 } else {
290 // ... otherwise the amount to pop is *all* of the argument space,
291 // conveniently stored in the MachineFunctionInfo by
292 // LowerFormalArguments. This will, of course, be zero for the C calling
293 // convention.
294 ArgumentPopSize = AFI->getArgumentStackToRestore();
295 }
296
297 return ArgumentPopSize;
298}
299
301static bool needsWinCFI(const MachineFunction &MF);
304
305/// Returns true if a homogeneous prolog or epilog code can be emitted
306/// for the size optimization. If possible, a frame helper call is injected.
307/// When Exit block is given, this check is for epilog.
308bool AArch64FrameLowering::homogeneousPrologEpilog(
309 MachineFunction &MF, MachineBasicBlock *Exit) const {
310 if (!MF.getFunction().hasMinSize())
311 return false;
313 return false;
315 return false;
316 if (EnableRedZone)
317 return false;
318
319 // TODO: Window is supported yet.
320 if (needsWinCFI(MF))
321 return false;
322 // TODO: SVE is not supported yet.
323 if (getSVEStackSize(MF))
324 return false;
325
326 // Bail on stack adjustment needed on return for simplicity.
327 const MachineFrameInfo &MFI = MF.getFrameInfo();
329 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
330 return false;
331 if (Exit && getArgumentStackToRestore(MF, *Exit))
332 return false;
333
334 return true;
335}
336
337/// Returns true if CSRs should be paired.
338bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
339 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
340}
341
342/// This is the biggest offset to the stack pointer we can encode in aarch64
343/// instructions (without using a separate calculation and a temp register).
344/// Note that the exception here are vector stores/loads which cannot encode any
345/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
346static const unsigned DefaultSafeSPDisplacement = 255;
347
348/// Look at each instruction that references stack frames and return the stack
349/// size limit beyond which some of these instructions will require a scratch
350/// register during their expansion later.
352 // FIXME: For now, just conservatively guestimate based on unscaled indexing
353 // range. We'll end up allocating an unnecessary spill slot a lot, but
354 // realistically that's not a big deal at this stage of the game.
355 for (MachineBasicBlock &MBB : MF) {
356 for (MachineInstr &MI : MBB) {
357 if (MI.isDebugInstr() || MI.isPseudo() ||
358 MI.getOpcode() == AArch64::ADDXri ||
359 MI.getOpcode() == AArch64::ADDSXri)
360 continue;
361
362 for (const MachineOperand &MO : MI.operands()) {
363 if (!MO.isFI())
364 continue;
365
367 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
369 return 0;
370 }
371 }
372 }
374}
375
379}
380
381/// Returns the size of the fixed object area (allocated next to sp on entry)
382/// On Win64 this may include a var args area and an UnwindHelp object for EH.
383static unsigned getFixedObjectSize(const MachineFunction &MF,
384 const AArch64FunctionInfo *AFI, bool IsWin64,
385 bool IsFunclet) {
386 if (!IsWin64 || IsFunclet) {
387 return AFI->getTailCallReservedStack();
388 } else {
389 if (AFI->getTailCallReservedStack() != 0)
390 report_fatal_error("cannot generate ABI-changing tail call for Win64");
391 // Var args are stored here in the primary function.
392 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
393 // To support EH funclets we allocate an UnwindHelp object
394 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
395 return alignTo(VarArgsArea + UnwindHelpObject, 16);
396 }
397}
398
399/// Returns the size of the entire SVE stackframe (calleesaves + spills).
402 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
403}
404
406 if (!EnableRedZone)
407 return false;
408
409 // Don't use the red zone if the function explicitly asks us not to.
410 // This is typically used for kernel code.
411 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
412 const unsigned RedZoneSize =
414 if (!RedZoneSize)
415 return false;
416
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
419 uint64_t NumBytes = AFI->getLocalStackSize();
420
421 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
422 getSVEStackSize(MF));
423}
424
425/// hasFP - Return true if the specified function should have a dedicated frame
426/// pointer register.
428 const MachineFrameInfo &MFI = MF.getFrameInfo();
429 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
430
431 // Win64 EH requires a frame pointer if funclets are present, as the locals
432 // are accessed off the frame pointer in both the parent function and the
433 // funclets.
434 if (MF.hasEHFunclets())
435 return true;
436 // Retain behavior of always omitting the FP for leaf functions when possible.
438 return true;
439 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
440 MFI.hasStackMap() || MFI.hasPatchPoint() ||
441 RegInfo->hasStackRealignment(MF))
442 return true;
443 // With large callframes around we may need to use FP to access the scavenging
444 // emergency spillslot.
445 //
446 // Unfortunately some calls to hasFP() like machine verifier ->
447 // getReservedReg() -> hasFP in the middle of global isel are too early
448 // to know the max call frame size. Hopefully conservatively returning "true"
449 // in those cases is fine.
450 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
451 if (!MFI.isMaxCallFrameSizeComputed() ||
453 return true;
454
455 return false;
456}
457
458/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
459/// not required, we reserve argument space for call sites in the function
460/// immediately on entry to the current function. This eliminates the need for
461/// add/sub sp brackets around call sites. Returns true if the call frame is
462/// included as part of the stack frame.
463bool
465 return !MF.getFrameInfo().hasVarSizedObjects();
466}
467
471 const AArch64InstrInfo *TII =
472 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
473 DebugLoc DL = I->getDebugLoc();
474 unsigned Opc = I->getOpcode();
475 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
476 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
477
478 if (!hasReservedCallFrame(MF)) {
479 int64_t Amount = I->getOperand(0).getImm();
480 Amount = alignTo(Amount, getStackAlign());
481 if (!IsDestroy)
482 Amount = -Amount;
483
484 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
485 // doesn't have to pop anything), then the first operand will be zero too so
486 // this adjustment is a no-op.
487 if (CalleePopAmount == 0) {
488 // FIXME: in-function stack adjustment for calls is limited to 24-bits
489 // because there's no guaranteed temporary register available.
490 //
491 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
492 // 1) For offset <= 12-bit, we use LSL #0
493 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
494 // LSL #0, and the other uses LSL #12.
495 //
496 // Most call frames will be allocated at the start of a function so
497 // this is OK, but it is a limitation that needs dealing with.
498 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
499 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
500 StackOffset::getFixed(Amount), TII);
501 }
502 } else if (CalleePopAmount != 0) {
503 // If the calling convention demands that the callee pops arguments from the
504 // stack, we want to add it back if we have a reserved call frame.
505 assert(CalleePopAmount < 0xffffff && "call frame too large");
506 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
507 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
508 }
509 return MBB.erase(I);
510}
511
512void AArch64FrameLowering::emitCalleeSavedGPRLocations(
515 MachineFrameInfo &MFI = MF.getFrameInfo();
516
517 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
518 if (CSI.empty())
519 return;
520
521 const TargetSubtargetInfo &STI = MF.getSubtarget();
522 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
523 const TargetInstrInfo &TII = *STI.getInstrInfo();
525
526 for (const auto &Info : CSI) {
527 if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector)
528 continue;
529
530 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
531 unsigned DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
532
533 int64_t Offset =
534 MFI.getObjectOffset(Info.getFrameIdx()) - getOffsetOfLocalArea();
535 unsigned CFIIndex = MF.addFrameInst(
536 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
537 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
538 .addCFIIndex(CFIIndex)
540 }
541}
542
543void AArch64FrameLowering::emitCalleeSavedSVELocations(
546 MachineFrameInfo &MFI = MF.getFrameInfo();
547
548 // Add callee saved registers to move list.
549 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
550 if (CSI.empty())
551 return;
552
553 const TargetSubtargetInfo &STI = MF.getSubtarget();
554 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
555 const TargetInstrInfo &TII = *STI.getInstrInfo();
558
559 for (const auto &Info : CSI) {
560 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
561 continue;
562
563 // Not all unwinders may know about SVE registers, so assume the lowest
564 // common demoninator.
565 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
566 unsigned Reg = Info.getReg();
567 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
568 continue;
569
571 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
573
574 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
575 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
576 .addCFIIndex(CFIIndex)
578 }
579}
580
584 unsigned DwarfReg) {
585 unsigned CFIIndex =
586 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
587 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
588}
589
591 MachineBasicBlock &MBB) const {
592
594 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
595 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
596 const auto &TRI =
597 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
598 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
599
600 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
601 DebugLoc DL;
602
603 // Reset the CFA to `SP + 0`.
605 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
606 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
607 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
608
609 // Flip the RA sign state.
610 if (MFI.shouldSignReturnAddress(MF)) {
612 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
613 }
614
615 // Shadow call stack uses X18, reset it.
617 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
618 TRI.getDwarfRegNum(AArch64::X18, true));
619
620 // Emit .cfi_same_value for callee-saved registers.
621 const std::vector<CalleeSavedInfo> &CSI =
623 for (const auto &Info : CSI) {
624 unsigned Reg = Info.getReg();
625 if (!TRI.regNeedsCFI(Reg, Reg))
626 continue;
627 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
628 TRI.getDwarfRegNum(Reg, true));
629 }
630}
631
634 bool SVE) {
636 MachineFrameInfo &MFI = MF.getFrameInfo();
637
638 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
639 if (CSI.empty())
640 return;
641
642 const TargetSubtargetInfo &STI = MF.getSubtarget();
643 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
644 const TargetInstrInfo &TII = *STI.getInstrInfo();
646
647 for (const auto &Info : CSI) {
648 if (SVE !=
649 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
650 continue;
651
652 unsigned Reg = Info.getReg();
653 if (SVE &&
654 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
655 continue;
656
657 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
658 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
659 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
660 .addCFIIndex(CFIIndex)
662 }
663}
664
665void AArch64FrameLowering::emitCalleeSavedGPRRestores(
668}
669
670void AArch64FrameLowering::emitCalleeSavedSVERestores(
673}
674
675static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
676 switch (Reg.id()) {
677 default:
678 // The called routine is expected to preserve r19-r28
679 // r29 and r30 are used as frame pointer and link register resp.
680 return 0;
681
682 // GPRs
683#define CASE(n) \
684 case AArch64::W##n: \
685 case AArch64::X##n: \
686 return AArch64::X##n
687 CASE(0);
688 CASE(1);
689 CASE(2);
690 CASE(3);
691 CASE(4);
692 CASE(5);
693 CASE(6);
694 CASE(7);
695 CASE(8);
696 CASE(9);
697 CASE(10);
698 CASE(11);
699 CASE(12);
700 CASE(13);
701 CASE(14);
702 CASE(15);
703 CASE(16);
704 CASE(17);
705 CASE(18);
706#undef CASE
707
708 // FPRs
709#define CASE(n) \
710 case AArch64::B##n: \
711 case AArch64::H##n: \
712 case AArch64::S##n: \
713 case AArch64::D##n: \
714 case AArch64::Q##n: \
715 return HasSVE ? AArch64::Z##n : AArch64::Q##n
716 CASE(0);
717 CASE(1);
718 CASE(2);
719 CASE(3);
720 CASE(4);
721 CASE(5);
722 CASE(6);
723 CASE(7);
724 CASE(8);
725 CASE(9);
726 CASE(10);
727 CASE(11);
728 CASE(12);
729 CASE(13);
730 CASE(14);
731 CASE(15);
732 CASE(16);
733 CASE(17);
734 CASE(18);
735 CASE(19);
736 CASE(20);
737 CASE(21);
738 CASE(22);
739 CASE(23);
740 CASE(24);
741 CASE(25);
742 CASE(26);
743 CASE(27);
744 CASE(28);
745 CASE(29);
746 CASE(30);
747 CASE(31);
748#undef CASE
749 }
750}
751
752void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
753 MachineBasicBlock &MBB) const {
754 // Insertion point.
756
757 // Fake a debug loc.
758 DebugLoc DL;
759 if (MBBI != MBB.end())
760 DL = MBBI->getDebugLoc();
761
762 const MachineFunction &MF = *MBB.getParent();
765
766 BitVector GPRsToZero(TRI.getNumRegs());
767 BitVector FPRsToZero(TRI.getNumRegs());
768 bool HasSVE = STI.hasSVE();
769 for (MCRegister Reg : RegsToZero.set_bits()) {
770 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
771 // For GPRs, we only care to clear out the 64-bit register.
772 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
773 GPRsToZero.set(XReg);
774 } else if (AArch64::FPR128RegClass.contains(Reg) ||
775 AArch64::FPR64RegClass.contains(Reg) ||
776 AArch64::FPR32RegClass.contains(Reg) ||
777 AArch64::FPR16RegClass.contains(Reg) ||
778 AArch64::FPR8RegClass.contains(Reg)) {
779 // For FPRs,
780 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
781 FPRsToZero.set(XReg);
782 }
783 }
784
785 const AArch64InstrInfo &TII = *STI.getInstrInfo();
786
787 // Zero out GPRs.
788 for (MCRegister Reg : GPRsToZero.set_bits())
789 TII.buildClearRegister(Reg, MBB, MBBI, DL);
790
791 // Zero out FP/vector registers.
792 for (MCRegister Reg : FPRsToZero.set_bits())
793 TII.buildClearRegister(Reg, MBB, MBBI, DL);
794
795 if (HasSVE) {
796 for (MCRegister PReg :
797 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
798 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
799 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
800 AArch64::P15}) {
801 if (RegsToZero[PReg])
802 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
803 }
804 }
805}
806
807// Find a scratch register that we can use at the start of the prologue to
808// re-align the stack pointer. We avoid using callee-save registers since they
809// may appear to be free when this is called from canUseAsPrologue (during
810// shrink wrapping), but then no longer be free when this is called from
811// emitPrologue.
812//
813// FIXME: This is a bit conservative, since in the above case we could use one
814// of the callee-save registers as a scratch temp to re-align the stack pointer,
815// but we would then have to make sure that we were in fact saving at least one
816// callee-save register in the prologue, which is additional complexity that
817// doesn't seem worth the benefit.
820
821 // If MBB is an entry block, use X9 as the scratch register
822 if (&MF->front() == MBB)
823 return AArch64::X9;
824
825 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
826 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
827 LivePhysRegs LiveRegs(TRI);
828 LiveRegs.addLiveIns(*MBB);
829
830 // Mark callee saved registers as used so we will not choose them.
831 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
832 for (unsigned i = 0; CSRegs[i]; ++i)
833 LiveRegs.addReg(CSRegs[i]);
834
835 // Prefer X9 since it was historically used for the prologue scratch reg.
836 const MachineRegisterInfo &MRI = MF->getRegInfo();
837 if (LiveRegs.available(MRI, AArch64::X9))
838 return AArch64::X9;
839
840 for (unsigned Reg : AArch64::GPR64RegClass) {
841 if (LiveRegs.available(MRI, Reg))
842 return Reg;
843 }
844 return AArch64::NoRegister;
845}
846
848 const MachineBasicBlock &MBB) const {
849 const MachineFunction *MF = MBB.getParent();
850 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
851 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
852 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
853
854 // Don't need a scratch register if we're not going to re-align the stack.
855 if (!RegInfo->hasStackRealignment(*MF))
856 return true;
857 // Otherwise, we can use any block as long as it has a scratch register
858 // available.
859 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
860}
861
863 uint64_t StackSizeInBytes) {
864 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
865 if (!Subtarget.isTargetWindows())
866 return false;
867 const Function &F = MF.getFunction();
868 // TODO: When implementing stack protectors, take that into account
869 // for the probe threshold.
870 unsigned StackProbeSize =
871 F.getFnAttributeAsParsedInteger("stack-probe-size", 4096);
872 return (StackSizeInBytes >= StackProbeSize) &&
873 !F.hasFnAttribute("no-stack-arg-probe");
874}
875
876static bool needsWinCFI(const MachineFunction &MF) {
877 const Function &F = MF.getFunction();
878 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
879 F.needsUnwindTableEntry();
880}
881
882bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
883 MachineFunction &MF, uint64_t StackBumpBytes) const {
885 const MachineFrameInfo &MFI = MF.getFrameInfo();
886 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
887 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
888 if (homogeneousPrologEpilog(MF))
889 return false;
890
891 if (AFI->getLocalStackSize() == 0)
892 return false;
893
894 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
895 // (to force a stp with predecrement) to match the packed unwind format,
896 // provided that there actually are any callee saved registers to merge the
897 // decrement with.
898 // This is potentially marginally slower, but allows using the packed
899 // unwind format for functions that both have a local area and callee saved
900 // registers. Using the packed unwind format notably reduces the size of
901 // the unwind info.
902 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
903 MF.getFunction().hasOptSize())
904 return false;
905
906 // 512 is the maximum immediate for stp/ldp that will be used for
907 // callee-save save/restores
908 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
909 return false;
910
911 if (MFI.hasVarSizedObjects())
912 return false;
913
914 if (RegInfo->hasStackRealignment(MF))
915 return false;
916
917 // This isn't strictly necessary, but it simplifies things a bit since the
918 // current RedZone handling code assumes the SP is adjusted by the
919 // callee-save save/restore code.
920 if (canUseRedZone(MF))
921 return false;
922
923 // When there is an SVE area on the stack, always allocate the
924 // callee-saves and spills/locals separately.
925 if (getSVEStackSize(MF))
926 return false;
927
928 return true;
929}
930
931bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
932 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
933 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
934 return false;
935
936 if (MBB.empty())
937 return true;
938
939 // Disable combined SP bump if the last instruction is an MTE tag store. It
940 // is almost always better to merge SP adjustment into those instructions.
943 while (LastI != Begin) {
944 --LastI;
945 if (LastI->isTransient())
946 continue;
947 if (!LastI->getFlag(MachineInstr::FrameDestroy))
948 break;
949 }
950 switch (LastI->getOpcode()) {
951 case AArch64::STGloop:
952 case AArch64::STZGloop:
953 case AArch64::STGi:
954 case AArch64::STZGi:
955 case AArch64::ST2Gi:
956 case AArch64::STZ2Gi:
957 return false;
958 default:
959 return true;
960 }
961 llvm_unreachable("unreachable");
962}
963
964// Given a load or a store instruction, generate an appropriate unwinding SEH
965// code on Windows.
967 const TargetInstrInfo &TII,
969 unsigned Opc = MBBI->getOpcode();
971 MachineFunction &MF = *MBB->getParent();
972 DebugLoc DL = MBBI->getDebugLoc();
973 unsigned ImmIdx = MBBI->getNumOperands() - 1;
974 int Imm = MBBI->getOperand(ImmIdx).getImm();
976 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
977 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
978
979 switch (Opc) {
980 default:
981 llvm_unreachable("No SEH Opcode for this instruction");
982 case AArch64::LDPDpost:
983 Imm = -Imm;
984 [[fallthrough]];
985 case AArch64::STPDpre: {
986 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
987 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
988 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
989 .addImm(Reg0)
990 .addImm(Reg1)
991 .addImm(Imm * 8)
992 .setMIFlag(Flag);
993 break;
994 }
995 case AArch64::LDPXpost:
996 Imm = -Imm;
997 [[fallthrough]];
998 case AArch64::STPXpre: {
999 Register Reg0 = MBBI->getOperand(1).getReg();
1000 Register Reg1 = MBBI->getOperand(2).getReg();
1001 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1002 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1003 .addImm(Imm * 8)
1004 .setMIFlag(Flag);
1005 else
1006 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1007 .addImm(RegInfo->getSEHRegNum(Reg0))
1008 .addImm(RegInfo->getSEHRegNum(Reg1))
1009 .addImm(Imm * 8)
1010 .setMIFlag(Flag);
1011 break;
1012 }
1013 case AArch64::LDRDpost:
1014 Imm = -Imm;
1015 [[fallthrough]];
1016 case AArch64::STRDpre: {
1017 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1018 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1019 .addImm(Reg)
1020 .addImm(Imm)
1021 .setMIFlag(Flag);
1022 break;
1023 }
1024 case AArch64::LDRXpost:
1025 Imm = -Imm;
1026 [[fallthrough]];
1027 case AArch64::STRXpre: {
1028 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1029 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1030 .addImm(Reg)
1031 .addImm(Imm)
1032 .setMIFlag(Flag);
1033 break;
1034 }
1035 case AArch64::STPDi:
1036 case AArch64::LDPDi: {
1037 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1038 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1039 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1040 .addImm(Reg0)
1041 .addImm(Reg1)
1042 .addImm(Imm * 8)
1043 .setMIFlag(Flag);
1044 break;
1045 }
1046 case AArch64::STPXi:
1047 case AArch64::LDPXi: {
1048 Register Reg0 = MBBI->getOperand(0).getReg();
1049 Register Reg1 = MBBI->getOperand(1).getReg();
1050 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1051 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1052 .addImm(Imm * 8)
1053 .setMIFlag(Flag);
1054 else
1055 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1056 .addImm(RegInfo->getSEHRegNum(Reg0))
1057 .addImm(RegInfo->getSEHRegNum(Reg1))
1058 .addImm(Imm * 8)
1059 .setMIFlag(Flag);
1060 break;
1061 }
1062 case AArch64::STRXui:
1063 case AArch64::LDRXui: {
1064 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1065 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1066 .addImm(Reg)
1067 .addImm(Imm * 8)
1068 .setMIFlag(Flag);
1069 break;
1070 }
1071 case AArch64::STRDui:
1072 case AArch64::LDRDui: {
1073 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1074 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1075 .addImm(Reg)
1076 .addImm(Imm * 8)
1077 .setMIFlag(Flag);
1078 break;
1079 }
1080 }
1081 auto I = MBB->insertAfter(MBBI, MIB);
1082 return I;
1083}
1084
1085// Fix up the SEH opcode associated with the save/restore instruction.
1087 unsigned LocalStackSize) {
1088 MachineOperand *ImmOpnd = nullptr;
1089 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1090 switch (MBBI->getOpcode()) {
1091 default:
1092 llvm_unreachable("Fix the offset in the SEH instruction");
1093 case AArch64::SEH_SaveFPLR:
1094 case AArch64::SEH_SaveRegP:
1095 case AArch64::SEH_SaveReg:
1096 case AArch64::SEH_SaveFRegP:
1097 case AArch64::SEH_SaveFReg:
1098 ImmOpnd = &MBBI->getOperand(ImmIdx);
1099 break;
1100 }
1101 if (ImmOpnd)
1102 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1103}
1104
1105// Convert callee-save register save/restore instruction to do stack pointer
1106// decrement/increment to allocate/deallocate the callee-save stack area by
1107// converting store/load to use pre/post increment version.
1110 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1111 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1113 int CFAOffset = 0) {
1114 unsigned NewOpc;
1115 switch (MBBI->getOpcode()) {
1116 default:
1117 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1118 case AArch64::STPXi:
1119 NewOpc = AArch64::STPXpre;
1120 break;
1121 case AArch64::STPDi:
1122 NewOpc = AArch64::STPDpre;
1123 break;
1124 case AArch64::STPQi:
1125 NewOpc = AArch64::STPQpre;
1126 break;
1127 case AArch64::STRXui:
1128 NewOpc = AArch64::STRXpre;
1129 break;
1130 case AArch64::STRDui:
1131 NewOpc = AArch64::STRDpre;
1132 break;
1133 case AArch64::STRQui:
1134 NewOpc = AArch64::STRQpre;
1135 break;
1136 case AArch64::LDPXi:
1137 NewOpc = AArch64::LDPXpost;
1138 break;
1139 case AArch64::LDPDi:
1140 NewOpc = AArch64::LDPDpost;
1141 break;
1142 case AArch64::LDPQi:
1143 NewOpc = AArch64::LDPQpost;
1144 break;
1145 case AArch64::LDRXui:
1146 NewOpc = AArch64::LDRXpost;
1147 break;
1148 case AArch64::LDRDui:
1149 NewOpc = AArch64::LDRDpost;
1150 break;
1151 case AArch64::LDRQui:
1152 NewOpc = AArch64::LDRQpost;
1153 break;
1154 }
1155 // Get rid of the SEH code associated with the old instruction.
1156 if (NeedsWinCFI) {
1157 auto SEH = std::next(MBBI);
1159 SEH->eraseFromParent();
1160 }
1161
1162 TypeSize Scale = TypeSize::Fixed(1);
1163 unsigned Width;
1164 int64_t MinOffset, MaxOffset;
1165 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1166 NewOpc, Scale, Width, MinOffset, MaxOffset);
1167 (void)Success;
1168 assert(Success && "unknown load/store opcode");
1169
1170 // If the first store isn't right where we want SP then we can't fold the
1171 // update in so create a normal arithmetic instruction instead.
1172 MachineFunction &MF = *MBB.getParent();
1173 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1174 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1175 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1176 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1177 false, false, nullptr, EmitCFI,
1178 StackOffset::getFixed(CFAOffset));
1179
1180 return std::prev(MBBI);
1181 }
1182
1183 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1184 MIB.addReg(AArch64::SP, RegState::Define);
1185
1186 // Copy all operands other than the immediate offset.
1187 unsigned OpndIdx = 0;
1188 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1189 ++OpndIdx)
1190 MIB.add(MBBI->getOperand(OpndIdx));
1191
1192 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1193 "Unexpected immediate offset in first/last callee-save save/restore "
1194 "instruction!");
1195 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1196 "Unexpected base register in callee-save save/restore instruction!");
1197 assert(CSStackSizeInc % Scale == 0);
1198 MIB.addImm(CSStackSizeInc / (int)Scale);
1199
1200 MIB.setMIFlags(MBBI->getFlags());
1201 MIB.setMemRefs(MBBI->memoperands());
1202
1203 // Generate a new SEH code that corresponds to the new instruction.
1204 if (NeedsWinCFI) {
1205 *HasWinCFI = true;
1206 InsertSEH(*MIB, *TII, FrameFlag);
1207 }
1208
1209 if (EmitCFI) {
1210 unsigned CFIIndex = MF.addFrameInst(
1211 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1212 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1213 .addCFIIndex(CFIIndex)
1214 .setMIFlags(FrameFlag);
1215 }
1216
1217 return std::prev(MBB.erase(MBBI));
1218}
1219
1220// Fixup callee-save register save/restore instructions to take into account
1221// combined SP bump by adding the local stack size to the stack offsets.
1223 uint64_t LocalStackSize,
1224 bool NeedsWinCFI,
1225 bool *HasWinCFI) {
1227 return;
1228
1229 unsigned Opc = MI.getOpcode();
1230 unsigned Scale;
1231 switch (Opc) {
1232 case AArch64::STPXi:
1233 case AArch64::STRXui:
1234 case AArch64::STPDi:
1235 case AArch64::STRDui:
1236 case AArch64::LDPXi:
1237 case AArch64::LDRXui:
1238 case AArch64::LDPDi:
1239 case AArch64::LDRDui:
1240 Scale = 8;
1241 break;
1242 case AArch64::STPQi:
1243 case AArch64::STRQui:
1244 case AArch64::LDPQi:
1245 case AArch64::LDRQui:
1246 Scale = 16;
1247 break;
1248 default:
1249 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1250 }
1251
1252 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1253 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1254 "Unexpected base register in callee-save save/restore instruction!");
1255 // Last operand is immediate offset that needs fixing.
1256 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1257 // All generated opcodes have scaled offsets.
1258 assert(LocalStackSize % Scale == 0);
1259 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1260
1261 if (NeedsWinCFI) {
1262 *HasWinCFI = true;
1263 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1264 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1266 "Expecting a SEH instruction");
1267 fixupSEHOpcode(MBBI, LocalStackSize);
1268 }
1269}
1270
1271static bool isTargetWindows(const MachineFunction &MF) {
1273}
1274
1275// Convenience function to determine whether I is an SVE callee save.
1277 switch (I->getOpcode()) {
1278 default:
1279 return false;
1280 case AArch64::STR_ZXI:
1281 case AArch64::STR_PXI:
1282 case AArch64::LDR_ZXI:
1283 case AArch64::LDR_PXI:
1284 return I->getFlag(MachineInstr::FrameSetup) ||
1285 I->getFlag(MachineInstr::FrameDestroy);
1286 }
1287}
1288
1290 if (!(llvm::any_of(
1292 [](const auto &Info) { return Info.getReg() == AArch64::LR; }) &&
1293 MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)))
1294 return false;
1295
1297 report_fatal_error("Must reserve x18 to use shadow call stack");
1298
1299 return true;
1300}
1301
1303 MachineFunction &MF,
1306 const DebugLoc &DL, bool NeedsWinCFI,
1307 bool NeedsUnwindInfo) {
1308 // Shadow call stack prolog: str x30, [x18], #8
1309 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1310 .addReg(AArch64::X18, RegState::Define)
1311 .addReg(AArch64::LR)
1312 .addReg(AArch64::X18)
1313 .addImm(8)
1315
1316 // This instruction also makes x18 live-in to the entry block.
1317 MBB.addLiveIn(AArch64::X18);
1318
1319 if (NeedsWinCFI)
1320 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1322
1323 if (NeedsUnwindInfo) {
1324 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1325 // x18 when unwinding past this frame.
1326 static const char CFIInst[] = {
1327 dwarf::DW_CFA_val_expression,
1328 18, // register
1329 2, // length
1330 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1331 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1332 };
1333 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1334 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1335 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1336 .addCFIIndex(CFIIndex)
1338 }
1339}
1340
1342 MachineFunction &MF,
1345 const DebugLoc &DL) {
1346 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1347 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1348 .addReg(AArch64::X18, RegState::Define)
1349 .addReg(AArch64::LR, RegState::Define)
1350 .addReg(AArch64::X18)
1351 .addImm(-8)
1353
1355 unsigned CFIIndex =
1357 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1358 .addCFIIndex(CFIIndex)
1360 }
1361}
1362
1363// Define the current CFA rule to use the provided FP.
1366 const DebugLoc &DL, unsigned FixedObject) {
1369 const TargetInstrInfo *TII = STI.getInstrInfo();
1371
1372 const int OffsetToFirstCalleeSaveFromFP =
1375 Register FramePtr = TRI->getFrameRegister(MF);
1376 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1377 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1378 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1379 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1380 .addCFIIndex(CFIIndex)
1382}
1383
1385 MachineBasicBlock &MBB) const {
1387 const MachineFrameInfo &MFI = MF.getFrameInfo();
1388 const Function &F = MF.getFunction();
1389 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1390 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1391 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1392 MachineModuleInfo &MMI = MF.getMMI();
1394 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1395 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1396 bool HasFP = hasFP(MF);
1397 bool NeedsWinCFI = needsWinCFI(MF);
1398 bool HasWinCFI = false;
1399 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1400
1401 bool IsFunclet = MBB.isEHFuncletEntry();
1402
1403 // At this point, we're going to decide whether or not the function uses a
1404 // redzone. In most cases, the function doesn't have a redzone so let's
1405 // assume that's false and set it to true in the case that there's a redzone.
1406 AFI->setHasRedZone(false);
1407
1408 // Debug location must be unknown since the first debug location is used
1409 // to determine the end of the prologue.
1410 DebugLoc DL;
1411
1412 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1414 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1415 MFnI.needsDwarfUnwindInfo(MF));
1416
1417 if (MFnI.shouldSignReturnAddress(MF)) {
1418 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1420 if (NeedsWinCFI)
1421 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1422 }
1423
1424 if (EmitCFI && MFnI.isMTETagged()) {
1425 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1427 }
1428
1429 // We signal the presence of a Swift extended frame to external tools by
1430 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1431 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1432 // bits so that is still true.
1433 if (HasFP && AFI->hasSwiftAsyncContext()) {
1436 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1437 // The special symbol below is absolute and has a *value* that can be
1438 // combined with the frame pointer to signal an extended frame.
1439 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1440 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1442 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1443 .addUse(AArch64::FP)
1444 .addUse(AArch64::X16)
1445 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1446 break;
1447 }
1448 [[fallthrough]];
1449
1451 // ORR x29, x29, #0x1000_0000_0000_0000
1452 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1453 .addUse(AArch64::FP)
1454 .addImm(0x1100)
1456 break;
1457
1459 break;
1460 }
1461 }
1462
1463 // All calls are tail calls in GHC calling conv, and functions have no
1464 // prologue/epilogue.
1466 return;
1467
1468 // Set tagged base pointer to the requested stack slot.
1469 // Ideally it should match SP value after prologue.
1470 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1471 if (TBPI)
1473 else
1475
1476 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1477
1478 // getStackSize() includes all the locals in its size calculation. We don't
1479 // include these locals when computing the stack size of a funclet, as they
1480 // are allocated in the parent's stack frame and accessed via the frame
1481 // pointer from the funclet. We only save the callee saved registers in the
1482 // funclet, which are really the callee saved registers of the parent
1483 // function, including the funclet.
1484 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1485 : MFI.getStackSize();
1486 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1487 assert(!HasFP && "unexpected function without stack frame but with FP");
1488 assert(!SVEStackSize &&
1489 "unexpected function without stack frame but with SVE objects");
1490 // All of the stack allocation is for locals.
1491 AFI->setLocalStackSize(NumBytes);
1492 if (!NumBytes)
1493 return;
1494 // REDZONE: If the stack size is less than 128 bytes, we don't need
1495 // to actually allocate.
1496 if (canUseRedZone(MF)) {
1497 AFI->setHasRedZone(true);
1498 ++NumRedZoneFunctions;
1499 } else {
1500 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1501 StackOffset::getFixed(-NumBytes), TII,
1502 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1503 if (EmitCFI) {
1504 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1505 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1506 // Encode the stack size of the leaf function.
1507 unsigned CFIIndex = MF.addFrameInst(
1508 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1509 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1510 .addCFIIndex(CFIIndex)
1512 }
1513 }
1514
1515 if (NeedsWinCFI) {
1516 HasWinCFI = true;
1517 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1519 }
1520
1521 return;
1522 }
1523
1524 bool IsWin64 =
1526 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1527
1528 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1529 // All of the remaining stack allocations are for locals.
1530 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1531 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1532 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1533 if (CombineSPBump) {
1534 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1535 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1536 StackOffset::getFixed(-NumBytes), TII,
1537 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1538 EmitAsyncCFI);
1539 NumBytes = 0;
1540 } else if (HomPrologEpilog) {
1541 // Stack has been already adjusted.
1542 NumBytes -= PrologueSaveSize;
1543 } else if (PrologueSaveSize != 0) {
1545 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1546 EmitAsyncCFI);
1547 NumBytes -= PrologueSaveSize;
1548 }
1549 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1550
1551 // Move past the saves of the callee-saved registers, fixing up the offsets
1552 // and pre-inc if we decided to combine the callee-save and local stack
1553 // pointer bump above.
1555 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1557 if (CombineSPBump)
1559 NeedsWinCFI, &HasWinCFI);
1560 ++MBBI;
1561 }
1562
1563 // For funclets the FP belongs to the containing function.
1564 if (!IsFunclet && HasFP) {
1565 // Only set up FP if we actually need to.
1566 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1567
1568 if (CombineSPBump)
1569 FPOffset += AFI->getLocalStackSize();
1570
1571 if (AFI->hasSwiftAsyncContext()) {
1572 // Before we update the live FP we have to ensure there's a valid (or
1573 // null) asynchronous context in its slot just before FP in the frame
1574 // record, so store it now.
1575 const auto &Attrs = MF.getFunction().getAttributes();
1576 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1577 if (HaveInitialContext)
1578 MBB.addLiveIn(AArch64::X22);
1579 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1580 .addUse(HaveInitialContext ? AArch64::X22 : AArch64::XZR)
1581 .addUse(AArch64::SP)
1582 .addImm(FPOffset - 8)
1584 }
1585
1586 if (HomPrologEpilog) {
1587 auto Prolog = MBBI;
1588 --Prolog;
1589 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1590 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1591 } else {
1592 // Issue sub fp, sp, FPOffset or
1593 // mov fp,sp when FPOffset is zero.
1594 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1595 // This code marks the instruction(s) that set the FP also.
1596 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1597 StackOffset::getFixed(FPOffset), TII,
1598 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1599 if (NeedsWinCFI && HasWinCFI) {
1600 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1602 // After setting up the FP, the rest of the prolog doesn't need to be
1603 // included in the SEH unwind info.
1604 NeedsWinCFI = false;
1605 }
1606 }
1607 if (EmitAsyncCFI)
1608 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1609 }
1610
1611 // Now emit the moves for whatever callee saved regs we have (including FP,
1612 // LR if those are saved). Frame instructions for SVE register are emitted
1613 // later, after the instruction which actually save SVE regs.
1614 if (EmitAsyncCFI)
1615 emitCalleeSavedGPRLocations(MBB, MBBI);
1616
1617 // Alignment is required for the parent frame, not the funclet
1618 const bool NeedsRealignment =
1619 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1620 int64_t RealignmentPadding =
1621 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1622 ? MFI.getMaxAlign().value() - 16
1623 : 0;
1624
1625 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1626 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1627 if (NeedsWinCFI) {
1628 HasWinCFI = true;
1629 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1630 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1631 // This is at most two instructions, MOVZ follwed by MOVK.
1632 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1633 // exceeding 256MB in size.
1634 if (NumBytes >= (1 << 28))
1635 report_fatal_error("Stack size cannot exceed 256MB for stack "
1636 "unwinding purposes");
1637
1638 uint32_t LowNumWords = NumWords & 0xFFFF;
1639 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
1640 .addImm(LowNumWords)
1643 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1645 if ((NumWords & 0xFFFF0000) != 0) {
1646 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
1647 .addReg(AArch64::X15)
1648 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
1651 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1653 }
1654 } else {
1655 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
1656 .addImm(NumWords)
1658 }
1659
1660 const char* ChkStk = Subtarget.getChkStkName();
1661 switch (MF.getTarget().getCodeModel()) {
1662 case CodeModel::Tiny:
1663 case CodeModel::Small:
1664 case CodeModel::Medium:
1665 case CodeModel::Kernel:
1666 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
1667 .addExternalSymbol(ChkStk)
1668 .addReg(AArch64::X15, RegState::Implicit)
1673 if (NeedsWinCFI) {
1674 HasWinCFI = true;
1675 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1677 }
1678 break;
1679 case CodeModel::Large:
1680 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
1681 .addReg(AArch64::X16, RegState::Define)
1682 .addExternalSymbol(ChkStk)
1683 .addExternalSymbol(ChkStk)
1685 if (NeedsWinCFI) {
1686 HasWinCFI = true;
1687 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1689 }
1690
1691 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
1692 .addReg(AArch64::X16, RegState::Kill)
1698 if (NeedsWinCFI) {
1699 HasWinCFI = true;
1700 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1702 }
1703 break;
1704 }
1705
1706 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
1707 .addReg(AArch64::SP, RegState::Kill)
1708 .addReg(AArch64::X15, RegState::Kill)
1711 if (NeedsWinCFI) {
1712 HasWinCFI = true;
1713 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
1714 .addImm(NumBytes)
1716 }
1717 NumBytes = 0;
1718
1719 if (RealignmentPadding > 0) {
1720 if (RealignmentPadding >= 4096) {
1721 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
1722 .addReg(AArch64::X16, RegState::Define)
1723 .addImm(RealignmentPadding)
1725 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
1726 .addReg(AArch64::SP)
1727 .addReg(AArch64::X16, RegState::Kill)
1730 } else {
1731 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
1732 .addReg(AArch64::SP)
1733 .addImm(RealignmentPadding)
1734 .addImm(0)
1736 }
1737
1738 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1739 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1740 .addReg(AArch64::X15, RegState::Kill)
1742 AFI->setStackRealigned(true);
1743
1744 // No need for SEH instructions here; if we're realigning the stack,
1745 // we've set a frame pointer and already finished the SEH prologue.
1746 assert(!NeedsWinCFI);
1747 }
1748 }
1749
1750 StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};
1751 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
1752
1753 // Process the SVE callee-saves to determine what space needs to be
1754 // allocated.
1755 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
1756 // Find callee save instructions in frame.
1757 CalleeSavesBegin = MBBI;
1758 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
1760 ++MBBI;
1761 CalleeSavesEnd = MBBI;
1762
1763 AllocateBefore = StackOffset::getScalable(CalleeSavedSize);
1764 AllocateAfter = SVEStackSize - AllocateBefore;
1765 }
1766
1767 // Allocate space for the callee saves (if any).
1769 MBB, CalleeSavesBegin, DL, AArch64::SP, AArch64::SP, -AllocateBefore, TII,
1770 MachineInstr::FrameSetup, false, false, nullptr,
1771 EmitAsyncCFI && !HasFP && AllocateBefore,
1772 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1773
1774 if (EmitAsyncCFI)
1775 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
1776
1777 // Finally allocate remaining SVE stack space.
1778 emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,
1779 -AllocateAfter, TII, MachineInstr::FrameSetup, false, false,
1780 nullptr, EmitAsyncCFI && !HasFP && AllocateAfter,
1781 AllocateBefore + StackOffset::getFixed(
1782 (int64_t)MFI.getStackSize() - NumBytes));
1783
1784 // Allocate space for the rest of the frame.
1785 if (NumBytes) {
1786 unsigned scratchSPReg = AArch64::SP;
1787
1788 if (NeedsRealignment) {
1789 scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
1790 assert(scratchSPReg != AArch64::NoRegister);
1791 }
1792
1793 // If we're a leaf function, try using the red zone.
1794 if (!canUseRedZone(MF)) {
1795 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
1796 // the correct value here, as NumBytes also includes padding bytes,
1797 // which shouldn't be counted here.
1799 MBB, MBBI, DL, scratchSPReg, AArch64::SP,
1801 false, NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
1802 SVEStackSize +
1803 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1804 }
1805 if (NeedsRealignment) {
1806 assert(MFI.getMaxAlign() > Align(1));
1807 assert(scratchSPReg != AArch64::SP);
1808
1809 // SUB X9, SP, NumBytes
1810 // -- X9 is temporary register, so shouldn't contain any live data here,
1811 // -- free to use. This is already produced by emitFrameOffset above.
1812 // AND SP, X9, 0b11111...0000
1813 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1814
1815 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1816 .addReg(scratchSPReg, RegState::Kill)
1818 AFI->setStackRealigned(true);
1819
1820 // No need for SEH instructions here; if we're realigning the stack,
1821 // we've set a frame pointer and already finished the SEH prologue.
1822 assert(!NeedsWinCFI);
1823 }
1824 }
1825
1826 // If we need a base pointer, set it up here. It's whatever the value of the
1827 // stack pointer is at this point. Any variable size objects will be allocated
1828 // after this, so we can still use the base pointer to reference locals.
1829 //
1830 // FIXME: Clarify FrameSetup flags here.
1831 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
1832 // needed.
1833 // For funclets the BP belongs to the containing function.
1834 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
1835 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
1836 false);
1837 if (NeedsWinCFI) {
1838 HasWinCFI = true;
1839 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1841 }
1842 }
1843
1844 // The very last FrameSetup instruction indicates the end of prologue. Emit a
1845 // SEH opcode indicating the prologue end.
1846 if (NeedsWinCFI && HasWinCFI) {
1847 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1849 }
1850
1851 // SEH funclets are passed the frame pointer in X1. If the parent
1852 // function uses the base register, then the base register is used
1853 // directly, and is not retrieved from X1.
1854 if (IsFunclet && F.hasPersonalityFn()) {
1855 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
1856 if (isAsynchronousEHPersonality(Per)) {
1857 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
1858 .addReg(AArch64::X1)
1860 MBB.addLiveIn(AArch64::X1);
1861 }
1862 }
1863
1864 if (EmitCFI && !EmitAsyncCFI) {
1865 if (HasFP) {
1866 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
1867 } else {
1868 StackOffset TotalSize =
1869 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
1870 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
1871 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
1872 /*LastAdjustmentWasScalable=*/false));
1873 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1874 .addCFIIndex(CFIIndex)
1876 }
1877 emitCalleeSavedGPRLocations(MBB, MBBI);
1878 emitCalleeSavedSVELocations(MBB, MBBI);
1879 }
1880}
1881
1883 switch (MI.getOpcode()) {
1884 default:
1885 return false;
1886 case AArch64::CATCHRET:
1887 case AArch64::CLEANUPRET:
1888 return true;
1889 }
1890}
1891
1893 MachineBasicBlock &MBB) const {
1895 MachineFrameInfo &MFI = MF.getFrameInfo();
1897 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1898 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1899 DebugLoc DL;
1900 bool NeedsWinCFI = needsWinCFI(MF);
1901 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1902 bool HasWinCFI = false;
1903 bool IsFunclet = false;
1904
1905 if (MBB.end() != MBBI) {
1906 DL = MBBI->getDebugLoc();
1907 IsFunclet = isFuncletReturnInstr(*MBBI);
1908 }
1909
1910 MachineBasicBlock::iterator EpilogStartI = MBB.end();
1911
1912 auto FinishingTouches = make_scope_exit([&]() {
1913 if (AFI->shouldSignReturnAddress(MF)) {
1914 BuildMI(MBB, MBB.getFirstTerminator(), DL,
1915 TII->get(AArch64::PAUTH_EPILOGUE))
1916 .setMIFlag(MachineInstr::FrameDestroy);
1917 if (NeedsWinCFI)
1918 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1919 }
1922 if (EmitCFI)
1923 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
1924 if (HasWinCFI) {
1926 TII->get(AArch64::SEH_EpilogEnd))
1928 if (!MF.hasWinCFI())
1929 MF.setHasWinCFI(true);
1930 }
1931 if (NeedsWinCFI) {
1932 assert(EpilogStartI != MBB.end());
1933 if (!HasWinCFI)
1934 MBB.erase(EpilogStartI);
1935 }
1936 });
1937
1938 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1939 : MFI.getStackSize();
1940
1941 // All calls are tail calls in GHC calling conv, and functions have no
1942 // prologue/epilogue.
1944 return;
1945
1946 // How much of the stack used by incoming arguments this function is expected
1947 // to restore in this particular epilogue.
1948 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
1949 bool IsWin64 =
1950 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
1951 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1952
1953 int64_t AfterCSRPopSize = ArgumentStackToRestore;
1954 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1955 // We cannot rely on the local stack size set in emitPrologue if the function
1956 // has funclets, as funclets have different local stack size requirements, and
1957 // the current value set in emitPrologue may be that of the containing
1958 // function.
1959 if (MF.hasEHFunclets())
1960 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1961 if (homogeneousPrologEpilog(MF, &MBB)) {
1962 assert(!NeedsWinCFI);
1963 auto LastPopI = MBB.getFirstTerminator();
1964 if (LastPopI != MBB.begin()) {
1965 auto HomogeneousEpilog = std::prev(LastPopI);
1966 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
1967 LastPopI = HomogeneousEpilog;
1968 }
1969
1970 // Adjust local stack
1971 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
1973 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
1974
1975 // SP has been already adjusted while restoring callee save regs.
1976 // We've bailed-out the case with adjusting SP for arguments.
1977 assert(AfterCSRPopSize == 0);
1978 return;
1979 }
1980 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
1981 // Assume we can't combine the last pop with the sp restore.
1982
1983 bool CombineAfterCSRBump = false;
1984 if (!CombineSPBump && PrologueSaveSize != 0) {
1986 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
1988 Pop = std::prev(Pop);
1989 // Converting the last ldp to a post-index ldp is valid only if the last
1990 // ldp's offset is 0.
1991 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
1992 // If the offset is 0 and the AfterCSR pop is not actually trying to
1993 // allocate more stack for arguments (in space that an untimely interrupt
1994 // may clobber), convert it to a post-index ldp.
1995 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
1997 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
1998 MachineInstr::FrameDestroy, PrologueSaveSize);
1999 } else {
2000 // If not, make sure to emit an add after the last ldp.
2001 // We're doing this by transfering the size to be restored from the
2002 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2003 // pops.
2004 AfterCSRPopSize += PrologueSaveSize;
2005 CombineAfterCSRBump = true;
2006 }
2007 }
2008
2009 // Move past the restores of the callee-saved registers.
2010 // If we plan on combining the sp bump of the local stack size and the callee
2011 // save stack size, we might need to adjust the CSR save and restore offsets.
2014 while (LastPopI != Begin) {
2015 --LastPopI;
2016 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2017 IsSVECalleeSave(LastPopI)) {
2018 ++LastPopI;
2019 break;
2020 } else if (CombineSPBump)
2022 NeedsWinCFI, &HasWinCFI);
2023 }
2024
2025 if (NeedsWinCFI) {
2026 // Note that there are cases where we insert SEH opcodes in the
2027 // epilogue when we had no SEH opcodes in the prologue. For
2028 // example, when there is no stack frame but there are stack
2029 // arguments. Insert the SEH_EpilogStart and remove it later if it
2030 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2031 // functions that don't need it.
2032 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2034 EpilogStartI = LastPopI;
2035 --EpilogStartI;
2036 }
2037
2038 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2041 // Avoid the reload as it is GOT relative, and instead fall back to the
2042 // hardcoded value below. This allows a mismatch between the OS and
2043 // application without immediately terminating on the difference.
2044 [[fallthrough]];
2046 // We need to reset FP to its untagged state on return. Bit 60 is
2047 // currently used to show the presence of an extended frame.
2048
2049 // BIC x29, x29, #0x1000_0000_0000_0000
2050 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2051 AArch64::FP)
2052 .addUse(AArch64::FP)
2053 .addImm(0x10fe)
2055 break;
2056
2058 break;
2059 }
2060 }
2061
2062 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2063
2064 // If there is a single SP update, insert it before the ret and we're done.
2065 if (CombineSPBump) {
2066 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2067
2068 // When we are about to restore the CSRs, the CFA register is SP again.
2069 if (EmitCFI && hasFP(MF)) {
2070 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2071 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2072 unsigned CFIIndex =
2073 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2074 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2075 .addCFIIndex(CFIIndex)
2077 }
2078
2079 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2080 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2081 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2082 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2083 return;
2084 }
2085
2086 NumBytes -= PrologueSaveSize;
2087 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2088
2089 // Process the SVE callee-saves to determine what space needs to be
2090 // deallocated.
2091 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2092 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2093 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2094 RestoreBegin = std::prev(RestoreEnd);
2095 while (RestoreBegin != MBB.begin() &&
2096 IsSVECalleeSave(std::prev(RestoreBegin)))
2097 --RestoreBegin;
2098
2099 assert(IsSVECalleeSave(RestoreBegin) &&
2100 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2101
2102 StackOffset CalleeSavedSizeAsOffset =
2103 StackOffset::getScalable(CalleeSavedSize);
2104 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2105 DeallocateAfter = CalleeSavedSizeAsOffset;
2106 }
2107
2108 // Deallocate the SVE area.
2109 if (SVEStackSize) {
2110 // If we have stack realignment or variable sized objects on the stack,
2111 // restore the stack pointer from the frame pointer prior to SVE CSR
2112 // restoration.
2113 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2114 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2115 // Set SP to start of SVE callee-save area from which they can
2116 // be reloaded. The code below will deallocate the stack space
2117 // space by moving FP -> SP.
2118 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2119 StackOffset::getScalable(-CalleeSavedSize), TII,
2121 }
2122 } else {
2123 if (AFI->getSVECalleeSavedStackSize()) {
2124 // Deallocate the non-SVE locals first before we can deallocate (and
2125 // restore callee saves) from the SVE area.
2127 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2129 false, false, nullptr, EmitCFI && !hasFP(MF),
2130 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2131 NumBytes = 0;
2132 }
2133
2134 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2135 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2136 false, nullptr, EmitCFI && !hasFP(MF),
2137 SVEStackSize +
2138 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2139
2140 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2141 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2142 false, nullptr, EmitCFI && !hasFP(MF),
2143 DeallocateAfter +
2144 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2145 }
2146 if (EmitCFI)
2147 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2148 }
2149
2150 if (!hasFP(MF)) {
2151 bool RedZone = canUseRedZone(MF);
2152 // If this was a redzone leaf function, we don't need to restore the
2153 // stack pointer (but we may need to pop stack args for fastcc).
2154 if (RedZone && AfterCSRPopSize == 0)
2155 return;
2156
2157 // Pop the local variables off the stack. If there are no callee-saved
2158 // registers, it means we are actually positioned at the terminator and can
2159 // combine stack increment for the locals and the stack increment for
2160 // callee-popped arguments into (possibly) a single instruction and be done.
2161 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2162 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2163 if (NoCalleeSaveRestore)
2164 StackRestoreBytes += AfterCSRPopSize;
2165
2167 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2168 StackOffset::getFixed(StackRestoreBytes), TII,
2169 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2170 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2171
2172 // If we were able to combine the local stack pop with the argument pop,
2173 // then we're done.
2174 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2175 return;
2176 }
2177
2178 NumBytes = 0;
2179 }
2180
2181 // Restore the original stack pointer.
2182 // FIXME: Rather than doing the math here, we should instead just use
2183 // non-post-indexed loads for the restores if we aren't actually going to
2184 // be able to save any instructions.
2185 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2187 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2189 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2190 } else if (NumBytes)
2191 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2192 StackOffset::getFixed(NumBytes), TII,
2193 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2194
2195 // When we are about to restore the CSRs, the CFA register is SP again.
2196 if (EmitCFI && hasFP(MF)) {
2197 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2198 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2199 unsigned CFIIndex = MF.addFrameInst(
2200 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2201 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2202 .addCFIIndex(CFIIndex)
2204 }
2205
2206 // This must be placed after the callee-save restore code because that code
2207 // assumes the SP is at the same location as it was after the callee-save save
2208 // code in the prologue.
2209 if (AfterCSRPopSize) {
2210 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2211 "interrupt may have clobbered");
2212
2214 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2216 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2217 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2218 }
2219}
2220
2223 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2224}
2225
2226/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2227/// debug info. It's the same as what we use for resolving the code-gen
2228/// references for now. FIXME: This can go wrong when references are
2229/// SP-relative and simple call frames aren't used.
2232 Register &FrameReg) const {
2234 MF, FI, FrameReg,
2235 /*PreferFP=*/
2236 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress),
2237 /*ForSimm=*/false);
2238}
2239
2242 int FI) const {
2244}
2245
2247 int64_t ObjectOffset) {
2248 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2249 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2250 bool IsWin64 =
2251 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2252 unsigned FixedObject =
2253 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2254 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2255 int64_t FPAdjust =
2256 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2257 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2258}
2259
2261 int64_t ObjectOffset) {
2262 const auto &MFI = MF.getFrameInfo();
2263 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2264}
2265
2266 // TODO: This function currently does not work for scalable vectors.
2268 int FI) const {
2269 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2271 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2272 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2273 ? getFPOffset(MF, ObjectOffset).getFixed()
2274 : getStackOffset(MF, ObjectOffset).getFixed();
2275}
2276
2278 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2279 bool ForSimm) const {
2280 const auto &MFI = MF.getFrameInfo();
2281 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2282 bool isFixed = MFI.isFixedObjectIndex(FI);
2283 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2284 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2285 PreferFP, ForSimm);
2286}
2287
2289 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2290 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2291 const auto &MFI = MF.getFrameInfo();
2292 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2294 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2295 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2296
2297 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2298 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2299 bool isCSR =
2300 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2301
2302 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2303
2304 // Use frame pointer to reference fixed objects. Use it for locals if
2305 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2306 // reliable as a base). Make sure useFPForScavengingIndex() does the
2307 // right thing for the emergency spill slot.
2308 bool UseFP = false;
2309 if (AFI->hasStackFrame() && !isSVE) {
2310 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2311 // there are scalable (SVE) objects in between the FP and the fixed-sized
2312 // objects.
2313 PreferFP &= !SVEStackSize;
2314
2315 // Note: Keeping the following as multiple 'if' statements rather than
2316 // merging to a single expression for readability.
2317 //
2318 // Argument access should always use the FP.
2319 if (isFixed) {
2320 UseFP = hasFP(MF);
2321 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2322 // References to the CSR area must use FP if we're re-aligning the stack
2323 // since the dynamically-sized alignment padding is between the SP/BP and
2324 // the CSR area.
2325 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2326 UseFP = true;
2327 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2328 // If the FPOffset is negative and we're producing a signed immediate, we
2329 // have to keep in mind that the available offset range for negative
2330 // offsets is smaller than for positive ones. If an offset is available
2331 // via the FP and the SP, use whichever is closest.
2332 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2333 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2334
2335 if (MFI.hasVarSizedObjects()) {
2336 // If we have variable sized objects, we can use either FP or BP, as the
2337 // SP offset is unknown. We can use the base pointer if we have one and
2338 // FP is not preferred. If not, we're stuck with using FP.
2339 bool CanUseBP = RegInfo->hasBasePointer(MF);
2340 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2341 UseFP = PreferFP;
2342 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2343 UseFP = true;
2344 // else we can use BP and FP, but the offset from FP won't fit.
2345 // That will make us scavenge registers which we can probably avoid by
2346 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2347 } else if (FPOffset >= 0) {
2348 // Use SP or FP, whichever gives us the best chance of the offset
2349 // being in range for direct access. If the FPOffset is positive,
2350 // that'll always be best, as the SP will be even further away.
2351 UseFP = true;
2352 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2353 // Funclets access the locals contained in the parent's stack frame
2354 // via the frame pointer, so we have to use the FP in the parent
2355 // function.
2356 (void) Subtarget;
2357 assert(
2358 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2359 "Funclets should only be present on Win64");
2360 UseFP = true;
2361 } else {
2362 // We have the choice between FP and (SP or BP).
2363 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2364 UseFP = true;
2365 }
2366 }
2367 }
2368
2369 assert(
2370 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2371 "In the presence of dynamic stack pointer realignment, "
2372 "non-argument/CSR objects cannot be accessed through the frame pointer");
2373
2374 if (isSVE) {
2375 StackOffset FPOffset =
2377 StackOffset SPOffset =
2378 SVEStackSize +
2379 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2380 ObjectOffset);
2381 // Always use the FP for SVE spills if available and beneficial.
2382 if (hasFP(MF) && (SPOffset.getFixed() ||
2383 FPOffset.getScalable() < SPOffset.getScalable() ||
2384 RegInfo->hasStackRealignment(MF))) {
2385 FrameReg = RegInfo->getFrameRegister(MF);
2386 return FPOffset;
2387 }
2388
2389 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2390 : (unsigned)AArch64::SP;
2391 return SPOffset;
2392 }
2393
2394 StackOffset ScalableOffset = {};
2395 if (UseFP && !(isFixed || isCSR))
2396 ScalableOffset = -SVEStackSize;
2397 if (!UseFP && (isFixed || isCSR))
2398 ScalableOffset = SVEStackSize;
2399
2400 if (UseFP) {
2401 FrameReg = RegInfo->getFrameRegister(MF);
2402 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2403 }
2404
2405 // Use the base pointer if we have one.
2406 if (RegInfo->hasBasePointer(MF))
2407 FrameReg = RegInfo->getBaseRegister();
2408 else {
2409 assert(!MFI.hasVarSizedObjects() &&
2410 "Can't use SP when we have var sized objects.");
2411 FrameReg = AArch64::SP;
2412 // If we're using the red zone for this function, the SP won't actually
2413 // be adjusted, so the offsets will be negative. They're also all
2414 // within range of the signed 9-bit immediate instructions.
2415 if (canUseRedZone(MF))
2416 Offset -= AFI->getLocalStackSize();
2417 }
2418
2419 return StackOffset::getFixed(Offset) + ScalableOffset;
2420}
2421
2422static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2423 // Do not set a kill flag on values that are also marked as live-in. This
2424 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2425 // callee saved registers.
2426 // Omitting the kill flags is conservatively correct even if the live-in
2427 // is not used after all.
2428 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2429 return getKillRegState(!IsLiveIn);
2430}
2431
2433 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2435 return Subtarget.isTargetMachO() &&
2436 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2437 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2439}
2440
2441static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2442 bool NeedsWinCFI, bool IsFirst,
2443 const TargetRegisterInfo *TRI) {
2444 // If we are generating register pairs for a Windows function that requires
2445 // EH support, then pair consecutive registers only. There are no unwind
2446 // opcodes for saves/restores of non-consectuve register pairs.
2447 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2448 // save_lrpair.
2449 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2450
2451 if (Reg2 == AArch64::FP)
2452 return true;
2453 if (!NeedsWinCFI)
2454 return false;
2455 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2456 return false;
2457 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2458 // opcode. If this is the first register pair, it would end up with a
2459 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2460 // if LR is paired with something else than the first register.
2461 // The save_lrpair opcode requires the first register to be an odd one.
2462 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2463 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2464 return false;
2465 return true;
2466}
2467
2468/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2469/// WindowsCFI requires that only consecutive registers can be paired.
2470/// LR and FP need to be allocated together when the frame needs to save
2471/// the frame-record. This means any other register pairing with LR is invalid.
2472static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2473 bool UsesWinAAPCS, bool NeedsWinCFI,
2474 bool NeedsFrameRecord, bool IsFirst,
2475 const TargetRegisterInfo *TRI) {
2476 if (UsesWinAAPCS)
2477 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2478 TRI);
2479
2480 // If we need to store the frame record, don't pair any register
2481 // with LR other than FP.
2482 if (NeedsFrameRecord)
2483 return Reg2 == AArch64::LR;
2484
2485 return false;
2486}
2487
2488namespace {
2489
2490struct RegPairInfo {
2491 unsigned Reg1 = AArch64::NoRegister;
2492 unsigned Reg2 = AArch64::NoRegister;
2493 int FrameIdx;
2494 int Offset;
2495 enum RegType { GPR, FPR64, FPR128, PPR, ZPR } Type;
2496
2497 RegPairInfo() = default;
2498
2499 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2500
2501 unsigned getScale() const {
2502 switch (Type) {
2503 case PPR:
2504 return 2;
2505 case GPR:
2506 case FPR64:
2507 return 8;
2508 case ZPR:
2509 case FPR128:
2510 return 16;
2511 }
2512 llvm_unreachable("Unsupported type");
2513 }
2514
2515 bool isScalable() const { return Type == PPR || Type == ZPR; }
2516};
2517
2518} // end anonymous namespace
2519
2523 bool NeedsFrameRecord) {
2524
2525 if (CSI.empty())
2526 return;
2527
2528 bool IsWindows = isTargetWindows(MF);
2529 bool NeedsWinCFI = needsWinCFI(MF);
2531 MachineFrameInfo &MFI = MF.getFrameInfo();
2533 unsigned Count = CSI.size();
2534 (void)CC;
2535 // MachO's compact unwind format relies on all registers being stored in
2536 // pairs.
2539 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2540 "Odd number of callee-saved regs to spill!");
2541 int ByteOffset = AFI->getCalleeSavedStackSize();
2542 int StackFillDir = -1;
2543 int RegInc = 1;
2544 unsigned FirstReg = 0;
2545 if (NeedsWinCFI) {
2546 // For WinCFI, fill the stack from the bottom up.
2547 ByteOffset = 0;
2548 StackFillDir = 1;
2549 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2550 // backwards, to pair up registers starting from lower numbered registers.
2551 RegInc = -1;
2552 FirstReg = Count - 1;
2553 }
2554 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2555 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2556
2557 // When iterating backwards, the loop condition relies on unsigned wraparound.
2558 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2559 RegPairInfo RPI;
2560 RPI.Reg1 = CSI[i].getReg();
2561
2562 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2563 RPI.Type = RegPairInfo::GPR;
2564 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2565 RPI.Type = RegPairInfo::FPR64;
2566 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2567 RPI.Type = RegPairInfo::FPR128;
2568 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2569 RPI.Type = RegPairInfo::ZPR;
2570 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2571 RPI.Type = RegPairInfo::PPR;
2572 else
2573 llvm_unreachable("Unsupported register class.");
2574
2575 // Add the next reg to the pair if it is in the same register class.
2576 if (unsigned(i + RegInc) < Count) {
2577 Register NextReg = CSI[i + RegInc].getReg();
2578 bool IsFirst = i == FirstReg;
2579 switch (RPI.Type) {
2580 case RegPairInfo::GPR:
2581 if (AArch64::GPR64RegClass.contains(NextReg) &&
2582 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2583 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2584 TRI))
2585 RPI.Reg2 = NextReg;
2586 break;
2587 case RegPairInfo::FPR64:
2588 if (AArch64::FPR64RegClass.contains(NextReg) &&
2589 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2590 IsFirst, TRI))
2591 RPI.Reg2 = NextReg;
2592 break;
2593 case RegPairInfo::FPR128:
2594 if (AArch64::FPR128RegClass.contains(NextReg))
2595 RPI.Reg2 = NextReg;
2596 break;
2597 case RegPairInfo::PPR:
2598 case RegPairInfo::ZPR:
2599 break;
2600 }
2601 }
2602
2603 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2604 // list to come in sorted by frame index so that we can issue the store
2605 // pair instructions directly. Assert if we see anything otherwise.
2606 //
2607 // The order of the registers in the list is controlled by
2608 // getCalleeSavedRegs(), so they will always be in-order, as well.
2609 assert((!RPI.isPaired() ||
2610 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2611 "Out of order callee saved regs!");
2612
2613 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2614 RPI.Reg1 == AArch64::LR) &&
2615 "FrameRecord must be allocated together with LR");
2616
2617 // Windows AAPCS has FP and LR reversed.
2618 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2619 RPI.Reg2 == AArch64::LR) &&
2620 "FrameRecord must be allocated together with LR");
2621
2622 // MachO's compact unwind format relies on all registers being stored in
2623 // adjacent register pairs.
2627 (RPI.isPaired() &&
2628 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2629 RPI.Reg1 + 1 == RPI.Reg2))) &&
2630 "Callee-save registers not saved as adjacent register pair!");
2631
2632 RPI.FrameIdx = CSI[i].getFrameIdx();
2633 if (NeedsWinCFI &&
2634 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2635 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2636
2637 int Scale = RPI.getScale();
2638
2639 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2640 assert(OffsetPre % Scale == 0);
2641
2642 if (RPI.isScalable())
2643 ScalableByteOffset += StackFillDir * Scale;
2644 else
2645 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2646
2647 // Swift's async context is directly before FP, so allocate an extra
2648 // 8 bytes for it.
2649 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2650 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2651 (IsWindows && RPI.Reg2 == AArch64::LR)))
2652 ByteOffset += StackFillDir * 8;
2653
2654 assert(!(RPI.isScalable() && RPI.isPaired()) &&
2655 "Paired spill/fill instructions don't exist for SVE vectors");
2656
2657 // Round up size of non-pair to pair size if we need to pad the
2658 // callee-save area to ensure 16-byte alignment.
2659 if (NeedGapToAlignStack && !NeedsWinCFI &&
2660 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
2661 !RPI.isPaired() && ByteOffset % 16 != 0) {
2662 ByteOffset += 8 * StackFillDir;
2663 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2664 // A stack frame with a gap looks like this, bottom up:
2665 // d9, d8. x21, gap, x20, x19.
2666 // Set extra alignment on the x21 object to create the gap above it.
2667 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2668 NeedGapToAlignStack = false;
2669 }
2670
2671 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2672 assert(OffsetPost % Scale == 0);
2673 // If filling top down (default), we want the offset after incrementing it.
2674 // If filling bottom up (WinCFI) we need the original offset.
2675 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2676
2677 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2678 // Swift context can directly precede FP.
2679 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2680 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2681 (IsWindows && RPI.Reg2 == AArch64::LR)))
2682 Offset += 8;
2683 RPI.Offset = Offset / Scale;
2684
2685 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2686 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2687 "Offset out of bounds for LDP/STP immediate");
2688
2689 // Save the offset to frame record so that the FP register can point to the
2690 // innermost frame record (spilled FP and LR registers).
2691 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
2692 RPI.Reg2 == AArch64::FP) ||
2693 (IsWindows && RPI.Reg1 == AArch64::FP &&
2694 RPI.Reg2 == AArch64::LR)))
2696
2697 RegPairs.push_back(RPI);
2698 if (RPI.isPaired())
2699 i += RegInc;
2700 }
2701 if (NeedsWinCFI) {
2702 // If we need an alignment gap in the stack, align the topmost stack
2703 // object. A stack frame with a gap looks like this, bottom up:
2704 // x19, d8. d9, gap.
2705 // Set extra alignment on the topmost stack object (the first element in
2706 // CSI, which goes top down), to create the gap above it.
2707 if (AFI->hasCalleeSaveStackFreeSpace())
2708 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2709 // We iterated bottom up over the registers; flip RegPairs back to top
2710 // down order.
2711 std::reverse(RegPairs.begin(), RegPairs.end());
2712 }
2713}
2714
2718 MachineFunction &MF = *MBB.getParent();
2720 bool NeedsWinCFI = needsWinCFI(MF);
2721 DebugLoc DL;
2723
2724 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2725
2726 const MachineRegisterInfo &MRI = MF.getRegInfo();
2727 if (homogeneousPrologEpilog(MF)) {
2728 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2730
2731 for (auto &RPI : RegPairs) {
2732 MIB.addReg(RPI.Reg1);
2733 MIB.addReg(RPI.Reg2);
2734
2735 // Update register live in.
2736 if (!MRI.isReserved(RPI.Reg1))
2737 MBB.addLiveIn(RPI.Reg1);
2738 if (!MRI.isReserved(RPI.Reg2))
2739 MBB.addLiveIn(RPI.Reg2);
2740 }
2741 return true;
2742 }
2743 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2744 unsigned Reg1 = RPI.Reg1;
2745 unsigned Reg2 = RPI.Reg2;
2746 unsigned StrOpc;
2747
2748 // Issue sequence of spills for cs regs. The first spill may be converted
2749 // to a pre-decrement store later by emitPrologue if the callee-save stack
2750 // area allocation can't be combined with the local stack area allocation.
2751 // For example:
2752 // stp x22, x21, [sp, #0] // addImm(+0)
2753 // stp x20, x19, [sp, #16] // addImm(+2)
2754 // stp fp, lr, [sp, #32] // addImm(+4)
2755 // Rationale: This sequence saves uop updates compared to a sequence of
2756 // pre-increment spills like stp xi,xj,[sp,#-16]!
2757 // Note: Similar rationale and sequence for restores in epilog.
2758 unsigned Size;
2759 Align Alignment;
2760 switch (RPI.Type) {
2761 case RegPairInfo::GPR:
2762 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2763 Size = 8;
2764 Alignment = Align(8);
2765 break;
2766 case RegPairInfo::FPR64:
2767 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2768 Size = 8;
2769 Alignment = Align(8);
2770 break;
2771 case RegPairInfo::FPR128:
2772 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2773 Size = 16;
2774 Alignment = Align(16);
2775 break;
2776 case RegPairInfo::ZPR:
2777 StrOpc = AArch64::STR_ZXI;
2778 Size = 16;
2779 Alignment = Align(16);
2780 break;
2781 case RegPairInfo::PPR:
2782 StrOpc = AArch64::STR_PXI;
2783 Size = 2;
2784 Alignment = Align(2);
2785 break;
2786 }
2787 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2788 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2789 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2790 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2791 dbgs() << ")\n");
2792
2793 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2794 "Windows unwdinding requires a consecutive (FP,LR) pair");
2795 // Windows unwind codes require consecutive registers if registers are
2796 // paired. Make the switch here, so that the code below will save (x,x+1)
2797 // and not (x+1,x).
2798 unsigned FrameIdxReg1 = RPI.FrameIdx;
2799 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2800 if (NeedsWinCFI && RPI.isPaired()) {
2801 std::swap(Reg1, Reg2);
2802 std::swap(FrameIdxReg1, FrameIdxReg2);
2803 }
2804 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2805 if (!MRI.isReserved(Reg1))
2806 MBB.addLiveIn(Reg1);
2807 if (RPI.isPaired()) {
2808 if (!MRI.isReserved(Reg2))
2809 MBB.addLiveIn(Reg2);
2810 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2812 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2813 MachineMemOperand::MOStore, Size, Alignment));
2814 }
2815 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2816 .addReg(AArch64::SP)
2817 .addImm(RPI.Offset) // [sp, #offset*scale],
2818 // where factor*scale is implicit
2821 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2822 MachineMemOperand::MOStore, Size, Alignment));
2823 if (NeedsWinCFI)
2825
2826 // Update the StackIDs of the SVE stack slots.
2827 MachineFrameInfo &MFI = MF.getFrameInfo();
2828 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR)
2829 MFI.setStackID(RPI.FrameIdx, TargetStackID::ScalableVector);
2830
2831 }
2832 return true;
2833}
2834
2838 MachineFunction &MF = *MBB.getParent();
2840 DebugLoc DL;
2842 bool NeedsWinCFI = needsWinCFI(MF);
2843
2844 if (MBBI != MBB.end())
2845 DL = MBBI->getDebugLoc();
2846
2847 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2848
2849 auto EmitMI = [&](const RegPairInfo &RPI) -> MachineBasicBlock::iterator {
2850 unsigned Reg1 = RPI.Reg1;
2851 unsigned Reg2 = RPI.Reg2;
2852
2853 // Issue sequence of restores for cs regs. The last restore may be converted
2854 // to a post-increment load later by emitEpilogue if the callee-save stack
2855 // area allocation can't be combined with the local stack area allocation.
2856 // For example:
2857 // ldp fp, lr, [sp, #32] // addImm(+4)
2858 // ldp x20, x19, [sp, #16] // addImm(+2)
2859 // ldp x22, x21, [sp, #0] // addImm(+0)
2860 // Note: see comment in spillCalleeSavedRegisters()
2861 unsigned LdrOpc;
2862 unsigned Size;
2863 Align Alignment;
2864 switch (RPI.Type) {
2865 case RegPairInfo::GPR:
2866 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2867 Size = 8;
2868 Alignment = Align(8);
2869 break;
2870 case RegPairInfo::FPR64:
2871 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2872 Size = 8;
2873 Alignment = Align(8);
2874 break;
2875 case RegPairInfo::FPR128:
2876 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2877 Size = 16;
2878 Alignment = Align(16);
2879 break;
2880 case RegPairInfo::ZPR:
2881 LdrOpc = AArch64::LDR_ZXI;
2882 Size = 16;
2883 Alignment = Align(16);
2884 break;
2885 case RegPairInfo::PPR:
2886 LdrOpc = AArch64::LDR_PXI;
2887 Size = 2;
2888 Alignment = Align(2);
2889 break;
2890 }
2891 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2892 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2893 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2894 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2895 dbgs() << ")\n");
2896
2897 // Windows unwind codes require consecutive registers if registers are
2898 // paired. Make the switch here, so that the code below will save (x,x+1)
2899 // and not (x+1,x).
2900 unsigned FrameIdxReg1 = RPI.FrameIdx;
2901 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2902 if (NeedsWinCFI && RPI.isPaired()) {
2903 std::swap(Reg1, Reg2);
2904 std::swap(FrameIdxReg1, FrameIdxReg2);
2905 }
2906 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2907 if (RPI.isPaired()) {
2908 MIB.addReg(Reg2, getDefRegState(true));
2910 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2911 MachineMemOperand::MOLoad, Size, Alignment));
2912 }
2913 MIB.addReg(Reg1, getDefRegState(true))
2914 .addReg(AArch64::SP)
2915 .addImm(RPI.Offset) // [sp, #offset*scale]
2916 // where factor*scale is implicit
2919 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2920 MachineMemOperand::MOLoad, Size, Alignment));
2921 if (NeedsWinCFI)
2923
2924 return MIB->getIterator();
2925 };
2926
2927 // SVE objects are always restored in reverse order.
2928 for (const RegPairInfo &RPI : reverse(RegPairs))
2929 if (RPI.isScalable())
2930 EmitMI(RPI);
2931
2932 if (homogeneousPrologEpilog(MF, &MBB)) {
2933 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2935 for (auto &RPI : RegPairs) {
2936 MIB.addReg(RPI.Reg1, RegState::Define);
2937 MIB.addReg(RPI.Reg2, RegState::Define);
2938 }
2939 return true;
2940 }
2941
2944 for (const RegPairInfo &RPI : reverse(RegPairs)) {
2945 if (RPI.isScalable())
2946 continue;
2947 MachineBasicBlock::iterator It = EmitMI(RPI);
2948 if (First == MBB.end())
2949 First = It;
2950 }
2951 if (First != MBB.end())
2952 MBB.splice(MBBI, &MBB, First);
2953 } else {
2954 for (const RegPairInfo &RPI : RegPairs) {
2955 if (RPI.isScalable())
2956 continue;
2957 (void)EmitMI(RPI);
2958 }
2959 }
2960
2961 return true;
2962}
2963
2965 BitVector &SavedRegs,
2966 RegScavenger *RS) const {
2967 // All calls are tail calls in GHC calling conv, and functions have no
2968 // prologue/epilogue.
2970 return;
2971
2973 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2975 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2977 unsigned UnspilledCSGPR = AArch64::NoRegister;
2978 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2979
2980 MachineFrameInfo &MFI = MF.getFrameInfo();
2981 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2982
2983 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2984 ? RegInfo->getBaseRegister()
2985 : (unsigned)AArch64::NoRegister;
2986
2987 unsigned ExtraCSSpill = 0;
2988 // Figure out which callee-saved registers to save/restore.
2989 for (unsigned i = 0; CSRegs[i]; ++i) {
2990 const unsigned Reg = CSRegs[i];
2991
2992 // Add the base pointer register to SavedRegs if it is callee-save.
2993 if (Reg == BasePointerReg)
2994 SavedRegs.set(Reg);
2995
2996 bool RegUsed = SavedRegs.test(Reg);
2997 unsigned PairedReg = AArch64::NoRegister;
2998 if (AArch64::GPR64RegClass.contains(Reg) ||
2999 AArch64::FPR64RegClass.contains(Reg) ||
3000 AArch64::FPR128RegClass.contains(Reg))
3001 PairedReg = CSRegs[i ^ 1];
3002
3003 if (!RegUsed) {
3004 if (AArch64::GPR64RegClass.contains(Reg) &&
3005 !RegInfo->isReservedReg(MF, Reg)) {
3006 UnspilledCSGPR = Reg;
3007 UnspilledCSGPRPaired = PairedReg;
3008 }
3009 continue;
3010 }
3011
3012 // MachO's compact unwind format relies on all registers being stored in
3013 // pairs.
3014 // FIXME: the usual format is actually better if unwinding isn't needed.
3015 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3016 !SavedRegs.test(PairedReg)) {
3017 SavedRegs.set(PairedReg);
3018 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3019 !RegInfo->isReservedReg(MF, PairedReg))
3020 ExtraCSSpill = PairedReg;
3021 }
3022 }
3023
3025 !Subtarget.isTargetWindows()) {
3026 // For Windows calling convention on a non-windows OS, where X18 is treated
3027 // as reserved, back up X18 when entering non-windows code (marked with the
3028 // Windows calling convention) and restore when returning regardless of
3029 // whether the individual function uses it - it might call other functions
3030 // that clobber it.
3031 SavedRegs.set(AArch64::X18);
3032 }
3033
3034 // Calculates the callee saved stack size.
3035 unsigned CSStackSize = 0;
3036 unsigned SVECSStackSize = 0;
3038 const MachineRegisterInfo &MRI = MF.getRegInfo();
3039 for (unsigned Reg : SavedRegs.set_bits()) {
3040 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3041 if (AArch64::PPRRegClass.contains(Reg) ||
3042 AArch64::ZPRRegClass.contains(Reg))
3043 SVECSStackSize += RegSize;
3044 else
3045 CSStackSize += RegSize;
3046 }
3047
3048 // Save number of saved regs, so we can easily update CSStackSize later.
3049 unsigned NumSavedRegs = SavedRegs.count();
3050
3051 // The frame record needs to be created by saving the appropriate registers
3052 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3053 if (hasFP(MF) ||
3054 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3055 SavedRegs.set(AArch64::FP);
3056 SavedRegs.set(AArch64::LR);
3057 }
3058
3059 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3060 for (unsigned Reg
3061 : SavedRegs.set_bits()) dbgs()
3062 << ' ' << printReg(Reg, RegInfo);
3063 dbgs() << "\n";);
3064
3065 // If any callee-saved registers are used, the frame cannot be eliminated.
3066 int64_t SVEStackSize =
3067 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3068 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3069
3070 // The CSR spill slots have not been allocated yet, so estimateStackSize
3071 // won't include them.
3072 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3073
3074 // We may address some of the stack above the canonical frame address, either
3075 // for our own arguments or during a call. Include that in calculating whether
3076 // we have complicated addressing concerns.
3077 int64_t CalleeStackUsed = 0;
3078 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3079 int64_t FixedOff = MFI.getObjectOffset(I);
3080 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3081 }
3082
3083 // Conservatively always assume BigStack when there are SVE spills.
3084 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3085 CalleeStackUsed) > EstimatedStackSizeLimit;
3086 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3087 AFI->setHasStackFrame(true);
3088
3089 // Estimate if we might need to scavenge a register at some point in order
3090 // to materialize a stack offset. If so, either spill one additional
3091 // callee-saved register or reserve a special spill slot to facilitate
3092 // register scavenging. If we already spilled an extra callee-saved register
3093 // above to keep the number of spills even, we don't need to do anything else
3094 // here.
3095 if (BigStack) {
3096 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3097 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3098 << " to get a scratch register.\n");
3099 SavedRegs.set(UnspilledCSGPR);
3100 // MachO's compact unwind format relies on all registers being stored in
3101 // pairs, so if we need to spill one extra for BigStack, then we need to
3102 // store the pair.
3103 if (producePairRegisters(MF))
3104 SavedRegs.set(UnspilledCSGPRPaired);
3105 ExtraCSSpill = UnspilledCSGPR;
3106 }
3107
3108 // If we didn't find an extra callee-saved register to spill, create
3109 // an emergency spill slot.
3110 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3112 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3113 unsigned Size = TRI->getSpillSize(RC);
3114 Align Alignment = TRI->getSpillAlign(RC);
3115 int FI = MFI.CreateStackObject(Size, Alignment, false);
3117 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3118 << " as the emergency spill slot.\n");
3119 }
3120 }
3121
3122 // Adding the size of additional 64bit GPR saves.
3123 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3124
3125 // A Swift asynchronous context extends the frame record with a pointer
3126 // directly before FP.
3127 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3128 CSStackSize += 8;
3129
3130 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3131 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3132 << EstimatedStackSize + AlignedCSStackSize
3133 << " bytes.\n");
3134
3136 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3137 "Should not invalidate callee saved info");
3138
3139 // Round up to register pair alignment to avoid additional SP adjustment
3140 // instructions.
3141 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3142 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3143 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3144}
3145
3147 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3148 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3149 unsigned &MaxCSFrameIndex) const {
3150 bool NeedsWinCFI = needsWinCFI(MF);
3151 // To match the canonical windows frame layout, reverse the list of
3152 // callee saved registers to get them laid out by PrologEpilogInserter
3153 // in the right order. (PrologEpilogInserter allocates stack objects top
3154 // down. Windows canonical prologs store higher numbered registers at
3155 // the top, thus have the CSI array start from the highest registers.)
3156 if (NeedsWinCFI)
3157 std::reverse(CSI.begin(), CSI.end());
3158
3159 if (CSI.empty())
3160 return true; // Early exit if no callee saved registers are modified!
3161
3162 // Now that we know which registers need to be saved and restored, allocate
3163 // stack slots for them.
3164 MachineFrameInfo &MFI = MF.getFrameInfo();
3165 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3166
3167 bool UsesWinAAPCS = isTargetWindows(MF);
3168 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3169 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3170 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3171 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3172 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3173 }
3174
3175 for (auto &CS : CSI) {
3176 Register Reg = CS.getReg();
3177 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3178
3179 unsigned Size = RegInfo->getSpillSize(*RC);
3180 Align Alignment(RegInfo->getSpillAlign(*RC));
3181 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3182 CS.setFrameIdx(FrameIdx);
3183
3184 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3185 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3186
3187 // Grab 8 bytes below FP for the extended asynchronous frame info.
3188 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3189 Reg == AArch64::FP) {
3190 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3191 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3192 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3193 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3194 }
3195 }
3196 return true;
3197}
3198
3200 const MachineFunction &MF) const {
3202 // If the function has streaming-mode changes, don't scavenge a
3203 // spillslot in the callee-save area, as that might require an
3204 // 'addvl' in the streaming-mode-changing call-sequence when the
3205 // function doesn't use a FP.
3206 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3207 return false;
3208 return AFI->hasCalleeSaveStackFreeSpace();
3209}
3210
3211/// returns true if there are any SVE callee saves.
3213 int &Min, int &Max) {
3214 Min = std::numeric_limits<int>::max();
3215 Max = std::numeric_limits<int>::min();
3216
3217 if (!MFI.isCalleeSavedInfoValid())
3218 return false;
3219
3220 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3221 for (auto &CS : CSI) {
3222 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3223 AArch64::PPRRegClass.contains(CS.getReg())) {
3224 assert((Max == std::numeric_limits<int>::min() ||
3225 Max + 1 == CS.getFrameIdx()) &&
3226 "SVE CalleeSaves are not consecutive");
3227
3228 Min = std::min(Min, CS.getFrameIdx());
3229 Max = std::max(Max, CS.getFrameIdx());
3230 }
3231 }
3232 return Min != std::numeric_limits<int>::max();
3233}
3234
3235// Process all the SVE stack objects and determine offsets for each
3236// object. If AssignOffsets is true, the offsets get assigned.
3237// Fills in the first and last callee-saved frame indices into
3238// Min/MaxCSFrameIndex, respectively.
3239// Returns the size of the stack.
3241 int &MinCSFrameIndex,
3242 int &MaxCSFrameIndex,
3243 bool AssignOffsets) {
3244#ifndef NDEBUG
3245 // First process all fixed stack objects.
3246 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3248 "SVE vectors should never be passed on the stack by value, only by "
3249 "reference.");
3250#endif
3251
3252 auto Assign = [&MFI](int FI, int64_t Offset) {
3253 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3254 MFI.setObjectOffset(FI, Offset);
3255 };
3256
3257 int64_t Offset = 0;
3258
3259 // Then process all callee saved slots.
3260 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3261 // Assign offsets to the callee save slots.
3262 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3263 Offset += MFI.getObjectSize(I);
3265 if (AssignOffsets)
3266 Assign(I, -Offset);
3267 }
3268 }
3269
3270 // Ensure that the Callee-save area is aligned to 16bytes.
3271 Offset = alignTo(Offset, Align(16U));
3272
3273 // Create a buffer of SVE objects to allocate and sort it.
3274 SmallVector<int, 8> ObjectsToAllocate;
3275 // If we have a stack protector, and we've previously decided that we have SVE
3276 // objects on the stack and thus need it to go in the SVE stack area, then it
3277 // needs to go first.
3278 int StackProtectorFI = -1;
3279 if (MFI.hasStackProtectorIndex()) {
3280 StackProtectorFI = MFI.getStackProtectorIndex();
3281 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3282 ObjectsToAllocate.push_back(StackProtectorFI);
3283 }
3284 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3285 unsigned StackID = MFI.getStackID(I);
3286 if (StackID != TargetStackID::ScalableVector)
3287 continue;
3288 if (I == StackProtectorFI)
3289 continue;
3290 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3291 continue;
3292 if (MFI.isDeadObjectIndex(I))
3293 continue;
3294
3295 ObjectsToAllocate.push_back(I);
3296 }
3297
3298 // Allocate all SVE locals and spills
3299 for (unsigned FI : ObjectsToAllocate) {
3300 Align Alignment = MFI.getObjectAlign(FI);
3301 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3302 // two, we'd need to align every object dynamically at runtime if the
3303 // alignment is larger than 16. This is not yet supported.
3304 if (Alignment > Align(16))
3306 "Alignment of scalable vectors > 16 bytes is not yet supported");
3307
3308 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3309 if (AssignOffsets)
3310 Assign(FI, -Offset);
3311 }
3312
3313 return Offset;
3314}
3315
3316int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3317 MachineFrameInfo &MFI) const {
3318 int MinCSFrameIndex, MaxCSFrameIndex;
3319 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3320}
3321
3322int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3323 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3324 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3325 true);
3326}
3327
3329 MachineFunction &MF, RegScavenger *RS) const {
3330 MachineFrameInfo &MFI = MF.getFrameInfo();
3331
3333 "Upwards growing stack unsupported");
3334
3335 int MinCSFrameIndex, MaxCSFrameIndex;
3336 int64_t SVEStackSize =
3337 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3338
3340 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3341 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3342
3343 // If this function isn't doing Win64-style C++ EH, we don't need to do
3344 // anything.
3345 if (!MF.hasEHFunclets())
3346 return;
3348 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3349
3350 MachineBasicBlock &MBB = MF.front();
3351 auto MBBI = MBB.begin();
3352 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3353 ++MBBI;
3354
3355 // Create an UnwindHelp object.
3356 // The UnwindHelp object is allocated at the start of the fixed object area
3357 int64_t FixedObject =
3358 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3359 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3360 /*SPOffset*/ -FixedObject,
3361 /*IsImmutable=*/false);
3362 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3363
3364 // We need to store -2 into the UnwindHelp object at the start of the
3365 // function.
3366 DebugLoc DL;
3368 RS->backward(std::prev(MBBI));
3369 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3370 assert(DstReg && "There must be a free register after frame setup");
3371 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3372 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3373 .addReg(DstReg, getKillRegState(true))
3374 .addFrameIndex(UnwindHelpFI)
3375 .addImm(0);
3376}
3377
3378namespace {
3379struct TagStoreInstr {
3381 int64_t Offset, Size;
3382 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3383 : MI(MI), Offset(Offset), Size(Size) {}
3384};
3385
3386class TagStoreEdit {
3387 MachineFunction *MF;
3390 // Tag store instructions that are being replaced.
3392 // Combined memref arguments of the above instructions.
3394
3395 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3396 // FrameRegOffset + Size) with the address tag of SP.
3397 Register FrameReg;
3398 StackOffset FrameRegOffset;
3399 int64_t Size;
3400 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3401 // end.
3402 std::optional<int64_t> FrameRegUpdate;
3403 // MIFlags for any FrameReg updating instructions.
3404 unsigned FrameRegUpdateFlags;
3405
3406 // Use zeroing instruction variants.
3407 bool ZeroData;
3408 DebugLoc DL;
3409
3410 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3411 void emitLoop(MachineBasicBlock::iterator InsertI);
3412
3413public:
3414 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3415 : MBB(MBB), ZeroData(ZeroData) {
3416 MF = MBB->getParent();
3417 MRI = &MF->getRegInfo();
3418 }
3419 // Add an instruction to be replaced. Instructions must be added in the
3420 // ascending order of Offset, and have to be adjacent.
3421 void addInstruction(TagStoreInstr I) {
3422 assert((TagStores.empty() ||
3423 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3424 "Non-adjacent tag store instructions.");
3425 TagStores.push_back(I);
3426 }
3427 void clear() { TagStores.clear(); }
3428 // Emit equivalent code at the given location, and erase the current set of
3429 // instructions. May skip if the replacement is not profitable. May invalidate
3430 // the input iterator and replace it with a valid one.
3431 void emitCode(MachineBasicBlock::iterator &InsertI,
3432 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3433};
3434
3435void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3436 const AArch64InstrInfo *TII =
3437 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3438
3439 const int64_t kMinOffset = -256 * 16;
3440 const int64_t kMaxOffset = 255 * 16;
3441
3442 Register BaseReg = FrameReg;
3443 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3444 if (BaseRegOffsetBytes < kMinOffset ||
3445 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3446 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3447 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3448 // is required for the offset of ST2G.
3449 BaseRegOffsetBytes % 16 != 0) {
3450 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3451 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3452 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3453 BaseReg = ScratchReg;
3454 BaseRegOffsetBytes = 0;
3455 }
3456
3457 MachineInstr *LastI = nullptr;
3458 while (Size) {
3459 int64_t InstrSize = (Size > 16) ? 32 : 16;
3460 unsigned Opcode =
3461 InstrSize == 16
3462 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3463 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3464 assert(BaseRegOffsetBytes % 16 == 0);
3465 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3466 .addReg(AArch64::SP)
3467 .addReg(BaseReg)
3468 .addImm(BaseRegOffsetBytes / 16)
3469 .setMemRefs(CombinedMemRefs);
3470 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3471 // final SP adjustment in the epilogue.
3472 if (BaseRegOffsetBytes == 0)
3473 LastI = I;
3474 BaseRegOffsetBytes += InstrSize;
3475 Size -= InstrSize;
3476 }
3477
3478 if (LastI)
3479 MBB->splice(InsertI, MBB, LastI);
3480}
3481
3482void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3483 const AArch64InstrInfo *TII =
3484 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3485
3486 Register BaseReg = FrameRegUpdate
3487 ? FrameReg
3488 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3489 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3490
3491 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3492
3493 int64_t LoopSize = Size;
3494 // If the loop size is not a multiple of 32, split off one 16-byte store at
3495 // the end to fold BaseReg update into.
3496 if (FrameRegUpdate && *FrameRegUpdate)
3497 LoopSize -= LoopSize % 32;
3498 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3499 TII->get(ZeroData ? AArch64::STZGloop_wback
3500 : AArch64::STGloop_wback))
3501 .addDef(SizeReg)
3502 .addDef(BaseReg)
3503 .addImm(LoopSize)
3504 .addReg(BaseReg)
3505 .setMemRefs(CombinedMemRefs);
3506 if (FrameRegUpdate)
3507 LoopI->setFlags(FrameRegUpdateFlags);
3508
3509 int64_t ExtraBaseRegUpdate =
3510 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3511 if (LoopSize < Size) {
3512 assert(FrameRegUpdate);
3513 assert(Size - LoopSize == 16);
3514 // Tag 16 more bytes at BaseReg and update BaseReg.
3515 BuildMI(*MBB, InsertI, DL,
3516 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3517 .addDef(BaseReg)
3518 .addReg(BaseReg)
3519 .addReg(BaseReg)
3520 .addImm(1 + ExtraBaseRegUpdate / 16)
3521 .setMemRefs(CombinedMemRefs)
3522 .setMIFlags(FrameRegUpdateFlags);
3523 } else if (ExtraBaseRegUpdate) {
3524 // Update BaseReg.
3525 BuildMI(
3526 *MBB, InsertI, DL,
3527 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3528 .addDef(BaseReg)
3529 .addReg(BaseReg)
3530 .addImm(std::abs(ExtraBaseRegUpdate))
3531 .addImm(0)
3532 .setMIFlags(FrameRegUpdateFlags);
3533 }
3534}
3535
3536// Check if *II is a register update that can be merged into STGloop that ends
3537// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3538// end of the loop.
3539bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3540 int64_t Size, int64_t *TotalOffset) {
3541 MachineInstr &MI = *II;
3542 if ((MI.getOpcode() == AArch64::ADDXri ||
3543 MI.getOpcode() == AArch64::SUBXri) &&
3544 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3545 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3546 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3547 if (MI.getOpcode() == AArch64::SUBXri)
3548 Offset = -Offset;
3549 int64_t AbsPostOffset = std::abs(Offset - Size);
3550 const int64_t kMaxOffset =
3551 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
3552 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
3553 *TotalOffset = Offset;
3554 return true;
3555 }
3556 }
3557 return false;
3558}
3559
3560void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3562 MemRefs.clear();
3563 for (auto &TS : TSE) {
3564 MachineInstr *MI = TS.MI;
3565 // An instruction without memory operands may access anything. Be
3566 // conservative and return an empty list.
3567 if (MI->memoperands_empty()) {
3568 MemRefs.clear();
3569 return;
3570 }
3571 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3572 }
3573}
3574
3575void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3576 const AArch64FrameLowering *TFI,
3577 bool TryMergeSPUpdate) {
3578 if (TagStores.empty())
3579 return;
3580 TagStoreInstr &FirstTagStore = TagStores[0];
3581 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3582 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3583 DL = TagStores[0].MI->getDebugLoc();
3584
3585 Register Reg;
3586 FrameRegOffset = TFI->resolveFrameOffsetReference(
3587 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3588 /*PreferFP=*/false, /*ForSimm=*/true);
3589 FrameReg = Reg;
3590 FrameRegUpdate = std::nullopt;
3591
3592 mergeMemRefs(TagStores, CombinedMemRefs);
3593
3594 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
3595 for (const auto &Instr
3596 : TagStores) { dbgs() << " " << *Instr.MI; });
3597
3598 // Size threshold where a loop becomes shorter than a linear sequence of
3599 // tagging instructions.
3600 const int kSetTagLoopThreshold = 176;
3601 if (Size < kSetTagLoopThreshold) {
3602 if (TagStores.size() < 2)
3603 return;
3604 emitUnrolled(InsertI);
3605 } else {
3606 MachineInstr *UpdateInstr = nullptr;
3607 int64_t TotalOffset = 0;
3608 if (TryMergeSPUpdate) {
3609 // See if we can merge base register update into the STGloop.
3610 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3611 // but STGloop is way too unusual for that, and also it only
3612 // realistically happens in function epilogue. Also, STGloop is expanded
3613 // before that pass.
3614 if (InsertI != MBB->end() &&
3615 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3616 &TotalOffset)) {
3617 UpdateInstr = &*InsertI++;
3618 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3619 << *UpdateInstr);
3620 }
3621 }
3622
3623 if (!UpdateInstr && TagStores.size() < 2)
3624 return;
3625
3626 if (UpdateInstr) {
3627 FrameRegUpdate = TotalOffset;
3628 FrameRegUpdateFlags = UpdateInstr->getFlags();
3629 }
3630 emitLoop(InsertI);
3631 if (UpdateInstr)
3632 UpdateInstr->eraseFromParent();
3633 }
3634
3635 for (auto &TS : TagStores)
3636 TS.MI->eraseFromParent();
3637}
3638
3639bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3640 int64_t &Size, bool &ZeroData) {
3641 MachineFunction &MF = *MI.getParent()->getParent();
3642 const MachineFrameInfo &MFI = MF.getFrameInfo();
3643
3644 unsigned Opcode = MI.getOpcode();
3645 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3646 Opcode == AArch64::STZ2Gi);
3647
3648 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3649 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3650 return false;
3651 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3652 return false;
3653 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3654 Size = MI.getOperand(2).getImm();
3655 return true;
3656 }
3657
3658 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3659 Size = 16;
3660 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3661 Size = 32;
3662 else
3663 return false;
3664
3665 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3666 return false;
3667
3668 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3669 16 * MI.getOperand(2).getImm();
3670 return true;
3671}
3672
3673// Detect a run of memory tagging instructions for adjacent stack frame slots,
3674// and replace them with a shorter instruction sequence:
3675// * replace STG + STG with ST2G
3676// * replace STGloop + STGloop with STGloop
3677// This code needs to run when stack slot offsets are already known, but before
3678// FrameIndex operands in STG instructions are eliminated.
3680 const AArch64FrameLowering *TFI,
3681 RegScavenger *RS) {
3682 bool FirstZeroData;
3683 int64_t Size, Offset;
3684 MachineInstr &MI = *II;
3685 MachineBasicBlock *MBB = MI.getParent();
3686 MachineBasicBlock::iterator NextI = ++II;
3687 if (&MI == &MBB->instr_back())
3688 return II;
3689 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3690 return II;
3691
3693 Instrs.emplace_back(&MI, Offset, Size);
3694
3695 constexpr int kScanLimit = 10;
3696 int Count = 0;
3698 NextI != E && Count < kScanLimit; ++NextI) {
3699 MachineInstr &MI = *NextI;
3700 bool ZeroData;
3701 int64_t Size, Offset;
3702 // Collect instructions that update memory tags with a FrameIndex operand
3703 // and (when applicable) constant size, and whose output registers are dead
3704 // (the latter is almost always the case in practice). Since these
3705 // instructions effectively have no inputs or outputs, we are free to skip
3706 // any non-aliasing instructions in between without tracking used registers.
3707 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3708 if (ZeroData != FirstZeroData)
3709 break;
3710 Instrs.emplace_back(&MI, Offset, Size);
3711 continue;
3712 }
3713
3714 // Only count non-transient, non-tagging instructions toward the scan
3715 // limit.
3716 if (!MI.isTransient())
3717 ++Count;
3718
3719 // Just in case, stop before the epilogue code starts.
3720 if (MI.getFlag(MachineInstr::FrameSetup) ||
3722 break;
3723
3724 // Reject anything that may alias the collected instructions.
3725 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
3726 break;
3727 }
3728
3729 // New code will be inserted after the last tagging instruction we've found.
3730 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3731 InsertI++;
3732
3733 llvm::stable_sort(Instrs,
3734 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3735 return Left.Offset < Right.Offset;
3736 });
3737
3738 // Make sure that we don't have any overlapping stores.
3739 int64_t CurOffset = Instrs[0].Offset;
3740 for (auto &Instr : Instrs) {
3741 if (CurOffset > Instr.Offset)
3742 return NextI;
3743 CurOffset = Instr.Offset + Instr.Size;
3744 }
3745
3746 // Find contiguous runs of tagged memory and emit shorter instruction
3747 // sequencies for them when possible.
3748 TagStoreEdit TSE(MBB, FirstZeroData);
3749 std::optional<int64_t> EndOffset;
3750 for (auto &Instr : Instrs) {
3751 if (EndOffset && *EndOffset != Instr.Offset) {
3752 // Found a gap.
3753 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3754 TSE.clear();
3755 }
3756
3757 TSE.addInstruction(Instr);
3758 EndOffset = Instr.Offset + Instr.Size;
3759 }
3760
3761 const MachineFunction *MF = MBB->getParent();
3762 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3763 TSE.emitCode(
3764 InsertI, TFI, /*TryMergeSPUpdate = */
3766
3767 return InsertI;
3768}
3769} // namespace
3770
3772 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3774 for (auto &BB : MF)
3775 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();)
3776 II = tryMergeAdjacentSTG(II, this, RS);
3777}
3778
3779/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3780/// before the update. This is easily retrieved as it is exactly the offset
3781/// that is set in processFunctionBeforeFrameFinalized.
3783 const MachineFunction &MF, int FI, Register &FrameReg,
3784 bool IgnoreSPUpdates) const {
3785 const MachineFrameInfo &MFI = MF.getFrameInfo();
3786 if (IgnoreSPUpdates) {
3787 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3788 << MFI.getObjectOffset(FI) << "\n");
3789 FrameReg = AArch64::SP;
3790 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3791 }
3792
3793 // Go to common code if we cannot provide sp + offset.
3794 if (MFI.hasVarSizedObjects() ||
3797 return getFrameIndexReference(MF, FI, FrameReg);
3798
3799 FrameReg = AArch64::SP;
3800 return getStackOffset(MF, MFI.getObjectOffset(FI));
3801}
3802
3803/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3804/// the parent's frame pointer
3806 const MachineFunction &MF) const {
3807 return 0;
3808}
3809
3810/// Funclets only need to account for space for the callee saved registers,
3811/// as the locals are accounted for in the parent's stack frame.
3813 const MachineFunction &MF) const {
3814 // This is the size of the pushed CSRs.
3815 unsigned CSSize =
3816 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3817 // This is the amount of stack a funclet needs to allocate.
3818 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3819 getStackAlign());
3820}
3821
3822namespace {
3823struct FrameObject {
3824 bool IsValid = false;
3825 // Index of the object in MFI.
3826 int ObjectIndex = 0;
3827 // Group ID this object belongs to.
3828 int GroupIndex = -1;
3829 // This object should be placed first (closest to SP).
3830 bool ObjectFirst = false;
3831 // This object's group (which always contains the object with
3832 // ObjectFirst==true) should be placed first.
3833 bool GroupFirst = false;
3834};
3835
3836class GroupBuilder {
3837 SmallVector<int, 8> CurrentMembers;
3838 int NextGroupIndex = 0;
3839 std::vector<FrameObject> &Objects;
3840
3841public:
3842 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3843 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3844 void EndCurrentGroup() {
3845 if (CurrentMembers.size() > 1) {
3846 // Create a new group with the current member list. This might remove them
3847 // from their pre-existing groups. That's OK, dealing with overlapping
3848 // groups is too hard and unlikely to make a difference.
3849 LLVM_DEBUG(dbgs() << "group:");
3850 for (int Index : CurrentMembers) {
3851 Objects[Index].GroupIndex = NextGroupIndex;
3852 LLVM_DEBUG(dbgs() << " " << Index);
3853 }
3854 LLVM_DEBUG(dbgs() << "\n");
3855 NextGroupIndex++;
3856 }
3857 CurrentMembers.clear();
3858 }
3859};
3860
3861bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3862 // Objects at a lower index are closer to FP; objects at a higher index are
3863 // closer to SP.
3864 //
3865 // For consistency in our comparison, all invalid objects are placed
3866 // at the end. This also allows us to stop walking when we hit the
3867 // first invalid item after it's all sorted.
3868 //
3869 // The "first" object goes first (closest to SP), followed by the members of
3870 // the "first" group.
3871 //
3872 // The rest are sorted by the group index to keep the groups together.
3873 // Higher numbered groups are more likely to be around longer (i.e. untagged
3874 // in the function epilogue and not at some earlier point). Place them closer
3875 // to SP.
3876 //
3877 // If all else equal, sort by the object index to keep the objects in the
3878 // original order.
3879 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
3880 A.ObjectIndex) <
3881 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
3882 B.ObjectIndex);
3883}
3884} // namespace
3885
3887 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3888 if (!OrderFrameObjects || ObjectsToAllocate.empty())
3889 return;
3890
3891 const MachineFrameInfo &MFI = MF.getFrameInfo();
3892 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3893 for (auto &Obj : ObjectsToAllocate) {
3894 FrameObjects[Obj].IsValid = true;
3895 FrameObjects[Obj].ObjectIndex = Obj;
3896 }
3897
3898 // Identify stack slots that are tagged at the same time.
3899 GroupBuilder GB(FrameObjects);
3900 for (auto &MBB : MF) {
3901 for (auto &MI : MBB) {
3902 if (MI.isDebugInstr())
3903 continue;
3904 int OpIndex;
3905 switch (MI.getOpcode()) {
3906 case AArch64::STGloop:
3907 case AArch64::STZGloop:
3908 OpIndex = 3;
3909 break;
3910 case AArch64::STGi:
3911 case AArch64::STZGi:
3912 case AArch64::ST2Gi:
3913 case AArch64::STZ2Gi:
3914 OpIndex = 1;
3915 break;
3916 default:
3917 OpIndex = -1;
3918 }
3919
3920 int TaggedFI = -1;
3921 if (OpIndex >= 0) {
3922 const MachineOperand &MO = MI.getOperand(OpIndex);
3923 if (MO.isFI()) {
3924 int FI = MO.getIndex();
3925 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3926 FrameObjects[FI].IsValid)
3927 TaggedFI = FI;
3928 }
3929 }
3930
3931 // If this is a stack tagging instruction for a slot that is not part of a
3932 // group yet, either start a new group or add it to the current one.
3933 if (TaggedFI >= 0)
3934 GB.AddMember(TaggedFI);
3935 else
3936 GB.EndCurrentGroup();
3937 }
3938 // Groups should never span multiple basic blocks.
3939 GB.EndCurrentGroup();
3940 }
3941
3942 // If the function's tagged base pointer is pinned to a stack slot, we want to
3943 // put that slot first when possible. This will likely place it at SP + 0,
3944 // and save one instruction when generating the base pointer because IRG does
3945 // not allow an immediate offset.
3947 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3948 if (TBPI) {
3949 FrameObjects[*TBPI].ObjectFirst = true;
3950 FrameObjects[*TBPI].GroupFirst = true;
3951 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3952 if (FirstGroupIndex >= 0)
3953 for (FrameObject &Object : FrameObjects)
3954 if (Object.GroupIndex == FirstGroupIndex)
3955 Object.GroupFirst = true;
3956 }
3957
3958 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3959
3960 int i = 0;
3961 for (auto &Obj : FrameObjects) {
3962 // All invalid items are sorted at the end, so it's safe to stop.
3963 if (!Obj.IsValid)
3964 break;
3965 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3966 }
3967
3968 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
3969 : FrameObjects) {
3970 if (!Obj.IsValid)
3971 break;
3972 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3973 if (Obj.ObjectFirst)
3974 dbgs() << ", first";
3975 if (Obj.GroupFirst)
3976 dbgs() << ", group-first";
3977 dbgs() << "\n";
3978 });
3979}
unsigned const MachineRegisterInfo * MRI
#define Success
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static cl::opt< bool > ReverseCSRRestoreSeq("reverse-csr-restore-seq", cl::desc("reverse the CSR restore sequence"), cl::init(false), cl::Hidden)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static unsigned findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
static bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF)
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:150
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
bool End
Definition: ELF_riscv.cpp:469
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:470
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool isXRegisterReserved(size_t i) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
bool test(unsigned Idx) const
Definition: BitVector.h:461
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:647
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:644
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:239
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:315
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:645
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:50
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:81
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:799
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:582
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:555
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:615
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:608
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:540
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:646
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:629
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:322
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, uint64_t s, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:68
void setFlags(unsigned flags)
Definition: MachineInstr.h:389
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:371
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:307
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:577
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:36
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:52
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:55
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:47
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:46
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:45
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...