LLVM 17.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | |
56// | callee-saved fp/simd/SVE regs |
57// | |
58// |-----------------------------------|
59// | |
60// | SVE stack objects |
61// | |
62// |-----------------------------------|
63// |.empty.space.to.make.part.below....|
64// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
65// |.the.standard.16-byte.alignment....| compile time; if present)
66// |-----------------------------------|
67// | |
68// | local variables of fixed size |
69// | including spill slots |
70// |-----------------------------------| <- bp(not defined by ABI,
71// |.variable-sized.local.variables....| LLVM chooses X19)
72// |.(VLAs)............................| (size of this area is unknown at
73// |...................................| compile time)
74// |-----------------------------------| <- sp
75// | | Lower address
76//
77//
78// To access the data in a frame, at-compile time, a constant offset must be
79// computable from one of the pointers (fp, bp, sp) to access it. The size
80// of the areas with a dotted background cannot be computed at compile-time
81// if they are present, making it required to have all three of fp, bp and
82// sp to be set up to be able to access all contents in the frame areas,
83// assuming all of the frame areas are non-empty.
84//
85// For most functions, some of the frame areas are empty. For those functions,
86// it may not be necessary to set up fp or bp:
87// * A base pointer is definitely needed when there are both VLAs and local
88// variables with more-than-default alignment requirements.
89// * A frame pointer is definitely needed when there are local variables with
90// more-than-default alignment requirements.
91//
92// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
93// callee-saved area, since the unwind encoding does not allow for encoding
94// this dynamically and existing tools depend on this layout. For other
95// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
96// area to allow SVE stack objects (allocated directly below the callee-saves,
97// if available) to be accessed directly from the framepointer.
98// The SVE spill/fill instructions have VL-scaled addressing modes such
99// as:
100// ldr z8, [fp, #-7 mul vl]
101// For SVE the size of the vector length (VL) is not known at compile-time, so
102// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
103// layout, we don't need to add an unscaled offset to the framepointer before
104// accessing the SVE object in the frame.
105//
106// In some cases when a base pointer is not strictly needed, it is generated
107// anyway when offsets from the frame pointer to access local variables become
108// so large that the offset can't be encoded in the immediate fields of loads
109// or stores.
110//
111// Outgoing function arguments must be at the bottom of the stack frame when
112// calling another function. If we do not have variable-sized stack objects, we
113// can allocate a "reserved call frame" area at the bottom of the local
114// variable area, large enough for all outgoing calls. If we do have VLAs, then
115// the stack pointer must be decremented and incremented around each call to
116// make space for the arguments below the VLAs.
117//
118// FIXME: also explain the redzone concept.
119//
120// An example of the prologue:
121//
122// .globl __foo
123// .align 2
124// __foo:
125// Ltmp0:
126// .cfi_startproc
127// .cfi_personality 155, ___gxx_personality_v0
128// Leh_func_begin:
129// .cfi_lsda 16, Lexception33
130//
131// stp xa,bx, [sp, -#offset]!
132// ...
133// stp x28, x27, [sp, #offset-32]
134// stp fp, lr, [sp, #offset-16]
135// add fp, sp, #offset - 16
136// sub sp, sp, #1360
137//
138// The Stack:
139// +-------------------------------------------+
140// 10000 | ........ | ........ | ........ | ........ |
141// 10004 | ........ | ........ | ........ | ........ |
142// +-------------------------------------------+
143// 10008 | ........ | ........ | ........ | ........ |
144// 1000c | ........ | ........ | ........ | ........ |
145// +===========================================+
146// 10010 | X28 Register |
147// 10014 | X28 Register |
148// +-------------------------------------------+
149// 10018 | X27 Register |
150// 1001c | X27 Register |
151// +===========================================+
152// 10020 | Frame Pointer |
153// 10024 | Frame Pointer |
154// +-------------------------------------------+
155// 10028 | Link Register |
156// 1002c | Link Register |
157// +===========================================+
158// 10030 | ........ | ........ | ........ | ........ |
159// 10034 | ........ | ........ | ........ | ........ |
160// +-------------------------------------------+
161// 10038 | ........ | ........ | ........ | ........ |
162// 1003c | ........ | ........ | ........ | ........ |
163// +-------------------------------------------+
164//
165// [sp] = 10030 :: >>initial value<<
166// sp = 10020 :: stp fp, lr, [sp, #-16]!
167// fp = sp == 10020 :: mov fp, sp
168// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
169// sp == 10010 :: >>final value<<
170//
171// The frame pointer (w29) points to address 10020. If we use an offset of
172// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
173// for w27, and -32 for w28:
174//
175// Ltmp1:
176// .cfi_def_cfa w29, 16
177// Ltmp2:
178// .cfi_offset w30, -8
179// Ltmp3:
180// .cfi_offset w29, -16
181// Ltmp4:
182// .cfi_offset w27, -24
183// Ltmp5:
184// .cfi_offset w28, -32
185//
186//===----------------------------------------------------------------------===//
187
188#include "AArch64FrameLowering.h"
189#include "AArch64InstrInfo.h"
191#include "AArch64RegisterInfo.h"
192#include "AArch64Subtarget.h"
193#include "AArch64TargetMachine.h"
196#include "llvm/ADT/ScopeExit.h"
197#include "llvm/ADT/SmallVector.h"
198#include "llvm/ADT/Statistic.h"
214#include "llvm/IR/Attributes.h"
215#include "llvm/IR/CallingConv.h"
216#include "llvm/IR/DataLayout.h"
217#include "llvm/IR/DebugLoc.h"
218#include "llvm/IR/Function.h"
219#include "llvm/MC/MCAsmInfo.h"
220#include "llvm/MC/MCDwarf.h"
222#include "llvm/Support/Debug.h"
228#include <cassert>
229#include <cstdint>
230#include <iterator>
231#include <optional>
232#include <vector>
233
234using namespace llvm;
235
236#define DEBUG_TYPE "frame-info"
237
238static cl::opt<bool> EnableRedZone("aarch64-redzone",
239 cl::desc("enable use of redzone on AArch64"),
240 cl::init(false), cl::Hidden);
241
242static cl::opt<bool>
243 ReverseCSRRestoreSeq("reverse-csr-restore-seq",
244 cl::desc("reverse the CSR restore sequence"),
245 cl::init(false), cl::Hidden);
246
248 "stack-tagging-merge-settag",
249 cl::desc("merge settag instruction in function epilog"), cl::init(true),
250 cl::Hidden);
251
252static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
253 cl::desc("sort stack allocations"),
254 cl::init(true), cl::Hidden);
255
257 "homogeneous-prolog-epilog", cl::Hidden,
258 cl::desc("Emit homogeneous prologue and epilogue for the size "
259 "optimization (default = off)"));
260
261STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
262
263/// Returns how much of the incoming argument stack area (in bytes) we should
264/// clean up in an epilogue. For the C calling convention this will be 0, for
265/// guaranteed tail call conventions it can be positive (a normal return or a
266/// tail call to a function that uses less stack space for arguments) or
267/// negative (for a tail call to a function that needs more stack space than us
268/// for arguments).
272 bool IsTailCallReturn = false;
273 if (MBB.end() != MBBI) {
274 unsigned RetOpcode = MBBI->getOpcode();
275 IsTailCallReturn = RetOpcode == AArch64::TCRETURNdi ||
276 RetOpcode == AArch64::TCRETURNri ||
277 RetOpcode == AArch64::TCRETURNriBTI;
278 }
280
281 int64_t ArgumentPopSize = 0;
282 if (IsTailCallReturn) {
283 MachineOperand &StackAdjust = MBBI->getOperand(1);
284
285 // For a tail-call in a callee-pops-arguments environment, some or all of
286 // the stack may actually be in use for the call's arguments, this is
287 // calculated during LowerCall and consumed here...
288 ArgumentPopSize = StackAdjust.getImm();
289 } else {
290 // ... otherwise the amount to pop is *all* of the argument space,
291 // conveniently stored in the MachineFunctionInfo by
292 // LowerFormalArguments. This will, of course, be zero for the C calling
293 // convention.
294 ArgumentPopSize = AFI->getArgumentStackToRestore();
295 }
296
297 return ArgumentPopSize;
298}
299
301static bool needsWinCFI(const MachineFunction &MF);
304
305/// Returns true if a homogeneous prolog or epilog code can be emitted
306/// for the size optimization. If possible, a frame helper call is injected.
307/// When Exit block is given, this check is for epilog.
308bool AArch64FrameLowering::homogeneousPrologEpilog(
309 MachineFunction &MF, MachineBasicBlock *Exit) const {
310 if (!MF.getFunction().hasMinSize())
311 return false;
313 return false;
315 return false;
316 if (EnableRedZone)
317 return false;
318
319 // TODO: Window is supported yet.
320 if (needsWinCFI(MF))
321 return false;
322 // TODO: SVE is not supported yet.
323 if (getSVEStackSize(MF))
324 return false;
325
326 // Bail on stack adjustment needed on return for simplicity.
327 const MachineFrameInfo &MFI = MF.getFrameInfo();
329 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
330 return false;
331 if (Exit && getArgumentStackToRestore(MF, *Exit))
332 return false;
333
334 return true;
335}
336
337/// Returns true if CSRs should be paired.
338bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
339 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
340}
341
342/// This is the biggest offset to the stack pointer we can encode in aarch64
343/// instructions (without using a separate calculation and a temp register).
344/// Note that the exception here are vector stores/loads which cannot encode any
345/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
346static const unsigned DefaultSafeSPDisplacement = 255;
347
348/// Look at each instruction that references stack frames and return the stack
349/// size limit beyond which some of these instructions will require a scratch
350/// register during their expansion later.
352 // FIXME: For now, just conservatively guestimate based on unscaled indexing
353 // range. We'll end up allocating an unnecessary spill slot a lot, but
354 // realistically that's not a big deal at this stage of the game.
355 for (MachineBasicBlock &MBB : MF) {
356 for (MachineInstr &MI : MBB) {
357 if (MI.isDebugInstr() || MI.isPseudo() ||
358 MI.getOpcode() == AArch64::ADDXri ||
359 MI.getOpcode() == AArch64::ADDSXri)
360 continue;
361
362 for (const MachineOperand &MO : MI.operands()) {
363 if (!MO.isFI())
364 continue;
365
367 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
369 return 0;
370 }
371 }
372 }
374}
375
379}
380
381/// Returns the size of the fixed object area (allocated next to sp on entry)
382/// On Win64 this may include a var args area and an UnwindHelp object for EH.
383static unsigned getFixedObjectSize(const MachineFunction &MF,
384 const AArch64FunctionInfo *AFI, bool IsWin64,
385 bool IsFunclet) {
386 if (!IsWin64 || IsFunclet) {
387 return AFI->getTailCallReservedStack();
388 } else {
389 if (AFI->getTailCallReservedStack() != 0)
390 report_fatal_error("cannot generate ABI-changing tail call for Win64");
391 // Var args are stored here in the primary function.
392 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
393 // To support EH funclets we allocate an UnwindHelp object
394 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
395 return alignTo(VarArgsArea + UnwindHelpObject, 16);
396 }
397}
398
399/// Returns the size of the entire SVE stackframe (calleesaves + spills).
402 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
403}
404
406 if (!EnableRedZone)
407 return false;
408
409 // Don't use the red zone if the function explicitly asks us not to.
410 // This is typically used for kernel code.
411 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
412 const unsigned RedZoneSize =
414 if (!RedZoneSize)
415 return false;
416
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
419 uint64_t NumBytes = AFI->getLocalStackSize();
420
421 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
422 getSVEStackSize(MF));
423}
424
425/// hasFP - Return true if the specified function should have a dedicated frame
426/// pointer register.
428 const MachineFrameInfo &MFI = MF.getFrameInfo();
429 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
430 // Win64 EH requires a frame pointer if funclets are present, as the locals
431 // are accessed off the frame pointer in both the parent function and the
432 // funclets.
433 if (MF.hasEHFunclets())
434 return true;
435 // Retain behavior of always omitting the FP for leaf functions when possible.
437 return true;
438 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
439 MFI.hasStackMap() || MFI.hasPatchPoint() ||
440 RegInfo->hasStackRealignment(MF))
441 return true;
442 // With large callframes around we may need to use FP to access the scavenging
443 // emergency spillslot.
444 //
445 // Unfortunately some calls to hasFP() like machine verifier ->
446 // getReservedReg() -> hasFP in the middle of global isel are too early
447 // to know the max call frame size. Hopefully conservatively returning "true"
448 // in those cases is fine.
449 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
450 if (!MFI.isMaxCallFrameSizeComputed() ||
452 return true;
453
454 return false;
455}
456
457/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
458/// not required, we reserve argument space for call sites in the function
459/// immediately on entry to the current function. This eliminates the need for
460/// add/sub sp brackets around call sites. Returns true if the call frame is
461/// included as part of the stack frame.
462bool
464 return !MF.getFrameInfo().hasVarSizedObjects();
465}
466
470 const AArch64InstrInfo *TII =
471 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
472 DebugLoc DL = I->getDebugLoc();
473 unsigned Opc = I->getOpcode();
474 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
475 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
476
477 if (!hasReservedCallFrame(MF)) {
478 int64_t Amount = I->getOperand(0).getImm();
479 Amount = alignTo(Amount, getStackAlign());
480 if (!IsDestroy)
481 Amount = -Amount;
482
483 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
484 // doesn't have to pop anything), then the first operand will be zero too so
485 // this adjustment is a no-op.
486 if (CalleePopAmount == 0) {
487 // FIXME: in-function stack adjustment for calls is limited to 24-bits
488 // because there's no guaranteed temporary register available.
489 //
490 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
491 // 1) For offset <= 12-bit, we use LSL #0
492 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
493 // LSL #0, and the other uses LSL #12.
494 //
495 // Most call frames will be allocated at the start of a function so
496 // this is OK, but it is a limitation that needs dealing with.
497 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
498 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
499 StackOffset::getFixed(Amount), TII);
500 }
501 } else if (CalleePopAmount != 0) {
502 // If the calling convention demands that the callee pops arguments from the
503 // stack, we want to add it back if we have a reserved call frame.
504 assert(CalleePopAmount < 0xffffff && "call frame too large");
505 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
506 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
507 }
508 return MBB.erase(I);
509}
510
511void AArch64FrameLowering::emitCalleeSavedGPRLocations(
514 MachineFrameInfo &MFI = MF.getFrameInfo();
515
516 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
517 if (CSI.empty())
518 return;
519
520 const TargetSubtargetInfo &STI = MF.getSubtarget();
521 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
522 const TargetInstrInfo &TII = *STI.getInstrInfo();
524
525 for (const auto &Info : CSI) {
526 if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector)
527 continue;
528
529 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
530 unsigned DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
531
532 int64_t Offset =
533 MFI.getObjectOffset(Info.getFrameIdx()) - getOffsetOfLocalArea();
534 unsigned CFIIndex = MF.addFrameInst(
535 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
536 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
537 .addCFIIndex(CFIIndex)
539 }
540}
541
542void AArch64FrameLowering::emitCalleeSavedSVELocations(
545 MachineFrameInfo &MFI = MF.getFrameInfo();
546
547 // Add callee saved registers to move list.
548 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
549 if (CSI.empty())
550 return;
551
552 const TargetSubtargetInfo &STI = MF.getSubtarget();
553 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
554 const TargetInstrInfo &TII = *STI.getInstrInfo();
557
558 for (const auto &Info : CSI) {
559 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
560 continue;
561
562 // Not all unwinders may know about SVE registers, so assume the lowest
563 // common demoninator.
564 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
565 unsigned Reg = Info.getReg();
566 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
567 continue;
568
570 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
572
573 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
574 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
575 .addCFIIndex(CFIIndex)
577 }
578}
579
580static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF,
583 unsigned DwarfReg) {
584 unsigned CFIIndex =
585 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
586 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
587}
588
590 MachineBasicBlock &MBB) const {
591
593 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
594 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
595 const auto &TRI =
596 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
597 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
598
599 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
600 DebugLoc DL;
601
602 // Reset the CFA to `SP + 0`.
604 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
605 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
606 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
607
608 // Flip the RA sign state.
609 if (MFI.shouldSignReturnAddress(MF)) {
611 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
612 }
613
614 // Shadow call stack uses X18, reset it.
616 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
617 TRI.getDwarfRegNum(AArch64::X18, true));
618
619 // Emit .cfi_same_value for callee-saved registers.
620 const std::vector<CalleeSavedInfo> &CSI =
622 for (const auto &Info : CSI) {
623 unsigned Reg = Info.getReg();
624 if (!TRI.regNeedsCFI(Reg, Reg))
625 continue;
626 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
627 TRI.getDwarfRegNum(Reg, true));
628 }
629}
630
633 bool SVE) {
635 MachineFrameInfo &MFI = MF.getFrameInfo();
636
637 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
638 if (CSI.empty())
639 return;
640
641 const TargetSubtargetInfo &STI = MF.getSubtarget();
642 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
643 const TargetInstrInfo &TII = *STI.getInstrInfo();
645
646 for (const auto &Info : CSI) {
647 if (SVE !=
648 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
649 continue;
650
651 unsigned Reg = Info.getReg();
652 if (SVE &&
653 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
654 continue;
655
656 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
657 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
658 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
659 .addCFIIndex(CFIIndex)
661 }
662}
663
664void AArch64FrameLowering::emitCalleeSavedGPRRestores(
667}
668
669void AArch64FrameLowering::emitCalleeSavedSVERestores(
672}
673
674static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
675 switch (Reg.id()) {
676 default:
677 // The called routine is expected to preserve r19-r28
678 // r29 and r30 are used as frame pointer and link register resp.
679 return 0;
680
681 // GPRs
682#define CASE(n) \
683 case AArch64::W##n: \
684 case AArch64::X##n: \
685 return AArch64::X##n
686 CASE(0);
687 CASE(1);
688 CASE(2);
689 CASE(3);
690 CASE(4);
691 CASE(5);
692 CASE(6);
693 CASE(7);
694 CASE(8);
695 CASE(9);
696 CASE(10);
697 CASE(11);
698 CASE(12);
699 CASE(13);
700 CASE(14);
701 CASE(15);
702 CASE(16);
703 CASE(17);
704 CASE(18);
705#undef CASE
706
707 // FPRs
708#define CASE(n) \
709 case AArch64::B##n: \
710 case AArch64::H##n: \
711 case AArch64::S##n: \
712 case AArch64::D##n: \
713 case AArch64::Q##n: \
714 return HasSVE ? AArch64::Z##n : AArch64::Q##n
715 CASE(0);
716 CASE(1);
717 CASE(2);
718 CASE(3);
719 CASE(4);
720 CASE(5);
721 CASE(6);
722 CASE(7);
723 CASE(8);
724 CASE(9);
725 CASE(10);
726 CASE(11);
727 CASE(12);
728 CASE(13);
729 CASE(14);
730 CASE(15);
731 CASE(16);
732 CASE(17);
733 CASE(18);
734 CASE(19);
735 CASE(20);
736 CASE(21);
737 CASE(22);
738 CASE(23);
739 CASE(24);
740 CASE(25);
741 CASE(26);
742 CASE(27);
743 CASE(28);
744 CASE(29);
745 CASE(30);
746 CASE(31);
747#undef CASE
748 }
749}
750
751void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
752 MachineBasicBlock &MBB) const {
753 // Insertion point.
755
756 // Fake a debug loc.
757 DebugLoc DL;
758 if (MBBI != MBB.end())
759 DL = MBBI->getDebugLoc();
760
761 const MachineFunction &MF = *MBB.getParent();
764
765 BitVector GPRsToZero(TRI.getNumRegs());
766 BitVector FPRsToZero(TRI.getNumRegs());
767 bool HasSVE = STI.hasSVE();
768 for (MCRegister Reg : RegsToZero.set_bits()) {
769 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
770 // For GPRs, we only care to clear out the 64-bit register.
771 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
772 GPRsToZero.set(XReg);
773 } else if (AArch64::FPR128RegClass.contains(Reg) ||
774 AArch64::FPR64RegClass.contains(Reg) ||
775 AArch64::FPR32RegClass.contains(Reg) ||
776 AArch64::FPR16RegClass.contains(Reg) ||
777 AArch64::FPR8RegClass.contains(Reg)) {
778 // For FPRs,
779 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
780 FPRsToZero.set(XReg);
781 }
782 }
783
784 const AArch64InstrInfo &TII = *STI.getInstrInfo();
785
786 // Zero out GPRs.
787 for (MCRegister Reg : GPRsToZero.set_bits())
788 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), Reg).addImm(0);
789
790 // Zero out FP/vector registers.
791 for (MCRegister Reg : FPRsToZero.set_bits())
792 if (HasSVE)
793 BuildMI(MBB, MBBI, DL, TII.get(AArch64::DUP_ZI_D), Reg)
794 .addImm(0)
795 .addImm(0);
796 else
797 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVIv2d_ns), Reg).addImm(0);
798
799 if (HasSVE) {
800 for (MCRegister PReg :
801 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
802 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
803 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
804 AArch64::P15}) {
805 if (RegsToZero[PReg])
806 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
807 }
808 }
809}
810
811// Find a scratch register that we can use at the start of the prologue to
812// re-align the stack pointer. We avoid using callee-save registers since they
813// may appear to be free when this is called from canUseAsPrologue (during
814// shrink wrapping), but then no longer be free when this is called from
815// emitPrologue.
816//
817// FIXME: This is a bit conservative, since in the above case we could use one
818// of the callee-save registers as a scratch temp to re-align the stack pointer,
819// but we would then have to make sure that we were in fact saving at least one
820// callee-save register in the prologue, which is additional complexity that
821// doesn't seem worth the benefit.
824
825 // If MBB is an entry block, use X9 as the scratch register
826 if (&MF->front() == MBB)
827 return AArch64::X9;
828
829 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
830 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
831 LivePhysRegs LiveRegs(TRI);
832 LiveRegs.addLiveIns(*MBB);
833
834 // Mark callee saved registers as used so we will not choose them.
835 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
836 for (unsigned i = 0; CSRegs[i]; ++i)
837 LiveRegs.addReg(CSRegs[i]);
838
839 // Prefer X9 since it was historically used for the prologue scratch reg.
840 const MachineRegisterInfo &MRI = MF->getRegInfo();
841 if (LiveRegs.available(MRI, AArch64::X9))
842 return AArch64::X9;
843
844 for (unsigned Reg : AArch64::GPR64RegClass) {
845 if (LiveRegs.available(MRI, Reg))
846 return Reg;
847 }
848 return AArch64::NoRegister;
849}
850
852 const MachineBasicBlock &MBB) const {
853 const MachineFunction *MF = MBB.getParent();
854 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
855 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
856 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
857
858 // Don't need a scratch register if we're not going to re-align the stack.
859 if (!RegInfo->hasStackRealignment(*MF))
860 return true;
861 // Otherwise, we can use any block as long as it has a scratch register
862 // available.
863 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
864}
865
867 uint64_t StackSizeInBytes) {
868 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
869 if (!Subtarget.isTargetWindows())
870 return false;
871 const Function &F = MF.getFunction();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 unsigned StackProbeSize =
875 F.getFnAttributeAsParsedInteger("stack-probe-size", 4096);
876 return (StackSizeInBytes >= StackProbeSize) &&
877 !F.hasFnAttribute("no-stack-arg-probe");
878}
879
880static bool needsWinCFI(const MachineFunction &MF) {
881 const Function &F = MF.getFunction();
882 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
883 F.needsUnwindTableEntry();
884}
885
886bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
887 MachineFunction &MF, uint64_t StackBumpBytes) const {
889 const MachineFrameInfo &MFI = MF.getFrameInfo();
890 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
891 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
892 if (homogeneousPrologEpilog(MF))
893 return false;
894
895 if (AFI->getLocalStackSize() == 0)
896 return false;
897
898 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
899 // (to force a stp with predecrement) to match the packed unwind format,
900 // provided that there actually are any callee saved registers to merge the
901 // decrement with.
902 // This is potentially marginally slower, but allows using the packed
903 // unwind format for functions that both have a local area and callee saved
904 // registers. Using the packed unwind format notably reduces the size of
905 // the unwind info.
906 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
907 MF.getFunction().hasOptSize())
908 return false;
909
910 // 512 is the maximum immediate for stp/ldp that will be used for
911 // callee-save save/restores
912 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
913 return false;
914
915 if (MFI.hasVarSizedObjects())
916 return false;
917
918 if (RegInfo->hasStackRealignment(MF))
919 return false;
920
921 // This isn't strictly necessary, but it simplifies things a bit since the
922 // current RedZone handling code assumes the SP is adjusted by the
923 // callee-save save/restore code.
924 if (canUseRedZone(MF))
925 return false;
926
927 // When there is an SVE area on the stack, always allocate the
928 // callee-saves and spills/locals separately.
929 if (getSVEStackSize(MF))
930 return false;
931
932 return true;
933}
934
935bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
936 MachineBasicBlock &MBB, unsigned StackBumpBytes) const {
937 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
938 return false;
939
940 if (MBB.empty())
941 return true;
942
943 // Disable combined SP bump if the last instruction is an MTE tag store. It
944 // is almost always better to merge SP adjustment into those instructions.
947 while (LastI != Begin) {
948 --LastI;
949 if (LastI->isTransient())
950 continue;
951 if (!LastI->getFlag(MachineInstr::FrameDestroy))
952 break;
953 }
954 switch (LastI->getOpcode()) {
955 case AArch64::STGloop:
956 case AArch64::STZGloop:
957 case AArch64::STGi:
958 case AArch64::STZGi:
959 case AArch64::ST2Gi:
960 case AArch64::STZ2Gi:
961 return false;
962 default:
963 return true;
964 }
965 llvm_unreachable("unreachable");
966}
967
968// Given a load or a store instruction, generate an appropriate unwinding SEH
969// code on Windows.
971 const TargetInstrInfo &TII,
973 unsigned Opc = MBBI->getOpcode();
975 MachineFunction &MF = *MBB->getParent();
976 DebugLoc DL = MBBI->getDebugLoc();
977 unsigned ImmIdx = MBBI->getNumOperands() - 1;
978 int Imm = MBBI->getOperand(ImmIdx).getImm();
980 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
981 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
982
983 switch (Opc) {
984 default:
985 llvm_unreachable("No SEH Opcode for this instruction");
986 case AArch64::LDPDpost:
987 Imm = -Imm;
988 [[fallthrough]];
989 case AArch64::STPDpre: {
990 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
991 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
992 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
993 .addImm(Reg0)
994 .addImm(Reg1)
995 .addImm(Imm * 8)
996 .setMIFlag(Flag);
997 break;
998 }
999 case AArch64::LDPXpost:
1000 Imm = -Imm;
1001 [[fallthrough]];
1002 case AArch64::STPXpre: {
1003 Register Reg0 = MBBI->getOperand(1).getReg();
1004 Register Reg1 = MBBI->getOperand(2).getReg();
1005 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1006 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1007 .addImm(Imm * 8)
1008 .setMIFlag(Flag);
1009 else
1010 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1011 .addImm(RegInfo->getSEHRegNum(Reg0))
1012 .addImm(RegInfo->getSEHRegNum(Reg1))
1013 .addImm(Imm * 8)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDRDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STRDpre: {
1021 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1023 .addImm(Reg)
1024 .addImm(Imm)
1025 .setMIFlag(Flag);
1026 break;
1027 }
1028 case AArch64::LDRXpost:
1029 Imm = -Imm;
1030 [[fallthrough]];
1031 case AArch64::STRXpre: {
1032 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1033 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1034 .addImm(Reg)
1035 .addImm(Imm)
1036 .setMIFlag(Flag);
1037 break;
1038 }
1039 case AArch64::STPDi:
1040 case AArch64::LDPDi: {
1041 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1042 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1043 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1044 .addImm(Reg0)
1045 .addImm(Reg1)
1046 .addImm(Imm * 8)
1047 .setMIFlag(Flag);
1048 break;
1049 }
1050 case AArch64::STPXi:
1051 case AArch64::LDPXi: {
1052 Register Reg0 = MBBI->getOperand(0).getReg();
1053 Register Reg1 = MBBI->getOperand(1).getReg();
1054 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1055 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1056 .addImm(Imm * 8)
1057 .setMIFlag(Flag);
1058 else
1059 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1060 .addImm(RegInfo->getSEHRegNum(Reg0))
1061 .addImm(RegInfo->getSEHRegNum(Reg1))
1062 .addImm(Imm * 8)
1063 .setMIFlag(Flag);
1064 break;
1065 }
1066 case AArch64::STRXui:
1067 case AArch64::LDRXui: {
1068 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1069 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1070 .addImm(Reg)
1071 .addImm(Imm * 8)
1072 .setMIFlag(Flag);
1073 break;
1074 }
1075 case AArch64::STRDui:
1076 case AArch64::LDRDui: {
1077 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1078 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1079 .addImm(Reg)
1080 .addImm(Imm * 8)
1081 .setMIFlag(Flag);
1082 break;
1083 }
1084 }
1085 auto I = MBB->insertAfter(MBBI, MIB);
1086 return I;
1087}
1088
1089// Fix up the SEH opcode associated with the save/restore instruction.
1091 unsigned LocalStackSize) {
1092 MachineOperand *ImmOpnd = nullptr;
1093 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1094 switch (MBBI->getOpcode()) {
1095 default:
1096 llvm_unreachable("Fix the offset in the SEH instruction");
1097 case AArch64::SEH_SaveFPLR:
1098 case AArch64::SEH_SaveRegP:
1099 case AArch64::SEH_SaveReg:
1100 case AArch64::SEH_SaveFRegP:
1101 case AArch64::SEH_SaveFReg:
1102 ImmOpnd = &MBBI->getOperand(ImmIdx);
1103 break;
1104 }
1105 if (ImmOpnd)
1106 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1107}
1108
1109// Convert callee-save register save/restore instruction to do stack pointer
1110// decrement/increment to allocate/deallocate the callee-save stack area by
1111// converting store/load to use pre/post increment version.
1114 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1115 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1117 int CFAOffset = 0) {
1118 unsigned NewOpc;
1119 switch (MBBI->getOpcode()) {
1120 default:
1121 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1122 case AArch64::STPXi:
1123 NewOpc = AArch64::STPXpre;
1124 break;
1125 case AArch64::STPDi:
1126 NewOpc = AArch64::STPDpre;
1127 break;
1128 case AArch64::STPQi:
1129 NewOpc = AArch64::STPQpre;
1130 break;
1131 case AArch64::STRXui:
1132 NewOpc = AArch64::STRXpre;
1133 break;
1134 case AArch64::STRDui:
1135 NewOpc = AArch64::STRDpre;
1136 break;
1137 case AArch64::STRQui:
1138 NewOpc = AArch64::STRQpre;
1139 break;
1140 case AArch64::LDPXi:
1141 NewOpc = AArch64::LDPXpost;
1142 break;
1143 case AArch64::LDPDi:
1144 NewOpc = AArch64::LDPDpost;
1145 break;
1146 case AArch64::LDPQi:
1147 NewOpc = AArch64::LDPQpost;
1148 break;
1149 case AArch64::LDRXui:
1150 NewOpc = AArch64::LDRXpost;
1151 break;
1152 case AArch64::LDRDui:
1153 NewOpc = AArch64::LDRDpost;
1154 break;
1155 case AArch64::LDRQui:
1156 NewOpc = AArch64::LDRQpost;
1157 break;
1158 }
1159 // Get rid of the SEH code associated with the old instruction.
1160 if (NeedsWinCFI) {
1161 auto SEH = std::next(MBBI);
1163 SEH->eraseFromParent();
1164 }
1165
1166 TypeSize Scale = TypeSize::Fixed(1);
1167 unsigned Width;
1168 int64_t MinOffset, MaxOffset;
1169 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1170 NewOpc, Scale, Width, MinOffset, MaxOffset);
1171 (void)Success;
1172 assert(Success && "unknown load/store opcode");
1173
1174 // If the first store isn't right where we want SP then we can't fold the
1175 // update in so create a normal arithmetic instruction instead.
1176 MachineFunction &MF = *MBB.getParent();
1177 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1178 CSStackSizeInc < MinOffset || CSStackSizeInc > MaxOffset) {
1179 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1180 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1181 false, false, nullptr, EmitCFI,
1182 StackOffset::getFixed(CFAOffset));
1183
1184 return std::prev(MBBI);
1185 }
1186
1187 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1188 MIB.addReg(AArch64::SP, RegState::Define);
1189
1190 // Copy all operands other than the immediate offset.
1191 unsigned OpndIdx = 0;
1192 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1193 ++OpndIdx)
1194 MIB.add(MBBI->getOperand(OpndIdx));
1195
1196 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1197 "Unexpected immediate offset in first/last callee-save save/restore "
1198 "instruction!");
1199 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1200 "Unexpected base register in callee-save save/restore instruction!");
1201 assert(CSStackSizeInc % Scale == 0);
1202 MIB.addImm(CSStackSizeInc / (int)Scale);
1203
1204 MIB.setMIFlags(MBBI->getFlags());
1205 MIB.setMemRefs(MBBI->memoperands());
1206
1207 // Generate a new SEH code that corresponds to the new instruction.
1208 if (NeedsWinCFI) {
1209 *HasWinCFI = true;
1210 InsertSEH(*MIB, *TII, FrameFlag);
1211 }
1212
1213 if (EmitCFI) {
1214 unsigned CFIIndex = MF.addFrameInst(
1215 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1216 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1217 .addCFIIndex(CFIIndex)
1218 .setMIFlags(FrameFlag);
1219 }
1220
1221 return std::prev(MBB.erase(MBBI));
1222}
1223
1224// Fixup callee-save register save/restore instructions to take into account
1225// combined SP bump by adding the local stack size to the stack offsets.
1227 uint64_t LocalStackSize,
1228 bool NeedsWinCFI,
1229 bool *HasWinCFI) {
1231 return;
1232
1233 unsigned Opc = MI.getOpcode();
1234 unsigned Scale;
1235 switch (Opc) {
1236 case AArch64::STPXi:
1237 case AArch64::STRXui:
1238 case AArch64::STPDi:
1239 case AArch64::STRDui:
1240 case AArch64::LDPXi:
1241 case AArch64::LDRXui:
1242 case AArch64::LDPDi:
1243 case AArch64::LDRDui:
1244 Scale = 8;
1245 break;
1246 case AArch64::STPQi:
1247 case AArch64::STRQui:
1248 case AArch64::LDPQi:
1249 case AArch64::LDRQui:
1250 Scale = 16;
1251 break;
1252 default:
1253 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1254 }
1255
1256 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1257 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1258 "Unexpected base register in callee-save save/restore instruction!");
1259 // Last operand is immediate offset that needs fixing.
1260 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1261 // All generated opcodes have scaled offsets.
1262 assert(LocalStackSize % Scale == 0);
1263 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1264
1265 if (NeedsWinCFI) {
1266 *HasWinCFI = true;
1267 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1268 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1270 "Expecting a SEH instruction");
1271 fixupSEHOpcode(MBBI, LocalStackSize);
1272 }
1273}
1274
1275static bool isTargetWindows(const MachineFunction &MF) {
1277}
1278
1279// Convenience function to determine whether I is an SVE callee save.
1281 switch (I->getOpcode()) {
1282 default:
1283 return false;
1284 case AArch64::STR_ZXI:
1285 case AArch64::STR_PXI:
1286 case AArch64::LDR_ZXI:
1287 case AArch64::LDR_PXI:
1288 return I->getFlag(MachineInstr::FrameSetup) ||
1289 I->getFlag(MachineInstr::FrameDestroy);
1290 }
1291}
1292
1294 if (!(llvm::any_of(
1296 [](const auto &Info) { return Info.getReg() == AArch64::LR; }) &&
1297 MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack)))
1298 return false;
1299
1301 report_fatal_error("Must reserve x18 to use shadow call stack");
1302
1303 return true;
1304}
1305
1307 MachineFunction &MF,
1310 const DebugLoc &DL, bool NeedsWinCFI,
1311 bool NeedsUnwindInfo) {
1312 // Shadow call stack prolog: str x30, [x18], #8
1313 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1314 .addReg(AArch64::X18, RegState::Define)
1315 .addReg(AArch64::LR)
1316 .addReg(AArch64::X18)
1317 .addImm(8)
1319
1320 // This instruction also makes x18 live-in to the entry block.
1321 MBB.addLiveIn(AArch64::X18);
1322
1323 if (NeedsWinCFI)
1324 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1326
1327 if (NeedsUnwindInfo) {
1328 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1329 // x18 when unwinding past this frame.
1330 static const char CFIInst[] = {
1331 dwarf::DW_CFA_val_expression,
1332 18, // register
1333 2, // length
1334 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1335 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1336 };
1337 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1338 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1339 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1340 .addCFIIndex(CFIIndex)
1342 }
1343}
1344
1346 MachineFunction &MF,
1349 const DebugLoc &DL) {
1350 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1351 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1352 .addReg(AArch64::X18, RegState::Define)
1353 .addReg(AArch64::LR, RegState::Define)
1354 .addReg(AArch64::X18)
1355 .addImm(-8)
1357
1359 unsigned CFIIndex =
1361 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1362 .addCFIIndex(CFIIndex)
1364 }
1365}
1366
1368 MachineBasicBlock &MBB) const {
1370 const MachineFrameInfo &MFI = MF.getFrameInfo();
1371 const Function &F = MF.getFunction();
1372 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1373 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1374 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1375 MachineModuleInfo &MMI = MF.getMMI();
1377 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1378 bool HasFP = hasFP(MF);
1379 bool NeedsWinCFI = needsWinCFI(MF);
1380 bool HasWinCFI = false;
1381 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1382
1383 bool IsFunclet = MBB.isEHFuncletEntry();
1384
1385 // At this point, we're going to decide whether or not the function uses a
1386 // redzone. In most cases, the function doesn't have a redzone so let's
1387 // assume that's false and set it to true in the case that there's a redzone.
1388 AFI->setHasRedZone(false);
1389
1390 // Debug location must be unknown since the first debug location is used
1391 // to determine the end of the prologue.
1392 DebugLoc DL;
1393
1394 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1396 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1397 MFnI.needsDwarfUnwindInfo(MF));
1398
1399 if (MFnI.shouldSignReturnAddress(MF)) {
1400 if (MFnI.shouldSignWithBKey()) {
1401 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITBKEY))
1403 }
1404
1405 // No SEH opcode for this one; it doesn't materialize into an
1406 // instruction on Windows.
1407 BuildMI(MBB, MBBI, DL,
1408 TII->get(MFnI.shouldSignWithBKey() ? AArch64::PACIBSP
1409 : AArch64::PACIASP))
1411
1412 if (EmitCFI) {
1413 unsigned CFIIndex =
1415 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1416 .addCFIIndex(CFIIndex)
1418 } else if (NeedsWinCFI) {
1419 HasWinCFI = true;
1420 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PACSignLR))
1422 }
1423 }
1424 if (EmitCFI && MFnI.isMTETagged()) {
1425 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1427 }
1428
1429 // We signal the presence of a Swift extended frame to external tools by
1430 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1431 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1432 // bits so that is still true.
1433 if (HasFP && AFI->hasSwiftAsyncContext()) {
1436 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1437 // The special symbol below is absolute and has a *value* that can be
1438 // combined with the frame pointer to signal an extended frame.
1439 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1440 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1442 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1443 .addUse(AArch64::FP)
1444 .addUse(AArch64::X16)
1445 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1446 break;
1447 }
1448 [[fallthrough]];
1449
1451 // ORR x29, x29, #0x1000_0000_0000_0000
1452 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1453 .addUse(AArch64::FP)
1454 .addImm(0x1100)
1456 break;
1457
1459 break;
1460 }
1461 }
1462
1463 // All calls are tail calls in GHC calling conv, and functions have no
1464 // prologue/epilogue.
1466 return;
1467
1468 // Set tagged base pointer to the requested stack slot.
1469 // Ideally it should match SP value after prologue.
1470 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1471 if (TBPI)
1473 else
1475
1476 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1477
1478 // getStackSize() includes all the locals in its size calculation. We don't
1479 // include these locals when computing the stack size of a funclet, as they
1480 // are allocated in the parent's stack frame and accessed via the frame
1481 // pointer from the funclet. We only save the callee saved registers in the
1482 // funclet, which are really the callee saved registers of the parent
1483 // function, including the funclet.
1484 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1485 : MFI.getStackSize();
1486 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1487 assert(!HasFP && "unexpected function without stack frame but with FP");
1488 assert(!SVEStackSize &&
1489 "unexpected function without stack frame but with SVE objects");
1490 // All of the stack allocation is for locals.
1491 AFI->setLocalStackSize(NumBytes);
1492 if (!NumBytes)
1493 return;
1494 // REDZONE: If the stack size is less than 128 bytes, we don't need
1495 // to actually allocate.
1496 if (canUseRedZone(MF)) {
1497 AFI->setHasRedZone(true);
1498 ++NumRedZoneFunctions;
1499 } else {
1500 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1501 StackOffset::getFixed(-NumBytes), TII,
1502 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1503 if (EmitCFI) {
1504 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1505 MCSymbol *FrameLabel = MMI.getContext().createTempSymbol();
1506 // Encode the stack size of the leaf function.
1507 unsigned CFIIndex = MF.addFrameInst(
1508 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1509 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1510 .addCFIIndex(CFIIndex)
1512 }
1513 }
1514
1515 if (NeedsWinCFI) {
1516 HasWinCFI = true;
1517 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1519 }
1520
1521 return;
1522 }
1523
1524 bool IsWin64 =
1526 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1527
1528 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1529 // All of the remaining stack allocations are for locals.
1530 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1531 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1532 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1533 if (CombineSPBump) {
1534 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1535 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1536 StackOffset::getFixed(-NumBytes), TII,
1537 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1538 EmitCFI);
1539 NumBytes = 0;
1540 } else if (HomPrologEpilog) {
1541 // Stack has been already adjusted.
1542 NumBytes -= PrologueSaveSize;
1543 } else if (PrologueSaveSize != 0) {
1545 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1546 EmitCFI);
1547 NumBytes -= PrologueSaveSize;
1548 }
1549 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1550
1551 // Move past the saves of the callee-saved registers, fixing up the offsets
1552 // and pre-inc if we decided to combine the callee-save and local stack
1553 // pointer bump above.
1555 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1557 if (CombineSPBump)
1559 NeedsWinCFI, &HasWinCFI);
1560 ++MBBI;
1561 }
1562
1563 // For funclets the FP belongs to the containing function.
1564 if (!IsFunclet && HasFP) {
1565 // Only set up FP if we actually need to.
1566 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1567
1568 if (CombineSPBump)
1569 FPOffset += AFI->getLocalStackSize();
1570
1571 if (AFI->hasSwiftAsyncContext()) {
1572 // Before we update the live FP we have to ensure there's a valid (or
1573 // null) asynchronous context in its slot just before FP in the frame
1574 // record, so store it now.
1575 const auto &Attrs = MF.getFunction().getAttributes();
1576 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1577 if (HaveInitialContext)
1578 MBB.addLiveIn(AArch64::X22);
1579 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1580 .addUse(HaveInitialContext ? AArch64::X22 : AArch64::XZR)
1581 .addUse(AArch64::SP)
1582 .addImm(FPOffset - 8)
1584 }
1585
1586 if (HomPrologEpilog) {
1587 auto Prolog = MBBI;
1588 --Prolog;
1589 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
1590 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
1591 } else {
1592 // Issue sub fp, sp, FPOffset or
1593 // mov fp,sp when FPOffset is zero.
1594 // Note: All stores of callee-saved registers are marked as "FrameSetup".
1595 // This code marks the instruction(s) that set the FP also.
1596 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
1597 StackOffset::getFixed(FPOffset), TII,
1598 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1599 if (NeedsWinCFI && HasWinCFI) {
1600 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1602 // After setting up the FP, the rest of the prolog doesn't need to be
1603 // included in the SEH unwind info.
1604 NeedsWinCFI = false;
1605 }
1606 }
1607 if (EmitCFI) {
1608 // Define the current CFA rule to use the provided FP.
1609 const int OffsetToFirstCalleeSaveFromFP =
1612 Register FramePtr = RegInfo->getFrameRegister(MF);
1613 unsigned Reg = RegInfo->getDwarfRegNum(FramePtr, true);
1614 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1615 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1616 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1617 .addCFIIndex(CFIIndex)
1619 }
1620 }
1621
1622 // Now emit the moves for whatever callee saved regs we have (including FP,
1623 // LR if those are saved). Frame instructions for SVE register are emitted
1624 // later, after the instruction which actually save SVE regs.
1625 if (EmitCFI)
1626 emitCalleeSavedGPRLocations(MBB, MBBI);
1627
1628 // Alignment is required for the parent frame, not the funclet
1629 const bool NeedsRealignment =
1630 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
1631 int64_t RealignmentPadding =
1632 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
1633 ? MFI.getMaxAlign().value() - 16
1634 : 0;
1635
1636 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
1637 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
1638 if (NeedsWinCFI) {
1639 HasWinCFI = true;
1640 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
1641 // exceed this amount. We need to move at most 2^24 - 1 into x15.
1642 // This is at most two instructions, MOVZ follwed by MOVK.
1643 // TODO: Fix to use multiple stack alloc unwind codes for stacks
1644 // exceeding 256MB in size.
1645 if (NumBytes >= (1 << 28))
1646 report_fatal_error("Stack size cannot exceed 256MB for stack "
1647 "unwinding purposes");
1648
1649 uint32_t LowNumWords = NumWords & 0xFFFF;
1650 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
1651 .addImm(LowNumWords)
1654 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1656 if ((NumWords & 0xFFFF0000) != 0) {
1657 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
1658 .addReg(AArch64::X15)
1659 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
1662 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1664 }
1665 } else {
1666 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
1667 .addImm(NumWords)
1669 }
1670
1671 const char* ChkStk = Subtarget.getChkStkName();
1672 switch (MF.getTarget().getCodeModel()) {
1673 case CodeModel::Tiny:
1674 case CodeModel::Small:
1675 case CodeModel::Medium:
1676 case CodeModel::Kernel:
1677 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
1678 .addExternalSymbol(ChkStk)
1679 .addReg(AArch64::X15, RegState::Implicit)
1684 if (NeedsWinCFI) {
1685 HasWinCFI = true;
1686 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1688 }
1689 break;
1690 case CodeModel::Large:
1691 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
1692 .addReg(AArch64::X16, RegState::Define)
1693 .addExternalSymbol(ChkStk)
1694 .addExternalSymbol(ChkStk)
1696 if (NeedsWinCFI) {
1697 HasWinCFI = true;
1698 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1700 }
1701
1702 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
1703 .addReg(AArch64::X16, RegState::Kill)
1709 if (NeedsWinCFI) {
1710 HasWinCFI = true;
1711 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1713 }
1714 break;
1715 }
1716
1717 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
1718 .addReg(AArch64::SP, RegState::Kill)
1719 .addReg(AArch64::X15, RegState::Kill)
1722 if (NeedsWinCFI) {
1723 HasWinCFI = true;
1724 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
1725 .addImm(NumBytes)
1727 }
1728 NumBytes = 0;
1729
1730 if (RealignmentPadding > 0) {
1731 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
1732 .addReg(AArch64::SP)
1733 .addImm(RealignmentPadding)
1734 .addImm(0);
1735
1736 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1737 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1738 .addReg(AArch64::X15, RegState::Kill)
1740 AFI->setStackRealigned(true);
1741
1742 // No need for SEH instructions here; if we're realigning the stack,
1743 // we've set a frame pointer and already finished the SEH prologue.
1744 assert(!NeedsWinCFI);
1745 }
1746 }
1747
1748 StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};
1749 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
1750
1751 // Process the SVE callee-saves to determine what space needs to be
1752 // allocated.
1753 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
1754 // Find callee save instructions in frame.
1755 CalleeSavesBegin = MBBI;
1756 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
1758 ++MBBI;
1759 CalleeSavesEnd = MBBI;
1760
1761 AllocateBefore = StackOffset::getScalable(CalleeSavedSize);
1762 AllocateAfter = SVEStackSize - AllocateBefore;
1763 }
1764
1765 // Allocate space for the callee saves (if any).
1767 MBB, CalleeSavesBegin, DL, AArch64::SP, AArch64::SP, -AllocateBefore, TII,
1768 MachineInstr::FrameSetup, false, false, nullptr,
1769 EmitCFI && !HasFP && AllocateBefore,
1770 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1771
1772 if (EmitCFI)
1773 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
1774
1775 // Finally allocate remaining SVE stack space.
1776 emitFrameOffset(MBB, CalleeSavesEnd, DL, AArch64::SP, AArch64::SP,
1777 -AllocateAfter, TII, MachineInstr::FrameSetup, false, false,
1778 nullptr, EmitCFI && !HasFP && AllocateAfter,
1779 AllocateBefore + StackOffset::getFixed(
1780 (int64_t)MFI.getStackSize() - NumBytes));
1781
1782 // Allocate space for the rest of the frame.
1783 if (NumBytes) {
1784 unsigned scratchSPReg = AArch64::SP;
1785
1786 if (NeedsRealignment) {
1787 scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
1788 assert(scratchSPReg != AArch64::NoRegister);
1789 }
1790
1791 // If we're a leaf function, try using the red zone.
1792 if (!canUseRedZone(MF)) {
1793 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
1794 // the correct value here, as NumBytes also includes padding bytes,
1795 // which shouldn't be counted here.
1797 MBB, MBBI, DL, scratchSPReg, AArch64::SP,
1799 false, NeedsWinCFI, &HasWinCFI, EmitCFI && !HasFP,
1800 SVEStackSize +
1801 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes));
1802 }
1803 if (NeedsRealignment) {
1804 assert(MFI.getMaxAlign() > Align(1));
1805 assert(scratchSPReg != AArch64::SP);
1806
1807 // SUB X9, SP, NumBytes
1808 // -- X9 is temporary register, so shouldn't contain any live data here,
1809 // -- free to use. This is already produced by emitFrameOffset above.
1810 // AND SP, X9, 0b11111...0000
1811 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
1812
1813 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
1814 .addReg(scratchSPReg, RegState::Kill)
1816 AFI->setStackRealigned(true);
1817
1818 // No need for SEH instructions here; if we're realigning the stack,
1819 // we've set a frame pointer and already finished the SEH prologue.
1820 assert(!NeedsWinCFI);
1821 }
1822 }
1823
1824 // If we need a base pointer, set it up here. It's whatever the value of the
1825 // stack pointer is at this point. Any variable size objects will be allocated
1826 // after this, so we can still use the base pointer to reference locals.
1827 //
1828 // FIXME: Clarify FrameSetup flags here.
1829 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
1830 // needed.
1831 // For funclets the BP belongs to the containing function.
1832 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
1833 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
1834 false);
1835 if (NeedsWinCFI) {
1836 HasWinCFI = true;
1837 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1839 }
1840 }
1841
1842 // The very last FrameSetup instruction indicates the end of prologue. Emit a
1843 // SEH opcode indicating the prologue end.
1844 if (NeedsWinCFI && HasWinCFI) {
1845 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1847 }
1848
1849 // SEH funclets are passed the frame pointer in X1. If the parent
1850 // function uses the base register, then the base register is used
1851 // directly, and is not retrieved from X1.
1852 if (IsFunclet && F.hasPersonalityFn()) {
1853 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
1854 if (isAsynchronousEHPersonality(Per)) {
1855 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
1856 .addReg(AArch64::X1)
1858 MBB.addLiveIn(AArch64::X1);
1859 }
1860 }
1861}
1862
1864 bool NeedsWinCFI, bool *HasWinCFI) {
1865 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
1866 if (!MFI.shouldSignReturnAddress(MF))
1867 return;
1868 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1869 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1870
1872 DebugLoc DL;
1873 if (MBBI != MBB.end())
1874 DL = MBBI->getDebugLoc();
1875
1876 // The AUTIASP instruction assembles to a hint instruction before v8.3a so
1877 // this instruction can safely used for any v8a architecture.
1878 // From v8.3a onwards there are optimised authenticate LR and return
1879 // instructions, namely RETA{A,B}, that can be used instead. In this case the
1880 // DW_CFA_AARCH64_negate_ra_state can't be emitted.
1881 if (Subtarget.hasPAuth() &&
1882 !MF.getFunction().hasFnAttribute(Attribute::ShadowCallStack) &&
1883 MBBI != MBB.end() && MBBI->getOpcode() == AArch64::RET_ReallyLR &&
1884 !NeedsWinCFI) {
1885 BuildMI(MBB, MBBI, DL,
1886 TII->get(MFI.shouldSignWithBKey() ? AArch64::RETAB : AArch64::RETAA))
1888 MBB.erase(MBBI);
1889 } else {
1890 BuildMI(
1891 MBB, MBBI, DL,
1892 TII->get(MFI.shouldSignWithBKey() ? AArch64::AUTIBSP : AArch64::AUTIASP))
1894
1895 unsigned CFIIndex =
1897 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1898 .addCFIIndex(CFIIndex)
1900 if (NeedsWinCFI) {
1901 *HasWinCFI = true;
1902 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PACSignLR))
1904 }
1905 }
1906}
1907
1909 switch (MI.getOpcode()) {
1910 default:
1911 return false;
1912 case AArch64::CATCHRET:
1913 case AArch64::CLEANUPRET:
1914 return true;
1915 }
1916}
1917
1919 MachineBasicBlock &MBB) const {
1921 MachineFrameInfo &MFI = MF.getFrameInfo();
1922 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1923 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1924 DebugLoc DL;
1925 bool NeedsWinCFI = needsWinCFI(MF);
1926 bool EmitCFI =
1927 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1928 bool HasWinCFI = false;
1929 bool IsFunclet = false;
1930 auto WinCFI = make_scope_exit([&]() { assert(HasWinCFI == MF.hasWinCFI()); });
1931
1932 if (MBB.end() != MBBI) {
1933 DL = MBBI->getDebugLoc();
1934 IsFunclet = isFuncletReturnInstr(*MBBI);
1935 }
1936
1937 auto FinishingTouches = make_scope_exit([&]() {
1938 InsertReturnAddressAuth(MF, MBB, NeedsWinCFI, &HasWinCFI);
1941 if (EmitCFI)
1942 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
1943 if (HasWinCFI)
1945 TII->get(AArch64::SEH_EpilogEnd))
1947 });
1948
1949 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1950 : MFI.getStackSize();
1952
1953 // All calls are tail calls in GHC calling conv, and functions have no
1954 // prologue/epilogue.
1956 return;
1957
1958 // How much of the stack used by incoming arguments this function is expected
1959 // to restore in this particular epilogue.
1960 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
1961 bool IsWin64 =
1963 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1964
1965 int64_t AfterCSRPopSize = ArgumentStackToRestore;
1966 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1967 // We cannot rely on the local stack size set in emitPrologue if the function
1968 // has funclets, as funclets have different local stack size requirements, and
1969 // the current value set in emitPrologue may be that of the containing
1970 // function.
1971 if (MF.hasEHFunclets())
1972 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1973 if (homogeneousPrologEpilog(MF, &MBB)) {
1974 assert(!NeedsWinCFI);
1975 auto LastPopI = MBB.getFirstTerminator();
1976 if (LastPopI != MBB.begin()) {
1977 auto HomogeneousEpilog = std::prev(LastPopI);
1978 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
1979 LastPopI = HomogeneousEpilog;
1980 }
1981
1982 // Adjust local stack
1983 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
1985 MachineInstr::FrameDestroy, false, NeedsWinCFI);
1986
1987 // SP has been already adjusted while restoring callee save regs.
1988 // We've bailed-out the case with adjusting SP for arguments.
1989 assert(AfterCSRPopSize == 0);
1990 return;
1991 }
1992 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
1993 // Assume we can't combine the last pop with the sp restore.
1994
1995 bool CombineAfterCSRBump = false;
1996 if (!CombineSPBump && PrologueSaveSize != 0) {
1998 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2000 Pop = std::prev(Pop);
2001 // Converting the last ldp to a post-index ldp is valid only if the last
2002 // ldp's offset is 0.
2003 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2004 // If the offset is 0 and the AfterCSR pop is not actually trying to
2005 // allocate more stack for arguments (in space that an untimely interrupt
2006 // may clobber), convert it to a post-index ldp.
2007 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2009 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2010 MachineInstr::FrameDestroy, PrologueSaveSize);
2011 } else {
2012 // If not, make sure to emit an add after the last ldp.
2013 // We're doing this by transfering the size to be restored from the
2014 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2015 // pops.
2016 AfterCSRPopSize += PrologueSaveSize;
2017 CombineAfterCSRBump = true;
2018 }
2019 }
2020
2021 // Move past the restores of the callee-saved registers.
2022 // If we plan on combining the sp bump of the local stack size and the callee
2023 // save stack size, we might need to adjust the CSR save and restore offsets.
2026 while (LastPopI != Begin) {
2027 --LastPopI;
2028 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2029 IsSVECalleeSave(LastPopI)) {
2030 ++LastPopI;
2031 break;
2032 } else if (CombineSPBump)
2034 NeedsWinCFI, &HasWinCFI);
2035 }
2036
2037 if (MF.hasWinCFI()) {
2038 // If the prologue didn't contain any SEH opcodes and didn't set the
2039 // MF.hasWinCFI() flag, assume the epilogue won't either, and skip the
2040 // EpilogStart - to avoid generating CFI for functions that don't need it.
2041 // (And as we didn't generate any prologue at all, it would be asymmetrical
2042 // to the epilogue.) By the end of the function, we assert that
2043 // HasWinCFI is equal to MF.hasWinCFI(), to verify this assumption.
2044 HasWinCFI = true;
2045 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2047 }
2048
2049 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2052 // Avoid the reload as it is GOT relative, and instead fall back to the
2053 // hardcoded value below. This allows a mismatch between the OS and
2054 // application without immediately terminating on the difference.
2055 [[fallthrough]];
2057 // We need to reset FP to its untagged state on return. Bit 60 is
2058 // currently used to show the presence of an extended frame.
2059
2060 // BIC x29, x29, #0x1000_0000_0000_0000
2061 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2062 AArch64::FP)
2063 .addUse(AArch64::FP)
2064 .addImm(0x10fe)
2066 break;
2067
2069 break;
2070 }
2071 }
2072
2073 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2074
2075 // If there is a single SP update, insert it before the ret and we're done.
2076 if (CombineSPBump) {
2077 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2078
2079 // When we are about to restore the CSRs, the CFA register is SP again.
2080 if (EmitCFI && hasFP(MF)) {
2081 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2082 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2083 unsigned CFIIndex =
2084 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2085 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2086 .addCFIIndex(CFIIndex)
2088 }
2089
2090 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2091 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2092 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2093 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2094 return;
2095 }
2096
2097 NumBytes -= PrologueSaveSize;
2098 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2099
2100 // Process the SVE callee-saves to determine what space needs to be
2101 // deallocated.
2102 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2103 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2104 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2105 RestoreBegin = std::prev(RestoreEnd);
2106 while (RestoreBegin != MBB.begin() &&
2107 IsSVECalleeSave(std::prev(RestoreBegin)))
2108 --RestoreBegin;
2109
2110 assert(IsSVECalleeSave(RestoreBegin) &&
2111 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2112
2113 StackOffset CalleeSavedSizeAsOffset =
2114 StackOffset::getScalable(CalleeSavedSize);
2115 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2116 DeallocateAfter = CalleeSavedSizeAsOffset;
2117 }
2118
2119 // Deallocate the SVE area.
2120 if (SVEStackSize) {
2121 // If we have stack realignment or variable sized objects on the stack,
2122 // restore the stack pointer from the frame pointer prior to SVE CSR
2123 // restoration.
2124 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2125 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2126 // Set SP to start of SVE callee-save area from which they can
2127 // be reloaded. The code below will deallocate the stack space
2128 // space by moving FP -> SP.
2129 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2130 StackOffset::getScalable(-CalleeSavedSize), TII,
2132 }
2133 } else {
2134 if (AFI->getSVECalleeSavedStackSize()) {
2135 // Deallocate the non-SVE locals first before we can deallocate (and
2136 // restore callee saves) from the SVE area.
2138 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2140 false, false, nullptr, EmitCFI && !hasFP(MF),
2141 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2142 NumBytes = 0;
2143 }
2144
2145 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2146 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2147 false, nullptr, EmitCFI && !hasFP(MF),
2148 SVEStackSize +
2149 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2150
2151 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2152 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2153 false, nullptr, EmitCFI && !hasFP(MF),
2154 DeallocateAfter +
2155 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2156 }
2157 if (EmitCFI)
2158 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2159 }
2160
2161 if (!hasFP(MF)) {
2162 bool RedZone = canUseRedZone(MF);
2163 // If this was a redzone leaf function, we don't need to restore the
2164 // stack pointer (but we may need to pop stack args for fastcc).
2165 if (RedZone && AfterCSRPopSize == 0)
2166 return;
2167
2168 // Pop the local variables off the stack. If there are no callee-saved
2169 // registers, it means we are actually positioned at the terminator and can
2170 // combine stack increment for the locals and the stack increment for
2171 // callee-popped arguments into (possibly) a single instruction and be done.
2172 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2173 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2174 if (NoCalleeSaveRestore)
2175 StackRestoreBytes += AfterCSRPopSize;
2176
2178 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2179 StackOffset::getFixed(StackRestoreBytes), TII,
2180 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2181 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2182
2183 // If we were able to combine the local stack pop with the argument pop,
2184 // then we're done.
2185 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2186 return;
2187 }
2188
2189 NumBytes = 0;
2190 }
2191
2192 // Restore the original stack pointer.
2193 // FIXME: Rather than doing the math here, we should instead just use
2194 // non-post-indexed loads for the restores if we aren't actually going to
2195 // be able to save any instructions.
2196 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2198 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2200 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI);
2201 } else if (NumBytes)
2202 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2203 StackOffset::getFixed(NumBytes), TII,
2204 MachineInstr::FrameDestroy, false, NeedsWinCFI);
2205
2206 // When we are about to restore the CSRs, the CFA register is SP again.
2207 if (EmitCFI && hasFP(MF)) {
2208 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2209 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2210 unsigned CFIIndex = MF.addFrameInst(
2211 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2212 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2213 .addCFIIndex(CFIIndex)
2215 }
2216
2217 // This must be placed after the callee-save restore code because that code
2218 // assumes the SP is at the same location as it was after the callee-save save
2219 // code in the prologue.
2220 if (AfterCSRPopSize) {
2221 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2222 "interrupt may have clobbered");
2223
2225 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2227 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2228 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2229 }
2230}
2231
2232/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2233/// debug info. It's the same as what we use for resolving the code-gen
2234/// references for now. FIXME: This can go wrong when references are
2235/// SP-relative and simple call frames aren't used.
2238 Register &FrameReg) const {
2240 MF, FI, FrameReg,
2241 /*PreferFP=*/
2242 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress),
2243 /*ForSimm=*/false);
2244}
2245
2248 int FI) const {
2250}
2251
2253 int64_t ObjectOffset) {
2254 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2255 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2256 bool IsWin64 =
2257 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv());
2258 unsigned FixedObject =
2259 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2260 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2261 int64_t FPAdjust =
2262 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2263 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2264}
2265
2267 int64_t ObjectOffset) {
2268 const auto &MFI = MF.getFrameInfo();
2269 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2270}
2271
2272 // TODO: This function currently does not work for scalable vectors.
2274 int FI) const {
2275 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2277 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2278 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2279 ? getFPOffset(MF, ObjectOffset).getFixed()
2280 : getStackOffset(MF, ObjectOffset).getFixed();
2281}
2282
2284 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2285 bool ForSimm) const {
2286 const auto &MFI = MF.getFrameInfo();
2287 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2288 bool isFixed = MFI.isFixedObjectIndex(FI);
2289 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2290 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2291 PreferFP, ForSimm);
2292}
2293
2295 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2296 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2297 const auto &MFI = MF.getFrameInfo();
2298 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2300 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2301 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2302
2303 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2304 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2305 bool isCSR =
2306 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2307
2308 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2309
2310 // Use frame pointer to reference fixed objects. Use it for locals if
2311 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2312 // reliable as a base). Make sure useFPForScavengingIndex() does the
2313 // right thing for the emergency spill slot.
2314 bool UseFP = false;
2315 if (AFI->hasStackFrame() && !isSVE) {
2316 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2317 // there are scalable (SVE) objects in between the FP and the fixed-sized
2318 // objects.
2319 PreferFP &= !SVEStackSize;
2320
2321 // Note: Keeping the following as multiple 'if' statements rather than
2322 // merging to a single expression for readability.
2323 //
2324 // Argument access should always use the FP.
2325 if (isFixed) {
2326 UseFP = hasFP(MF);
2327 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2328 // References to the CSR area must use FP if we're re-aligning the stack
2329 // since the dynamically-sized alignment padding is between the SP/BP and
2330 // the CSR area.
2331 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2332 UseFP = true;
2333 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2334 // If the FPOffset is negative and we're producing a signed immediate, we
2335 // have to keep in mind that the available offset range for negative
2336 // offsets is smaller than for positive ones. If an offset is available
2337 // via the FP and the SP, use whichever is closest.
2338 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2339 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2340
2341 if (MFI.hasVarSizedObjects()) {
2342 // If we have variable sized objects, we can use either FP or BP, as the
2343 // SP offset is unknown. We can use the base pointer if we have one and
2344 // FP is not preferred. If not, we're stuck with using FP.
2345 bool CanUseBP = RegInfo->hasBasePointer(MF);
2346 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2347 UseFP = PreferFP;
2348 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2349 UseFP = true;
2350 // else we can use BP and FP, but the offset from FP won't fit.
2351 // That will make us scavenge registers which we can probably avoid by
2352 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2353 } else if (FPOffset >= 0) {
2354 // Use SP or FP, whichever gives us the best chance of the offset
2355 // being in range for direct access. If the FPOffset is positive,
2356 // that'll always be best, as the SP will be even further away.
2357 UseFP = true;
2358 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2359 // Funclets access the locals contained in the parent's stack frame
2360 // via the frame pointer, so we have to use the FP in the parent
2361 // function.
2362 (void) Subtarget;
2363 assert(
2364 Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv()) &&
2365 "Funclets should only be present on Win64");
2366 UseFP = true;
2367 } else {
2368 // We have the choice between FP and (SP or BP).
2369 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2370 UseFP = true;
2371 }
2372 }
2373 }
2374
2375 assert(
2376 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2377 "In the presence of dynamic stack pointer realignment, "
2378 "non-argument/CSR objects cannot be accessed through the frame pointer");
2379
2380 if (isSVE) {
2381 StackOffset FPOffset =
2383 StackOffset SPOffset =
2384 SVEStackSize +
2385 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2386 ObjectOffset);
2387 // Always use the FP for SVE spills if available and beneficial.
2388 if (hasFP(MF) && (SPOffset.getFixed() ||
2389 FPOffset.getScalable() < SPOffset.getScalable() ||
2390 RegInfo->hasStackRealignment(MF))) {
2391 FrameReg = RegInfo->getFrameRegister(MF);
2392 return FPOffset;
2393 }
2394
2395 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2396 : (unsigned)AArch64::SP;
2397 return SPOffset;
2398 }
2399
2400 StackOffset ScalableOffset = {};
2401 if (UseFP && !(isFixed || isCSR))
2402 ScalableOffset = -SVEStackSize;
2403 if (!UseFP && (isFixed || isCSR))
2404 ScalableOffset = SVEStackSize;
2405
2406 if (UseFP) {
2407 FrameReg = RegInfo->getFrameRegister(MF);
2408 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2409 }
2410
2411 // Use the base pointer if we have one.
2412 if (RegInfo->hasBasePointer(MF))
2413 FrameReg = RegInfo->getBaseRegister();
2414 else {
2415 assert(!MFI.hasVarSizedObjects() &&
2416 "Can't use SP when we have var sized objects.");
2417 FrameReg = AArch64::SP;
2418 // If we're using the red zone for this function, the SP won't actually
2419 // be adjusted, so the offsets will be negative. They're also all
2420 // within range of the signed 9-bit immediate instructions.
2421 if (canUseRedZone(MF))
2422 Offset -= AFI->getLocalStackSize();
2423 }
2424
2425 return StackOffset::getFixed(Offset) + ScalableOffset;
2426}
2427
2428static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2429 // Do not set a kill flag on values that are also marked as live-in. This
2430 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2431 // callee saved registers.
2432 // Omitting the kill flags is conservatively correct even if the live-in
2433 // is not used after all.
2434 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2435 return getKillRegState(!IsLiveIn);
2436}
2437
2439 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2441 return Subtarget.isTargetMachO() &&
2442 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2443 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2445}
2446
2447static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2448 bool NeedsWinCFI, bool IsFirst,
2449 const TargetRegisterInfo *TRI) {
2450 // If we are generating register pairs for a Windows function that requires
2451 // EH support, then pair consecutive registers only. There are no unwind
2452 // opcodes for saves/restores of non-consectuve register pairs.
2453 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2454 // save_lrpair.
2455 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2456
2457 if (Reg2 == AArch64::FP)
2458 return true;
2459 if (!NeedsWinCFI)
2460 return false;
2461 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2462 return false;
2463 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2464 // opcode. If this is the first register pair, it would end up with a
2465 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2466 // if LR is paired with something else than the first register.
2467 // The save_lrpair opcode requires the first register to be an odd one.
2468 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2469 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2470 return false;
2471 return true;
2472}
2473
2474/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2475/// WindowsCFI requires that only consecutive registers can be paired.
2476/// LR and FP need to be allocated together when the frame needs to save
2477/// the frame-record. This means any other register pairing with LR is invalid.
2478static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2479 bool UsesWinAAPCS, bool NeedsWinCFI,
2480 bool NeedsFrameRecord, bool IsFirst,
2481 const TargetRegisterInfo *TRI) {
2482 if (UsesWinAAPCS)
2483 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2484 TRI);
2485
2486 // If we need to store the frame record, don't pair any register
2487 // with LR other than FP.
2488 if (NeedsFrameRecord)
2489 return Reg2 == AArch64::LR;
2490
2491 return false;
2492}
2493
2494namespace {
2495
2496struct RegPairInfo {
2497 unsigned Reg1 = AArch64::NoRegister;
2498 unsigned Reg2 = AArch64::NoRegister;
2499 int FrameIdx;
2500 int Offset;
2501 enum RegType { GPR, FPR64, FPR128, PPR, ZPR } Type;
2502
2503 RegPairInfo() = default;
2504
2505 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2506
2507 unsigned getScale() const {
2508 switch (Type) {
2509 case PPR:
2510 return 2;
2511 case GPR:
2512 case FPR64:
2513 return 8;
2514 case ZPR:
2515 case FPR128:
2516 return 16;
2517 }
2518 llvm_unreachable("Unsupported type");
2519 }
2520
2521 bool isScalable() const { return Type == PPR || Type == ZPR; }
2522};
2523
2524} // end anonymous namespace
2525
2529 bool NeedsFrameRecord) {
2530
2531 if (CSI.empty())
2532 return;
2533
2534 bool IsWindows = isTargetWindows(MF);
2535 bool NeedsWinCFI = needsWinCFI(MF);
2537 MachineFrameInfo &MFI = MF.getFrameInfo();
2539 unsigned Count = CSI.size();
2540 (void)CC;
2541 // MachO's compact unwind format relies on all registers being stored in
2542 // pairs.
2545 (Count & 1) == 0) &&
2546 "Odd number of callee-saved regs to spill!");
2547 int ByteOffset = AFI->getCalleeSavedStackSize();
2548 int StackFillDir = -1;
2549 int RegInc = 1;
2550 unsigned FirstReg = 0;
2551 if (NeedsWinCFI) {
2552 // For WinCFI, fill the stack from the bottom up.
2553 ByteOffset = 0;
2554 StackFillDir = 1;
2555 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2556 // backwards, to pair up registers starting from lower numbered registers.
2557 RegInc = -1;
2558 FirstReg = Count - 1;
2559 }
2560 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
2561 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2562
2563 // When iterating backwards, the loop condition relies on unsigned wraparound.
2564 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2565 RegPairInfo RPI;
2566 RPI.Reg1 = CSI[i].getReg();
2567
2568 if (AArch64::GPR64RegClass.contains(RPI.Reg1))
2569 RPI.Type = RegPairInfo::GPR;
2570 else if (AArch64::FPR64RegClass.contains(RPI.Reg1))
2571 RPI.Type = RegPairInfo::FPR64;
2572 else if (AArch64::FPR128RegClass.contains(RPI.Reg1))
2573 RPI.Type = RegPairInfo::FPR128;
2574 else if (AArch64::ZPRRegClass.contains(RPI.Reg1))
2575 RPI.Type = RegPairInfo::ZPR;
2576 else if (AArch64::PPRRegClass.contains(RPI.Reg1))
2577 RPI.Type = RegPairInfo::PPR;
2578 else
2579 llvm_unreachable("Unsupported register class.");
2580
2581 // Add the next reg to the pair if it is in the same register class.
2582 if (unsigned(i + RegInc) < Count) {
2583 Register NextReg = CSI[i + RegInc].getReg();
2584 bool IsFirst = i == FirstReg;
2585 switch (RPI.Type) {
2586 case RegPairInfo::GPR:
2587 if (AArch64::GPR64RegClass.contains(NextReg) &&
2588 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2589 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2590 TRI))
2591 RPI.Reg2 = NextReg;
2592 break;
2593 case RegPairInfo::FPR64:
2594 if (AArch64::FPR64RegClass.contains(NextReg) &&
2595 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2596 IsFirst, TRI))
2597 RPI.Reg2 = NextReg;
2598 break;
2599 case RegPairInfo::FPR128:
2600 if (AArch64::FPR128RegClass.contains(NextReg))
2601 RPI.Reg2 = NextReg;
2602 break;
2603 case RegPairInfo::PPR:
2604 case RegPairInfo::ZPR:
2605 break;
2606 }
2607 }
2608
2609 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2610 // list to come in sorted by frame index so that we can issue the store
2611 // pair instructions directly. Assert if we see anything otherwise.
2612 //
2613 // The order of the registers in the list is controlled by
2614 // getCalleeSavedRegs(), so they will always be in-order, as well.
2615 assert((!RPI.isPaired() ||
2616 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2617 "Out of order callee saved regs!");
2618
2619 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2620 RPI.Reg1 == AArch64::LR) &&
2621 "FrameRecord must be allocated together with LR");
2622
2623 // Windows AAPCS has FP and LR reversed.
2624 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2625 RPI.Reg2 == AArch64::LR) &&
2626 "FrameRecord must be allocated together with LR");
2627
2628 // MachO's compact unwind format relies on all registers being stored in
2629 // adjacent register pairs.
2632 (RPI.isPaired() &&
2633 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2634 RPI.Reg1 + 1 == RPI.Reg2))) &&
2635 "Callee-save registers not saved as adjacent register pair!");
2636
2637 RPI.FrameIdx = CSI[i].getFrameIdx();
2638 if (NeedsWinCFI &&
2639 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2640 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2641
2642 int Scale = RPI.getScale();
2643
2644 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2645 assert(OffsetPre % Scale == 0);
2646
2647 if (RPI.isScalable())
2648 ScalableByteOffset += StackFillDir * Scale;
2649 else
2650 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2651
2652 // Swift's async context is directly before FP, so allocate an extra
2653 // 8 bytes for it.
2654 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2655 RPI.Reg2 == AArch64::FP)
2656 ByteOffset += StackFillDir * 8;
2657
2658 assert(!(RPI.isScalable() && RPI.isPaired()) &&
2659 "Paired spill/fill instructions don't exist for SVE vectors");
2660
2661 // Round up size of non-pair to pair size if we need to pad the
2662 // callee-save area to ensure 16-byte alignment.
2663 if (NeedGapToAlignStack && !NeedsWinCFI &&
2664 !RPI.isScalable() && RPI.Type != RegPairInfo::FPR128 &&
2665 !RPI.isPaired() && ByteOffset % 16 != 0) {
2666 ByteOffset += 8 * StackFillDir;
2667 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2668 // A stack frame with a gap looks like this, bottom up:
2669 // d9, d8. x21, gap, x20, x19.
2670 // Set extra alignment on the x21 object to create the gap above it.
2671 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2672 NeedGapToAlignStack = false;
2673 }
2674
2675 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2676 assert(OffsetPost % Scale == 0);
2677 // If filling top down (default), we want the offset after incrementing it.
2678 // If fillibg bootom up (WinCFI) we need the original offset.
2679 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2680
2681 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2682 // Swift context can directly precede FP.
2683 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2684 RPI.Reg2 == AArch64::FP)
2685 Offset += 8;
2686 RPI.Offset = Offset / Scale;
2687
2688 assert(((!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2689 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2690 "Offset out of bounds for LDP/STP immediate");
2691
2692 // Save the offset to frame record so that the FP register can point to the
2693 // innermost frame record (spilled FP and LR registers).
2694 if (NeedsFrameRecord && ((!IsWindows && RPI.Reg1 == AArch64::LR &&
2695 RPI.Reg2 == AArch64::FP) ||
2696 (IsWindows && RPI.Reg1 == AArch64::FP &&
2697 RPI.Reg2 == AArch64::LR)))
2699
2700 RegPairs.push_back(RPI);
2701 if (RPI.isPaired())
2702 i += RegInc;
2703 }
2704 if (NeedsWinCFI) {
2705 // If we need an alignment gap in the stack, align the topmost stack
2706 // object. A stack frame with a gap looks like this, bottom up:
2707 // x19, d8. d9, gap.
2708 // Set extra alignment on the topmost stack object (the first element in
2709 // CSI, which goes top down), to create the gap above it.
2710 if (AFI->hasCalleeSaveStackFreeSpace())
2711 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2712 // We iterated bottom up over the registers; flip RegPairs back to top
2713 // down order.
2714 std::reverse(RegPairs.begin(), RegPairs.end());
2715 }
2716}
2717
2721 MachineFunction &MF = *MBB.getParent();
2723 bool NeedsWinCFI = needsWinCFI(MF);
2724 DebugLoc DL;
2726
2727 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2728
2729 const MachineRegisterInfo &MRI = MF.getRegInfo();
2730 if (homogeneousPrologEpilog(MF)) {
2731 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2733
2734 for (auto &RPI : RegPairs) {
2735 MIB.addReg(RPI.Reg1);
2736 MIB.addReg(RPI.Reg2);
2737
2738 // Update register live in.
2739 if (!MRI.isReserved(RPI.Reg1))
2740 MBB.addLiveIn(RPI.Reg1);
2741 if (!MRI.isReserved(RPI.Reg2))
2742 MBB.addLiveIn(RPI.Reg2);
2743 }
2744 return true;
2745 }
2746 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2747 unsigned Reg1 = RPI.Reg1;
2748 unsigned Reg2 = RPI.Reg2;
2749 unsigned StrOpc;
2750
2751 // Issue sequence of spills for cs regs. The first spill may be converted
2752 // to a pre-decrement store later by emitPrologue if the callee-save stack
2753 // area allocation can't be combined with the local stack area allocation.
2754 // For example:
2755 // stp x22, x21, [sp, #0] // addImm(+0)
2756 // stp x20, x19, [sp, #16] // addImm(+2)
2757 // stp fp, lr, [sp, #32] // addImm(+4)
2758 // Rationale: This sequence saves uop updates compared to a sequence of
2759 // pre-increment spills like stp xi,xj,[sp,#-16]!
2760 // Note: Similar rationale and sequence for restores in epilog.
2761 unsigned Size;
2762 Align Alignment;
2763 switch (RPI.Type) {
2764 case RegPairInfo::GPR:
2765 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2766 Size = 8;
2767 Alignment = Align(8);
2768 break;
2769 case RegPairInfo::FPR64:
2770 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2771 Size = 8;
2772 Alignment = Align(8);
2773 break;
2774 case RegPairInfo::FPR128:
2775 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2776 Size = 16;
2777 Alignment = Align(16);
2778 break;
2779 case RegPairInfo::ZPR:
2780 StrOpc = AArch64::STR_ZXI;
2781 Size = 16;
2782 Alignment = Align(16);
2783 break;
2784 case RegPairInfo::PPR:
2785 StrOpc = AArch64::STR_PXI;
2786 Size = 2;
2787 Alignment = Align(2);
2788 break;
2789 }
2790 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2791 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2792 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2793 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2794 dbgs() << ")\n");
2795
2796 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2797 "Windows unwdinding requires a consecutive (FP,LR) pair");
2798 // Windows unwind codes require consecutive registers if registers are
2799 // paired. Make the switch here, so that the code below will save (x,x+1)
2800 // and not (x+1,x).
2801 unsigned FrameIdxReg1 = RPI.FrameIdx;
2802 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2803 if (NeedsWinCFI && RPI.isPaired()) {
2804 std::swap(Reg1, Reg2);
2805 std::swap(FrameIdxReg1, FrameIdxReg2);
2806 }
2807 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2808 if (!MRI.isReserved(Reg1))
2809 MBB.addLiveIn(Reg1);
2810 if (RPI.isPaired()) {
2811 if (!MRI.isReserved(Reg2))
2812 MBB.addLiveIn(Reg2);
2813 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2815 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2816 MachineMemOperand::MOStore, Size, Alignment));
2817 }
2818 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2819 .addReg(AArch64::SP)
2820 .addImm(RPI.Offset) // [sp, #offset*scale],
2821 // where factor*scale is implicit
2824 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2825 MachineMemOperand::MOStore, Size, Alignment));
2826 if (NeedsWinCFI)
2828
2829 // Update the StackIDs of the SVE stack slots.
2830 MachineFrameInfo &MFI = MF.getFrameInfo();
2831 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR)
2832 MFI.setStackID(RPI.FrameIdx, TargetStackID::ScalableVector);
2833
2834 }
2835 return true;
2836}
2837
2841 MachineFunction &MF = *MBB.getParent();
2843 DebugLoc DL;
2845 bool NeedsWinCFI = needsWinCFI(MF);
2846
2847 if (MBBI != MBB.end())
2848 DL = MBBI->getDebugLoc();
2849
2850 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
2851
2852 auto EmitMI = [&](const RegPairInfo &RPI) -> MachineBasicBlock::iterator {
2853 unsigned Reg1 = RPI.Reg1;
2854 unsigned Reg2 = RPI.Reg2;
2855
2856 // Issue sequence of restores for cs regs. The last restore may be converted
2857 // to a post-increment load later by emitEpilogue if the callee-save stack
2858 // area allocation can't be combined with the local stack area allocation.
2859 // For example:
2860 // ldp fp, lr, [sp, #32] // addImm(+4)
2861 // ldp x20, x19, [sp, #16] // addImm(+2)
2862 // ldp x22, x21, [sp, #0] // addImm(+0)
2863 // Note: see comment in spillCalleeSavedRegisters()
2864 unsigned LdrOpc;
2865 unsigned Size;
2866 Align Alignment;
2867 switch (RPI.Type) {
2868 case RegPairInfo::GPR:
2869 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2870 Size = 8;
2871 Alignment = Align(8);
2872 break;
2873 case RegPairInfo::FPR64:
2874 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2875 Size = 8;
2876 Alignment = Align(8);
2877 break;
2878 case RegPairInfo::FPR128:
2879 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2880 Size = 16;
2881 Alignment = Align(16);
2882 break;
2883 case RegPairInfo::ZPR:
2884 LdrOpc = AArch64::LDR_ZXI;
2885 Size = 16;
2886 Alignment = Align(16);
2887 break;
2888 case RegPairInfo::PPR:
2889 LdrOpc = AArch64::LDR_PXI;
2890 Size = 2;
2891 Alignment = Align(2);
2892 break;
2893 }
2894 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2895 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2896 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2897 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2898 dbgs() << ")\n");
2899
2900 // Windows unwind codes require consecutive registers if registers are
2901 // paired. Make the switch here, so that the code below will save (x,x+1)
2902 // and not (x+1,x).
2903 unsigned FrameIdxReg1 = RPI.FrameIdx;
2904 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2905 if (NeedsWinCFI && RPI.isPaired()) {
2906 std::swap(Reg1, Reg2);
2907 std::swap(FrameIdxReg1, FrameIdxReg2);
2908 }
2909 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2910 if (RPI.isPaired()) {
2911 MIB.addReg(Reg2, getDefRegState(true));
2913 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2914 MachineMemOperand::MOLoad, Size, Alignment));
2915 }
2916 MIB.addReg(Reg1, getDefRegState(true))
2917 .addReg(AArch64::SP)
2918 .addImm(RPI.Offset) // [sp, #offset*scale]
2919 // where factor*scale is implicit
2922 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2923 MachineMemOperand::MOLoad, Size, Alignment));
2924 if (NeedsWinCFI)
2926
2927 return MIB->getIterator();
2928 };
2929
2930 // SVE objects are always restored in reverse order.
2931 for (const RegPairInfo &RPI : reverse(RegPairs))
2932 if (RPI.isScalable())
2933 EmitMI(RPI);
2934
2935 if (homogeneousPrologEpilog(MF, &MBB)) {
2936 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2938 for (auto &RPI : RegPairs) {
2939 MIB.addReg(RPI.Reg1, RegState::Define);
2940 MIB.addReg(RPI.Reg2, RegState::Define);
2941 }
2942 return true;
2943 }
2944
2947 for (const RegPairInfo &RPI : reverse(RegPairs)) {
2948 if (RPI.isScalable())
2949 continue;
2950 MachineBasicBlock::iterator It = EmitMI(RPI);
2951 if (First == MBB.end())
2952 First = It;
2953 }
2954 if (First != MBB.end())
2955 MBB.splice(MBBI, &MBB, First);
2956 } else {
2957 for (const RegPairInfo &RPI : RegPairs) {
2958 if (RPI.isScalable())
2959 continue;
2960 (void)EmitMI(RPI);
2961 }
2962 }
2963
2964 return true;
2965}
2966
2968 BitVector &SavedRegs,
2969 RegScavenger *RS) const {
2970 // All calls are tail calls in GHC calling conv, and functions have no
2971 // prologue/epilogue.
2973 return;
2974
2976 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2978 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2980 unsigned UnspilledCSGPR = AArch64::NoRegister;
2981 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2982
2983 MachineFrameInfo &MFI = MF.getFrameInfo();
2984 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2985
2986 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2987 ? RegInfo->getBaseRegister()
2988 : (unsigned)AArch64::NoRegister;
2989
2990 unsigned ExtraCSSpill = 0;
2991 // Figure out which callee-saved registers to save/restore.
2992 for (unsigned i = 0; CSRegs[i]; ++i) {
2993 const unsigned Reg = CSRegs[i];
2994
2995 // Add the base pointer register to SavedRegs if it is callee-save.
2996 if (Reg == BasePointerReg)
2997 SavedRegs.set(Reg);
2998
2999 bool RegUsed = SavedRegs.test(Reg);
3000 unsigned PairedReg = AArch64::NoRegister;
3001 if (AArch64::GPR64RegClass.contains(Reg) ||
3002 AArch64::FPR64RegClass.contains(Reg) ||
3003 AArch64::FPR128RegClass.contains(Reg))
3004 PairedReg = CSRegs[i ^ 1];
3005
3006 if (!RegUsed) {
3007 if (AArch64::GPR64RegClass.contains(Reg) &&
3008 !RegInfo->isReservedReg(MF, Reg)) {
3009 UnspilledCSGPR = Reg;
3010 UnspilledCSGPRPaired = PairedReg;
3011 }
3012 continue;
3013 }
3014
3015 // MachO's compact unwind format relies on all registers being stored in
3016 // pairs.
3017 // FIXME: the usual format is actually better if unwinding isn't needed.
3018 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3019 !SavedRegs.test(PairedReg)) {
3020 SavedRegs.set(PairedReg);
3021 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3022 !RegInfo->isReservedReg(MF, PairedReg))
3023 ExtraCSSpill = PairedReg;
3024 }
3025 }
3026
3028 !Subtarget.isTargetWindows()) {
3029 // For Windows calling convention on a non-windows OS, where X18 is treated
3030 // as reserved, back up X18 when entering non-windows code (marked with the
3031 // Windows calling convention) and restore when returning regardless of
3032 // whether the individual function uses it - it might call other functions
3033 // that clobber it.
3034 SavedRegs.set(AArch64::X18);
3035 }
3036
3037 // Calculates the callee saved stack size.
3038 unsigned CSStackSize = 0;
3039 unsigned SVECSStackSize = 0;
3041 const MachineRegisterInfo &MRI = MF.getRegInfo();
3042 for (unsigned Reg : SavedRegs.set_bits()) {
3043 auto RegSize = TRI->getRegSizeInBits(Reg, MRI) / 8;
3044 if (AArch64::PPRRegClass.contains(Reg) ||
3045 AArch64::ZPRRegClass.contains(Reg))
3046 SVECSStackSize += RegSize;
3047 else
3048 CSStackSize += RegSize;
3049 }
3050
3051 // Save number of saved regs, so we can easily update CSStackSize later.
3052 unsigned NumSavedRegs = SavedRegs.count();
3053
3054 // The frame record needs to be created by saving the appropriate registers
3055 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3056 if (hasFP(MF) ||
3057 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3058 SavedRegs.set(AArch64::FP);
3059 SavedRegs.set(AArch64::LR);
3060 }
3061
3062 LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3063 for (unsigned Reg
3064 : SavedRegs.set_bits()) dbgs()
3065 << ' ' << printReg(Reg, RegInfo);
3066 dbgs() << "\n";);
3067
3068 // If any callee-saved registers are used, the frame cannot be eliminated.
3069 int64_t SVEStackSize =
3070 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3071 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3072
3073 // The CSR spill slots have not been allocated yet, so estimateStackSize
3074 // won't include them.
3075 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3076
3077 // We may address some of the stack above the canonical frame address, either
3078 // for our own arguments or during a call. Include that in calculating whether
3079 // we have complicated addressing concerns.
3080 int64_t CalleeStackUsed = 0;
3081 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3082 int64_t FixedOff = MFI.getObjectOffset(I);
3083 if (FixedOff > CalleeStackUsed) CalleeStackUsed = FixedOff;
3084 }
3085
3086 // Conservatively always assume BigStack when there are SVE spills.
3087 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3088 CalleeStackUsed) > EstimatedStackSizeLimit;
3089 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3090 AFI->setHasStackFrame(true);
3091
3092 // Estimate if we might need to scavenge a register at some point in order
3093 // to materialize a stack offset. If so, either spill one additional
3094 // callee-saved register or reserve a special spill slot to facilitate
3095 // register scavenging. If we already spilled an extra callee-saved register
3096 // above to keep the number of spills even, we don't need to do anything else
3097 // here.
3098 if (BigStack) {
3099 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3100 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3101 << " to get a scratch register.\n");
3102 SavedRegs.set(UnspilledCSGPR);
3103 // MachO's compact unwind format relies on all registers being stored in
3104 // pairs, so if we need to spill one extra for BigStack, then we need to
3105 // store the pair.
3106 if (producePairRegisters(MF))
3107 SavedRegs.set(UnspilledCSGPRPaired);
3108 ExtraCSSpill = UnspilledCSGPR;
3109 }
3110
3111 // If we didn't find an extra callee-saved register to spill, create
3112 // an emergency spill slot.
3113 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3115 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3116 unsigned Size = TRI->getSpillSize(RC);
3117 Align Alignment = TRI->getSpillAlign(RC);
3118 int FI = MFI.CreateStackObject(Size, Alignment, false);
3120 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3121 << " as the emergency spill slot.\n");
3122 }
3123 }
3124
3125 // Adding the size of additional 64bit GPR saves.
3126 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3127
3128 // A Swift asynchronous context extends the frame record with a pointer
3129 // directly before FP.
3130 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3131 CSStackSize += 8;
3132
3133 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3134 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3135 << EstimatedStackSize + AlignedCSStackSize
3136 << " bytes.\n");
3137
3139 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3140 "Should not invalidate callee saved info");
3141
3142 // Round up to register pair alignment to avoid additional SP adjustment
3143 // instructions.
3144 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3145 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3146 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3147}
3148
3150 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3151 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3152 unsigned &MaxCSFrameIndex) const {
3153 bool NeedsWinCFI = needsWinCFI(MF);
3154 // To match the canonical windows frame layout, reverse the list of
3155 // callee saved registers to get them laid out by PrologEpilogInserter
3156 // in the right order. (PrologEpilogInserter allocates stack objects top
3157 // down. Windows canonical prologs store higher numbered registers at
3158 // the top, thus have the CSI array start from the highest registers.)
3159 if (NeedsWinCFI)
3160 std::reverse(CSI.begin(), CSI.end());
3161
3162 if (CSI.empty())
3163 return true; // Early exit if no callee saved registers are modified!
3164
3165 // Now that we know which registers need to be saved and restored, allocate
3166 // stack slots for them.
3167 MachineFrameInfo &MFI = MF.getFrameInfo();
3168 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3169
3170 bool UsesWinAAPCS = isTargetWindows(MF);
3171 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3172 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3173 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3174 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3175 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3176 }
3177
3178 for (auto &CS : CSI) {
3179 Register Reg = CS.getReg();
3180 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3181
3182 unsigned Size = RegInfo->getSpillSize(*RC);
3183 Align Alignment(RegInfo->getSpillAlign(*RC));
3184 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3185 CS.setFrameIdx(FrameIdx);
3186
3187 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3188 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3189
3190 // Grab 8 bytes below FP for the extended asynchronous frame info.
3191 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3192 Reg == AArch64::FP) {
3193 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3194 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3195 if ((unsigned)FrameIdx < MinCSFrameIndex) MinCSFrameIndex = FrameIdx;
3196 if ((unsigned)FrameIdx > MaxCSFrameIndex) MaxCSFrameIndex = FrameIdx;
3197 }
3198 }
3199 return true;
3200}
3201
3203 const MachineFunction &MF) const {
3205 return AFI->hasCalleeSaveStackFreeSpace();
3206}
3207
3208/// returns true if there are any SVE callee saves.
3210 int &Min, int &Max) {
3211 Min = std::numeric_limits<int>::max();
3212 Max = std::numeric_limits<int>::min();
3213
3214 if (!MFI.isCalleeSavedInfoValid())
3215 return false;
3216
3217 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3218 for (auto &CS : CSI) {
3219 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3220 AArch64::PPRRegClass.contains(CS.getReg())) {
3221 assert((Max == std::numeric_limits<int>::min() ||
3222 Max + 1 == CS.getFrameIdx()) &&
3223 "SVE CalleeSaves are not consecutive");
3224
3225 Min = std::min(Min, CS.getFrameIdx());
3226 Max = std::max(Max, CS.getFrameIdx());
3227 }
3228 }
3229 return Min != std::numeric_limits<int>::max();
3230}
3231
3232// Process all the SVE stack objects and determine offsets for each
3233// object. If AssignOffsets is true, the offsets get assigned.
3234// Fills in the first and last callee-saved frame indices into
3235// Min/MaxCSFrameIndex, respectively.
3236// Returns the size of the stack.
3238 int &MinCSFrameIndex,
3239 int &MaxCSFrameIndex,
3240 bool AssignOffsets) {
3241#ifndef NDEBUG
3242 // First process all fixed stack objects.
3243 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3245 "SVE vectors should never be passed on the stack by value, only by "
3246 "reference.");
3247#endif
3248
3249 auto Assign = [&MFI](int FI, int64_t Offset) {
3250 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3251 MFI.setObjectOffset(FI, Offset);
3252 };
3253
3254 int64_t Offset = 0;
3255
3256 // Then process all callee saved slots.
3257 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3258 // Assign offsets to the callee save slots.
3259 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3260 Offset += MFI.getObjectSize(I);
3262 if (AssignOffsets)
3263 Assign(I, -Offset);
3264 }
3265 }
3266
3267 // Ensure that the Callee-save area is aligned to 16bytes.
3268 Offset = alignTo(Offset, Align(16U));
3269
3270 // Create a buffer of SVE objects to allocate and sort it.
3271 SmallVector<int, 8> ObjectsToAllocate;
3272 // If we have a stack protector, and we've previously decided that we have SVE
3273 // objects on the stack and thus need it to go in the SVE stack area, then it
3274 // needs to go first.
3275 int StackProtectorFI = -1;
3276 if (MFI.hasStackProtectorIndex()) {
3277 StackProtectorFI = MFI.getStackProtectorIndex();
3278 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3279 ObjectsToAllocate.push_back(StackProtectorFI);
3280 }
3281 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3282 unsigned StackID = MFI.getStackID(I);
3283 if (StackID != TargetStackID::ScalableVector)
3284 continue;
3285 if (I == StackProtectorFI)
3286 continue;
3287 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3288 continue;
3289 if (MFI.isDeadObjectIndex(I))
3290 continue;
3291
3292 ObjectsToAllocate.push_back(I);
3293 }
3294
3295 // Allocate all SVE locals and spills
3296 for (unsigned FI : ObjectsToAllocate) {
3297 Align Alignment = MFI.getObjectAlign(FI);
3298 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3299 // two, we'd need to align every object dynamically at runtime if the
3300 // alignment is larger than 16. This is not yet supported.
3301 if (Alignment > Align(16))
3303 "Alignment of scalable vectors > 16 bytes is not yet supported");
3304
3305 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3306 if (AssignOffsets)
3307 Assign(FI, -Offset);
3308 }
3309
3310 return Offset;
3311}
3312
3313int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3314 MachineFrameInfo &MFI) const {
3315 int MinCSFrameIndex, MaxCSFrameIndex;
3316 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3317}
3318
3319int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3320 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3321 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3322 true);
3323}
3324
3326 MachineFunction &MF, RegScavenger *RS) const {
3327 MachineFrameInfo &MFI = MF.getFrameInfo();
3328
3330 "Upwards growing stack unsupported");
3331
3332 int MinCSFrameIndex, MaxCSFrameIndex;
3333 int64_t SVEStackSize =
3334 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3335
3337 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3338 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3339
3340 // If this function isn't doing Win64-style C++ EH, we don't need to do
3341 // anything.
3342 if (!MF.hasEHFunclets())
3343 return;
3345 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3346
3347 MachineBasicBlock &MBB = MF.front();
3348 auto MBBI = MBB.begin();
3349 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3350 ++MBBI;
3351
3352 // Create an UnwindHelp object.
3353 // The UnwindHelp object is allocated at the start of the fixed object area
3354 int64_t FixedObject =
3355 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
3356 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
3357 /*SPOffset*/ -FixedObject,
3358 /*IsImmutable=*/false);
3359 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3360
3361 // We need to store -2 into the UnwindHelp object at the start of the
3362 // function.
3363 DebugLoc DL;
3365 RS->backward(std::prev(MBBI));
3366 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3367 assert(DstReg && "There must be a free register after frame setup");
3368 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3369 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3370 .addReg(DstReg, getKillRegState(true))
3371 .addFrameIndex(UnwindHelpFI)
3372 .addImm(0);
3373}
3374
3375namespace {
3376struct TagStoreInstr {
3378 int64_t Offset, Size;
3379 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3380 : MI(MI), Offset(Offset), Size(Size) {}
3381};
3382
3383class TagStoreEdit {
3384 MachineFunction *MF;
3387 // Tag store instructions that are being replaced.
3389 // Combined memref arguments of the above instructions.
3391
3392 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3393 // FrameRegOffset + Size) with the address tag of SP.
3394 Register FrameReg;
3395 StackOffset FrameRegOffset;
3396 int64_t Size;
3397 // If not None, move FrameReg to (FrameReg + FrameRegUpdate) at the end.
3398 std::optional<int64_t> FrameRegUpdate;
3399 // MIFlags for any FrameReg updating instructions.
3400 unsigned FrameRegUpdateFlags;
3401
3402 // Use zeroing instruction variants.
3403 bool ZeroData;
3404 DebugLoc DL;
3405
3406 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3407 void emitLoop(MachineBasicBlock::iterator InsertI);
3408
3409public:
3410 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3411 : MBB(MBB), ZeroData(ZeroData) {
3412 MF = MBB->getParent();
3413 MRI = &MF->getRegInfo();
3414 }
3415 // Add an instruction to be replaced. Instructions must be added in the
3416 // ascending order of Offset, and have to be adjacent.
3417 void addInstruction(TagStoreInstr I) {
3418 assert((TagStores.empty() ||
3419 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3420 "Non-adjacent tag store instructions.");
3421 TagStores.push_back(I);
3422 }
3423 void clear() { TagStores.clear(); }
3424 // Emit equivalent code at the given location, and erase the current set of
3425 // instructions. May skip if the replacement is not profitable. May invalidate
3426 // the input iterator and replace it with a valid one.
3427 void emitCode(MachineBasicBlock::iterator &InsertI,
3428 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3429};
3430
3431void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3432 const AArch64InstrInfo *TII =
3433 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3434
3435 const int64_t kMinOffset = -256 * 16;
3436 const int64_t kMaxOffset = 255 * 16;
3437
3438 Register BaseReg = FrameReg;
3439 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3440 if (BaseRegOffsetBytes < kMinOffset ||
3441 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3442 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3443 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3444 // is required for the offset of ST2G.
3445 BaseRegOffsetBytes % 16 != 0) {
3446 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3447 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3448 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3449 BaseReg = ScratchReg;
3450 BaseRegOffsetBytes = 0;
3451 }
3452
3453 MachineInstr *LastI = nullptr;
3454 while (Size) {
3455 int64_t InstrSize = (Size > 16) ? 32 : 16;
3456 unsigned Opcode =
3457 InstrSize == 16
3458 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3459 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3460 assert(BaseRegOffsetBytes % 16 == 0);
3461 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3462 .addReg(AArch64::SP)
3463 .addReg(BaseReg)
3464 .addImm(BaseRegOffsetBytes / 16)
3465 .setMemRefs(CombinedMemRefs);
3466 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3467 // final SP adjustment in the epilogue.
3468 if (BaseRegOffsetBytes == 0)
3469 LastI = I;
3470 BaseRegOffsetBytes += InstrSize;
3471 Size -= InstrSize;
3472 }
3473
3474 if (LastI)
3475 MBB->splice(InsertI, MBB, LastI);
3476}
3477
3478void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3479 const AArch64InstrInfo *TII =
3480 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3481
3482 Register BaseReg = FrameRegUpdate
3483 ? FrameReg
3484 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3485 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3486
3487 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3488
3489 int64_t LoopSize = Size;
3490 // If the loop size is not a multiple of 32, split off one 16-byte store at
3491 // the end to fold BaseReg update into.
3492 if (FrameRegUpdate && *FrameRegUpdate)
3493 LoopSize -= LoopSize % 32;
3494 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3495 TII->get(ZeroData ? AArch64::STZGloop_wback
3496 : AArch64::STGloop_wback))
3497 .addDef(SizeReg)
3498 .addDef(BaseReg)
3499 .addImm(LoopSize)
3500 .addReg(BaseReg)
3501 .setMemRefs(CombinedMemRefs);
3502 if (FrameRegUpdate)
3503 LoopI->setFlags(FrameRegUpdateFlags);
3504
3505 int64_t ExtraBaseRegUpdate =
3506 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3507 if (LoopSize < Size) {
3508 assert(FrameRegUpdate);
3509 assert(Size - LoopSize == 16);
3510 // Tag 16 more bytes at BaseReg and update BaseReg.
3511 BuildMI(*MBB, InsertI, DL,
3512 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3513 .addDef(BaseReg)
3514 .addReg(BaseReg)
3515 .addReg(BaseReg)
3516 .addImm(1 + ExtraBaseRegUpdate / 16)
3517 .setMemRefs(CombinedMemRefs)
3518 .setMIFlags(FrameRegUpdateFlags);
3519 } else if (ExtraBaseRegUpdate) {
3520 // Update BaseReg.
3521 BuildMI(
3522 *MBB, InsertI, DL,
3523 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3524 .addDef(BaseReg)
3525 .addReg(BaseReg)
3526 .addImm(std::abs(ExtraBaseRegUpdate))
3527 .addImm(0)
3528 .setMIFlags(FrameRegUpdateFlags);
3529 }
3530}
3531
3532// Check if *II is a register update that can be merged into STGloop that ends
3533// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3534// end of the loop.
3535bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3536 int64_t Size, int64_t *TotalOffset) {
3537 MachineInstr &MI = *II;
3538 if ((MI.getOpcode() == AArch64::ADDXri ||
3539 MI.getOpcode() == AArch64::SUBXri) &&
3540 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3541 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3542 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3543 if (MI.getOpcode() == AArch64::SUBXri)
3544 Offset = -Offset;
3545 int64_t AbsPostOffset = std::abs(Offset - Size);
3546 const int64_t kMaxOffset =
3547 0xFFF; // Max encoding for unshifted ADDXri / SUBXri
3548 if (AbsPostOffset <= kMaxOffset && AbsPostOffset % 16 == 0) {
3549 *TotalOffset = Offset;
3550 return true;
3551 }
3552 }
3553 return false;
3554}
3555
3556void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3558 MemRefs.clear();
3559 for (auto &TS : TSE) {
3560 MachineInstr *MI = TS.MI;
3561 // An instruction without memory operands may access anything. Be
3562 // conservative and return an empty list.
3563 if (MI->memoperands_empty()) {
3564 MemRefs.clear();
3565 return;
3566 }
3567 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3568 }
3569}
3570
3571void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3572 const AArch64FrameLowering *TFI,
3573 bool TryMergeSPUpdate) {
3574 if (TagStores.empty())
3575 return;
3576 TagStoreInstr &FirstTagStore = TagStores[0];
3577 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3578 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3579 DL = TagStores[0].MI->getDebugLoc();
3580
3581 Register Reg;
3582 FrameRegOffset = TFI->resolveFrameOffsetReference(
3583 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3584 /*PreferFP=*/false, /*ForSimm=*/true);
3585 FrameReg = Reg;
3586 FrameRegUpdate = std::nullopt;
3587
3588 mergeMemRefs(TagStores, CombinedMemRefs);
3589
3590 LLVM_DEBUG(dbgs() << "Replacing adjacent STG instructions:\n";
3591 for (const auto &Instr
3592 : TagStores) { dbgs() << " " << *Instr.MI; });
3593
3594 // Size threshold where a loop becomes shorter than a linear sequence of
3595 // tagging instructions.
3596 const int kSetTagLoopThreshold = 176;
3597 if (Size < kSetTagLoopThreshold) {
3598 if (TagStores.size() < 2)
3599 return;
3600 emitUnrolled(InsertI);
3601 } else {
3602 MachineInstr *UpdateInstr = nullptr;
3603 int64_t TotalOffset = 0;
3604 if (TryMergeSPUpdate) {
3605 // See if we can merge base register update into the STGloop.
3606 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3607 // but STGloop is way too unusual for that, and also it only
3608 // realistically happens in function epilogue. Also, STGloop is expanded
3609 // before that pass.
3610 if (InsertI != MBB->end() &&
3611 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3612 &TotalOffset)) {
3613 UpdateInstr = &*InsertI++;
3614 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3615 << *UpdateInstr);
3616 }
3617 }
3618
3619 if (!UpdateInstr && TagStores.size() < 2)
3620 return;
3621
3622 if (UpdateInstr) {
3623 FrameRegUpdate = TotalOffset;
3624 FrameRegUpdateFlags = UpdateInstr->getFlags();
3625 }
3626 emitLoop(InsertI);
3627 if (UpdateInstr)
3628 UpdateInstr->eraseFromParent();
3629 }
3630
3631 for (auto &TS : TagStores)
3632 TS.MI->eraseFromParent();
3633}
3634
3635bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3636 int64_t &Size, bool &ZeroData) {
3637 MachineFunction &MF = *MI.getParent()->getParent();
3638 const MachineFrameInfo &MFI = MF.getFrameInfo();
3639
3640 unsigned Opcode = MI.getOpcode();
3641 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3642 Opcode == AArch64::STZ2Gi);
3643
3644 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3645 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3646 return false;
3647 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3648 return false;
3649 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3650 Size = MI.getOperand(2).getImm();
3651 return true;
3652 }
3653
3654 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3655 Size = 16;
3656 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3657 Size = 32;
3658 else
3659 return false;
3660
3661 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3662 return false;
3663
3664 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3665 16 * MI.getOperand(2).getImm();
3666 return true;
3667}
3668
3669// Detect a run of memory tagging instructions for adjacent stack frame slots,
3670// and replace them with a shorter instruction sequence:
3671// * replace STG + STG with ST2G
3672// * replace STGloop + STGloop with STGloop
3673// This code needs to run when stack slot offsets are already known, but before
3674// FrameIndex operands in STG instructions are eliminated.
3676 const AArch64FrameLowering *TFI,
3677 RegScavenger *RS) {
3678 bool FirstZeroData;
3679 int64_t Size, Offset;
3680 MachineInstr &MI = *II;
3681 MachineBasicBlock *MBB = MI.getParent();
3682 MachineBasicBlock::iterator NextI = ++II;
3683 if (&MI == &MBB->instr_back())
3684 return II;
3685 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3686 return II;
3687
3689 Instrs.emplace_back(&MI, Offset, Size);
3690
3691 constexpr int kScanLimit = 10;
3692 int Count = 0;
3694 NextI != E && Count < kScanLimit; ++NextI) {
3695 MachineInstr &MI = *NextI;
3696 bool ZeroData;
3697 int64_t Size, Offset;
3698 // Collect instructions that update memory tags with a FrameIndex operand
3699 // and (when applicable) constant size, and whose output registers are dead
3700 // (the latter is almost always the case in practice). Since these
3701 // instructions effectively have no inputs or outputs, we are free to skip
3702 // any non-aliasing instructions in between without tracking used registers.
3703 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3704 if (ZeroData != FirstZeroData)
3705 break;
3706 Instrs.emplace_back(&MI, Offset, Size);
3707 continue;
3708 }
3709
3710 // Only count non-transient, non-tagging instructions toward the scan
3711 // limit.
3712 if (!MI.isTransient())
3713 ++Count;
3714
3715 // Just in case, stop before the epilogue code starts.
3716 if (MI.getFlag(MachineInstr::FrameSetup) ||
3718 break;
3719
3720 // Reject anything that may alias the collected instructions.
3721 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects())
3722 break;
3723 }
3724
3725 // New code will be inserted after the last tagging instruction we've found.
3726 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3727 InsertI++;
3728
3729 llvm::stable_sort(Instrs,
3730 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3731 return Left.Offset < Right.Offset;
3732 });
3733
3734 // Make sure that we don't have any overlapping stores.
3735 int64_t CurOffset = Instrs[0].Offset;
3736 for (auto &Instr : Instrs) {
3737 if (CurOffset > Instr.Offset)
3738 return NextI;
3739 CurOffset = Instr.Offset + Instr.Size;
3740 }
3741
3742 // Find contiguous runs of tagged memory and emit shorter instruction
3743 // sequencies for them when possible.
3744 TagStoreEdit TSE(MBB, FirstZeroData);
3745 std::optional<int64_t> EndOffset;
3746 for (auto &Instr : Instrs) {
3747 if (EndOffset && *EndOffset != Instr.Offset) {
3748 // Found a gap.
3749 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3750 TSE.clear();
3751 }
3752
3753 TSE.addInstruction(Instr);
3754 EndOffset = Instr.Offset + Instr.Size;
3755 }
3756
3757 const MachineFunction *MF = MBB->getParent();
3758 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3759 TSE.emitCode(
3760 InsertI, TFI, /*TryMergeSPUpdate = */
3762
3763 return InsertI;
3764}
3765} // namespace
3766
3768 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3770 for (auto &BB : MF)
3771 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();)
3772 II = tryMergeAdjacentSTG(II, this, RS);
3773}
3774
3775/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3776/// before the update. This is easily retrieved as it is exactly the offset
3777/// that is set in processFunctionBeforeFrameFinalized.
3779 const MachineFunction &MF, int FI, Register &FrameReg,
3780 bool IgnoreSPUpdates) const {
3781 const MachineFrameInfo &MFI = MF.getFrameInfo();
3782 if (IgnoreSPUpdates) {
3783 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3784 << MFI.getObjectOffset(FI) << "\n");
3785 FrameReg = AArch64::SP;
3786 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3787 }
3788
3789 // Go to common code if we cannot provide sp + offset.
3790 if (MFI.hasVarSizedObjects() ||
3793 return getFrameIndexReference(MF, FI, FrameReg);
3794
3795 FrameReg = AArch64::SP;
3796 return getStackOffset(MF, MFI.getObjectOffset(FI));
3797}
3798
3799/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3800/// the parent's frame pointer
3802 const MachineFunction &MF) const {
3803 return 0;
3804}
3805
3806/// Funclets only need to account for space for the callee saved registers,
3807/// as the locals are accounted for in the parent's stack frame.
3809 const MachineFunction &MF) const {
3810 // This is the size of the pushed CSRs.
3811 unsigned CSSize =
3812 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3813 // This is the amount of stack a funclet needs to allocate.
3814 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3815 getStackAlign());
3816}
3817
3818namespace {
3819struct FrameObject {
3820 bool IsValid = false;
3821 // Index of the object in MFI.
3822 int ObjectIndex = 0;
3823 // Group ID this object belongs to.
3824 int GroupIndex = -1;
3825 // This object should be placed first (closest to SP).
3826 bool ObjectFirst = false;
3827 // This object's group (which always contains the object with
3828 // ObjectFirst==true) should be placed first.
3829 bool GroupFirst = false;
3830};
3831
3832class GroupBuilder {
3833 SmallVector<int, 8> CurrentMembers;
3834 int NextGroupIndex = 0;
3835 std::vector<FrameObject> &Objects;
3836
3837public:
3838 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3839 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3840 void EndCurrentGroup() {
3841 if (CurrentMembers.size() > 1) {
3842 // Create a new group with the current member list. This might remove them
3843 // from their pre-existing groups. That's OK, dealing with overlapping
3844 // groups is too hard and unlikely to make a difference.
3845 LLVM_DEBUG(dbgs() << "group:");
3846 for (int Index : CurrentMembers) {
3847 Objects[Index].GroupIndex = NextGroupIndex;
3848 LLVM_DEBUG(dbgs() << " " << Index);
3849 }
3850 LLVM_DEBUG(dbgs() << "\n");
3851 NextGroupIndex++;
3852 }
3853 CurrentMembers.clear();
3854 }
3855};
3856
3857bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3858 // Objects at a lower index are closer to FP; objects at a higher index are
3859 // closer to SP.
3860 //
3861 // For consistency in our comparison, all invalid objects are placed
3862 // at the end. This also allows us to stop walking when we hit the
3863 // first invalid item after it's all sorted.
3864 //
3865 // The "first" object goes first (closest to SP), followed by the members of
3866 // the "first" group.
3867 //
3868 // The rest are sorted by the group index to keep the groups together.
3869 // Higher numbered groups are more likely to be around longer (i.e. untagged
3870 // in the function epilogue and not at some earlier point). Place them closer
3871 // to SP.
3872 //
3873 // If all else equal, sort by the object index to keep the objects in the
3874 // original order.
3875 return std::make_tuple(!A.IsValid, A.ObjectFirst, A.GroupFirst, A.GroupIndex,
3876 A.ObjectIndex) <
3877 std::make_tuple(!B.IsValid, B.ObjectFirst, B.GroupFirst, B.GroupIndex,
3878 B.ObjectIndex);
3879}
3880} // namespace
3881
3883 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3884 if (!OrderFrameObjects || ObjectsToAllocate.empty())
3885 return;
3886
3887 const MachineFrameInfo &MFI = MF.getFrameInfo();
3888 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3889 for (auto &Obj : ObjectsToAllocate) {
3890 FrameObjects[Obj].IsValid = true;
3891 FrameObjects[Obj].ObjectIndex = Obj;
3892 }
3893
3894 // Identify stack slots that are tagged at the same time.
3895 GroupBuilder GB(FrameObjects);
3896 for (auto &MBB : MF) {
3897 for (auto &MI : MBB) {
3898 if (MI.isDebugInstr())
3899 continue;
3900 int OpIndex;
3901 switch (MI.getOpcode()) {
3902 case AArch64::STGloop:
3903 case AArch64::STZGloop:
3904 OpIndex = 3;
3905 break;
3906 case AArch64::STGi:
3907 case AArch64::STZGi:
3908 case AArch64::ST2Gi:
3909 case AArch64::STZ2Gi:
3910 OpIndex = 1;
3911 break;
3912 default:
3913 OpIndex = -1;
3914 }
3915
3916 int TaggedFI = -1;
3917 if (OpIndex >= 0) {
3918 const MachineOperand &MO = MI.getOperand(OpIndex);
3919 if (MO.isFI()) {
3920 int FI = MO.getIndex();
3921 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3922 FrameObjects[FI].IsValid)
3923 TaggedFI = FI;
3924 }
3925 }
3926
3927 // If this is a stack tagging instruction for a slot that is not part of a
3928 // group yet, either start a new group or add it to the current one.
3929 if (TaggedFI >= 0)
3930 GB.AddMember(TaggedFI);
3931 else
3932 GB.EndCurrentGroup();
3933 }
3934 // Groups should never span multiple basic blocks.
3935 GB.EndCurrentGroup();
3936 }
3937
3938 // If the function's tagged base pointer is pinned to a stack slot, we want to
3939 // put that slot first when possible. This will likely place it at SP + 0,
3940 // and save one instruction when generating the base pointer because IRG does
3941 // not allow an immediate offset.
3943 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3944 if (TBPI) {
3945 FrameObjects[*TBPI].ObjectFirst = true;
3946 FrameObjects[*TBPI].GroupFirst = true;
3947 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3948 if (FirstGroupIndex >= 0)
3949 for (FrameObject &Object : FrameObjects)
3950 if (Object.GroupIndex == FirstGroupIndex)
3951 Object.GroupFirst = true;
3952 }
3953
3954 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3955
3956 int i = 0;
3957 for (auto &Obj : FrameObjects) {
3958 // All invalid items are sorted at the end, so it's safe to stop.
3959 if (!Obj.IsValid)
3960 break;
3961 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3962 }
3963
3964 LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
3965 : FrameObjects) {
3966 if (!Obj.IsValid)
3967 break;
3968 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3969 if (Obj.ObjectFirst)
3970 dbgs() << ", first";
3971 if (Obj.GroupFirst)
3972 dbgs() << ", group-first";
3973 dbgs() << "\n";
3974 });
3975}
unsigned const MachineRegisterInfo * MRI
#define Success
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static void InsertReturnAddressAuth(MachineFunction &MF, MachineBasicBlock &MBB, bool NeedsWinCFI, bool *HasWinCFI)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool needsWinCFI(const MachineFunction &MF)
static cl::opt< bool > ReverseCSRRestoreSeq("reverse-csr-restore-seq", cl::desc("reverse the CSR restore sequence"), cl::init(false), cl::Hidden)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static unsigned findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
static bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF)
unsigned RegSize
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static const int kSetTagLoopThreshold
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:149
#define CASE(S, T)
#define LLVM_DEBUG(X)
Definition: Debug.h:101
uint64_t Size
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
if(VerifyEach)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFP(const MachineFunction &MF) const override
hasFP - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isCallingConvWin64(CallingConv::ID CC) const
const char * getChkStkName() const
bool isXRegisterReserved(size_t i) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:163
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:158
bool test(unsigned Idx) const
Definition: BitVector.h:461
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:649
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:646
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:237
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:313
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:644
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc) const override
Emit instructions to copy a pair of physical registers.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:50
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:81
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:797
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int Offset)
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:547
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register)
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:604
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register)
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:616
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int Offset)
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:533
static MCCFIInstruction createNegateRAState(MCSymbol *L)
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:597
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:632
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int Offset)
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:571
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:318
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:24
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any DBG_VALUE and DBG_LABEL instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
uint8_t getStackID(int ObjectIdx) const
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, uint64_t s, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
const LLVMTargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
MachineModuleInfo & getMMI() const
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & copyImplicitOps(const MachineInstr &OtherMI) const
Copy all the implicit operands from OtherMI onto this one.
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:68
void setFlags(unsigned flags)
Definition: MachineInstr.h:366
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint16_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:352
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
This class contains meta information specific to a module.
const MCContext & getContext() const
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:305
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:577
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:36
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:52
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:55
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:47
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:46
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:45
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:50
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
static constexpr TypeSize Fixed(ScalarTy ExactSize)
Definition: TypeSize.h:331
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45