LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// | <hazard padding> |
60// |-----------------------------------|
61// | |
62// | callee-saved fp/simd/SVE regs |
63// | |
64// |-----------------------------------|
65// | |
66// | SVE stack objects |
67// | |
68// |-----------------------------------|
69// |.empty.space.to.make.part.below....|
70// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
71// |.the.standard.16-byte.alignment....| compile time; if present)
72// |-----------------------------------|
73// | local variables of fixed size |
74// | including spill slots |
75// | <FPR> |
76// | <hazard padding> |
77// | <GPR> |
78// |-----------------------------------| <- bp(not defined by ABI,
79// |.variable-sized.local.variables....| LLVM chooses X19)
80// |.(VLAs)............................| (size of this area is unknown at
81// |...................................| compile time)
82// |-----------------------------------| <- sp
83// | | Lower address
84//
85//
86// To access the data in a frame, at-compile time, a constant offset must be
87// computable from one of the pointers (fp, bp, sp) to access it. The size
88// of the areas with a dotted background cannot be computed at compile-time
89// if they are present, making it required to have all three of fp, bp and
90// sp to be set up to be able to access all contents in the frame areas,
91// assuming all of the frame areas are non-empty.
92//
93// For most functions, some of the frame areas are empty. For those functions,
94// it may not be necessary to set up fp or bp:
95// * A base pointer is definitely needed when there are both VLAs and local
96// variables with more-than-default alignment requirements.
97// * A frame pointer is definitely needed when there are local variables with
98// more-than-default alignment requirements.
99//
100// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
101// callee-saved area, since the unwind encoding does not allow for encoding
102// this dynamically and existing tools depend on this layout. For other
103// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
104// area to allow SVE stack objects (allocated directly below the callee-saves,
105// if available) to be accessed directly from the framepointer.
106// The SVE spill/fill instructions have VL-scaled addressing modes such
107// as:
108// ldr z8, [fp, #-7 mul vl]
109// For SVE the size of the vector length (VL) is not known at compile-time, so
110// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
111// layout, we don't need to add an unscaled offset to the framepointer before
112// accessing the SVE object in the frame.
113//
114// In some cases when a base pointer is not strictly needed, it is generated
115// anyway when offsets from the frame pointer to access local variables become
116// so large that the offset can't be encoded in the immediate fields of loads
117// or stores.
118//
119// Outgoing function arguments must be at the bottom of the stack frame when
120// calling another function. If we do not have variable-sized stack objects, we
121// can allocate a "reserved call frame" area at the bottom of the local
122// variable area, large enough for all outgoing calls. If we do have VLAs, then
123// the stack pointer must be decremented and incremented around each call to
124// make space for the arguments below the VLAs.
125//
126// FIXME: also explain the redzone concept.
127//
128// About stack hazards: Under some SME contexts, a coprocessor with its own
129// separate cache can used for FP operations. This can create hazards if the CPU
130// and the SME unit try to access the same area of memory, including if the
131// access is to an area of the stack. To try to alleviate this we attempt to
132// introduce extra padding into the stack frame between FP and GPR accesses,
133// controlled by the aarch64-stack-hazard-size option. Without changing the
134// layout of the stack frame in the diagram above, a stack object of size
135// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
136// to the stack objects section, and stack objects are sorted so that FPR >
137// Hazard padding slot > GPRs (where possible). Unfortunately some things are
138// not handled well (VLA area, arguments on the stack, objects with both GPR and
139// FPR accesses), but if those are controlled by the user then the entire stack
140// frame becomes GPR at the start/end with FPR in the middle, surrounded by
141// Hazard padding.
142//
143// An example of the prologue:
144//
145// .globl __foo
146// .align 2
147// __foo:
148// Ltmp0:
149// .cfi_startproc
150// .cfi_personality 155, ___gxx_personality_v0
151// Leh_func_begin:
152// .cfi_lsda 16, Lexception33
153//
154// stp xa,bx, [sp, -#offset]!
155// ...
156// stp x28, x27, [sp, #offset-32]
157// stp fp, lr, [sp, #offset-16]
158// add fp, sp, #offset - 16
159// sub sp, sp, #1360
160//
161// The Stack:
162// +-------------------------------------------+
163// 10000 | ........ | ........ | ........ | ........ |
164// 10004 | ........ | ........ | ........ | ........ |
165// +-------------------------------------------+
166// 10008 | ........ | ........ | ........ | ........ |
167// 1000c | ........ | ........ | ........ | ........ |
168// +===========================================+
169// 10010 | X28 Register |
170// 10014 | X28 Register |
171// +-------------------------------------------+
172// 10018 | X27 Register |
173// 1001c | X27 Register |
174// +===========================================+
175// 10020 | Frame Pointer |
176// 10024 | Frame Pointer |
177// +-------------------------------------------+
178// 10028 | Link Register |
179// 1002c | Link Register |
180// +===========================================+
181// 10030 | ........ | ........ | ........ | ........ |
182// 10034 | ........ | ........ | ........ | ........ |
183// +-------------------------------------------+
184// 10038 | ........ | ........ | ........ | ........ |
185// 1003c | ........ | ........ | ........ | ........ |
186// +-------------------------------------------+
187//
188// [sp] = 10030 :: >>initial value<<
189// sp = 10020 :: stp fp, lr, [sp, #-16]!
190// fp = sp == 10020 :: mov fp, sp
191// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
192// sp == 10010 :: >>final value<<
193//
194// The frame pointer (w29) points to address 10020. If we use an offset of
195// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
196// for w27, and -32 for w28:
197//
198// Ltmp1:
199// .cfi_def_cfa w29, 16
200// Ltmp2:
201// .cfi_offset w30, -8
202// Ltmp3:
203// .cfi_offset w29, -16
204// Ltmp4:
205// .cfi_offset w27, -24
206// Ltmp5:
207// .cfi_offset w28, -32
208//
209//===----------------------------------------------------------------------===//
210
211#include "AArch64FrameLowering.h"
212#include "AArch64InstrInfo.h"
215#include "AArch64RegisterInfo.h"
216#include "AArch64Subtarget.h"
220#include "llvm/ADT/ScopeExit.h"
221#include "llvm/ADT/SmallVector.h"
239#include "llvm/IR/Attributes.h"
240#include "llvm/IR/CallingConv.h"
241#include "llvm/IR/DataLayout.h"
242#include "llvm/IR/DebugLoc.h"
243#include "llvm/IR/Function.h"
244#include "llvm/MC/MCAsmInfo.h"
245#include "llvm/MC/MCDwarf.h"
247#include "llvm/Support/Debug.h"
254#include <cassert>
255#include <cstdint>
256#include <iterator>
257#include <optional>
258#include <vector>
259
260using namespace llvm;
261
262#define DEBUG_TYPE "frame-info"
263
264static cl::opt<bool> EnableRedZone("aarch64-redzone",
265 cl::desc("enable use of redzone on AArch64"),
266 cl::init(false), cl::Hidden);
267
269 "stack-tagging-merge-settag",
270 cl::desc("merge settag instruction in function epilog"), cl::init(true),
271 cl::Hidden);
272
273static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
274 cl::desc("sort stack allocations"),
275 cl::init(true), cl::Hidden);
276
278 "homogeneous-prolog-epilog", cl::Hidden,
279 cl::desc("Emit homogeneous prologue and epilogue for the size "
280 "optimization (default = off)"));
281
282// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
284 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
285 cl::Hidden);
286// Whether to insert padding into non-streaming functions (for testing).
287static cl::opt<bool>
288 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
289 cl::init(false), cl::Hidden);
290
292 "aarch64-disable-multivector-spill-fill",
293 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
294 cl::Hidden);
295
296int64_t
297AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
298 MachineBasicBlock &MBB) const {
299 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
301 bool IsTailCallReturn = (MBB.end() != MBBI)
303 : false;
304
305 int64_t ArgumentPopSize = 0;
306 if (IsTailCallReturn) {
307 MachineOperand &StackAdjust = MBBI->getOperand(1);
308
309 // For a tail-call in a callee-pops-arguments environment, some or all of
310 // the stack may actually be in use for the call's arguments, this is
311 // calculated during LowerCall and consumed here...
312 ArgumentPopSize = StackAdjust.getImm();
313 } else {
314 // ... otherwise the amount to pop is *all* of the argument space,
315 // conveniently stored in the MachineFunctionInfo by
316 // LowerFormalArguments. This will, of course, be zero for the C calling
317 // convention.
318 ArgumentPopSize = AFI->getArgumentStackToRestore();
319 }
320
321 return ArgumentPopSize;
322}
323
325 MachineFunction &MF);
326
327// Conservatively, returns true if the function is likely to have an SVE vectors
328// on the stack. This function is safe to be called before callee-saves or
329// object offsets have been determined.
331 const MachineFunction &MF) {
332 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
333 if (AFI->isSVECC())
334 return true;
335
336 if (AFI->hasCalculatedStackSizeSVE())
337 return bool(AFL.getSVEStackSize(MF));
338
339 const MachineFrameInfo &MFI = MF.getFrameInfo();
340 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
342 return true;
343 }
344
345 return false;
346}
347
348/// Returns true if a homogeneous prolog or epilog code can be emitted
349/// for the size optimization. If possible, a frame helper call is injected.
350/// When Exit block is given, this check is for epilog.
351bool AArch64FrameLowering::homogeneousPrologEpilog(
352 MachineFunction &MF, MachineBasicBlock *Exit) const {
353 if (!MF.getFunction().hasMinSize())
354 return false;
356 return false;
357 if (EnableRedZone)
358 return false;
359
360 // TODO: Window is supported yet.
361 if (needsWinCFI(MF))
362 return false;
363
364 // TODO: SVE is not supported yet.
365 if (isLikelyToHaveSVEStack(*this, MF))
366 return false;
367
368 // Bail on stack adjustment needed on return for simplicity.
369 const MachineFrameInfo &MFI = MF.getFrameInfo();
370 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
371 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
372 return false;
373 if (Exit && getArgumentStackToRestore(MF, *Exit))
374 return false;
375
376 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
378 return false;
379
380 // If there are an odd number of GPRs before LR and FP in the CSRs list,
381 // they will not be paired into one RegPairInfo, which is incompatible with
382 // the assumption made by the homogeneous prolog epilog pass.
383 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
384 unsigned NumGPRs = 0;
385 for (unsigned I = 0; CSRegs[I]; ++I) {
386 Register Reg = CSRegs[I];
387 if (Reg == AArch64::LR) {
388 assert(CSRegs[I + 1] == AArch64::FP);
389 if (NumGPRs % 2 != 0)
390 return false;
391 break;
392 }
393 if (AArch64::GPR64RegClass.contains(Reg))
394 ++NumGPRs;
395 }
396
397 return true;
398}
399
400/// Returns true if CSRs should be paired.
401bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
402 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
403}
404
405/// This is the biggest offset to the stack pointer we can encode in aarch64
406/// instructions (without using a separate calculation and a temp register).
407/// Note that the exception here are vector stores/loads which cannot encode any
408/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
409static const unsigned DefaultSafeSPDisplacement = 255;
410
411/// Look at each instruction that references stack frames and return the stack
412/// size limit beyond which some of these instructions will require a scratch
413/// register during their expansion later.
415 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
416 // range. We'll end up allocating an unnecessary spill slot a lot, but
417 // realistically that's not a big deal at this stage of the game.
418 for (MachineBasicBlock &MBB : MF) {
419 for (MachineInstr &MI : MBB) {
420 if (MI.isDebugInstr() || MI.isPseudo() ||
421 MI.getOpcode() == AArch64::ADDXri ||
422 MI.getOpcode() == AArch64::ADDSXri)
423 continue;
424
425 for (const MachineOperand &MO : MI.operands()) {
426 if (!MO.isFI())
427 continue;
428
430 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
432 return 0;
433 }
434 }
435 }
437}
438
443
444unsigned
445AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
446 const AArch64FunctionInfo *AFI,
447 bool IsWin64, bool IsFunclet) const {
448 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
449 "Tail call reserved stack must be aligned to 16 bytes");
450 if (!IsWin64 || IsFunclet) {
451 return AFI->getTailCallReservedStack();
452 } else {
453 if (AFI->getTailCallReservedStack() != 0 &&
454 !MF.getFunction().getAttributes().hasAttrSomewhere(
455 Attribute::SwiftAsync))
456 report_fatal_error("cannot generate ABI-changing tail call for Win64");
457 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
458
459 // Var args are stored here in the primary function.
460 FixedObjectSize += AFI->getVarArgsGPRSize();
461
462 if (MF.hasEHFunclets()) {
463 // Catch objects are stored here in the primary function.
464 const MachineFrameInfo &MFI = MF.getFrameInfo();
465 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
466 SmallSetVector<int, 8> CatchObjFrameIndices;
467 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
468 for (const WinEHHandlerType &H : TBME.HandlerArray) {
469 int FrameIndex = H.CatchObj.FrameIndex;
470 if ((FrameIndex != INT_MAX) &&
471 CatchObjFrameIndices.insert(FrameIndex)) {
472 FixedObjectSize = alignTo(FixedObjectSize,
473 MFI.getObjectAlign(FrameIndex).value()) +
474 MFI.getObjectSize(FrameIndex);
475 }
476 }
477 }
478 // To support EH funclets we allocate an UnwindHelp object
479 FixedObjectSize += 8;
480 }
481 return alignTo(FixedObjectSize, 16);
482 }
483}
484
485/// Returns the size of the entire SVE stackframe (calleesaves + spills).
491
493 if (!EnableRedZone)
494 return false;
495
496 // Don't use the red zone if the function explicitly asks us not to.
497 // This is typically used for kernel code.
498 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
499 const unsigned RedZoneSize =
501 if (!RedZoneSize)
502 return false;
503
504 const MachineFrameInfo &MFI = MF.getFrameInfo();
506 uint64_t NumBytes = AFI->getLocalStackSize();
507
508 // If neither NEON or SVE are available, a COPY from one Q-reg to
509 // another requires a spill -> reload sequence. We can do that
510 // using a pre-decrementing store/post-decrementing load, but
511 // if we do so, we can't use the Red Zone.
512 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
513 !Subtarget.isNeonAvailable() &&
514 !Subtarget.hasSVE();
515
516 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
517 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
518}
519
520/// hasFPImpl - Return true if the specified function should have a dedicated
521/// frame pointer register.
523 const MachineFrameInfo &MFI = MF.getFrameInfo();
524 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
526
527 // Win64 EH requires a frame pointer if funclets are present, as the locals
528 // are accessed off the frame pointer in both the parent function and the
529 // funclets.
530 if (MF.hasEHFunclets())
531 return true;
532 // Retain behavior of always omitting the FP for leaf functions when possible.
534 return true;
535 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
536 MFI.hasStackMap() || MFI.hasPatchPoint() ||
537 RegInfo->hasStackRealignment(MF))
538 return true;
539
540 // If we:
541 //
542 // 1. Have streaming mode changes
543 // OR:
544 // 2. Have a streaming body with SVE stack objects
545 //
546 // Then the value of VG restored when unwinding to this function may not match
547 // the value of VG used to set up the stack.
548 //
549 // This is a problem as the CFA can be described with an expression of the
550 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
551 //
552 // If the value of VG used in that expression does not match the value used to
553 // set up the stack, an incorrect address for the CFA will be computed, and
554 // unwinding will fail.
555 //
556 // We work around this issue by ensuring the frame-pointer can describe the
557 // CFA in either of these cases.
558 if (AFI.needsDwarfUnwindInfo(MF) &&
560 (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
561 return true;
562 // With large callframes around we may need to use FP to access the scavenging
563 // emergency spillslot.
564 //
565 // Unfortunately some calls to hasFP() like machine verifier ->
566 // getReservedReg() -> hasFP in the middle of global isel are too early
567 // to know the max call frame size. Hopefully conservatively returning "true"
568 // in those cases is fine.
569 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
570 if (!MFI.isMaxCallFrameSizeComputed() ||
572 return true;
573
574 return false;
575}
576
577/// Should the Frame Pointer be reserved for the current function?
579 const TargetMachine &TM = MF.getTarget();
580 const Triple &TT = TM.getTargetTriple();
581
582 // These OSes require the frame chain is valid, even if the current frame does
583 // not use a frame pointer.
584 if (TT.isOSDarwin() || TT.isOSWindows())
585 return true;
586
587 // If the function has a frame pointer, it is reserved.
588 if (hasFP(MF))
589 return true;
590
591 // Frontend has requested to preserve the frame pointer.
592 if (TM.Options.FramePointerIsReserved(MF))
593 return true;
594
595 return false;
596}
597
598/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
599/// not required, we reserve argument space for call sites in the function
600/// immediately on entry to the current function. This eliminates the need for
601/// add/sub sp brackets around call sites. Returns true if the call frame is
602/// included as part of the stack frame.
604 const MachineFunction &MF) const {
605 // The stack probing code for the dynamically allocated outgoing arguments
606 // area assumes that the stack is probed at the top - either by the prologue
607 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
608 // most recent variable-sized object allocation. Changing the condition here
609 // may need to be followed up by changes to the probe issuing logic.
610 return !MF.getFrameInfo().hasVarSizedObjects();
611}
612
616 const AArch64InstrInfo *TII =
617 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
618 const AArch64TargetLowering *TLI =
619 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
620 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
621 DebugLoc DL = I->getDebugLoc();
622 unsigned Opc = I->getOpcode();
623 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
624 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
625
626 if (!hasReservedCallFrame(MF)) {
627 int64_t Amount = I->getOperand(0).getImm();
628 Amount = alignTo(Amount, getStackAlign());
629 if (!IsDestroy)
630 Amount = -Amount;
631
632 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
633 // doesn't have to pop anything), then the first operand will be zero too so
634 // this adjustment is a no-op.
635 if (CalleePopAmount == 0) {
636 // FIXME: in-function stack adjustment for calls is limited to 24-bits
637 // because there's no guaranteed temporary register available.
638 //
639 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
640 // 1) For offset <= 12-bit, we use LSL #0
641 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
642 // LSL #0, and the other uses LSL #12.
643 //
644 // Most call frames will be allocated at the start of a function so
645 // this is OK, but it is a limitation that needs dealing with.
646 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
647
648 if (TLI->hasInlineStackProbe(MF) &&
650 // When stack probing is enabled, the decrement of SP may need to be
651 // probed. We only need to do this if the call site needs 1024 bytes of
652 // space or more, because a region smaller than that is allowed to be
653 // unprobed at an ABI boundary. We rely on the fact that SP has been
654 // probed exactly at this point, either by the prologue or most recent
655 // dynamic allocation.
657 "non-reserved call frame without var sized objects?");
658 Register ScratchReg =
659 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
660 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
661 } else {
662 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
663 StackOffset::getFixed(Amount), TII);
664 }
665 }
666 } else if (CalleePopAmount != 0) {
667 // If the calling convention demands that the callee pops arguments from the
668 // stack, we want to add it back if we have a reserved call frame.
669 assert(CalleePopAmount < 0xffffff && "call frame too large");
670 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
671 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
672 }
673 return MBB.erase(I);
674}
675
677 MachineBasicBlock &MBB) const {
678
679 MachineFunction &MF = *MBB.getParent();
680 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
681 const auto &TRI = *Subtarget.getRegisterInfo();
682 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
683
684 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
685
686 // Reset the CFA to `SP + 0`.
687 CFIBuilder.buildDefCFA(AArch64::SP, 0);
688
689 // Flip the RA sign state.
690 if (MFI.shouldSignReturnAddress(MF))
691 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
692 : CFIBuilder.buildNegateRAState();
693
694 // Shadow call stack uses X18, reset it.
695 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
696 CFIBuilder.buildSameValue(AArch64::X18);
697
698 // Emit .cfi_same_value for callee-saved registers.
699 const std::vector<CalleeSavedInfo> &CSI =
701 for (const auto &Info : CSI) {
702 MCRegister Reg = Info.getReg();
703 if (!TRI.regNeedsCFI(Reg, Reg))
704 continue;
705 CFIBuilder.buildSameValue(Reg);
706 }
707}
708
710 switch (Reg.id()) {
711 default:
712 // The called routine is expected to preserve r19-r28
713 // r29 and r30 are used as frame pointer and link register resp.
714 return 0;
715
716 // GPRs
717#define CASE(n) \
718 case AArch64::W##n: \
719 case AArch64::X##n: \
720 return AArch64::X##n
721 CASE(0);
722 CASE(1);
723 CASE(2);
724 CASE(3);
725 CASE(4);
726 CASE(5);
727 CASE(6);
728 CASE(7);
729 CASE(8);
730 CASE(9);
731 CASE(10);
732 CASE(11);
733 CASE(12);
734 CASE(13);
735 CASE(14);
736 CASE(15);
737 CASE(16);
738 CASE(17);
739 CASE(18);
740#undef CASE
741
742 // FPRs
743#define CASE(n) \
744 case AArch64::B##n: \
745 case AArch64::H##n: \
746 case AArch64::S##n: \
747 case AArch64::D##n: \
748 case AArch64::Q##n: \
749 return HasSVE ? AArch64::Z##n : AArch64::Q##n
750 CASE(0);
751 CASE(1);
752 CASE(2);
753 CASE(3);
754 CASE(4);
755 CASE(5);
756 CASE(6);
757 CASE(7);
758 CASE(8);
759 CASE(9);
760 CASE(10);
761 CASE(11);
762 CASE(12);
763 CASE(13);
764 CASE(14);
765 CASE(15);
766 CASE(16);
767 CASE(17);
768 CASE(18);
769 CASE(19);
770 CASE(20);
771 CASE(21);
772 CASE(22);
773 CASE(23);
774 CASE(24);
775 CASE(25);
776 CASE(26);
777 CASE(27);
778 CASE(28);
779 CASE(29);
780 CASE(30);
781 CASE(31);
782#undef CASE
783 }
784}
785
786void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
787 MachineBasicBlock &MBB) const {
788 // Insertion point.
790
791 // Fake a debug loc.
792 DebugLoc DL;
793 if (MBBI != MBB.end())
794 DL = MBBI->getDebugLoc();
795
796 const MachineFunction &MF = *MBB.getParent();
797 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
798 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
799
800 BitVector GPRsToZero(TRI.getNumRegs());
801 BitVector FPRsToZero(TRI.getNumRegs());
802 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
803 for (MCRegister Reg : RegsToZero.set_bits()) {
804 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
805 // For GPRs, we only care to clear out the 64-bit register.
806 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
807 GPRsToZero.set(XReg);
808 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
809 // For FPRs,
810 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
811 FPRsToZero.set(XReg);
812 }
813 }
814
815 const AArch64InstrInfo &TII = *STI.getInstrInfo();
816
817 // Zero out GPRs.
818 for (MCRegister Reg : GPRsToZero.set_bits())
819 TII.buildClearRegister(Reg, MBB, MBBI, DL);
820
821 // Zero out FP/vector registers.
822 for (MCRegister Reg : FPRsToZero.set_bits())
823 TII.buildClearRegister(Reg, MBB, MBBI, DL);
824
825 if (HasSVE) {
826 for (MCRegister PReg :
827 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
828 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
829 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
830 AArch64::P15}) {
831 if (RegsToZero[PReg])
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
833 }
834 }
835}
836
837bool AArch64FrameLowering::windowsRequiresStackProbe(
838 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
839 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
840 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
841 // TODO: When implementing stack protectors, take that into account
842 // for the probe threshold.
843 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
844 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
845}
846
848 const MachineBasicBlock &MBB) {
849 const MachineFunction *MF = MBB.getParent();
850 LiveRegs.addLiveIns(MBB);
851 // Mark callee saved registers as used so we will not choose them.
852 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
853 for (unsigned i = 0; CSRegs[i]; ++i)
854 LiveRegs.addReg(CSRegs[i]);
855}
856
858AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
859 bool HasCall) const {
860 MachineFunction *MF = MBB->getParent();
861
862 // If MBB is an entry block, use X9 as the scratch register
863 // preserve_none functions may be using X9 to pass arguments,
864 // so prefer to pick an available register below.
865 if (&MF->front() == MBB &&
867 return AArch64::X9;
868
869 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
870 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
871 LivePhysRegs LiveRegs(TRI);
872 getLiveRegsForEntryMBB(LiveRegs, *MBB);
873 if (HasCall) {
874 LiveRegs.addReg(AArch64::X16);
875 LiveRegs.addReg(AArch64::X17);
876 LiveRegs.addReg(AArch64::X18);
877 }
878
879 // Prefer X9 since it was historically used for the prologue scratch reg.
880 const MachineRegisterInfo &MRI = MF->getRegInfo();
881 if (LiveRegs.available(MRI, AArch64::X9))
882 return AArch64::X9;
883
884 for (unsigned Reg : AArch64::GPR64RegClass) {
885 if (LiveRegs.available(MRI, Reg))
886 return Reg;
887 }
888 return AArch64::NoRegister;
889}
890
892 const MachineBasicBlock &MBB) const {
893 const MachineFunction *MF = MBB.getParent();
894 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
895 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
896 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
897 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
899
900 if (AFI->hasSwiftAsyncContext()) {
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 const MachineRegisterInfo &MRI = MF->getRegInfo();
905 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
906 // available.
907 if (!LiveRegs.available(MRI, AArch64::X16) ||
908 !LiveRegs.available(MRI, AArch64::X17))
909 return false;
910 }
911
912 // Certain stack probing sequences might clobber flags, then we can't use
913 // the block as a prologue if the flags register is a live-in.
915 MBB.isLiveIn(AArch64::NZCV))
916 return false;
917
918 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
919 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
920 return false;
921
922 // May need a scratch register (for return value) if require making a special
923 // call
924 if (requiresSaveVG(*MF) ||
925 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
926 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
927 return false;
928
929 return true;
930}
931
933 const Function &F = MF.getFunction();
934 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
935 F.needsUnwindTableEntry();
936}
937
938bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
939 const MachineFunction &MF) const {
940 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
941 // and SEH_EpilogEnd instructions in the correct order.
943 return false;
945 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
946 return SignReturnAddressAll;
947}
948
949// Given a load or a store instruction, generate an appropriate unwinding SEH
950// code on Windows.
952AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
953 const TargetInstrInfo &TII,
954 MachineInstr::MIFlag Flag) const {
955 unsigned Opc = MBBI->getOpcode();
956 MachineBasicBlock *MBB = MBBI->getParent();
957 MachineFunction &MF = *MBB->getParent();
958 DebugLoc DL = MBBI->getDebugLoc();
959 unsigned ImmIdx = MBBI->getNumOperands() - 1;
960 int Imm = MBBI->getOperand(ImmIdx).getImm();
962 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
963 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
964
965 switch (Opc) {
966 default:
967 report_fatal_error("No SEH Opcode for this instruction");
968 case AArch64::STR_ZXI:
969 case AArch64::LDR_ZXI: {
970 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
971 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
972 .addImm(Reg0)
973 .addImm(Imm)
974 .setMIFlag(Flag);
975 break;
976 }
977 case AArch64::STR_PXI:
978 case AArch64::LDR_PXI: {
979 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
980 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
981 .addImm(Reg0)
982 .addImm(Imm)
983 .setMIFlag(Flag);
984 break;
985 }
986 case AArch64::LDPDpost:
987 Imm = -Imm;
988 [[fallthrough]];
989 case AArch64::STPDpre: {
990 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
991 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
992 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
993 .addImm(Reg0)
994 .addImm(Reg1)
995 .addImm(Imm * 8)
996 .setMIFlag(Flag);
997 break;
998 }
999 case AArch64::LDPXpost:
1000 Imm = -Imm;
1001 [[fallthrough]];
1002 case AArch64::STPXpre: {
1003 Register Reg0 = MBBI->getOperand(1).getReg();
1004 Register Reg1 = MBBI->getOperand(2).getReg();
1005 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1006 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1007 .addImm(Imm * 8)
1008 .setMIFlag(Flag);
1009 else
1010 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1011 .addImm(RegInfo->getSEHRegNum(Reg0))
1012 .addImm(RegInfo->getSEHRegNum(Reg1))
1013 .addImm(Imm * 8)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDRDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STRDpre: {
1021 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1023 .addImm(Reg)
1024 .addImm(Imm)
1025 .setMIFlag(Flag);
1026 break;
1027 }
1028 case AArch64::LDRXpost:
1029 Imm = -Imm;
1030 [[fallthrough]];
1031 case AArch64::STRXpre: {
1032 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1033 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1034 .addImm(Reg)
1035 .addImm(Imm)
1036 .setMIFlag(Flag);
1037 break;
1038 }
1039 case AArch64::STPDi:
1040 case AArch64::LDPDi: {
1041 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1042 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1043 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1044 .addImm(Reg0)
1045 .addImm(Reg1)
1046 .addImm(Imm * 8)
1047 .setMIFlag(Flag);
1048 break;
1049 }
1050 case AArch64::STPXi:
1051 case AArch64::LDPXi: {
1052 Register Reg0 = MBBI->getOperand(0).getReg();
1053 Register Reg1 = MBBI->getOperand(1).getReg();
1054 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1055 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1056 .addImm(Imm * 8)
1057 .setMIFlag(Flag);
1058 else
1059 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1060 .addImm(RegInfo->getSEHRegNum(Reg0))
1061 .addImm(RegInfo->getSEHRegNum(Reg1))
1062 .addImm(Imm * 8)
1063 .setMIFlag(Flag);
1064 break;
1065 }
1066 case AArch64::STRXui:
1067 case AArch64::LDRXui: {
1068 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1069 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1070 .addImm(Reg)
1071 .addImm(Imm * 8)
1072 .setMIFlag(Flag);
1073 break;
1074 }
1075 case AArch64::STRDui:
1076 case AArch64::LDRDui: {
1077 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1078 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1079 .addImm(Reg)
1080 .addImm(Imm * 8)
1081 .setMIFlag(Flag);
1082 break;
1083 }
1084 case AArch64::STPQi:
1085 case AArch64::LDPQi: {
1086 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1087 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1088 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1089 .addImm(Reg0)
1090 .addImm(Reg1)
1091 .addImm(Imm * 16)
1092 .setMIFlag(Flag);
1093 break;
1094 }
1095 case AArch64::LDPQpost:
1096 Imm = -Imm;
1097 [[fallthrough]];
1098 case AArch64::STPQpre: {
1099 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1100 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1101 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1102 .addImm(Reg0)
1103 .addImm(Reg1)
1104 .addImm(Imm * 16)
1105 .setMIFlag(Flag);
1106 break;
1107 }
1108 }
1109 auto I = MBB->insertAfter(MBBI, MIB);
1110 return I;
1111}
1112
1115 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1116 return false;
1117 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1118 // is enabled with streaming mode changes.
1119 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1120 if (ST.isTargetDarwin())
1121 return ST.hasSVE();
1122 return true;
1123}
1124
1125static bool isTargetWindows(const MachineFunction &MF) {
1127}
1128
1129static unsigned getStackHazardSize(const MachineFunction &MF) {
1130 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1131}
1132
1134 MachineFunction &MF) const {
1135 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1136 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1137
1138 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1139 DebugLoc DL; // Set debug location to unknown.
1141
1142 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1144 };
1145
1146 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1147 DebugLoc DL;
1148 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1149 if (MBBI != MBB.end())
1150 DL = MBBI->getDebugLoc();
1151
1152 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1154 };
1155
1156 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1157 EmitSignRA(MF.front());
1158 for (MachineBasicBlock &MBB : MF) {
1159 if (MBB.isEHFuncletEntry())
1160 EmitSignRA(MBB);
1161 if (MBB.isReturnBlock())
1162 EmitAuthRA(MBB);
1163 }
1164}
1165
1167 MachineBasicBlock &MBB) const {
1168 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1169 PrologueEmitter.emitPrologue();
1170}
1171
1173 MachineBasicBlock &MBB) const {
1174 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1175 EpilogueEmitter.emitEpilogue();
1176}
1177
1180 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1181}
1182
1184 return enableCFIFixup(MF) &&
1185 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1186}
1187
1188/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1189/// debug info. It's the same as what we use for resolving the code-gen
1190/// references for now. FIXME: This can go wrong when references are
1191/// SP-relative and simple call frames aren't used.
1194 Register &FrameReg) const {
1196 MF, FI, FrameReg,
1197 /*PreferFP=*/
1198 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1199 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1200 /*ForSimm=*/false);
1201}
1202
1205 int FI) const {
1206 // This function serves to provide a comparable offset from a single reference
1207 // point (the value of SP at function entry) that can be used for analysis,
1208 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1209 // correct for all objects in the presence of VLA-area objects or dynamic
1210 // stack re-alignment.
1211
1212 const auto &MFI = MF.getFrameInfo();
1213
1214 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1215 StackOffset SVEStackSize = getSVEStackSize(MF);
1216
1217 // For VLA-area objects, just emit an offset at the end of the stack frame.
1218 // Whilst not quite correct, these objects do live at the end of the frame and
1219 // so it is more useful for analysis for the offset to reflect this.
1220 if (MFI.isVariableSizedObjectIndex(FI)) {
1221 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1222 }
1223
1224 // This is correct in the absence of any SVE stack objects.
1225 if (!SVEStackSize)
1226 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1227
1228 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1229 bool FPAfterSVECalleeSaves =
1231 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
1232 if (FPAfterSVECalleeSaves &&
1233 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
1234 return StackOffset::getScalable(ObjectOffset);
1235 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1236 ObjectOffset);
1237 }
1238
1239 bool IsFixed = MFI.isFixedObjectIndex(FI);
1240 bool IsCSR =
1241 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1242
1243 StackOffset ScalableOffset = {};
1244 if (!IsFixed && !IsCSR) {
1245 ScalableOffset = -SVEStackSize;
1246 } else if (FPAfterSVECalleeSaves && IsCSR) {
1247 ScalableOffset =
1249 }
1250
1251 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1252}
1253
1259
1260StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1261 int64_t ObjectOffset) const {
1262 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1263 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1264 const Function &F = MF.getFunction();
1265 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1266 unsigned FixedObject =
1267 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1268 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1269 int64_t FPAdjust =
1270 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1271 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1272}
1273
1274StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1275 int64_t ObjectOffset) const {
1276 const auto &MFI = MF.getFrameInfo();
1277 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1278}
1279
1280// TODO: This function currently does not work for scalable vectors.
1282 int FI) const {
1283 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1284 MF.getSubtarget().getRegisterInfo());
1285 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1286 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1287 ? getFPOffset(MF, ObjectOffset).getFixed()
1288 : getStackOffset(MF, ObjectOffset).getFixed();
1289}
1290
1292 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1293 bool ForSimm) const {
1294 const auto &MFI = MF.getFrameInfo();
1295 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1296 bool isFixed = MFI.isFixedObjectIndex(FI);
1297 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
1298 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
1299 PreferFP, ForSimm);
1300}
1301
1303 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
1304 Register &FrameReg, bool PreferFP, bool ForSimm) const {
1305 const auto &MFI = MF.getFrameInfo();
1306 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1307 MF.getSubtarget().getRegisterInfo());
1308 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1309 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1310
1311 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1312 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1313 bool isCSR =
1314 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1315
1316 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1317
1318 // Use frame pointer to reference fixed objects. Use it for locals if
1319 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1320 // reliable as a base). Make sure useFPForScavengingIndex() does the
1321 // right thing for the emergency spill slot.
1322 bool UseFP = false;
1323 if (AFI->hasStackFrame() && !isSVE) {
1324 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1325 // there are scalable (SVE) objects in between the FP and the fixed-sized
1326 // objects.
1327 PreferFP &= !SVEStackSize;
1328
1329 // Note: Keeping the following as multiple 'if' statements rather than
1330 // merging to a single expression for readability.
1331 //
1332 // Argument access should always use the FP.
1333 if (isFixed) {
1334 UseFP = hasFP(MF);
1335 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1336 // References to the CSR area must use FP if we're re-aligning the stack
1337 // since the dynamically-sized alignment padding is between the SP/BP and
1338 // the CSR area.
1339 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1340 UseFP = true;
1341 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1342 // If the FPOffset is negative and we're producing a signed immediate, we
1343 // have to keep in mind that the available offset range for negative
1344 // offsets is smaller than for positive ones. If an offset is available
1345 // via the FP and the SP, use whichever is closest.
1346 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1347 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1348
1349 if (FPOffset >= 0) {
1350 // If the FPOffset is positive, that'll always be best, as the SP/BP
1351 // will be even further away.
1352 UseFP = true;
1353 } else if (MFI.hasVarSizedObjects()) {
1354 // If we have variable sized objects, we can use either FP or BP, as the
1355 // SP offset is unknown. We can use the base pointer if we have one and
1356 // FP is not preferred. If not, we're stuck with using FP.
1357 bool CanUseBP = RegInfo->hasBasePointer(MF);
1358 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1359 UseFP = PreferFP;
1360 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1361 UseFP = true;
1362 // else we can use BP and FP, but the offset from FP won't fit.
1363 // That will make us scavenge registers which we can probably avoid by
1364 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1365 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1366 // Funclets access the locals contained in the parent's stack frame
1367 // via the frame pointer, so we have to use the FP in the parent
1368 // function.
1369 (void) Subtarget;
1370 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1371 MF.getFunction().isVarArg()) &&
1372 "Funclets should only be present on Win64");
1373 UseFP = true;
1374 } else {
1375 // We have the choice between FP and (SP or BP).
1376 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1377 UseFP = true;
1378 }
1379 }
1380 }
1381
1382 assert(
1383 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1384 "In the presence of dynamic stack pointer realignment, "
1385 "non-argument/CSR objects cannot be accessed through the frame pointer");
1386
1387 bool FPAfterSVECalleeSaves =
1389
1390 if (isSVE) {
1391 StackOffset FPOffset =
1393 StackOffset SPOffset =
1394 SVEStackSize +
1395 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1396 ObjectOffset);
1397 if (FPAfterSVECalleeSaves) {
1399 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1402 }
1403 }
1404 // Always use the FP for SVE spills if available and beneficial.
1405 if (hasFP(MF) && (SPOffset.getFixed() ||
1406 FPOffset.getScalable() < SPOffset.getScalable() ||
1407 RegInfo->hasStackRealignment(MF))) {
1408 FrameReg = RegInfo->getFrameRegister(MF);
1409 return FPOffset;
1410 }
1411
1412 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1413 : (unsigned)AArch64::SP;
1414 return SPOffset;
1415 }
1416
1417 StackOffset ScalableOffset = {};
1418 if (FPAfterSVECalleeSaves) {
1419 // In this stack layout, the FP is in between the callee saves and other
1420 // SVE allocations.
1421 StackOffset SVECalleeSavedStack =
1423 if (UseFP) {
1424 if (isFixed)
1425 ScalableOffset = SVECalleeSavedStack;
1426 else if (!isCSR)
1427 ScalableOffset = SVECalleeSavedStack - SVEStackSize;
1428 } else {
1429 if (isFixed)
1430 ScalableOffset = SVEStackSize;
1431 else if (isCSR)
1432 ScalableOffset = SVEStackSize - SVECalleeSavedStack;
1433 }
1434 } else {
1435 if (UseFP && !(isFixed || isCSR))
1436 ScalableOffset = -SVEStackSize;
1437 if (!UseFP && (isFixed || isCSR))
1438 ScalableOffset = SVEStackSize;
1439 }
1440
1441 if (UseFP) {
1442 FrameReg = RegInfo->getFrameRegister(MF);
1443 return StackOffset::getFixed(FPOffset) + ScalableOffset;
1444 }
1445
1446 // Use the base pointer if we have one.
1447 if (RegInfo->hasBasePointer(MF))
1448 FrameReg = RegInfo->getBaseRegister();
1449 else {
1450 assert(!MFI.hasVarSizedObjects() &&
1451 "Can't use SP when we have var sized objects.");
1452 FrameReg = AArch64::SP;
1453 // If we're using the red zone for this function, the SP won't actually
1454 // be adjusted, so the offsets will be negative. They're also all
1455 // within range of the signed 9-bit immediate instructions.
1456 if (canUseRedZone(MF))
1457 Offset -= AFI->getLocalStackSize();
1458 }
1459
1460 return StackOffset::getFixed(Offset) + ScalableOffset;
1461}
1462
1463static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1464 // Do not set a kill flag on values that are also marked as live-in. This
1465 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1466 // callee saved registers.
1467 // Omitting the kill flags is conservatively correct even if the live-in
1468 // is not used after all.
1469 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1470 return getKillRegState(!IsLiveIn);
1471}
1472
1474 MachineFunction &MF) {
1475 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1476 AttributeList Attrs = MF.getFunction().getAttributes();
1478 return Subtarget.isTargetMachO() &&
1479 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1480 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1482 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1483}
1484
1485static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1486 bool NeedsWinCFI, bool IsFirst,
1487 const TargetRegisterInfo *TRI) {
1488 // If we are generating register pairs for a Windows function that requires
1489 // EH support, then pair consecutive registers only. There are no unwind
1490 // opcodes for saves/restores of non-consecutive register pairs.
1491 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1492 // save_lrpair.
1493 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1494
1495 if (Reg2 == AArch64::FP)
1496 return true;
1497 if (!NeedsWinCFI)
1498 return false;
1499 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1500 return false;
1501 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1502 // opcode. If this is the first register pair, it would end up with a
1503 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1504 // if LR is paired with something else than the first register.
1505 // The save_lrpair opcode requires the first register to be an odd one.
1506 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1507 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1508 return false;
1509 return true;
1510}
1511
1512/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1513/// WindowsCFI requires that only consecutive registers can be paired.
1514/// LR and FP need to be allocated together when the frame needs to save
1515/// the frame-record. This means any other register pairing with LR is invalid.
1516static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1517 bool UsesWinAAPCS, bool NeedsWinCFI,
1518 bool NeedsFrameRecord, bool IsFirst,
1519 const TargetRegisterInfo *TRI) {
1520 if (UsesWinAAPCS)
1521 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1522 TRI);
1523
1524 // If we need to store the frame record, don't pair any register
1525 // with LR other than FP.
1526 if (NeedsFrameRecord)
1527 return Reg2 == AArch64::LR;
1528
1529 return false;
1530}
1531
1532namespace {
1533
1534struct RegPairInfo {
1535 unsigned Reg1 = AArch64::NoRegister;
1536 unsigned Reg2 = AArch64::NoRegister;
1537 int FrameIdx;
1538 int Offset;
1539 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1540 const TargetRegisterClass *RC;
1541
1542 RegPairInfo() = default;
1543
1544 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
1545
1546 bool isScalable() const { return Type == PPR || Type == ZPR; }
1547};
1548
1549} // end anonymous namespace
1550
1551unsigned findFreePredicateReg(BitVector &SavedRegs) {
1552 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1553 if (SavedRegs.test(PReg)) {
1554 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1555 return PNReg;
1556 }
1557 }
1558 return AArch64::NoRegister;
1559}
1560
1561// The multivector LD/ST are available only for SME or SVE2p1 targets
1563 MachineFunction &MF) {
1565 return false;
1566
1567 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1568 bool IsLocallyStreaming =
1569 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1570
1571 // Only when in streaming mode SME2 instructions can be safely used.
1572 // It is not safe to use SME2 instructions when in streaming compatible or
1573 // locally streaming mode.
1574 return Subtarget.hasSVE2p1() ||
1575 (Subtarget.hasSME2() &&
1576 (!IsLocallyStreaming && Subtarget.isStreaming()));
1577}
1578
1580 MachineFunction &MF,
1582 const TargetRegisterInfo *TRI,
1584 bool NeedsFrameRecord) {
1585
1586 if (CSI.empty())
1587 return;
1588
1589 bool IsWindows = isTargetWindows(MF);
1590 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1592 unsigned StackHazardSize = getStackHazardSize(MF);
1593 MachineFrameInfo &MFI = MF.getFrameInfo();
1595 unsigned Count = CSI.size();
1596 (void)CC;
1597 // MachO's compact unwind format relies on all registers being stored in
1598 // pairs.
1599 assert((!produceCompactUnwindFrame(AFL, MF) ||
1602 (Count & 1) == 0) &&
1603 "Odd number of callee-saved regs to spill!");
1604 int ByteOffset = AFI->getCalleeSavedStackSize();
1605 int StackFillDir = -1;
1606 int RegInc = 1;
1607 unsigned FirstReg = 0;
1608 if (NeedsWinCFI) {
1609 // For WinCFI, fill the stack from the bottom up.
1610 ByteOffset = 0;
1611 StackFillDir = 1;
1612 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1613 // backwards, to pair up registers starting from lower numbered registers.
1614 RegInc = -1;
1615 FirstReg = Count - 1;
1616 }
1617 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1618 int ScalableByteOffset =
1619 FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
1620 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1621 Register LastReg = 0;
1622
1623 // When iterating backwards, the loop condition relies on unsigned wraparound.
1624 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1625 RegPairInfo RPI;
1626 RPI.Reg1 = CSI[i].getReg();
1627
1628 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1629 RPI.Type = RegPairInfo::GPR;
1630 RPI.RC = &AArch64::GPR64RegClass;
1631 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1632 RPI.Type = RegPairInfo::FPR64;
1633 RPI.RC = &AArch64::FPR64RegClass;
1634 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1635 RPI.Type = RegPairInfo::FPR128;
1636 RPI.RC = &AArch64::FPR128RegClass;
1637 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1638 RPI.Type = RegPairInfo::ZPR;
1639 RPI.RC = &AArch64::ZPRRegClass;
1640 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1641 RPI.Type = RegPairInfo::PPR;
1642 RPI.RC = &AArch64::PPRRegClass;
1643 } else if (RPI.Reg1 == AArch64::VG) {
1644 RPI.Type = RegPairInfo::VG;
1645 RPI.RC = &AArch64::FIXED_REGSRegClass;
1646 } else {
1647 llvm_unreachable("Unsupported register class.");
1648 }
1649
1650 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1651 if (AFI->hasStackHazardSlotIndex() &&
1652 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1654 ByteOffset += StackFillDir * StackHazardSize;
1655 LastReg = RPI.Reg1;
1656
1657 int Scale = TRI->getSpillSize(*RPI.RC);
1658 // Add the next reg to the pair if it is in the same register class.
1659 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
1660 MCRegister NextReg = CSI[i + RegInc].getReg();
1661 bool IsFirst = i == FirstReg;
1662 switch (RPI.Type) {
1663 case RegPairInfo::GPR:
1664 if (AArch64::GPR64RegClass.contains(NextReg) &&
1665 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
1666 NeedsWinCFI, NeedsFrameRecord, IsFirst,
1667 TRI))
1668 RPI.Reg2 = NextReg;
1669 break;
1670 case RegPairInfo::FPR64:
1671 if (AArch64::FPR64RegClass.contains(NextReg) &&
1672 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
1673 IsFirst, TRI))
1674 RPI.Reg2 = NextReg;
1675 break;
1676 case RegPairInfo::FPR128:
1677 if (AArch64::FPR128RegClass.contains(NextReg))
1678 RPI.Reg2 = NextReg;
1679 break;
1680 case RegPairInfo::PPR:
1681 break;
1682 case RegPairInfo::ZPR:
1683 if (AFI->getPredicateRegForFillSpill() != 0 &&
1684 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1685 // Calculate offset of register pair to see if pair instruction can be
1686 // used.
1687 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1688 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1689 RPI.Reg2 = NextReg;
1690 }
1691 break;
1692 case RegPairInfo::VG:
1693 break;
1694 }
1695 }
1696
1697 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1698 // list to come in sorted by frame index so that we can issue the store
1699 // pair instructions directly. Assert if we see anything otherwise.
1700 //
1701 // The order of the registers in the list is controlled by
1702 // getCalleeSavedRegs(), so they will always be in-order, as well.
1703 assert((!RPI.isPaired() ||
1704 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1705 "Out of order callee saved regs!");
1706
1707 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1708 RPI.Reg1 == AArch64::LR) &&
1709 "FrameRecord must be allocated together with LR");
1710
1711 // Windows AAPCS has FP and LR reversed.
1712 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1713 RPI.Reg2 == AArch64::LR) &&
1714 "FrameRecord must be allocated together with LR");
1715
1716 // MachO's compact unwind format relies on all registers being stored in
1717 // adjacent register pairs.
1718 assert((!produceCompactUnwindFrame(AFL, MF) ||
1721 (RPI.isPaired() &&
1722 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1723 RPI.Reg1 + 1 == RPI.Reg2))) &&
1724 "Callee-save registers not saved as adjacent register pair!");
1725
1726 RPI.FrameIdx = CSI[i].getFrameIdx();
1727 if (NeedsWinCFI &&
1728 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1729 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1730
1731 // Realign the scalable offset if necessary. This is relevant when
1732 // spilling predicates on Windows.
1733 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1734 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1735 }
1736
1737 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1738 assert(OffsetPre % Scale == 0);
1739
1740 if (RPI.isScalable())
1741 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1742 else
1743 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1744
1745 // Swift's async context is directly before FP, so allocate an extra
1746 // 8 bytes for it.
1747 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1748 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1749 (IsWindows && RPI.Reg2 == AArch64::LR)))
1750 ByteOffset += StackFillDir * 8;
1751
1752 // Round up size of non-pair to pair size if we need to pad the
1753 // callee-save area to ensure 16-byte alignment.
1754 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1755 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1756 ByteOffset % 16 != 0) {
1757 ByteOffset += 8 * StackFillDir;
1758 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1759 // A stack frame with a gap looks like this, bottom up:
1760 // d9, d8. x21, gap, x20, x19.
1761 // Set extra alignment on the x21 object to create the gap above it.
1762 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1763 NeedGapToAlignStack = false;
1764 }
1765
1766 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1767 assert(OffsetPost % Scale == 0);
1768 // If filling top down (default), we want the offset after incrementing it.
1769 // If filling bottom up (WinCFI) we need the original offset.
1770 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1771
1772 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1773 // Swift context can directly precede FP.
1774 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1775 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1776 (IsWindows && RPI.Reg2 == AArch64::LR)))
1777 Offset += 8;
1778 RPI.Offset = Offset / Scale;
1779
1780 assert((!RPI.isPaired() ||
1781 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1782 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1783 "Offset out of bounds for LDP/STP immediate");
1784
1785 auto isFrameRecord = [&] {
1786 if (RPI.isPaired())
1787 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1788 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1789 // Otherwise, look for the frame record as two unpaired registers. This is
1790 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1791 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1792 // On Windows, this check works out as current reg == FP, next reg == LR,
1793 // and on other platforms current reg == FP, previous reg == LR. This
1794 // works out as the correct pre-increment or post-increment offsets
1795 // respectively.
1796 return i > 0 && RPI.Reg1 == AArch64::FP &&
1797 CSI[i - 1].getReg() == AArch64::LR;
1798 };
1799
1800 // Save the offset to frame record so that the FP register can point to the
1801 // innermost frame record (spilled FP and LR registers).
1802 if (NeedsFrameRecord && isFrameRecord())
1804
1805 RegPairs.push_back(RPI);
1806 if (RPI.isPaired())
1807 i += RegInc;
1808 }
1809 if (NeedsWinCFI) {
1810 // If we need an alignment gap in the stack, align the topmost stack
1811 // object. A stack frame with a gap looks like this, bottom up:
1812 // x19, d8. d9, gap.
1813 // Set extra alignment on the topmost stack object (the first element in
1814 // CSI, which goes top down), to create the gap above it.
1815 if (AFI->hasCalleeSaveStackFreeSpace())
1816 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1817 // We iterated bottom up over the registers; flip RegPairs back to top
1818 // down order.
1819 std::reverse(RegPairs.begin(), RegPairs.end());
1820 }
1821}
1822
1826 MachineFunction &MF = *MBB.getParent();
1827 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1829 bool NeedsWinCFI = needsWinCFI(MF);
1830 DebugLoc DL;
1832
1833 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1834
1836 // Refresh the reserved regs in case there are any potential changes since the
1837 // last freeze.
1838 MRI.freezeReservedRegs();
1839
1840 if (homogeneousPrologEpilog(MF)) {
1841 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1843
1844 for (auto &RPI : RegPairs) {
1845 MIB.addReg(RPI.Reg1);
1846 MIB.addReg(RPI.Reg2);
1847
1848 // Update register live in.
1849 if (!MRI.isReserved(RPI.Reg1))
1850 MBB.addLiveIn(RPI.Reg1);
1851 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1852 MBB.addLiveIn(RPI.Reg2);
1853 }
1854 return true;
1855 }
1856 bool PTrueCreated = false;
1857 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1858 unsigned Reg1 = RPI.Reg1;
1859 unsigned Reg2 = RPI.Reg2;
1860 unsigned StrOpc;
1861
1862 // Issue sequence of spills for cs regs. The first spill may be converted
1863 // to a pre-decrement store later by emitPrologue if the callee-save stack
1864 // area allocation can't be combined with the local stack area allocation.
1865 // For example:
1866 // stp x22, x21, [sp, #0] // addImm(+0)
1867 // stp x20, x19, [sp, #16] // addImm(+2)
1868 // stp fp, lr, [sp, #32] // addImm(+4)
1869 // Rationale: This sequence saves uop updates compared to a sequence of
1870 // pre-increment spills like stp xi,xj,[sp,#-16]!
1871 // Note: Similar rationale and sequence for restores in epilog.
1872 unsigned Size = TRI->getSpillSize(*RPI.RC);
1873 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1874 switch (RPI.Type) {
1875 case RegPairInfo::GPR:
1876 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1877 break;
1878 case RegPairInfo::FPR64:
1879 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1880 break;
1881 case RegPairInfo::FPR128:
1882 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
1883 break;
1884 case RegPairInfo::ZPR:
1885 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
1886 break;
1887 case RegPairInfo::PPR:
1888 StrOpc =
1889 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
1890 break;
1891 case RegPairInfo::VG:
1892 StrOpc = AArch64::STRXui;
1893 break;
1894 }
1895
1896 unsigned X0Scratch = AArch64::NoRegister;
1897 auto RestoreX0 = make_scope_exit([&] {
1898 if (X0Scratch != AArch64::NoRegister)
1899 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
1900 .addReg(X0Scratch)
1902 });
1903
1904 if (Reg1 == AArch64::VG) {
1905 // Find an available register to store value of VG to.
1906 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
1907 assert(Reg1 != AArch64::NoRegister);
1908 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
1909 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
1910 .addImm(31)
1911 .addImm(1)
1913 } else {
1915 if (any_of(MBB.liveins(),
1916 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
1917 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
1918 AArch64::X0, LiveIn.PhysReg);
1919 })) {
1920 X0Scratch = Reg1;
1921 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
1922 .addReg(AArch64::X0)
1924 }
1925
1926 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
1927 const uint32_t *RegMask =
1928 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
1929 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
1930 .addExternalSymbol(TLI.getLibcallName(LC))
1931 .addRegMask(RegMask)
1932 .addReg(AArch64::X0, RegState::ImplicitDefine)
1934 Reg1 = AArch64::X0;
1935 }
1936 }
1937
1938 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
1939 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
1940 dbgs() << ") -> fi#(" << RPI.FrameIdx;
1941 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
1942 dbgs() << ")\n");
1943
1944 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
1945 "Windows unwdinding requires a consecutive (FP,LR) pair");
1946 // Windows unwind codes require consecutive registers if registers are
1947 // paired. Make the switch here, so that the code below will save (x,x+1)
1948 // and not (x+1,x).
1949 unsigned FrameIdxReg1 = RPI.FrameIdx;
1950 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
1951 if (NeedsWinCFI && RPI.isPaired()) {
1952 std::swap(Reg1, Reg2);
1953 std::swap(FrameIdxReg1, FrameIdxReg2);
1954 }
1955
1956 if (RPI.isPaired() && RPI.isScalable()) {
1957 [[maybe_unused]] const AArch64Subtarget &Subtarget =
1960 unsigned PnReg = AFI->getPredicateRegForFillSpill();
1961 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
1962 "Expects SVE2.1 or SME2 target and a predicate register");
1963#ifdef EXPENSIVE_CHECKS
1964 auto IsPPR = [](const RegPairInfo &c) {
1965 return c.Reg1 == RegPairInfo::PPR;
1966 };
1967 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
1968 auto IsZPR = [](const RegPairInfo &c) {
1969 return c.Type == RegPairInfo::ZPR;
1970 };
1971 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
1972 assert(!(PPRBegin < ZPRBegin) &&
1973 "Expected callee save predicate to be handled first");
1974#endif
1975 if (!PTrueCreated) {
1976 PTrueCreated = true;
1977 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
1979 }
1980 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
1981 if (!MRI.isReserved(Reg1))
1982 MBB.addLiveIn(Reg1);
1983 if (!MRI.isReserved(Reg2))
1984 MBB.addLiveIn(Reg2);
1985 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
1987 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
1988 MachineMemOperand::MOStore, Size, Alignment));
1989 MIB.addReg(PnReg);
1990 MIB.addReg(AArch64::SP)
1991 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
1992 // where 2*vscale is implicit
1995 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
1996 MachineMemOperand::MOStore, Size, Alignment));
1997 if (NeedsWinCFI)
1998 insertSEH(MIB, TII, MachineInstr::FrameSetup);
1999 } else { // The code when the pair of ZReg is not present
2000 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2001 if (!MRI.isReserved(Reg1))
2002 MBB.addLiveIn(Reg1);
2003 if (RPI.isPaired()) {
2004 if (!MRI.isReserved(Reg2))
2005 MBB.addLiveIn(Reg2);
2006 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2008 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2009 MachineMemOperand::MOStore, Size, Alignment));
2010 }
2011 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2012 .addReg(AArch64::SP)
2013 .addImm(RPI.Offset) // [sp, #offset*vscale],
2014 // where factor*vscale is implicit
2017 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2018 MachineMemOperand::MOStore, Size, Alignment));
2019 if (NeedsWinCFI)
2020 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2021 }
2022 // Update the StackIDs of the SVE stack slots.
2023 MachineFrameInfo &MFI = MF.getFrameInfo();
2024 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
2025 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2026 if (RPI.isPaired())
2027 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2028 }
2029 }
2030 return true;
2031}
2032
2036 MachineFunction &MF = *MBB.getParent();
2038 DebugLoc DL;
2040 bool NeedsWinCFI = needsWinCFI(MF);
2041
2042 if (MBBI != MBB.end())
2043 DL = MBBI->getDebugLoc();
2044
2045 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2046 if (homogeneousPrologEpilog(MF, &MBB)) {
2047 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2049 for (auto &RPI : RegPairs) {
2050 MIB.addReg(RPI.Reg1, RegState::Define);
2051 MIB.addReg(RPI.Reg2, RegState::Define);
2052 }
2053 return true;
2054 }
2055
2056 // For performance reasons restore SVE register in increasing order
2057 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2058 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2059 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2060 std::reverse(PPRBegin, PPREnd);
2061 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2062 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2063 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2064 std::reverse(ZPRBegin, ZPREnd);
2065
2066 bool PTrueCreated = false;
2067 for (const RegPairInfo &RPI : RegPairs) {
2068 unsigned Reg1 = RPI.Reg1;
2069 unsigned Reg2 = RPI.Reg2;
2070
2071 // Issue sequence of restores for cs regs. The last restore may be converted
2072 // to a post-increment load later by emitEpilogue if the callee-save stack
2073 // area allocation can't be combined with the local stack area allocation.
2074 // For example:
2075 // ldp fp, lr, [sp, #32] // addImm(+4)
2076 // ldp x20, x19, [sp, #16] // addImm(+2)
2077 // ldp x22, x21, [sp, #0] // addImm(+0)
2078 // Note: see comment in spillCalleeSavedRegisters()
2079 unsigned LdrOpc;
2080 unsigned Size = TRI->getSpillSize(*RPI.RC);
2081 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2082 switch (RPI.Type) {
2083 case RegPairInfo::GPR:
2084 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2085 break;
2086 case RegPairInfo::FPR64:
2087 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2088 break;
2089 case RegPairInfo::FPR128:
2090 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2091 break;
2092 case RegPairInfo::ZPR:
2093 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2094 break;
2095 case RegPairInfo::PPR:
2096 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
2097 : AArch64::LDR_PXI;
2098 break;
2099 case RegPairInfo::VG:
2100 continue;
2101 }
2102 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2103 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2104 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2105 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2106 dbgs() << ")\n");
2107
2108 // Windows unwind codes require consecutive registers if registers are
2109 // paired. Make the switch here, so that the code below will save (x,x+1)
2110 // and not (x+1,x).
2111 unsigned FrameIdxReg1 = RPI.FrameIdx;
2112 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2113 if (NeedsWinCFI && RPI.isPaired()) {
2114 std::swap(Reg1, Reg2);
2115 std::swap(FrameIdxReg1, FrameIdxReg2);
2116 }
2117
2119 if (RPI.isPaired() && RPI.isScalable()) {
2120 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2122 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2123 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2124 "Expects SVE2.1 or SME2 target and a predicate register");
2125#ifdef EXPENSIVE_CHECKS
2126 assert(!(PPRBegin < ZPRBegin) &&
2127 "Expected callee save predicate to be handled first");
2128#endif
2129 if (!PTrueCreated) {
2130 PTrueCreated = true;
2131 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2133 }
2134 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2135 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2136 getDefRegState(true));
2138 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2139 MachineMemOperand::MOLoad, Size, Alignment));
2140 MIB.addReg(PnReg);
2141 MIB.addReg(AArch64::SP)
2142 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2143 // where 2*vscale is implicit
2146 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2147 MachineMemOperand::MOLoad, Size, Alignment));
2148 if (NeedsWinCFI)
2149 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2150 } else {
2151 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2152 if (RPI.isPaired()) {
2153 MIB.addReg(Reg2, getDefRegState(true));
2155 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2156 MachineMemOperand::MOLoad, Size, Alignment));
2157 }
2158 MIB.addReg(Reg1, getDefRegState(true));
2159 MIB.addReg(AArch64::SP)
2160 .addImm(RPI.Offset) // [sp, #offset*vscale]
2161 // where factor*vscale is implicit
2164 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2165 MachineMemOperand::MOLoad, Size, Alignment));
2166 if (NeedsWinCFI)
2167 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2168 }
2169 }
2170 return true;
2171}
2172
2173// Return the FrameID for a MMO.
2174static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2175 const MachineFrameInfo &MFI) {
2176 auto *PSV =
2178 if (PSV)
2179 return std::optional<int>(PSV->getFrameIndex());
2180
2181 if (MMO->getValue()) {
2182 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2183 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2184 FI++)
2185 if (MFI.getObjectAllocation(FI) == Al)
2186 return FI;
2187 }
2188 }
2189
2190 return std::nullopt;
2191}
2192
2193// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2194static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2195 const MachineFrameInfo &MFI) {
2196 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2197 return std::nullopt;
2198
2199 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2200}
2201
2202// Check if a Hazard slot is needed for the current function, and if so create
2203// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2204// which can be used to determine if any hazard padding is needed.
2205void AArch64FrameLowering::determineStackHazardSlot(
2206 MachineFunction &MF, BitVector &SavedRegs) const {
2207 unsigned StackHazardSize = getStackHazardSize(MF);
2208 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2209 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2211 return;
2212
2213 // Stack hazards are only needed in streaming functions.
2214 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2215 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2216 return;
2217
2218 MachineFrameInfo &MFI = MF.getFrameInfo();
2219
2220 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2221 // stack objects.
2222 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2223 return AArch64::FPR64RegClass.contains(Reg) ||
2224 AArch64::FPR128RegClass.contains(Reg) ||
2225 AArch64::ZPRRegClass.contains(Reg) ||
2226 AArch64::PPRRegClass.contains(Reg);
2227 });
2228 bool HasFPRStackObjects = false;
2229 if (!HasFPRCSRs) {
2230 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
2231 for (auto &MBB : MF) {
2232 for (auto &MI : MBB) {
2233 std::optional<int> FI = getLdStFrameID(MI, MFI);
2234 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
2235 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
2237 FrameObjects[*FI] |= 2;
2238 else
2239 FrameObjects[*FI] |= 1;
2240 }
2241 }
2242 }
2243 HasFPRStackObjects =
2244 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
2245 }
2246
2247 if (HasFPRCSRs || HasFPRStackObjects) {
2248 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2249 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2250 << StackHazardSize << "\n");
2252 }
2253}
2254
2256 BitVector &SavedRegs,
2257 RegScavenger *RS) const {
2258 // All calls are tail calls in GHC calling conv, and functions have no
2259 // prologue/epilogue.
2261 return;
2262
2264 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2265 MF.getSubtarget().getRegisterInfo());
2266 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2268 unsigned UnspilledCSGPR = AArch64::NoRegister;
2269 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2270
2271 MachineFrameInfo &MFI = MF.getFrameInfo();
2272 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2273
2274 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2275 ? RegInfo->getBaseRegister()
2276 : (unsigned)AArch64::NoRegister;
2277
2278 unsigned ExtraCSSpill = 0;
2279 bool HasUnpairedGPR64 = false;
2280 bool HasPairZReg = false;
2281 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2282 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2283
2284 // Figure out which callee-saved registers to save/restore.
2285 for (unsigned i = 0; CSRegs[i]; ++i) {
2286 const unsigned Reg = CSRegs[i];
2287
2288 // Add the base pointer register to SavedRegs if it is callee-save.
2289 if (Reg == BasePointerReg)
2290 SavedRegs.set(Reg);
2291
2292 // Don't save manually reserved registers set through +reserve-x#i,
2293 // even for callee-saved registers, as per GCC's behavior.
2294 if (UserReservedRegs[Reg]) {
2295 SavedRegs.reset(Reg);
2296 continue;
2297 }
2298
2299 bool RegUsed = SavedRegs.test(Reg);
2300 unsigned PairedReg = AArch64::NoRegister;
2301 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2302 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2303 AArch64::FPR128RegClass.contains(Reg)) {
2304 // Compensate for odd numbers of GP CSRs.
2305 // For now, all the known cases of odd number of CSRs are of GPRs.
2306 if (HasUnpairedGPR64)
2307 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2308 else
2309 PairedReg = CSRegs[i ^ 1];
2310 }
2311
2312 // If the function requires all the GP registers to save (SavedRegs),
2313 // and there are an odd number of GP CSRs at the same time (CSRegs),
2314 // PairedReg could be in a different register class from Reg, which would
2315 // lead to a FPR (usually D8) accidentally being marked saved.
2316 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2317 PairedReg = AArch64::NoRegister;
2318 HasUnpairedGPR64 = true;
2319 }
2320 assert(PairedReg == AArch64::NoRegister ||
2321 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2322 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2323 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2324
2325 if (!RegUsed) {
2326 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2327 UnspilledCSGPR = Reg;
2328 UnspilledCSGPRPaired = PairedReg;
2329 }
2330 continue;
2331 }
2332
2333 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
2334 // spilled. If all of p0-p3 are used as return values p4 is must be free
2335 // to reload p8-p15.
2336 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
2337 AArch64::PPR_p8to15RegClass.contains(Reg)) {
2338 SavedRegs.set(AArch64::P4);
2339 }
2340
2341 // MachO's compact unwind format relies on all registers being stored in
2342 // pairs.
2343 // FIXME: the usual format is actually better if unwinding isn't needed.
2344 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2345 !SavedRegs.test(PairedReg)) {
2346 SavedRegs.set(PairedReg);
2347 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2348 !ReservedRegs[PairedReg])
2349 ExtraCSSpill = PairedReg;
2350 }
2351 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2352 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2353 SavedRegs.test(CSRegs[i ^ 1]));
2354 }
2355
2356 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2358 // Find a suitable predicate register for the multi-vector spill/fill
2359 // instructions.
2360 unsigned PnReg = findFreePredicateReg(SavedRegs);
2361 if (PnReg != AArch64::NoRegister)
2362 AFI->setPredicateRegForFillSpill(PnReg);
2363 // If no free callee-save has been found assign one.
2364 if (!AFI->getPredicateRegForFillSpill() &&
2365 MF.getFunction().getCallingConv() ==
2367 SavedRegs.set(AArch64::P8);
2368 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2369 }
2370
2371 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2372 "Predicate cannot be a reserved register");
2373 }
2374
2376 !Subtarget.isTargetWindows()) {
2377 // For Windows calling convention on a non-windows OS, where X18 is treated
2378 // as reserved, back up X18 when entering non-windows code (marked with the
2379 // Windows calling convention) and restore when returning regardless of
2380 // whether the individual function uses it - it might call other functions
2381 // that clobber it.
2382 SavedRegs.set(AArch64::X18);
2383 }
2384
2385 // Calculates the callee saved stack size.
2386 unsigned CSStackSize = 0;
2387 unsigned SVECSStackSize = 0;
2389 for (unsigned Reg : SavedRegs.set_bits()) {
2390 auto *RC = TRI->getMinimalPhysRegClass(Reg);
2391 assert(RC && "expected register class!");
2392 auto SpillSize = TRI->getSpillSize(*RC);
2393 if (AArch64::PPRRegClass.contains(Reg) ||
2394 AArch64::ZPRRegClass.contains(Reg))
2395 SVECSStackSize += SpillSize;
2396 else
2397 CSStackSize += SpillSize;
2398 }
2399
2400 // Save number of saved regs, so we can easily update CSStackSize later to
2401 // account for any additional 64-bit GPR saves. Note: After this point
2402 // only 64-bit GPRs can be added to SavedRegs.
2403 unsigned NumSavedRegs = SavedRegs.count();
2404
2405 // Increase the callee-saved stack size if the function has streaming mode
2406 // changes, as we will need to spill the value of the VG register.
2407 if (requiresSaveVG(MF))
2408 CSStackSize += 8;
2409
2410 // Determine if a Hazard slot should be used, and increase the CSStackSize by
2411 // StackHazardSize if so.
2412 determineStackHazardSlot(MF, SavedRegs);
2413 if (AFI->hasStackHazardSlotIndex())
2414 CSStackSize += getStackHazardSize(MF);
2415
2416 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2417 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2418 SavedRegs.set(AArch64::LR);
2419
2420 // The frame record needs to be created by saving the appropriate registers
2421 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2422 if (hasFP(MF) ||
2423 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2424 SavedRegs.set(AArch64::FP);
2425 SavedRegs.set(AArch64::LR);
2426 }
2427
2428 LLVM_DEBUG({
2429 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2430 for (unsigned Reg : SavedRegs.set_bits())
2431 dbgs() << ' ' << printReg(Reg, RegInfo);
2432 dbgs() << "\n";
2433 });
2434
2435 // If any callee-saved registers are used, the frame cannot be eliminated.
2436 int64_t SVEStackSize =
2437 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
2438 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2439
2440 // The CSR spill slots have not been allocated yet, so estimateStackSize
2441 // won't include them.
2442 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2443
2444 // We may address some of the stack above the canonical frame address, either
2445 // for our own arguments or during a call. Include that in calculating whether
2446 // we have complicated addressing concerns.
2447 int64_t CalleeStackUsed = 0;
2448 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2449 int64_t FixedOff = MFI.getObjectOffset(I);
2450 if (FixedOff > CalleeStackUsed)
2451 CalleeStackUsed = FixedOff;
2452 }
2453
2454 // Conservatively always assume BigStack when there are SVE spills.
2455 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2456 CalleeStackUsed) > EstimatedStackSizeLimit;
2457 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2458 AFI->setHasStackFrame(true);
2459
2460 // Estimate if we might need to scavenge a register at some point in order
2461 // to materialize a stack offset. If so, either spill one additional
2462 // callee-saved register or reserve a special spill slot to facilitate
2463 // register scavenging. If we already spilled an extra callee-saved register
2464 // above to keep the number of spills even, we don't need to do anything else
2465 // here.
2466 if (BigStack) {
2467 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2468 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2469 << " to get a scratch register.\n");
2470 SavedRegs.set(UnspilledCSGPR);
2471 ExtraCSSpill = UnspilledCSGPR;
2472
2473 // MachO's compact unwind format relies on all registers being stored in
2474 // pairs, so if we need to spill one extra for BigStack, then we need to
2475 // store the pair.
2476 if (producePairRegisters(MF)) {
2477 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2478 // Failed to make a pair for compact unwind format, revert spilling.
2479 if (produceCompactUnwindFrame(*this, MF)) {
2480 SavedRegs.reset(UnspilledCSGPR);
2481 ExtraCSSpill = AArch64::NoRegister;
2482 }
2483 } else
2484 SavedRegs.set(UnspilledCSGPRPaired);
2485 }
2486 }
2487
2488 // If we didn't find an extra callee-saved register to spill, create
2489 // an emergency spill slot.
2490 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2492 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2493 unsigned Size = TRI->getSpillSize(RC);
2494 Align Alignment = TRI->getSpillAlign(RC);
2495 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2497 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2498 << " as the emergency spill slot.\n");
2499 }
2500 }
2501
2502 // Adding the size of additional 64bit GPR saves.
2503 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2504
2505 // A Swift asynchronous context extends the frame record with a pointer
2506 // directly before FP.
2507 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2508 CSStackSize += 8;
2509
2510 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2511 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2512 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2513
2515 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2516 "Should not invalidate callee saved info");
2517
2518 // Round up to register pair alignment to avoid additional SP adjustment
2519 // instructions.
2520 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2521 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2522 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
2523}
2524
2526 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2527 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2528 unsigned &MaxCSFrameIndex) const {
2529 bool NeedsWinCFI = needsWinCFI(MF);
2530 unsigned StackHazardSize = getStackHazardSize(MF);
2531 // To match the canonical windows frame layout, reverse the list of
2532 // callee saved registers to get them laid out by PrologEpilogInserter
2533 // in the right order. (PrologEpilogInserter allocates stack objects top
2534 // down. Windows canonical prologs store higher numbered registers at
2535 // the top, thus have the CSI array start from the highest registers.)
2536 if (NeedsWinCFI)
2537 std::reverse(CSI.begin(), CSI.end());
2538
2539 if (CSI.empty())
2540 return true; // Early exit if no callee saved registers are modified!
2541
2542 // Now that we know which registers need to be saved and restored, allocate
2543 // stack slots for them.
2544 MachineFrameInfo &MFI = MF.getFrameInfo();
2545 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2546
2547 bool UsesWinAAPCS = isTargetWindows(MF);
2548 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2549 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2550 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2551 if ((unsigned)FrameIdx < MinCSFrameIndex)
2552 MinCSFrameIndex = FrameIdx;
2553 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2554 MaxCSFrameIndex = FrameIdx;
2555 }
2556
2557 // Insert VG into the list of CSRs, immediately before LR if saved.
2558 if (requiresSaveVG(MF)) {
2559 CalleeSavedInfo VGInfo(AArch64::VG);
2560 auto It =
2561 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2562 if (It != CSI.end())
2563 CSI.insert(It, VGInfo);
2564 else
2565 CSI.push_back(VGInfo);
2566 }
2567
2568 Register LastReg = 0;
2569 int HazardSlotIndex = std::numeric_limits<int>::max();
2570 for (auto &CS : CSI) {
2571 MCRegister Reg = CS.getReg();
2572 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2573
2574 // Create a hazard slot as we switch between GPR and FPR CSRs.
2575 if (AFI->hasStackHazardSlotIndex() &&
2576 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2578 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2579 "Unexpected register order for hazard slot");
2580 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2581 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2582 << "\n");
2583 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2584 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2585 MinCSFrameIndex = HazardSlotIndex;
2586 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2587 MaxCSFrameIndex = HazardSlotIndex;
2588 }
2589
2590 unsigned Size = RegInfo->getSpillSize(*RC);
2591 Align Alignment(RegInfo->getSpillAlign(*RC));
2592 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2593 CS.setFrameIdx(FrameIdx);
2594
2595 if ((unsigned)FrameIdx < MinCSFrameIndex)
2596 MinCSFrameIndex = FrameIdx;
2597 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2598 MaxCSFrameIndex = FrameIdx;
2599
2600 // Grab 8 bytes below FP for the extended asynchronous frame info.
2601 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2602 Reg == AArch64::FP) {
2603 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2604 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2605 if ((unsigned)FrameIdx < MinCSFrameIndex)
2606 MinCSFrameIndex = FrameIdx;
2607 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2608 MaxCSFrameIndex = FrameIdx;
2609 }
2610 LastReg = Reg;
2611 }
2612
2613 // Add hazard slot in the case where no FPR CSRs are present.
2614 if (AFI->hasStackHazardSlotIndex() &&
2615 HazardSlotIndex == std::numeric_limits<int>::max()) {
2616 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2617 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2618 << "\n");
2619 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2620 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2621 MinCSFrameIndex = HazardSlotIndex;
2622 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2623 MaxCSFrameIndex = HazardSlotIndex;
2624 }
2625
2626 return true;
2627}
2628
2630 const MachineFunction &MF) const {
2632 // If the function has streaming-mode changes, don't scavenge a
2633 // spillslot in the callee-save area, as that might require an
2634 // 'addvl' in the streaming-mode-changing call-sequence when the
2635 // function doesn't use a FP.
2636 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2637 return false;
2638 // Don't allow register salvaging with hazard slots, in case it moves objects
2639 // into the wrong place.
2640 if (AFI->hasStackHazardSlotIndex())
2641 return false;
2642 return AFI->hasCalleeSaveStackFreeSpace();
2643}
2644
2645/// returns true if there are any SVE callee saves.
2647 int &Min, int &Max) {
2648 Min = std::numeric_limits<int>::max();
2649 Max = std::numeric_limits<int>::min();
2650
2651 if (!MFI.isCalleeSavedInfoValid())
2652 return false;
2653
2654 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2655 for (auto &CS : CSI) {
2656 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2657 AArch64::PPRRegClass.contains(CS.getReg())) {
2658 assert((Max == std::numeric_limits<int>::min() ||
2659 Max + 1 == CS.getFrameIdx()) &&
2660 "SVE CalleeSaves are not consecutive");
2661
2662 Min = std::min(Min, CS.getFrameIdx());
2663 Max = std::max(Max, CS.getFrameIdx());
2664 }
2665 }
2666 return Min != std::numeric_limits<int>::max();
2667}
2668
2669// Process all the SVE stack objects and determine offsets for each
2670// object. If AssignOffsets is true, the offsets get assigned.
2671// Fills in the first and last callee-saved frame indices into
2672// Min/MaxCSFrameIndex, respectively.
2673// Returns the size of the stack.
2675 int &MinCSFrameIndex,
2676 int &MaxCSFrameIndex,
2677 bool AssignOffsets) {
2678#ifndef NDEBUG
2679 // First process all fixed stack objects.
2680 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2682 "SVE vectors should never be passed on the stack by value, only by "
2683 "reference.");
2684#endif
2685
2686 auto Assign = [&MFI](int FI, int64_t Offset) {
2687 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2688 MFI.setObjectOffset(FI, Offset);
2689 };
2690
2691 int64_t Offset = 0;
2692
2693 // Then process all callee saved slots.
2694 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2695 // Assign offsets to the callee save slots.
2696 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
2697 Offset += MFI.getObjectSize(I);
2699 if (AssignOffsets)
2700 Assign(I, -Offset);
2701 }
2702 }
2703
2704 // Ensure that the Callee-save area is aligned to 16bytes.
2705 Offset = alignTo(Offset, Align(16U));
2706
2707 // Create a buffer of SVE objects to allocate and sort it.
2708 SmallVector<int, 8> ObjectsToAllocate;
2709 // If we have a stack protector, and we've previously decided that we have SVE
2710 // objects on the stack and thus need it to go in the SVE stack area, then it
2711 // needs to go first.
2712 int StackProtectorFI = -1;
2713 if (MFI.hasStackProtectorIndex()) {
2714 StackProtectorFI = MFI.getStackProtectorIndex();
2715 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2716 ObjectsToAllocate.push_back(StackProtectorFI);
2717 }
2718 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
2719 unsigned StackID = MFI.getStackID(I);
2720 if (StackID != TargetStackID::ScalableVector)
2721 continue;
2722 if (I == StackProtectorFI)
2723 continue;
2724 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
2725 continue;
2726 if (MFI.isDeadObjectIndex(I))
2727 continue;
2728
2729 ObjectsToAllocate.push_back(I);
2730 }
2731
2732 // Allocate all SVE locals and spills
2733 for (unsigned FI : ObjectsToAllocate) {
2734 Align Alignment = MFI.getObjectAlign(FI);
2735 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2736 // two, we'd need to align every object dynamically at runtime if the
2737 // alignment is larger than 16. This is not yet supported.
2738 if (Alignment > Align(16))
2740 "Alignment of scalable vectors > 16 bytes is not yet supported");
2741
2742 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
2743 if (AssignOffsets)
2744 Assign(FI, -Offset);
2745 }
2746
2747 return Offset;
2748}
2749
2750int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
2751 MachineFrameInfo &MFI) const {
2752 int MinCSFrameIndex, MaxCSFrameIndex;
2753 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
2754}
2755
2756int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
2757 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
2758 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
2759 true);
2760}
2761
2762/// Attempts to scavenge a register from \p ScavengeableRegs given the used
2763/// registers in \p UsedRegs.
2766 Register PreferredReg) {
2767 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
2768 return PreferredReg;
2769 for (auto Reg : ScavengeableRegs.set_bits()) {
2770 if (UsedRegs.available(Reg))
2771 return Reg;
2772 }
2773 return AArch64::NoRegister;
2774}
2775
2776/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
2777/// \p MachineInstrs.
2778static void propagateFrameFlags(MachineInstr &SourceMI,
2779 ArrayRef<MachineInstr *> MachineInstrs) {
2780 for (MachineInstr *MI : MachineInstrs) {
2781 if (SourceMI.getFlag(MachineInstr::FrameSetup))
2782 MI->setFlag(MachineInstr::FrameSetup);
2783 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
2785 }
2786}
2787
2788/// RAII helper class for scavenging or spilling a register. On construction
2789/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
2790/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
2791/// MaybeSpillFI to free a register. The free'd register is returned via the \p
2792/// FreeReg output parameter. On destruction, if there is a spill, its previous
2793/// value is reloaded. The spilling and scavenging is only valid at the
2794/// insertion point \p MBBI, this class should _not_ be used in places that
2795/// create or manipulate basic blocks, moving the expected insertion point.
2799
2802 Register SpillCandidate, const TargetRegisterClass &RC,
2803 LiveRegUnits const &UsedRegs,
2804 BitVector const &AllocatableRegs,
2805 std::optional<int> *MaybeSpillFI,
2806 Register PreferredReg = AArch64::NoRegister)
2807 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
2808 *MF.getSubtarget().getInstrInfo())),
2809 TRI(*MF.getSubtarget().getRegisterInfo()) {
2810 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
2811 if (FreeReg != AArch64::NoRegister)
2812 return;
2813 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
2814 "(attempted to spill in prologue/epilogue?)");
2815 if (!MaybeSpillFI->has_value()) {
2816 MachineFrameInfo &MFI = MF.getFrameInfo();
2817 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
2818 TRI.getSpillAlign(RC));
2819 }
2820 FreeReg = SpillCandidate;
2821 SpillFI = MaybeSpillFI->value();
2822 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
2823 Register());
2824 }
2825
2826 bool hasSpilled() const { return SpillFI.has_value(); }
2827
2828 /// Returns the free register (found from scavenging or spilling a register).
2829 Register freeRegister() const { return FreeReg; }
2830
2831 Register operator*() const { return freeRegister(); }
2832
2834 if (hasSpilled())
2835 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
2836 Register());
2837 }
2838
2839private:
2842 const TargetRegisterClass &RC;
2843 const AArch64InstrInfo &TII;
2844 const TargetRegisterInfo &TRI;
2845 Register FreeReg = AArch64::NoRegister;
2846 std::optional<int> SpillFI;
2847};
2848
2849/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
2850/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
2852 std::optional<int> ZPRSpillFI;
2853 std::optional<int> PPRSpillFI;
2854 std::optional<int> GPRSpillFI;
2855};
2856
2857/// Registers available for scavenging (ZPR, PPR3b, GPR).
2863
2865 return MI.getFlag(MachineInstr::FrameSetup) ||
2867}
2868
2869/// Expands:
2870/// ```
2871/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
2872/// ```
2873/// To:
2874/// ```
2875/// $z0 = CPY_ZPzI_B $p0, 1, 0
2876/// STR_ZXI $z0, $stack.0, 0
2877/// ```
2878/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
2879/// spilling if necessary).
2882 const TargetRegisterInfo &TRI,
2883 LiveRegUnits const &UsedRegs,
2884 ScavengeableRegs const &SR,
2885 EmergencyStackSlots &SpillSlots) {
2886 MachineFunction &MF = *MBB.getParent();
2887 auto *TII =
2888 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
2889
2890 ScopedScavengeOrSpill ZPredReg(
2891 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
2892 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
2893
2894 SmallVector<MachineInstr *, 2> MachineInstrs;
2895 const DebugLoc &DL = MI.getDebugLoc();
2896 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
2897 .addReg(*ZPredReg, RegState::Define)
2898 .add(MI.getOperand(0))
2899 .addImm(1)
2900 .addImm(0)
2901 .getInstr());
2902 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
2903 .addReg(*ZPredReg)
2904 .add(MI.getOperand(1))
2905 .addImm(MI.getOperand(2).getImm())
2906 .setMemRefs(MI.memoperands())
2907 .getInstr());
2908 propagateFrameFlags(MI, MachineInstrs);
2909}
2910
2911/// Expands:
2912/// ```
2913/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
2914/// ```
2915/// To:
2916/// ```
2917/// $z0 = LDR_ZXI %stack.0, 0
2918/// $p0 = PTRUE_B 31, implicit $vg
2919/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
2920/// ```
2921/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
2922/// spilling if necessary). If the status flags are in use at the point of
2923/// expansion they are preserved (by moving them to/from a GPR). This may cause
2924/// an additional spill if no GPR is free at the expansion point.
2927 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
2928 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
2929 MachineFunction &MF = *MBB.getParent();
2930 auto *TII =
2931 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
2932
2933 ScopedScavengeOrSpill ZPredReg(
2934 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
2935 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
2936
2937 ScopedScavengeOrSpill PredReg(
2938 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
2939 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
2940 /*PreferredReg=*/
2941 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
2942
2943 // Elide NZCV spills if we know it is not used.
2944 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
2945 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
2946 if (IsNZCVUsed)
2947 NZCVSaveReg.emplace(
2948 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
2949 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
2950 SmallVector<MachineInstr *, 4> MachineInstrs;
2951 const DebugLoc &DL = MI.getDebugLoc();
2952 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
2953 .addReg(*ZPredReg, RegState::Define)
2954 .add(MI.getOperand(1))
2955 .addImm(MI.getOperand(2).getImm())
2956 .setMemRefs(MI.memoperands())
2957 .getInstr());
2958 if (IsNZCVUsed)
2959 MachineInstrs.push_back(
2960 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
2961 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
2962 .addImm(AArch64SysReg::NZCV)
2963 .addReg(AArch64::NZCV, RegState::Implicit)
2964 .getInstr());
2965
2966 // Reuse previous ptrue if we know it has not been clobbered.
2967 if (LastPTrue) {
2968 assert(*PredReg == LastPTrue->getOperand(0).getReg());
2969 LastPTrue->moveBefore(&MI);
2970 } else {
2971 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
2972 .addReg(*PredReg, RegState::Define)
2973 .addImm(31);
2974 }
2975 MachineInstrs.push_back(LastPTrue);
2976 MachineInstrs.push_back(
2977 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
2978 .addReg(MI.getOperand(0).getReg(), RegState::Define)
2979 .addReg(*PredReg)
2980 .addReg(*ZPredReg)
2981 .addImm(0)
2982 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
2983 .getInstr());
2984 if (IsNZCVUsed)
2985 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
2986 .addImm(AArch64SysReg::NZCV)
2987 .addReg(NZCVSaveReg->freeRegister())
2988 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
2989 .getInstr());
2990
2991 propagateFrameFlags(MI, MachineInstrs);
2992 return PredReg.hasSpilled();
2993}
2994
2995/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
2996/// operations within the MachineBasicBlock \p MBB.
2998 const TargetRegisterInfo &TRI,
2999 ScavengeableRegs const &SR,
3000 EmergencyStackSlots &SpillSlots) {
3001 LiveRegUnits UsedRegs(TRI);
3002 UsedRegs.addLiveOuts(MBB);
3003 bool HasPPRSpills = false;
3004 MachineInstr *LastPTrue = nullptr;
3006 UsedRegs.stepBackward(MI);
3007 switch (MI.getOpcode()) {
3008 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
3009 if (LastPTrue &&
3010 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
3011 LastPTrue = nullptr;
3012 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
3013 LastPTrue, SpillSlots);
3014 MI.eraseFromParent();
3015 break;
3016 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
3017 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
3018 MI.eraseFromParent();
3019 [[fallthrough]];
3020 default:
3021 LastPTrue = nullptr;
3022 break;
3023 }
3024 }
3025
3026 return HasPPRSpills;
3027}
3028
3030 MachineFunction &MF, RegScavenger *RS) const {
3031
3033 const TargetSubtargetInfo &TSI = MF.getSubtarget();
3034 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
3035
3036 // If predicates spills are 16-bytes we may need to expand
3037 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3038 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
3039 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
3040 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
3041 assert(Regs.count() > 0 && "Expected scavengeable registers");
3042 return Regs;
3043 };
3044
3045 ScavengeableRegs SR{};
3046 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
3047 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
3048 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
3049 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
3050
3051 EmergencyStackSlots SpillSlots;
3052 for (MachineBasicBlock &MBB : MF) {
3053 // In the case we had to spill a predicate (in the range p0-p7) to reload
3054 // a predicate (>= p8), additional spill/fill pseudos will be created.
3055 // These need an additional expansion pass. Note: There will only be at
3056 // most two expansion passes, as spilling/filling a predicate in the range
3057 // p0-p7 never requires spilling another predicate.
3058 for (int Pass = 0; Pass < 2; Pass++) {
3059 bool HasPPRSpills =
3060 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
3061 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
3062 if (!HasPPRSpills)
3063 break;
3064 }
3065 }
3066 }
3067
3068 MachineFrameInfo &MFI = MF.getFrameInfo();
3069
3071 "Upwards growing stack unsupported");
3072
3073 int MinCSFrameIndex, MaxCSFrameIndex;
3074 int64_t SVEStackSize =
3075 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3076
3077 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3078 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3079
3080 // If this function isn't doing Win64-style C++ EH, we don't need to do
3081 // anything.
3082 if (!MF.hasEHFunclets())
3083 return;
3084
3085 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3086 // object area right next to the UnwindHelp object.
3087 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3088 int64_t CurrentOffset =
3090 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3091 for (WinEHHandlerType &H : TBME.HandlerArray) {
3092 int FrameIndex = H.CatchObj.FrameIndex;
3093 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3094 CurrentOffset =
3095 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3096 CurrentOffset += MFI.getObjectSize(FrameIndex);
3097 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3098 }
3099 }
3100 }
3101
3102 // Create an UnwindHelp object.
3103 // The UnwindHelp object is allocated at the start of the fixed object area
3104 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3105 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3106 /*IsFunclet*/ false) &&
3107 "UnwindHelpOffset must be at the start of the fixed object area");
3108 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3109 /*IsImmutable=*/false);
3110 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3111
3112 MachineBasicBlock &MBB = MF.front();
3113 auto MBBI = MBB.begin();
3114 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3115 ++MBBI;
3116
3117 // We need to store -2 into the UnwindHelp object at the start of the
3118 // function.
3119 DebugLoc DL;
3121 RS->backward(MBBI);
3122 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3123 assert(DstReg && "There must be a free register after frame setup");
3125 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3126 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3127 .addReg(DstReg, getKillRegState(true))
3128 .addFrameIndex(UnwindHelpFI)
3129 .addImm(0);
3130}
3131
3132namespace {
3133struct TagStoreInstr {
3135 int64_t Offset, Size;
3136 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3137 : MI(MI), Offset(Offset), Size(Size) {}
3138};
3139
3140class TagStoreEdit {
3141 MachineFunction *MF;
3142 MachineBasicBlock *MBB;
3143 MachineRegisterInfo *MRI;
3144 // Tag store instructions that are being replaced.
3146 // Combined memref arguments of the above instructions.
3148
3149 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3150 // FrameRegOffset + Size) with the address tag of SP.
3151 Register FrameReg;
3152 StackOffset FrameRegOffset;
3153 int64_t Size;
3154 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3155 // end.
3156 std::optional<int64_t> FrameRegUpdate;
3157 // MIFlags for any FrameReg updating instructions.
3158 unsigned FrameRegUpdateFlags;
3159
3160 // Use zeroing instruction variants.
3161 bool ZeroData;
3162 DebugLoc DL;
3163
3164 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3165 void emitLoop(MachineBasicBlock::iterator InsertI);
3166
3167public:
3168 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3169 : MBB(MBB), ZeroData(ZeroData) {
3170 MF = MBB->getParent();
3171 MRI = &MF->getRegInfo();
3172 }
3173 // Add an instruction to be replaced. Instructions must be added in the
3174 // ascending order of Offset, and have to be adjacent.
3175 void addInstruction(TagStoreInstr I) {
3176 assert((TagStores.empty() ||
3177 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3178 "Non-adjacent tag store instructions.");
3179 TagStores.push_back(I);
3180 }
3181 void clear() { TagStores.clear(); }
3182 // Emit equivalent code at the given location, and erase the current set of
3183 // instructions. May skip if the replacement is not profitable. May invalidate
3184 // the input iterator and replace it with a valid one.
3185 void emitCode(MachineBasicBlock::iterator &InsertI,
3186 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3187};
3188
3189void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3190 const AArch64InstrInfo *TII =
3191 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3192
3193 const int64_t kMinOffset = -256 * 16;
3194 const int64_t kMaxOffset = 255 * 16;
3195
3196 Register BaseReg = FrameReg;
3197 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3198 if (BaseRegOffsetBytes < kMinOffset ||
3199 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3200 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3201 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3202 // is required for the offset of ST2G.
3203 BaseRegOffsetBytes % 16 != 0) {
3204 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3205 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3206 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3207 BaseReg = ScratchReg;
3208 BaseRegOffsetBytes = 0;
3209 }
3210
3211 MachineInstr *LastI = nullptr;
3212 while (Size) {
3213 int64_t InstrSize = (Size > 16) ? 32 : 16;
3214 unsigned Opcode =
3215 InstrSize == 16
3216 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3217 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3218 assert(BaseRegOffsetBytes % 16 == 0);
3219 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3220 .addReg(AArch64::SP)
3221 .addReg(BaseReg)
3222 .addImm(BaseRegOffsetBytes / 16)
3223 .setMemRefs(CombinedMemRefs);
3224 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3225 // final SP adjustment in the epilogue.
3226 if (BaseRegOffsetBytes == 0)
3227 LastI = I;
3228 BaseRegOffsetBytes += InstrSize;
3229 Size -= InstrSize;
3230 }
3231
3232 if (LastI)
3233 MBB->splice(InsertI, MBB, LastI);
3234}
3235
3236void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3237 const AArch64InstrInfo *TII =
3238 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3239
3240 Register BaseReg = FrameRegUpdate
3241 ? FrameReg
3242 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3243 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3244
3245 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3246
3247 int64_t LoopSize = Size;
3248 // If the loop size is not a multiple of 32, split off one 16-byte store at
3249 // the end to fold BaseReg update into.
3250 if (FrameRegUpdate && *FrameRegUpdate)
3251 LoopSize -= LoopSize % 32;
3252 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3253 TII->get(ZeroData ? AArch64::STZGloop_wback
3254 : AArch64::STGloop_wback))
3255 .addDef(SizeReg)
3256 .addDef(BaseReg)
3257 .addImm(LoopSize)
3258 .addReg(BaseReg)
3259 .setMemRefs(CombinedMemRefs);
3260 if (FrameRegUpdate)
3261 LoopI->setFlags(FrameRegUpdateFlags);
3262
3263 int64_t ExtraBaseRegUpdate =
3264 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3265 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3266 << ", Size=" << Size
3267 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3268 << ", FrameRegUpdate=" << FrameRegUpdate
3269 << ", FrameRegOffset.getFixed()="
3270 << FrameRegOffset.getFixed() << "\n");
3271 if (LoopSize < Size) {
3272 assert(FrameRegUpdate);
3273 assert(Size - LoopSize == 16);
3274 // Tag 16 more bytes at BaseReg and update BaseReg.
3275 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3276 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3277 "STG immediate out of range");
3278 BuildMI(*MBB, InsertI, DL,
3279 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3280 .addDef(BaseReg)
3281 .addReg(BaseReg)
3282 .addReg(BaseReg)
3283 .addImm(STGOffset / 16)
3284 .setMemRefs(CombinedMemRefs)
3285 .setMIFlags(FrameRegUpdateFlags);
3286 } else if (ExtraBaseRegUpdate) {
3287 // Update BaseReg.
3288 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3289 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3290 BuildMI(
3291 *MBB, InsertI, DL,
3292 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3293 .addDef(BaseReg)
3294 .addReg(BaseReg)
3295 .addImm(AddSubOffset)
3296 .addImm(0)
3297 .setMIFlags(FrameRegUpdateFlags);
3298 }
3299}
3300
3301// Check if *II is a register update that can be merged into STGloop that ends
3302// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3303// end of the loop.
3304bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3305 int64_t Size, int64_t *TotalOffset) {
3306 MachineInstr &MI = *II;
3307 if ((MI.getOpcode() == AArch64::ADDXri ||
3308 MI.getOpcode() == AArch64::SUBXri) &&
3309 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3310 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3311 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3312 if (MI.getOpcode() == AArch64::SUBXri)
3313 Offset = -Offset;
3314 int64_t PostOffset = Offset - Size;
3315 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3316 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3317 // chosen depends on the alignment of the loop size, but the difference
3318 // between the valid ranges for the two instructions is small, so we
3319 // conservatively assume that it could be either case here.
3320 //
3321 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3322 // instruction.
3323 const int64_t kMaxOffset = 4080 - 16;
3324 // Max offset of SUBXri.
3325 const int64_t kMinOffset = -4095;
3326 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3327 PostOffset % 16 == 0) {
3328 *TotalOffset = Offset;
3329 return true;
3330 }
3331 }
3332 return false;
3333}
3334
3335void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3337 MemRefs.clear();
3338 for (auto &TS : TSE) {
3339 MachineInstr *MI = TS.MI;
3340 // An instruction without memory operands may access anything. Be
3341 // conservative and return an empty list.
3342 if (MI->memoperands_empty()) {
3343 MemRefs.clear();
3344 return;
3345 }
3346 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3347 }
3348}
3349
3350void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3351 const AArch64FrameLowering *TFI,
3352 bool TryMergeSPUpdate) {
3353 if (TagStores.empty())
3354 return;
3355 TagStoreInstr &FirstTagStore = TagStores[0];
3356 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3357 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3358 DL = TagStores[0].MI->getDebugLoc();
3359
3360 Register Reg;
3361 FrameRegOffset = TFI->resolveFrameOffsetReference(
3362 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3363 /*PreferFP=*/false, /*ForSimm=*/true);
3364 FrameReg = Reg;
3365 FrameRegUpdate = std::nullopt;
3366
3367 mergeMemRefs(TagStores, CombinedMemRefs);
3368
3369 LLVM_DEBUG({
3370 dbgs() << "Replacing adjacent STG instructions:\n";
3371 for (const auto &Instr : TagStores) {
3372 dbgs() << " " << *Instr.MI;
3373 }
3374 });
3375
3376 // Size threshold where a loop becomes shorter than a linear sequence of
3377 // tagging instructions.
3378 const int kSetTagLoopThreshold = 176;
3379 if (Size < kSetTagLoopThreshold) {
3380 if (TagStores.size() < 2)
3381 return;
3382 emitUnrolled(InsertI);
3383 } else {
3384 MachineInstr *UpdateInstr = nullptr;
3385 int64_t TotalOffset = 0;
3386 if (TryMergeSPUpdate) {
3387 // See if we can merge base register update into the STGloop.
3388 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3389 // but STGloop is way too unusual for that, and also it only
3390 // realistically happens in function epilogue. Also, STGloop is expanded
3391 // before that pass.
3392 if (InsertI != MBB->end() &&
3393 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3394 &TotalOffset)) {
3395 UpdateInstr = &*InsertI++;
3396 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3397 << *UpdateInstr);
3398 }
3399 }
3400
3401 if (!UpdateInstr && TagStores.size() < 2)
3402 return;
3403
3404 if (UpdateInstr) {
3405 FrameRegUpdate = TotalOffset;
3406 FrameRegUpdateFlags = UpdateInstr->getFlags();
3407 }
3408 emitLoop(InsertI);
3409 if (UpdateInstr)
3410 UpdateInstr->eraseFromParent();
3411 }
3412
3413 for (auto &TS : TagStores)
3414 TS.MI->eraseFromParent();
3415}
3416
3417bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3418 int64_t &Size, bool &ZeroData) {
3419 MachineFunction &MF = *MI.getParent()->getParent();
3420 const MachineFrameInfo &MFI = MF.getFrameInfo();
3421
3422 unsigned Opcode = MI.getOpcode();
3423 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3424 Opcode == AArch64::STZ2Gi);
3425
3426 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3427 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3428 return false;
3429 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3430 return false;
3431 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3432 Size = MI.getOperand(2).getImm();
3433 return true;
3434 }
3435
3436 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3437 Size = 16;
3438 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3439 Size = 32;
3440 else
3441 return false;
3442
3443 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3444 return false;
3445
3446 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3447 16 * MI.getOperand(2).getImm();
3448 return true;
3449}
3450
3451// Detect a run of memory tagging instructions for adjacent stack frame slots,
3452// and replace them with a shorter instruction sequence:
3453// * replace STG + STG with ST2G
3454// * replace STGloop + STGloop with STGloop
3455// This code needs to run when stack slot offsets are already known, but before
3456// FrameIndex operands in STG instructions are eliminated.
3458 const AArch64FrameLowering *TFI,
3459 RegScavenger *RS) {
3460 bool FirstZeroData;
3461 int64_t Size, Offset;
3462 MachineInstr &MI = *II;
3463 MachineBasicBlock *MBB = MI.getParent();
3465 if (&MI == &MBB->instr_back())
3466 return II;
3467 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3468 return II;
3469
3471 Instrs.emplace_back(&MI, Offset, Size);
3472
3473 constexpr int kScanLimit = 10;
3474 int Count = 0;
3476 NextI != E && Count < kScanLimit; ++NextI) {
3477 MachineInstr &MI = *NextI;
3478 bool ZeroData;
3479 int64_t Size, Offset;
3480 // Collect instructions that update memory tags with a FrameIndex operand
3481 // and (when applicable) constant size, and whose output registers are dead
3482 // (the latter is almost always the case in practice). Since these
3483 // instructions effectively have no inputs or outputs, we are free to skip
3484 // any non-aliasing instructions in between without tracking used registers.
3485 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3486 if (ZeroData != FirstZeroData)
3487 break;
3488 Instrs.emplace_back(&MI, Offset, Size);
3489 continue;
3490 }
3491
3492 // Only count non-transient, non-tagging instructions toward the scan
3493 // limit.
3494 if (!MI.isTransient())
3495 ++Count;
3496
3497 // Just in case, stop before the epilogue code starts.
3498 if (MI.getFlag(MachineInstr::FrameSetup) ||
3500 break;
3501
3502 // Reject anything that may alias the collected instructions.
3503 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3504 break;
3505 }
3506
3507 // New code will be inserted after the last tagging instruction we've found.
3508 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3509
3510 // All the gathered stack tag instructions are merged and placed after
3511 // last tag store in the list. The check should be made if the nzcv
3512 // flag is live at the point where we are trying to insert. Otherwise
3513 // the nzcv flag might get clobbered if any stg loops are present.
3514
3515 // FIXME : This approach of bailing out from merge is conservative in
3516 // some ways like even if stg loops are not present after merge the
3517 // insert list, this liveness check is done (which is not needed).
3519 LiveRegs.addLiveOuts(*MBB);
3520 for (auto I = MBB->rbegin();; ++I) {
3521 MachineInstr &MI = *I;
3522 if (MI == InsertI)
3523 break;
3524 LiveRegs.stepBackward(*I);
3525 }
3526 InsertI++;
3527 if (LiveRegs.contains(AArch64::NZCV))
3528 return InsertI;
3529
3530 llvm::stable_sort(Instrs,
3531 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3532 return Left.Offset < Right.Offset;
3533 });
3534
3535 // Make sure that we don't have any overlapping stores.
3536 int64_t CurOffset = Instrs[0].Offset;
3537 for (auto &Instr : Instrs) {
3538 if (CurOffset > Instr.Offset)
3539 return NextI;
3540 CurOffset = Instr.Offset + Instr.Size;
3541 }
3542
3543 // Find contiguous runs of tagged memory and emit shorter instruction
3544 // sequences for them when possible.
3545 TagStoreEdit TSE(MBB, FirstZeroData);
3546 std::optional<int64_t> EndOffset;
3547 for (auto &Instr : Instrs) {
3548 if (EndOffset && *EndOffset != Instr.Offset) {
3549 // Found a gap.
3550 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3551 TSE.clear();
3552 }
3553
3554 TSE.addInstruction(Instr);
3555 EndOffset = Instr.Offset + Instr.Size;
3556 }
3557
3558 const MachineFunction *MF = MBB->getParent();
3559 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3560 TSE.emitCode(
3561 InsertI, TFI, /*TryMergeSPUpdate = */
3563
3564 return InsertI;
3565}
3566} // namespace
3567
3569 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3570 for (auto &BB : MF)
3571 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3573 II = tryMergeAdjacentSTG(II, this, RS);
3574 }
3575
3576 // By the time this method is called, most of the prologue/epilogue code is
3577 // already emitted, whether its location was affected by the shrink-wrapping
3578 // optimization or not.
3579 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3580 shouldSignReturnAddressEverywhere(MF))
3582}
3583
3584/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3585/// before the update. This is easily retrieved as it is exactly the offset
3586/// that is set in processFunctionBeforeFrameFinalized.
3588 const MachineFunction &MF, int FI, Register &FrameReg,
3589 bool IgnoreSPUpdates) const {
3590 const MachineFrameInfo &MFI = MF.getFrameInfo();
3591 if (IgnoreSPUpdates) {
3592 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3593 << MFI.getObjectOffset(FI) << "\n");
3594 FrameReg = AArch64::SP;
3595 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3596 }
3597
3598 // Go to common code if we cannot provide sp + offset.
3599 if (MFI.hasVarSizedObjects() ||
3602 return getFrameIndexReference(MF, FI, FrameReg);
3603
3604 FrameReg = AArch64::SP;
3605 return getStackOffset(MF, MFI.getObjectOffset(FI));
3606}
3607
3608/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3609/// the parent's frame pointer
3611 const MachineFunction &MF) const {
3612 return 0;
3613}
3614
3615/// Funclets only need to account for space for the callee saved registers,
3616/// as the locals are accounted for in the parent's stack frame.
3618 const MachineFunction &MF) const {
3619 // This is the size of the pushed CSRs.
3620 unsigned CSSize =
3621 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3622 // This is the amount of stack a funclet needs to allocate.
3623 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3624 getStackAlign());
3625}
3626
3627namespace {
3628struct FrameObject {
3629 bool IsValid = false;
3630 // Index of the object in MFI.
3631 int ObjectIndex = 0;
3632 // Group ID this object belongs to.
3633 int GroupIndex = -1;
3634 // This object should be placed first (closest to SP).
3635 bool ObjectFirst = false;
3636 // This object's group (which always contains the object with
3637 // ObjectFirst==true) should be placed first.
3638 bool GroupFirst = false;
3639
3640 // Used to distinguish between FP and GPR accesses. The values are decided so
3641 // that they sort FPR < Hazard < GPR and they can be or'd together.
3642 unsigned Accesses = 0;
3643 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3644};
3645
3646class GroupBuilder {
3647 SmallVector<int, 8> CurrentMembers;
3648 int NextGroupIndex = 0;
3649 std::vector<FrameObject> &Objects;
3650
3651public:
3652 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3653 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3654 void EndCurrentGroup() {
3655 if (CurrentMembers.size() > 1) {
3656 // Create a new group with the current member list. This might remove them
3657 // from their pre-existing groups. That's OK, dealing with overlapping
3658 // groups is too hard and unlikely to make a difference.
3659 LLVM_DEBUG(dbgs() << "group:");
3660 for (int Index : CurrentMembers) {
3661 Objects[Index].GroupIndex = NextGroupIndex;
3662 LLVM_DEBUG(dbgs() << " " << Index);
3663 }
3664 LLVM_DEBUG(dbgs() << "\n");
3665 NextGroupIndex++;
3666 }
3667 CurrentMembers.clear();
3668 }
3669};
3670
3671bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3672 // Objects at a lower index are closer to FP; objects at a higher index are
3673 // closer to SP.
3674 //
3675 // For consistency in our comparison, all invalid objects are placed
3676 // at the end. This also allows us to stop walking when we hit the
3677 // first invalid item after it's all sorted.
3678 //
3679 // If we want to include a stack hazard region, order FPR accesses < the
3680 // hazard object < GPRs accesses in order to create a separation between the
3681 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3682 //
3683 // Otherwise the "first" object goes first (closest to SP), followed by the
3684 // members of the "first" group.
3685 //
3686 // The rest are sorted by the group index to keep the groups together.
3687 // Higher numbered groups are more likely to be around longer (i.e. untagged
3688 // in the function epilogue and not at some earlier point). Place them closer
3689 // to SP.
3690 //
3691 // If all else equal, sort by the object index to keep the objects in the
3692 // original order.
3693 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3694 A.GroupIndex, A.ObjectIndex) <
3695 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3696 B.GroupIndex, B.ObjectIndex);
3697}
3698} // namespace
3699
3701 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3702 if (!OrderFrameObjects || ObjectsToAllocate.empty())
3703 return;
3704
3706 const MachineFrameInfo &MFI = MF.getFrameInfo();
3707 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3708 for (auto &Obj : ObjectsToAllocate) {
3709 FrameObjects[Obj].IsValid = true;
3710 FrameObjects[Obj].ObjectIndex = Obj;
3711 }
3712
3713 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3714 // the same time.
3715 GroupBuilder GB(FrameObjects);
3716 for (auto &MBB : MF) {
3717 for (auto &MI : MBB) {
3718 if (MI.isDebugInstr())
3719 continue;
3720
3721 if (AFI.hasStackHazardSlotIndex()) {
3722 std::optional<int> FI = getLdStFrameID(MI, MFI);
3723 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3724 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3726 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3727 else
3728 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3729 }
3730 }
3731
3732 int OpIndex;
3733 switch (MI.getOpcode()) {
3734 case AArch64::STGloop:
3735 case AArch64::STZGloop:
3736 OpIndex = 3;
3737 break;
3738 case AArch64::STGi:
3739 case AArch64::STZGi:
3740 case AArch64::ST2Gi:
3741 case AArch64::STZ2Gi:
3742 OpIndex = 1;
3743 break;
3744 default:
3745 OpIndex = -1;
3746 }
3747
3748 int TaggedFI = -1;
3749 if (OpIndex >= 0) {
3750 const MachineOperand &MO = MI.getOperand(OpIndex);
3751 if (MO.isFI()) {
3752 int FI = MO.getIndex();
3753 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3754 FrameObjects[FI].IsValid)
3755 TaggedFI = FI;
3756 }
3757 }
3758
3759 // If this is a stack tagging instruction for a slot that is not part of a
3760 // group yet, either start a new group or add it to the current one.
3761 if (TaggedFI >= 0)
3762 GB.AddMember(TaggedFI);
3763 else
3764 GB.EndCurrentGroup();
3765 }
3766 // Groups should never span multiple basic blocks.
3767 GB.EndCurrentGroup();
3768 }
3769
3770 if (AFI.hasStackHazardSlotIndex()) {
3771 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3772 FrameObject::AccessHazard;
3773 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3774 for (auto &Obj : FrameObjects)
3775 if (!Obj.Accesses ||
3776 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3777 Obj.Accesses = FrameObject::AccessGPR;
3778 }
3779
3780 // If the function's tagged base pointer is pinned to a stack slot, we want to
3781 // put that slot first when possible. This will likely place it at SP + 0,
3782 // and save one instruction when generating the base pointer because IRG does
3783 // not allow an immediate offset.
3784 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3785 if (TBPI) {
3786 FrameObjects[*TBPI].ObjectFirst = true;
3787 FrameObjects[*TBPI].GroupFirst = true;
3788 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3789 if (FirstGroupIndex >= 0)
3790 for (FrameObject &Object : FrameObjects)
3791 if (Object.GroupIndex == FirstGroupIndex)
3792 Object.GroupFirst = true;
3793 }
3794
3795 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3796
3797 int i = 0;
3798 for (auto &Obj : FrameObjects) {
3799 // All invalid items are sorted at the end, so it's safe to stop.
3800 if (!Obj.IsValid)
3801 break;
3802 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3803 }
3804
3805 LLVM_DEBUG({
3806 dbgs() << "Final frame order:\n";
3807 for (auto &Obj : FrameObjects) {
3808 if (!Obj.IsValid)
3809 break;
3810 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3811 if (Obj.ObjectFirst)
3812 dbgs() << ", first";
3813 if (Obj.GroupFirst)
3814 dbgs() << ", group-first";
3815 dbgs() << "\n";
3816 }
3817 });
3818}
3819
3820/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3821/// least every ProbeSize bytes. Returns an iterator of the first instruction
3822/// after the loop. The difference between SP and TargetReg must be an exact
3823/// multiple of ProbeSize.
3825AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3826 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3827 Register TargetReg) const {
3828 MachineBasicBlock &MBB = *MBBI->getParent();
3829 MachineFunction &MF = *MBB.getParent();
3830 const AArch64InstrInfo *TII =
3831 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3832 DebugLoc DL = MBB.findDebugLoc(MBBI);
3833
3834 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3835 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3836 MF.insert(MBBInsertPoint, LoopMBB);
3837 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3838 MF.insert(MBBInsertPoint, ExitMBB);
3839
3840 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3841 // in SUB).
3842 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3843 StackOffset::getFixed(-ProbeSize), TII,
3845 // STR XZR, [SP]
3846 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3847 .addReg(AArch64::XZR)
3848 .addReg(AArch64::SP)
3849 .addImm(0)
3851 // CMP SP, TargetReg
3852 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3853 AArch64::XZR)
3854 .addReg(AArch64::SP)
3855 .addReg(TargetReg)
3858 // B.CC Loop
3859 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3861 .addMBB(LoopMBB)
3863
3864 LoopMBB->addSuccessor(ExitMBB);
3865 LoopMBB->addSuccessor(LoopMBB);
3866 // Synthesize the exit MBB.
3867 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3869 MBB.addSuccessor(LoopMBB);
3870 // Update liveins.
3871 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3872
3873 return ExitMBB->begin();
3874}
3875
3876void AArch64FrameLowering::inlineStackProbeFixed(
3877 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3878 StackOffset CFAOffset) const {
3879 MachineBasicBlock *MBB = MBBI->getParent();
3880 MachineFunction &MF = *MBB->getParent();
3881 const AArch64InstrInfo *TII =
3882 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3883 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3884 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3885 bool HasFP = hasFP(MF);
3886
3887 DebugLoc DL;
3888 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3889 int64_t NumBlocks = FrameSize / ProbeSize;
3890 int64_t ResidualSize = FrameSize % ProbeSize;
3891
3892 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3893 << NumBlocks << " blocks of " << ProbeSize
3894 << " bytes, plus " << ResidualSize << " bytes\n");
3895
3896 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3897 // ordinary loop.
3898 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3899 for (int i = 0; i < NumBlocks; ++i) {
3900 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3901 // encodable in a SUB).
3902 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3903 StackOffset::getFixed(-ProbeSize), TII,
3904 MachineInstr::FrameSetup, false, false, nullptr,
3905 EmitAsyncCFI && !HasFP, CFAOffset);
3906 CFAOffset += StackOffset::getFixed(ProbeSize);
3907 // STR XZR, [SP]
3908 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3909 .addReg(AArch64::XZR)
3910 .addReg(AArch64::SP)
3911 .addImm(0)
3913 }
3914 } else if (NumBlocks != 0) {
3915 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3916 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3917 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3918 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3919 MachineInstr::FrameSetup, false, false, nullptr,
3920 EmitAsyncCFI && !HasFP, CFAOffset);
3921 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3922 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3923 MBB = MBBI->getParent();
3924 if (EmitAsyncCFI && !HasFP) {
3925 // Set the CFA register back to SP.
3926 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3927 .buildDefCFARegister(AArch64::SP);
3928 }
3929 }
3930
3931 if (ResidualSize != 0) {
3932 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3933 // in SUB).
3934 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3935 StackOffset::getFixed(-ResidualSize), TII,
3936 MachineInstr::FrameSetup, false, false, nullptr,
3937 EmitAsyncCFI && !HasFP, CFAOffset);
3938 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3939 // STR XZR, [SP]
3940 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3941 .addReg(AArch64::XZR)
3942 .addReg(AArch64::SP)
3943 .addImm(0)
3945 }
3946 }
3947}
3948
3949void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3950 MachineBasicBlock &MBB) const {
3951 // Get the instructions that need to be replaced. We emit at most two of
3952 // these. Remember them in order to avoid complications coming from the need
3953 // to traverse the block while potentially creating more blocks.
3954 SmallVector<MachineInstr *, 4> ToReplace;
3955 for (MachineInstr &MI : MBB)
3956 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3957 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3958 ToReplace.push_back(&MI);
3959
3960 for (MachineInstr *MI : ToReplace) {
3961 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3962 Register ScratchReg = MI->getOperand(0).getReg();
3963 int64_t FrameSize = MI->getOperand(1).getImm();
3964 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3965 MI->getOperand(3).getImm());
3966 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3967 CFAOffset);
3968 } else {
3969 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3970 "Stack probe pseudo-instruction expected");
3971 const AArch64InstrInfo *TII =
3972 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3973 Register TargetReg = MI->getOperand(0).getReg();
3974 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3975 }
3976 MI->eraseFromParent();
3977 }
3978}
3979
3982 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3983 GPR = 1 << 0, // A general purpose register.
3984 PPR = 1 << 1, // A predicate register.
3985 FPR = 1 << 2, // A floating point/Neon/SVE register.
3986 };
3987
3988 int Idx;
3990 int64_t Size;
3991 unsigned AccessTypes;
3992
3994
3995 bool operator<(const StackAccess &Rhs) const {
3996 return std::make_tuple(start(), Idx) <
3997 std::make_tuple(Rhs.start(), Rhs.Idx);
3998 }
3999
4000 bool isCPU() const {
4001 // Predicate register load and store instructions execute on the CPU.
4003 }
4004 bool isSME() const { return AccessTypes & AccessType::FPR; }
4005 bool isMixed() const { return isCPU() && isSME(); }
4006
4007 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
4008 int64_t end() const { return start() + Size; }
4009
4010 std::string getTypeString() const {
4011 switch (AccessTypes) {
4012 case AccessType::FPR:
4013 return "FPR";
4014 case AccessType::PPR:
4015 return "PPR";
4016 case AccessType::GPR:
4017 return "GPR";
4019 return "NA";
4020 default:
4021 return "Mixed";
4022 }
4023 }
4024
4025 void print(raw_ostream &OS) const {
4026 OS << getTypeString() << " stack object at [SP"
4027 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
4028 if (Offset.getScalable())
4029 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
4030 << " * vscale";
4031 OS << "]";
4032 }
4033};
4034
4035static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4036 SA.print(OS);
4037 return OS;
4038}
4039
4040void AArch64FrameLowering::emitRemarks(
4041 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4042
4043 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4045 return;
4046
4047 unsigned StackHazardSize = getStackHazardSize(MF);
4048 const uint64_t HazardSize =
4049 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4050
4051 if (HazardSize == 0)
4052 return;
4053
4054 const MachineFrameInfo &MFI = MF.getFrameInfo();
4055 // Bail if function has no stack objects.
4056 if (!MFI.hasStackObjects())
4057 return;
4058
4059 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4060
4061 size_t NumFPLdSt = 0;
4062 size_t NumNonFPLdSt = 0;
4063
4064 // Collect stack accesses via Load/Store instructions.
4065 for (const MachineBasicBlock &MBB : MF) {
4066 for (const MachineInstr &MI : MBB) {
4067 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4068 continue;
4069 for (MachineMemOperand *MMO : MI.memoperands()) {
4070 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4071 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4072 int FrameIdx = *FI;
4073
4074 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4075 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4076 StackAccesses[ArrIdx].Idx = FrameIdx;
4077 StackAccesses[ArrIdx].Offset =
4078 getFrameIndexReferenceFromSP(MF, FrameIdx);
4079 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4080 }
4081
4082 unsigned RegTy = StackAccess::AccessType::GPR;
4083 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
4084 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
4085 // spill/fill the predicate as a data vector (so are an FPR access).
4086 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
4087 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
4088 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
4089 RegTy = StackAccess::PPR;
4090 } else
4091 RegTy = StackAccess::FPR;
4092 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
4093 RegTy = StackAccess::FPR;
4094 }
4095
4096 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4097
4098 if (RegTy == StackAccess::FPR)
4099 ++NumFPLdSt;
4100 else
4101 ++NumNonFPLdSt;
4102 }
4103 }
4104 }
4105 }
4106
4107 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4108 return;
4109
4110 llvm::sort(StackAccesses);
4111 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4113 });
4114
4117
4118 if (StackAccesses.front().isMixed())
4119 MixedObjects.push_back(&StackAccesses.front());
4120
4121 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4122 It != End; ++It) {
4123 const auto &First = *It;
4124 const auto &Second = *(It + 1);
4125
4126 if (Second.isMixed())
4127 MixedObjects.push_back(&Second);
4128
4129 if ((First.isSME() && Second.isCPU()) ||
4130 (First.isCPU() && Second.isSME())) {
4131 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4132 if (Distance < HazardSize)
4133 HazardPairs.emplace_back(&First, &Second);
4134 }
4135 }
4136
4137 auto EmitRemark = [&](llvm::StringRef Str) {
4138 ORE->emit([&]() {
4139 auto R = MachineOptimizationRemarkAnalysis(
4140 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4141 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4142 });
4143 };
4144
4145 for (const auto &P : HazardPairs)
4146 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4147
4148 for (const auto *Obj : MixedObjects)
4149 EmitRemark(
4150 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4151}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (calleesaves + spills).
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:461
BitVector & reset()
Definition BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:162
BitVector & set()
Definition BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:140
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
A set of register units used to track register liveness.
bool available(MCRegister Reg) const
Returns true if no part of physical register Reg is live.
LLVM_ABI void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
LLVM_ABI void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Pass interface - Implemented by all 'passes'.
Definition Pass.h:99
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:168
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:356
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2038
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:649
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:634
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:759
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1712
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1624
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1738
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2100
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:853
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:85
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray