47531 – __builtin_ia32_readeflags_u64() unnecessarily forces a frame pointer

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47531 - __builtin_ia32_readeflags_u64() unnecessarily forces a frame pointer

Summary: __builtin_ia32_readeflags_u64() unnecessarily forces a frame pointer

Status:	CONFIRMED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Nick Desaulniers

URL:
Keywords:

Depends on:
Blocks:	4068
	Show dependency tree

Reported:	2020-09-14 15:03 PDT by Andy Lutomirski
Modified:	2021-08-16 16:16 PDT (History)
CC List:	10 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andy Lutomirski 2020-09-14 15:03:34 PDT

This code:

unsigned long read_eflags(void)
{
   return __builtin_ia32_readeflags_u64();
}

compiles to:

read_eflags:
        pushq   %rbp
        movq    %rsp, %rbp
        pushfq
        popq    %rax
        popq    %rbp
        retq

which has an unnecessary frame pointer.  (I'm somewhat puzzled as to why this happened.  Sure, the redzone slot clobbered by the pushfq can't be used, but I don't see what this has to do with a frame pointer.)

gcc generates:

read_eflags:
        pushfq
        popq    %rax
        ret

Comment 1 Nick Desaulniers 2020-09-18 11:54:45 PDT

Looks like:
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:25858

    case llvm::Intrinsic::x86_flags_read_u32:
    case llvm::Intrinsic::x86_flags_read_u64:
    case llvm::Intrinsic::x86_flags_write_u32:
    case llvm::Intrinsic::x86_flags_write_u64: {
      // We need a frame pointer because this will get lowered to a PUSH/POP
      // sequence.
      MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
      MFI.setHasCopyImplyingStackAdjustment(true);
      // Don't do anything here, we will expand these intrinsics out later
      // during FinalizeISel in EmitInstrWithCustomInserter.
      return Op;
    }

added by 861a0ae3492c8.  Commenting out that MFI operation fixes the test case in question.  Doing so will cause test failures in:

  LLVM :: CodeGen/X86/win64_frame.ll
  LLVM :: CodeGen/X86/x86-64-flags-intrinsics.ll
  LLVM :: CodeGen/X86/x86-flags-intrinsics.ll

I don't exactly follow the comment in the source.  Why does push/pop require a frame pointer?  Also, the tests seem to all use windows target triple; is there some requirement of the Windows ABI there?

Comment 2 Craig Topper 2020-09-18 12:07:16 PDT

Here are some older comments from before some other EFLAGS copying code was removed in 19618fc639978ac15086ab7448e529d4e3fbb49f


-/// This function checks if any of the users of EFLAGS copies the EFLAGS. We
-/// know that the code that lowers COPY of EFLAGS has to use the stack, and if
-/// we don't adjust the stack we clobber the first frame index.
-/// See X86InstrInfo::copyPhysReg.
-static bool hasCopyImplyingStackAdjustment(const MachineFunction &MF) {
-  const MachineRegisterInfo &MRI = MF.getRegInfo();
-  return any_of(MRI.reg_instructions(X86::EFLAGS),
-                [](const MachineInstr &RI) { return RI.isCopy(); });
-}
-
-void X86TargetLowering::finalizeLowering(MachineFunction &MF) const {
-  if (hasCopyImplyingStackAdjustment(MF)) {
-    MachineFrameInfo &MFI = MF.getFrameInfo();
-    MFI.setHasCopyImplyingStackAdjustment(true);
-  }
-
-  TargetLoweringBase::finalizeLowering(MF);
-}
-

Comment 3 Reid Kleckner 2020-09-21 11:48:20 PDT

At the time, Jan 2016, David was working on Windows EH stuff, and I think the main concern was that if you have PUSHF or POPF instructions in a function, the stack unwind info will be incorrect after that instruction. This is true regardless of whether DWARF or Windows unwind info is in use. We had just finished going to great lengths to making the Windows unwind info accurate at every instruction boundary, and this may have represented a bug to David. Writing flags is always super slow, so we weren't really concerned about introducing extra overhead for it. I can see the argument that users may not care about perfect unwind info and may prefer to read flags more quickly.

Comment 4 Nick Desaulniers 2020-12-04 14:41:24 PST

https://reviews.llvm.org/D92695

Comment 5 Nick Desaulniers 2020-12-04 14:57:31 PST

Noting: see also: https://bugs.llvm.org/show_bug.cgi?id=47530

Comment 6 Andy Lutomirski 2020-12-05 16:52:50 PST

(In reply to Nick Desaulniers from comment #4)
> https://reviews.llvm.org/D92695

I don't have a Phabricator account, but maybe a comment here will still be useful.

As you all discovered, on x86_64, a kernel essentially can't use a red-zone.  [0]  That being said, I would hope that LLVM would be sensible enough to generate correct code regardless of command line options.  For example, the eflags builtins could disable the redzone in the function in which they're used, or they could be even more clever and reserve the top redzone slot for themselves so that the rest of the redzone could be used.

But I bet that they will be much more common in kernel code than user code.  Kernel code can and does manipulate various interesting EFLAGS bits that can't or mostly wouldn't be written from user mode, in particular IF and AC.  The only reason I can think of for using this builtins from 64-bit user mode is to manipulate the EFLAGS.TF bit for single-stepping.  The RF bit is also interesting but IIRC can't be written using POPF in any useful way, and the arithmetic flags bits don't make any sense to manipulate like this in C code.  Authors of test cases that abuse the kernel or of DRM schemes might play with EFLAGS.TS, and maybe someone wants to fiddle with EFLAGS.AC, but those would be quite rare.  (32-bit user mode intended to work on very old CPUs uses the EFLAGS.ID to determine whether the CPUID instruction exists.  I doubt anyone cares about the performance of such code.)

[0] It's techincally possible but it's horrible for all kinds of reasons.  A user program that cooperates with the kernel to get direct-to-usermode delivery of interrupts also can't use a red-zone, but no one except exploit authors would do this.  There are hardware features^Wbugs that have historically been interesting attacks against hypervisors that involve programming the CPU like this, but no one does it for legitimate reasons.