16830 – X86: stack realignment, dynamic allocas, and inline assembly cause conflict over ebx / esi

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 16830 - X86: stack realignment, dynamic allocas, and inline assembly cause conflict over ebx / esi

Summary: X86: stack realignment, dynamic allocas, and inline assembly cause conflict o...

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: X86 (show other bugs)
Version:	trunk
Hardware:	PC Windows NT

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Duplicates (4):	22068 27183 30389 51100 (view as bug list)
Depends on:
Blocks:	21420 24345 13340
	Show dependency tree

Reported:	2013-08-07 17:39 PDT by Reid Kleckner
Modified:	2021-07-19 14:45 PDT (History)
CC List:	8 users (show)

See Also:	26883 50133
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Reid Kleckner 2013-08-07 17:39:23 PDT

So far as I can tell, the only register you can't touch in MS inline asm is ebp, but LLVM's x86 backend requires a BasePtr register which is separate from the frame pointer in ebp.  It happens to hard code the choice of esi in X86RegisterInfo.cpp:
  // Use a callee-saved register as the base pointer.  These registers must
  // not conflict with any ABI requirements.  For example, in 32-bit mode PIC
  // requires GOT in the EBX register before function calls via PLT GOT pointer.
  BasePtr = Is64Bit ? X86::RBX : X86::ESI;

This will blow up if the inline asm clobbers esi.

Test case straight from LLVM's own lib/Support/Host.cpp:

bool GetX86CpuIDAndInfo(unsigned value, unsigned *rEAX, unsigned *rEBX,
                        unsigned *rECX, unsigned *rEDX) {
  __asm {
    mov   eax,value
    cpuid
    mov   esi,rEAX
    mov   dword ptr [esi],eax
    mov   esi,rEBX
    mov   dword ptr [esi],ebx
    mov   esi,rECX
    mov   dword ptr [esi],ecx
    mov   esi,rEDX
    mov   dword ptr [esi],edx
  }
  return false;
}

This generates x86 asm like:

$ clang -cc1 -fms-compatibility t.cpp -o - -cxx-abi microsoft -S
...
"?GetX86CpuIDAndInfo@@YA_NIPAI000@Z":
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $28, %esp
        movl    %esp, %esi
...
        #APP
        .intel_syntax
        mov eax,dword ptr [esi + 24]
        cpuid
        mov esi,dword ptr [esi + 20]
        mov dword ptr [esi],eax
        mov esi,dword ptr [esi + 16]
        mov dword ptr [esi],ebx
        mov esi,dword ptr [esi + 12]
        mov dword ptr [esi],ecx
        mov esi,dword ptr [esi + 8]
        mov dword ptr [esi],edx
        .att_syntax
        #NO_APP

Comment 1 Reid Kleckner 2013-12-06 16:03:18 PST

MSVC uses ebx instead of esi, for roughly the same reasons, although the roles are swapped.  ebx points to the incoming arguments, and ebp points to the local variables.  ebx is callee saved in most CCs, so it can be used across the body of the function.

MSVC emits a warning when modifying ebp, and ebx if it had to use it due to stack realignment:

int main() {
  __declspec(align(8)) int a;
  __asm {
    push ebp
    mov ebx, esp
    and esp, -16
    mov a, edi
    mov esp, ebx
    pop ebp
  }
  return a;
}

t.cpp(5) : warning C4731: 'main' : frame pointer register 'ebx' modified by inline assembly code
t.cpp(9) : warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code

If you remove the need for stack realignment, only the ebp warning remains.  You can modify ebp without warning, but only if you enable optimizations and the FP can be eliminated.

We could teach clang to emit a diagnostic like this.

Comment 2 Reid Kleckner 2014-12-30 18:33:36 PST

*** Bug 22068 has been marked as a duplicate of this bug. ***

Comment 3 Anton Korobeynikov 2015-01-01 08:56:42 PST

I think we can safely use ebx on win32 - PIC is implemented w/o GOT there, so we can easily use ebx.

Comment 4 Reid Kleckner 2015-01-12 18:56:23 PST

(In reply to comment #3)
> I think we can safely use ebx on win32 - PIC is implemented w/o GOT there,
> so we can easily use ebx.

That's just a workaround, though. We will still conflict with inline asm using ebx on Windows (rare, due to MSVC's use of ebx) and esi on Linux (not uncommon due to string instructions).

I think we need to reformulate the problem, rather than trying to pick an arbitrary hardcoded register that works for all situations. Stack objects (allocas, spill slots) with low alignment can be accessed via ebp, and stack objects with large alignment requirements can be accessed via a virtual register, which can be spilled via ebp. Then we can let the register allocator solve the problem.

Comment 5 Reid Kleckner 2015-01-13 15:04:07 PST

Re-titling since this is generic across OSs and asm style.

Comment 6 Reid Kleckner 2016-04-01 17:27:33 PDT

*** Bug 27183 has been marked as a duplicate of this bug. ***

Comment 7 Reid Kleckner 2016-09-14 16:11:28 PDT

*** Bug 30389 has been marked as a duplicate of this bug. ***

Comment 8 Eli Friedman 2021-07-15 00:33:42 PDT

*** Bug 51100 has been marked as a duplicate of this bug. ***

Comment 9 Eli Friedman 2021-07-15 01:00:42 PDT

I looked at this briefly since a duplicate was filed, bug 51100.

Apparently, gcc doesn't use a base pointer at all.  It uses different techniques on different targets; on x86, it emits an exotic prologue that actually stores the frame pointer below the realignment gap.  On ARM, gcc apparently just never realigns the stack at all.

Comment 10 Reid Kleckner 2021-07-19 11:33:10 PDT

To recap, there are three stack areas that the compiler may need to reference:
1. incoming parameters and fixed stack objects (return addr)
2. local variable stack objects (potentially highly aligned)
3. outgoing arguments

Overaligned variables make the offset from 1 to 2 dynamic, and allocas make the offset from 2 to 3 dynamic. LLVM's strategy today is to dedicate a physical register to all areas when needed.

Fixing this issue fully requires adding indirection to accesses to one of these areas. IMO, area #1, arguments, is the best option, especially given other LLVM design choices. To implement that, we should:
- mark argument stack objects as mutable (prevent remat of argument loads)
- assign virtual registers to each byval / inalloca argument
- treat access to byval/inalloca arguments like accessing a dynamic alloca

Comment 11 Eli Friedman 2021-07-19 12:18:59 PDT

If we're going to move the alignment gap so it isn't between the frame pointer and the stack pointer, we might as well just construct a single virtual register that pointers to the argument area, instead of using a separate virtual register for each argument.

I suspect adding an indirection to access overaligned locals is actually the easiest approach for most targets, at least ones that don't need overaligned spill slots.  We already have LocalStackSlotAllocation, which does a similar transform.

Comment 12 Reid Kleckner 2021-07-19 14:45:45 PDT

Sure, it's easy to penalize access to overaligned locals, but I think they tend to be performance critical, so IMO it's better to penalize accesses to arguments. I assume the base pointer work was specifically done to address this issue.