Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X86: stack realignment, dynamic allocas, and inline assembly cause conflict over ebx / esi #17204

Open
rnk opened this issue Aug 8, 2013 · 19 comments · May be fixed by #81048
Open

X86: stack realignment, dynamic allocas, and inline assembly cause conflict over ebx / esi #17204

rnk opened this issue Aug 8, 2013 · 19 comments · May be fixed by #81048
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@rnk
Copy link
Collaborator

rnk commented Aug 8, 2013

Bugzilla Link 16830
Version trunk
OS Windows NT
Blocks #21794 #24719 #13712
CC @asl,@efriedma-quic,@duck-37

Extended Description

So far as I can tell, the only register you can't touch in MS inline asm is ebp, but LLVM's x86 backend requires a BasePtr register which is separate from the frame pointer in ebp. It happens to hard code the choice of esi in X86RegisterInfo.cpp:
// Use a callee-saved register as the base pointer. These registers must
// not conflict with any ABI requirements. For example, in 32-bit mode PIC
// requires GOT in the EBX register before function calls via PLT GOT pointer.
BasePtr = Is64Bit ? X86::RBX : X86::ESI;

This will blow up if the inline asm clobbers esi.

Test case straight from LLVM's own lib/Support/Host.cpp:

bool GetX86CpuIDAndInfo(unsigned value, unsigned *rEAX, unsigned *rEBX,
unsigned *rECX, unsigned *rEDX) {
__asm {
mov eax,value
cpuid
mov esi,rEAX
mov dword ptr [esi],eax
mov esi,rEBX
mov dword ptr [esi],ebx
mov esi,rECX
mov dword ptr [esi],ecx
mov esi,rEDX
mov dword ptr [esi],edx
}
return false;
}

This generates x86 asm like:

$ clang -cc1 -fms-compatibility t.cpp -o - -cxx-abi microsoft -S
...
"?GetX86CpuIDAndInfo@@YA_NIPAI000@Z":
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %esp, %esi
...
#APP
.intel_syntax
mov eax,dword ptr [esi + 24]
cpuid
mov esi,dword ptr [esi + 20]
mov dword ptr [esi],eax
mov esi,dword ptr [esi + 16]
mov dword ptr [esi],ebx
mov esi,dword ptr [esi + 12]
mov dword ptr [esi],ecx
mov esi,dword ptr [esi + 8]
mov dword ptr [esi],edx
.att_syntax
#NO_APP

@rnk
Copy link
Collaborator Author

rnk commented Dec 7, 2013

MSVC uses ebx instead of esi, for roughly the same reasons, although the roles are swapped. ebx points to the incoming arguments, and ebp points to the local variables. ebx is callee saved in most CCs, so it can be used across the body of the function.

MSVC emits a warning when modifying ebp, and ebx if it had to use it due to stack realignment:

int main() {
__declspec(align(8)) int a;
__asm {
push ebp
mov ebx, esp
and esp, -16
mov a, edi
mov esp, ebx
pop ebp
}
return a;
}

t.cpp(5) : warning C4731: 'main' : frame pointer register 'ebx' modified by inline assembly code
t.cpp(9) : warning C4731: 'main' : frame pointer register 'ebp' modified by inline assembly code

If you remove the need for stack realignment, only the ebp warning remains. You can modify ebp without warning, but only if you enable optimizations and the FP can be eliminated.

We could teach clang to emit a diagnostic like this.

@rnk
Copy link
Collaborator Author

rnk commented Dec 31, 2014

*** Bug llvm/llvm-bugzilla-archive#22068 has been marked as a duplicate of this bug. ***

@asl
Copy link
Collaborator

asl commented Jan 1, 2015

I think we can safely use ebx on win32 - PIC is implemented w/o GOT there, so we can easily use ebx.

@rnk
Copy link
Collaborator Author

rnk commented Jan 13, 2015

I think we can safely use ebx on win32 - PIC is implemented w/o GOT there,
so we can easily use ebx.

That's just a workaround, though. We will still conflict with inline asm using ebx on Windows (rare, due to MSVC's use of ebx) and esi on Linux (not uncommon due to string instructions).

I think we need to reformulate the problem, rather than trying to pick an arbitrary hardcoded register that works for all situations. Stack objects (allocas, spill slots) with low alignment can be accessed via ebp, and stack objects with large alignment requirements can be accessed via a virtual register, which can be spilled via ebp. Then we can let the register allocator solve the problem.

@rnk
Copy link
Collaborator Author

rnk commented Jan 13, 2015

Re-titling since this is generic across OSs and asm style.

@rnk
Copy link
Collaborator Author

rnk commented Apr 2, 2016

*** Bug llvm/llvm-bugzilla-archive#27183 has been marked as a duplicate of this bug. ***

@rnk
Copy link
Collaborator Author

rnk commented Sep 14, 2016

*** Bug llvm/llvm-bugzilla-archive#30389 has been marked as a duplicate of this bug. ***

@efriedma-quic
Copy link
Collaborator

*** Bug llvm/llvm-bugzilla-archive#51100 has been marked as a duplicate of this bug. ***

@efriedma-quic
Copy link
Collaborator

I looked at this briefly since a duplicate was filed, bug 51100.

Apparently, gcc doesn't use a base pointer at all. It uses different techniques on different targets; on x86, it emits an exotic prologue that actually stores the frame pointer below the realignment gap. On ARM, gcc apparently just never realigns the stack at all.

@rnk
Copy link
Collaborator Author

rnk commented Jul 19, 2021

To recap, there are three stack areas that the compiler may need to reference:

  1. incoming parameters and fixed stack objects (return addr)
  2. local variable stack objects (potentially highly aligned)
  3. outgoing arguments

Overaligned variables make the offset from 1 to 2 dynamic, and allocas make the offset from 2 to 3 dynamic. LLVM's strategy today is to dedicate a physical register to all areas when needed.

Fixing this issue fully requires adding indirection to accesses to one of these areas. IMO, area #​1, arguments, is the best option, especially given other LLVM design choices. To implement that, we should:

  • mark argument stack objects as mutable (prevent remat of argument loads)
  • assign virtual registers to each byval / inalloca argument
  • treat access to byval/inalloca arguments like accessing a dynamic alloca

@efriedma-quic
Copy link
Collaborator

If we're going to move the alignment gap so it isn't between the frame pointer and the stack pointer, we might as well just construct a single virtual register that pointers to the argument area, instead of using a separate virtual register for each argument.

I suspect adding an indirection to access overaligned locals is actually the easiest approach for most targets, at least ones that don't need overaligned spill slots. We already have LocalStackSlotAllocation, which does a similar transform.

@rnk
Copy link
Collaborator Author

rnk commented Jul 19, 2021

Sure, it's easy to penalize access to overaligned locals, but I think they tend to be performance critical, so IMO it's better to penalize accesses to arguments. I assume the base pointer work was specifically done to address this issue.

@rnk
Copy link
Collaborator Author

rnk commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#17907

@stephenhines
Copy link
Collaborator

mentioned in issue #21794

@rnk
Copy link
Collaborator Author

rnk commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#22068

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 26, 2021

mentioned in issue #24719

@rnk
Copy link
Collaborator Author

rnk commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#27183

@rnk
Copy link
Collaborator Author

rnk commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#30389

@efriedma-quic
Copy link
Collaborator

mentioned in issue llvm/llvm-bugzilla-archive#51100

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
weiguozhi added a commit to weiguozhi/llvm-project that referenced this issue Feb 7, 2024
This patch fixes llvm#17204.

If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.

We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants