LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 42439 - __int128 stack calling conventions are incorrect on x86-64
Summary: __int128 stack calling conventions are incorrect on x86-64
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords: ABI
Depends on:
Blocks:
 
Reported: 2019-06-28 11:08 PDT by Stephen Hines
Modified: 2019-12-13 11:03 PST (History)
10 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stephen Hines 2019-06-28 11:08:25 PDT
This is a bug report that was posted internally at Google, since the user can't register here on bugs.llvm.org:

Clang violates x86-64 calling convention in the obscure case when __int128 is passed on stack.

Conside the following two functions:

__int128 foo(__int128 x, __int128 y, __int128 z, uint64_t a, __int128 c) {
   return x + y + z + a + c;
}

__int128 foo(__int128 x, __int128 y, __int128 z, uint64_t a, uint64_t b, __int128 c) {
   return x + y + z + a + c;
}

Gcc generates identical code for these because __int128 have to be stored aligned on stack (psABI says:

  Arguments of type __int128 offer the same operations as INTEGERs, yet they do not fit into one general purpose register but require two registers.

For classification purposes __int128 is treated as if it were implemented as:
  typedef struct {
    long low, high;
  } __int128;
with the exception that arguments of type __int128 that are stored in memory must be aligned on a 16-byte boundary.

Clang generates different code for these two functions:
  https://godbolt.org/z/ZKjb4Z

Note that both GCC and Clang show the alignment of __int128 properly as 16 bytes, but this is only getting ignored when the variable is passed on the stack.
Comment 1 Craig Topper 2019-12-12 16:03:52 PST
Do we also put half in a register and half in memory if we've already used 5 GPRs before we get to the i128?
Comment 3 Craig Topper 2019-12-13 11:03:08 PST
This patch might work. I based this of some similar code in AArch64

diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 30d05c6..0b07033 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -516,6 +516,14 @@ def CC_X86_64_C : CallingConv<[

   // The first 6 integer arguments are passed in integer registers.
   CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D]>>,
+
+  // i128 is split to two i64s, we can't fit half to register R9.
+  CCIfType<[i64],
+           CCIfSplit<CCAssignToReg<[EDI, ESI, EDX, ECX, R8D]>>>,
+
+  // i128 is split to two i64s, and its stack alignment is 16 bytes.
+  CCIfType<[i64], CCIfSplit<CCAssignToStackWithShadow<8, 16, [R9]>>>,
+
   CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8 , R9 ]>>,

   // The first 8 MMX vector arguments are passed in XMM registers on Darwin.