This is a bug report that was posted internally at Google, since the user can't register here on bugs.llvm.org: Clang violates x86-64 calling convention in the obscure case when __int128 is passed on stack. Conside the following two functions: __int128 foo(__int128 x, __int128 y, __int128 z, uint64_t a, __int128 c) { return x + y + z + a + c; } __int128 foo(__int128 x, __int128 y, __int128 z, uint64_t a, uint64_t b, __int128 c) { return x + y + z + a + c; } Gcc generates identical code for these because __int128 have to be stored aligned on stack (psABI says: Arguments of type __int128 offer the same operations as INTEGERs, yet they do not fit into one general purpose register but require two registers. For classification purposes __int128 is treated as if it were implemented as: typedef struct { long low, high; } __int128; with the exception that arguments of type __int128 that are stored in memory must be aligned on a 16-byte boundary. Clang generates different code for these two functions: https://godbolt.org/z/ZKjb4Z Note that both GCC and Clang show the alignment of __int128 properly as 16 bytes, but this is only getting ignored when the variable is passed on the stack.
Do we also put half in a register and half in memory if we've already used 5 GPRs before we get to the i128?
These bugs are related: https://bugs.llvm.org/show_bug.cgi?id=19909 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92904
This patch might work. I based this of some similar code in AArch64 diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td index 30d05c6..0b07033 100644 --- a/llvm/lib/Target/X86/X86CallingConv.td +++ b/llvm/lib/Target/X86/X86CallingConv.td @@ -516,6 +516,14 @@ def CC_X86_64_C : CallingConv<[ // The first 6 integer arguments are passed in integer registers. CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D]>>, + + // i128 is split to two i64s, we can't fit half to register R9. + CCIfType<[i64], + CCIfSplit<CCAssignToReg<[EDI, ESI, EDX, ECX, R8D]>>>, + + // i128 is split to two i64s, and its stack alignment is 16 bytes. + CCIfType<[i64], CCIfSplit<CCAssignToStackWithShadow<8, 16, [R9]>>>, + CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8 , R9 ]>>, // The first 8 MMX vector arguments are passed in XMM registers on Darwin.