```c #include <inttypes.h> struct i128 { uint64_t a; uint64_t b; }; __attribute__((ms_abi, noinline)) struct i128 passthrough_a_s(struct i128 a) { return a; } __attribute__((ms_abi, noinline)) __int128 passthrough_a(__int128 a) { return a; } ``` This code compiles to assembly that looks like this: ```asm passthrough_a_s: mov rax, rcx ret passthrough_a: mov rax, rcx ret ``` As per these two documents: [msdn1]: https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx [msdn2]: https://msdn.microsoft.com/en-us/library/7572ztz4.aspx Both of these are wrong and the generated assembly ought to look like this instead: ```asm passthrough_a_s: mov rax, rcx mov r9, QWORD PTR [rdx] mov r10, QWORD PTR [rdx+8] mov QWORD PTR [rcx], r9 mov QWORD PTR [rcx+8], r10 ret passthrough_a: mov rax, rcx mov r9, QWORD PTR [rdx] mov r10, QWORD PTR [rdx+8] mov QWORD PTR [rcx], r9 mov QWORD PTR [rcx+8], r10 ret ``` The relevant excerpts: > A scalar return value that can fit into 64 bits is returned through RAX—this includes __m64 types. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. [snip] To be returned by value in RAX, user-defined types must have a length of 1, 2, 4, 8, 16, 32, or 64 bits. [snip] Otherwise, the caller assumes the responsibility of allocating memory and passing a pointer for the return value as the first argument. > __m128 types, arrays and strings are never passed by immediate value but rather a pointer is passed to memory allocated by the caller. Structs/unions of size 8, 16, 32, or 64 bits and __m64 are passed as if they were integers of the same size. Structs/unions other than these sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer (including __m128), the caller-allocated temporary memory will be 16-byte aligned. Also from https://msdn.microsoft.com/en-us/library/ms235286.aspx: > There is no attempt to spread a single argument across multiple registers.
*** Bug 33834 has been marked as a duplicate of this bug. ***
*** Bug 29143 has been marked as a duplicate of this bug. ***
*** Bug 35814 has been marked as a duplicate of this bug. ***
Issue 35814 was tentatively marked as a release blocker, so let's mark this as such too. I don't know if anyone will have time to look into it though.
(In reply to Hans Wennborg from comment #4) > Issue 35814 was tentatively marked as a release blocker, so let's mark this > as such too. I don't know if anyone will have time to look into it though. Bug 35814 was specifically about clang now asserting after https://reviews.llvm.org/rL316416. It would be nice to fix at least that, somehow? Not sure if reverting r316416 is feasible, though.
I plan to look into this issue. Currently my understanding is that win64cc was never fully implemented (byval returns and arguments are not handled as specified in the convention). The solution seems to be to modify X86ISelLowering to correctly spill/fill byval arguments to addresses passed in RCX, RDX, R8 and R9 instead of stack if this CC is used. With byvals loaded and stored correctly the original problem form Bug 35814 should disappear as the X86CallFrameOptimization will not be removing the stores and loads anymore.
After investigation I reached conclusion that issue is caused by Clang. When clang is lowering function into IR on platform other than Windows standard x86_64 ABI rules are used with disregard to what is specified using __attribute__. This is caused by the fact that only compilation target is checked when when choosing ABI, calling convention of lowered function (which may be overwritten by attribute) is not checked. This caused clang to split 128-bit byval into 64-bit arguments when emiting IR. Even though the information about CC is available to the LLVM code generator there is no indication that the arguments were originally forming single byval argument, so code gen is not able to produce correct binary in this case. I've prepared a patch that is fixing this issue: https://reviews.llvm.org/D43016
There's now a fix landed in https://reviews.llvm.org/rL324594, it at least fixes the assertion from bug 35814, but can the original submitter of this bug also please independently verify that it fixes their issue? We should merge this as soon as it has baked a little in trunk, Hans, do you agree?
(In reply to Dimitry Andric from comment #8) > There's now a fix landed in https://reviews.llvm.org/rL324594, it at least > fixes the assertion from bug 35814, but can the original submitter of this > bug also please independently verify that it fixes their issue? > > We should merge this as soon as it has baked a little in trunk, Hans, do you > agree? Sounds good to me. Merged in r324718.