TLS wrapper functions use calling convention cxx_fast_tlscc, which per langref: """On X86-64 the callee preserves all general purpose registers, except for RDI and RAX.""" When calling a non-dso_local TLS wrapper function on darwin, we'll end up calling into dyld_stub_binder to to resolve the wrapper function. dyld_stub_binder clobbers r11: https://github.com/opensource-apple/dyld/blob/master/src/dyld_stub_binder.s#L203 (Also, the thunks inserted by lld and ld64 do so too, probably since they figure r11 is already overwritten by dyld_stub_binder) So we can't use cxx_fast_tlscc for non-dso_local TLS wrapper functions on darwin. -- That's of course unfortunate since cxx_fast_tlscc removes lots of stack traffic. So maybe in time we could change dyld_stub_binder to not clobber r11...somehow and make the linkers use rax in the stub code, and then use cxx_fast_tlscc if linkers and targeted macOS versions are new enough. But that needs changes to dyld, so someone at apple would have to drive this. -- Here's a standalone repro that shows the bug, but the summary above is really all you need. % cat tlvhost.cc extern thread_local int j; thread_local int j = 0; % clang -O2 tlvhost.cc -std=c++11 -shared -o tlvhost.dylib % cat tlv.cc extern thread_local int j; int f(int a, int b) { int c = a * b; int d = a + b; int e = a / b; int f = a - b; int g = a - 2 * b; int h = a - 3 * b; int i = a - 4 * b; return c / d * e / f + j + g * h + i; } % out/gn/bin/clang -O2 tlv.cc -std=c++11 -shared tlvhost.dylib -o tlv.dylib % cat main.cc #include <iostream> extern int f(int a, int b); int main(int argc, char* argv[]) { printf("%d\n", f(atoi(argv[1]), atoi(argv[2]))); } % clang main.cc tlv.dylib % ./a.out 1 2 -846192167 The output _should_ be 8, and it is 8 if I remove the `+ j +` bit in tlv.cc. (j is a tlv that's 0.) (The repro is with clang built at 9b6c8132d3785269512803ff51cb421f8d8bcf0e and it's dependent on the optimizer. Pasting the asm clang generated for me below, see how the same r11 issue happens there: % out/gn/bin/clang -O2 tlv.cc -std=c++11 -shared tlvhost.dylib -S -o - clang: warning: tlvhost.dylib: 'linker' input unused [-Wunused-command-line-argument] clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument] .section __TEXT,__text,regular,pure_instructions .build_version macos, 10, 15 .globl __Z1fii ## -- Begin function _Z1fii .p2align 4, 0x90 __Z1fii: ## @_Z1fii .cfi_startproc ## %bb.0: ## %entry pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset %rbp, -16 movq %rsp, %rbp .cfi_def_cfa_register %rbp pushq %rbx pushq %rax .cfi_offset %rbx, -24 ## kill: def $esi killed $esi def $rsi movl %edi, %ecx movl %esi, %r10d imull %edi, %r10d leal (%rsi,%rcx), %r9d movl %edi, %eax cltd idivl %esi movl %eax, %r8d subl %esi, %edi movl %ecx, %r11d subl %esi, %r11d subl %esi, %r11d leal (%rsi,%rsi,2), %eax movl %ecx, %ebx subl %eax, %ebx shll $2, %esi movl %r10d, %eax cltd idivl %r9d imull %r8d, %eax cltd idivl %edi movl %eax, %edx # Calling into dyld_stub_binder here: callq __ZTW1j # Using the now-clobbered r11 register right after: imull %r11d, %ebx subl %esi, %ecx addl %ebx, %ecx addl %ecx, %edx addl (%rax), %edx movl %edx, %eax addq $8, %rsp popq %rbx popq %rbp retq .cfi_endproc ## -- End function .subsections_via_symbols )
rjmccall, kledzik: This is arguably a dyld bug. Let me know if you (or anyone in your area) is interested in the long-term fix mentioned after "--" in comment 0.
(Upstream bug where we ran into this: https://bugs.chromium.org/p/chromium/issues/detail?id=1243375#c20)
Can we not just say that the CC doesn't preserve r11 on Apple platforms? LLVM langref doesn't dictate Apple platform ABI.
A'ight: https://reviews.llvm.org/D109112