LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 51703 - References to extern thread_local variables clobber r11 on darwin
Summary: References to extern thread_local variables clobber r11 on darwin
Status: NEW
Alias: None
Product: clang
Classification: Unclassified
Component: LLVM Codegen (show other bugs)
Version: trunk
Hardware: PC All
: P enhancement
Assignee: Unassigned Clang Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-01 07:25 PDT by Nico Weber
Modified: 2021-09-01 16:50 PDT (History)
9 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nico Weber 2021-09-01 07:25:05 PDT
TLS wrapper functions use calling convention cxx_fast_tlscc, which per langref:

"""On X86-64 the callee preserves all general purpose registers, except for RDI and RAX."""

When calling a non-dso_local TLS wrapper function on darwin, we'll end up calling into dyld_stub_binder to to resolve the wrapper function.

dyld_stub_binder clobbers r11: https://github.com/opensource-apple/dyld/blob/master/src/dyld_stub_binder.s#L203

(Also, the thunks inserted by lld and ld64 do so too, probably since they figure r11 is already overwritten by dyld_stub_binder)

So we can't use cxx_fast_tlscc for non-dso_local TLS wrapper functions on darwin.

--

That's of course unfortunate since cxx_fast_tlscc removes lots of stack traffic. So maybe in time we could change dyld_stub_binder to not clobber r11...somehow and make the linkers use rax in the stub code, and then use cxx_fast_tlscc if linkers and targeted macOS versions are new enough. But that needs changes to dyld, so someone at apple would have to drive this.

--

Here's a standalone repro that shows the bug, but the summary above is really all you need.

% cat tlvhost.cc
extern thread_local int j;
thread_local int j = 0;

% clang -O2 tlvhost.cc  -std=c++11 -shared -o tlvhost.dylib

% cat tlv.cc
extern thread_local int j;

int f(int a, int b) {
  int c = a * b;
  int d = a + b;
  int e = a / b;
  int f = a - b;
  int g = a - 2 * b;
  int h = a - 3 * b;
  int i = a - 4 * b;
  return c / d * e / f + j + g * h + i;
}

% out/gn/bin/clang -O2 tlv.cc  -std=c++11 -shared tlvhost.dylib -o tlv.dylib

% cat main.cc
#include <iostream>

extern int f(int a, int b);

int main(int argc, char* argv[]) {
  printf("%d\n", f(atoi(argv[1]), atoi(argv[2])));
}

% clang main.cc tlv.dylib

% ./a.out 1 2
-846192167

The output _should_ be 8, and it is 8 if I remove the `+ j +` bit in tlv.cc. (j is a tlv that's 0.)




(The repro is with clang built at 9b6c8132d3785269512803ff51cb421f8d8bcf0e and it's dependent on the optimizer. Pasting the asm clang generated for me below, see how the same r11 issue happens there:

 % out/gn/bin/clang -O2 tlv.cc  -std=c++11 -shared tlvhost.dylib -S -o -
clang: warning: tlvhost.dylib: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15
	.globl	__Z1fii                         ## -- Begin function _Z1fii
	.p2align	4, 0x90
__Z1fii:                                ## @_Z1fii
	.cfi_startproc
## %bb.0:                               ## %entry
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	pushq	%rbx
	pushq	%rax
	.cfi_offset %rbx, -24
                                        ## kill: def $esi killed $esi def $rsi
	movl	%edi, %ecx
	movl	%esi, %r10d
	imull	%edi, %r10d
	leal	(%rsi,%rcx), %r9d
	movl	%edi, %eax
	cltd
	idivl	%esi
	movl	%eax, %r8d
	subl	%esi, %edi
	movl	%ecx, %r11d
	subl	%esi, %r11d
	subl	%esi, %r11d
	leal	(%rsi,%rsi,2), %eax
	movl	%ecx, %ebx
	subl	%eax, %ebx
	shll	$2, %esi
	movl	%r10d, %eax
	cltd
	idivl	%r9d
	imull	%r8d, %eax
	cltd
	idivl	%edi
	movl	%eax, %edx

        # Calling into dyld_stub_binder here:
	callq	__ZTW1j

        # Using the now-clobbered r11 register right after:
	imull	%r11d, %ebx
	subl	%esi, %ecx
	addl	%ebx, %ecx
	addl	%ecx, %edx
	addl	(%rax), %edx
	movl	%edx, %eax
	addq	$8, %rsp
	popq	%rbx
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
.subsections_via_symbols
)
Comment 1 Nico Weber 2021-09-01 07:29:20 PDT
rjmccall, kledzik: This is arguably a dyld bug. Let me know if you (or anyone in your area) is interested in the long-term fix mentioned after "--" in comment 0.
Comment 2 Nico Weber 2021-09-01 07:30:32 PDT
(Upstream bug where we ran into this: https://bugs.chromium.org/p/chromium/issues/detail?id=1243375#c20)
Comment 3 John McCall 2021-09-01 13:08:53 PDT
Can we not just say that the CC doesn't preserve r11 on Apple platforms?  LLVM langref doesn't dictate Apple platform ABI.
Comment 4 Nico Weber 2021-09-01 16:50:52 PDT
A'ight: https://reviews.llvm.org/D109112