New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clang no longer bootstraps itself on AArch64 linux #22782
Comments
Chandler, It seems that newer linkers may be able to bootstrap Clang, so that falls back on the issue of deprecating old tools, in this case, the linker. If we need those features, we should at least wait for the next release, and not back-port this into 3.6. cheers, |
It seems that some of the TLS relocations that clang produces right now just aren't supported by ld.bfd, and maybe some of them may also not be supported by ld.gold. My understanding is that the GNU toolchain currently supports only the It seems that the most practical solution for the time being (until linkers available in the most common distributions do support more TLS relocations), is to make sure that clang only produces what ld.bfd & ld.gold support on ELF-based systems. In an attempt to see what gcc actually produces, and how clang produces different code, I've created & run the following short python script: This produces the .s files in attachment. Looking at the main difference between the gcc- and clang-produce .s files, it seems that the main difference is in function f3/f4 accessing gcc produces: clang produces: My understanding is the the sequence produced by clang allows up to 4GiB-sized TLS areas, whereas the code produced by gcc After just having had a very quick glance, it seems that maybe the code generation in clang could be made more in line SDValue TLSModel::Model Model = getTargetMachine().getTLSModel(GA->getGlobal()); SDValue TPOff; SDValue ThreadBase = DAG.getNode(AArch64ISD::THREAD_POINTER, DL, PtrVT); if (Model == TLSModel::LocalExec) {
The relevant regression tests seem to be: |
This seems like more of a GNU bug: these relocations are in the ABI. That said, we could probably work around it while improving LLVM at the same time. Instead of ripping out the existing code, we should make it the switchable option that was intended. Changing the default to match GCC wouldn't be unreasonable either. IA-64 seems to have a similar issue and use "-mtls-size" (not in LLVM, obviously). I was sure there was another platform with the issue too (PPC?), but can't find anything searching now. Maybe I'm just remembering -mpic vs -mPIC. |
FURTHER ANALYSIS: One further reason why thread_local variables don't work on AArch64-linux for "The compiler is not allowed to schedule the sequences below. This restriction is In other words, current linkers are allowed to assume that accesses to TLS Where ... appears in a code sequence the compiler may insert zero or more General Dynamic:
Local Dynamic: Initial Exec:
Local Exec:
At the moment, LLVM schedules the instruction sequences produced, leading to C code:
objdump -rd of the clang-produced object file. The marked lines show that part
ld then misoptimizes/mis-relaxes this code by assuming that always register x0
Bottom line: the specification states that the instruction sequences must not |
I think that scheduling issue is something we should push back extremely firmly about. First, it defeats the entire purpose separate ELF relocations. If they have to be in a block, the specification should have just created an R_AARCH64_HORRIFIC_TLS_HACK covering the 5 instructions (or whatever) and used that. Second, we have a real compiler optimisation this is preventing, in the name of some hypothetical future linker change that there aren't even plans for. One implemented based on real public specifications. Basically, I think that document should never see the light of day. |
Would that help? |
It looks like that covers the extra relocations we emit (though Kristof has more recent experience with exhaustive testing of this). |
(Of course, there's also GAS support if you care about such things). |
Actually, that was too hasty. The linker relaxation appears to be a very real optimisation too, possibly greater than the compiler's in some cases. I think the correct way to word it would be a suggestion though ("compilers are advised to produce this exact sequence to aid linker optimisation; linkers that see this sequence may..."). There's still no excuse for not processing relocations correctly as seen. It's a buggy optimisation plain and simple. It's like us telling programmers to declare their variables volatile because "we've not yet fully determined what optimisations may be possible" and then blaming them when they get bad code for non-volatiles. The linker should bail if the sequence isn't exactly as written (the data-flow is completely self-contained, so this is possible with a local analysis). |
In the linker TLSDESC optimization, linker uses register r0 because the result of the TLSDESC access sequence must be held on r0 (returned by blr). Ideally, if linker is able to put the resulting "movz; movk" instructions in the place of original "blr", then it can accommodate any access sequence. However, for current aarch64 linker, it is almost impossible to squeeze 2 instructions into one slot without changing linker's work flow. Linker can check TLS access sequence before doing TLS relaxation. It does so on GD->LE relaxation. We can add checks on TLSDESC too. However, the bad news is, when linker finds the tls access sequence is not all right, it will fail and exit, reporting an error. It can not go back and do relocation without TLS relaxation. The reason is that linker allocates GOT and PLT spaces at very early stage, where the TLS relaxation decision has to be made. When linker comes to do the real relocation work, it is too late to change the decision since this symbol does not have a space in GOT or PLT at all. Clang generates different TLS access sequences than GCC. Before I was aware of this post, I have seen "adrp; ldr; add; blr", or "adrp; add; adrp; ldr; blr", or "adrp; ldr; adrp; add; blr". Linker can accommodate them well, no matter what register adrp is using. IMHO, both compiler and linker should work together to achieve better performance in a whole. Considering the nature of linker flow, compiler output has to match the TLS access patterns assumed by linker. If there are new patterns that make sense, we can add those patterns into linker. |
I still very firmly disagree. If the linker rejects a valid ELF file because it's got itself in a twist and can't undo that, that's a linker bug. Maybe the least bad solution is for the compiler to accommodate the linker, but let's be under no illusions here: the reason will be that the linker has a broken design that's more difficult to fix than the compiler. Not that it's better for everyone concerned. |
Oh, and if you can't actually fix the linker an error is strongly preferred to a silent codegen failure (rule #0 of compiler development). |
I share Tim's opinion. However, even fixing this bug today in the linker (it seems the fix is in progress), it still won't work for the next distributions to come, that use current linkers. Would it be possible to do the hack in the compiler today (with a very big FIXME), set this bug as a blocker to 3.7 and make sure we can re-enable it in the future when the linker is fixed? If necessary, I can liaise with the Linaro GNU developers to backport that fix onto the next stable binutils, so that distros that use our toolchain could benefit without having to wait several months to get the proper fix. |
Extended Description
When building clang/llvm on aarch64-linux, using a clang compiler, the following error is produced:
/usr/bin/ld: lib/libLLVMSupport.a(PrettyStackTrace.cpp.o): unrecognized relocation (0x20c) in section `.text._ZN4llvm21PrettyStackTraceEntryC2Ev'
/usr/bin/ld: final link failed: Bad value
clang-3.7: error: linker command failed with exit code 1 (use -v to see invocation)
This is because llvm recently started making use of a thread_local variable in PrettyStackTrace.cpp, and the AArch64 backend in llvm produces TLS relocations that are not supported by gnu ld.bfd.
The text was updated successfully, but these errors were encountered: