The LLVM Bytecode format, AsmParser, and Module class need to be extended to support robust target identification and a list of needed shared libraries. Currently we use endianness and pointer-size to auto-select a code generator to use with a bytecode file. This is obviously really limited (ie, we can't distinguish between i386 and i686, or PPC and SparcV8, for example), and not complete enough. Instead, we should allow the front-end to encode a standard GNU style "target-triple" in the .s file, and propagate it through to the compilation and optimizations steps. From this target triple, we can robustly identify a code generator or TargetMachine to use Even when target-triple support is added though, we should still keep the endianness/pointer size bits around, as they are useful for extracting information about unknown targets. Also, if a front-end generates portable code, it should obviously leave the target triple blank, indicating it works with any target. While this is being added, it would also make sense to add support for remembering the shared libraries that a module depends on. Currently when 'gccld' links a program, it statically links in any libraries in LLVM form, then forgets the rest. This requires the "user" to remember which libraries must be used when compiling to a exe file or running with the JIT. To fix this, a module should be able to depend on external "libraries" of code, either in LLVM form or in native form. This would allow us to "dynamically link" libstdc++, for example, to C++ programs. When the JIT start doing off-line caching and neat stuff like that, it could just load the native code for a library that is already compiled, instead of JIT compiling the whole library every time an app uses it. Though it would be nice to have this before 1.2, it looks unlikely that this will happen. I'm just adding this bug so it doesn't get forgotten. -Chris
Interestingly enough, the GCC people are starting to realize that they need something very similar to the library support described in this PR: http://gcc.gnu.org/ml/gcc/2004-06/msg00116.html -Chris
The "depend on library" feature is great and should be added. However, I have no idea how a compiler would determine the library dependencies based on its input (i.e. the morass of C/C++ header files require what libraries?). The feature can be added to the bytecode/AsmWriter/Module but its unclear how it gets used from there. As for the target triples, I'm thinking this is a bad idea for byte code. One of the design goals for bytecode should be target independence. You should be able to move a .bc file to any target LLVM supports and run it and get correct results. This is extremely important to LLVM's design, I believe. If we encode the target triple into a bc file, what purpose does it serve? To record what platform the .bc file was generated on? Who cares? What is needed, is a way to specify a target triple to the code generators to indicate what kind of (native) code they should generate. This should even support cross compilation.
> However, I have no idea how a compiler would determine the library > dependencies based on its input (i.e. the morass of C/C++ header files > require what libraries?). In MSIL, each external function specifies which library it comes from. In C land, this information is presented to the linker. The idea is to remember it after gccld runs. Also, consider if you link a library X... we want to remember all of the Y & Z libraries that X depends on, so when we link X to an application, we also know about Y and Z. > The feature can be added to the bytecode/AsmWriter/Module but its unclear > how it gets used from there. It gets used by the JIT (to dlopen native .so's), and by the mythical magically compiler driver, to link the output of llc. > As for the target triples, I'm thinking this is a bad idea for byte code. One > of the design goals for bytecode should be target independence. One of the nice things about llvm bytecode is that if the source language is target-indep, so will the LLVM bytecode. However, C and C++ are not, and there is no way to guarantee target independent bytecode. e.g.: int X[sizeof(void*)]; Cannot be compiled to something that is target independent. > If we encode the target triple into a bc file, what purpose does it serve? There are a couple of things, but the most important is the ability to pick target machines, and the ability to support target-specific features like calling conventions (fastcall,cdecl,thiscall,pascal,-fregparam,etc). We cannot replace GCC unless we can operate as a great target-specific compiler as well as a target independent compiler. -Chris
Mine
Fixed.
Erm, not quite fixed. The Bytecode, AsmWriter, and AsmParser parts of this bug are done and tested. What remains is the portion that actually uses the information in the Linker and JIT. I will leave this to others more knowledeable about that code.
The C/C++ front-end is now producing shared library and target triple info.
All the linking code has been consolidated into lib/Linker and all three linkers now use this library. Furthermore, the dependent libraries feature is now being used by lib/Linker to automatically resolve dependent libraries. This bug is 1/2 complete. The target-triple support still needs to be added.
Scheduled for 1.5
The linker now handles TT support: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20041206/022070.html What is left to close this bug? -Chris
Methinks to close this bug, targets need to be selected based on pattern-matching the Module's target triple.
Not only that, but code generation needs to into account sub-targets. Re-read the initial posting on this bug. We should be able to generate code that is suitable for a 386 on up. Same thing with variants of PowerPC and Sparc. I don't think we have sub-target support yet. Providing the target-triple is just the tip of the ice berg in my perspective. Perhaps the sub-target support and *using* the target-triple is another task. There's something that still bother's me about all this. While we need to support front ends that are machine specific (e.g. C) and the target-triple seems to do that, why should that affect the way code is generated? I.e. don't we really need two things here? One is the target-triple and the other is the actual machine for which code should be generated?
> Not only that, but code generation needs to into account sub-targets. No, that is a separate issue. This bug is just about getting the information into LLVM so we CAN do that. > There's something that still bother's me about all this. While we need to > support front ends that are machine specific (e.g. C) and the target-triple > seems to do that, why should that affect the way code is generated? This information is really only used for one thing: making cross compilers transparent. The heuristic for target selection is: 1. If -march is specified, use it. 2. If t-t is specified, use it. 3. Otherwise, pick the appropriate target based on the host. -Chris
Fixed. The last piece of this was making targets autoselect themselves based on the target triple. This is implemented here: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20041206/022136.html .. http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20041206/022139.html -Chris