We want llvmc to be more configurable by including target-specific as well as language-specific configuration information. Additionally, the configuration data needs to be compiled in by default so we're not always reading a whole bunch of text files on every llvmc invocation. That said, the config files need to be retained as well as allowing compiled dynamic loading of the configuration information. Consequently, the following enhancements to llvmc regarding its configuration are requested: 1. Enhance llvmc to handle target-specific configuration data as well. Target triples should be used to identify the platform/machine. The configuration data for targets would specify things like native linker and assembler command line invocation options for a variety of tasks. 2. Create a set of C++ classes to represent the configuration of the driver. These classes must provide constructors that utilize strings or enums so that it is possible to create a highly readable configuration file in C++. 3. Compile in a set of instances of the C++ configuration classes (#1) as the standard configuration that supports known languages and targets. 4. Allow dynamic libraries of those config classes to be loaded at runtime via a command line option to specify either a file or a directory full of files. This allows non-supported or experimental targets and languages to have their own compiled configuration module to go along with the target or language's extension to LLVM. 5. Allow a textual representation of the configurations to be parsed that converts the text into corresponding C++ object instances that are merged in with other configuration data provided. Text file configuration is specified with command line argument. 6. Allow an object representation of configuration data to print itself in the textual format. Use this to permit reproduction of llvmc's configuration data in a human readable format. 7. Consider making the entire configuration mechanism a separate library or a support utility rather than placing it in llvmc directly.
Here's a radical idea: use TableGen. TableGen's facilities could be used to generate code that can be compiled and linked with a small "configuration library" to produce a loadable module that provides the configuration information for a particular language. Users could either create those modules separately and provide them with their front end, or the tablegen input could be given to llvmc on the command line in which case it would compile it (with TableGen and gcc), link it (perhaps with mklib), and then load it in, saving the linked object module for the next invocation. This would give us a powerful configuration language plus the speed of compiled code for the configuration information. Standard languages would be provided with llvmc and compiled directly into it.
Some brief thoughts: 1. The different intermediate steps and the tools to go from one step to the other should be configurable. They should not have to be based on file extension. 2. It would be nice to make llvmc be able to replace libtool in the simpler modes. E.g. have a "create dso" mode "compile .c to .o mode" etc. 3. I would like bugpoint/toolrunner to turn into a small wrapper around llvmc. The basic way I see implementing this is to treat this as a graph. The nodes in the graph are the different intermediate points (e.g. a .c file, a .s file, a .o file, a .so file, an executable). The edges in the graph are commands to get from one step to another: run cc1 to go from .c to .s, run llc to go from .bc to .s, run llc -file-type=obj to go from .bc to .o, run as to go from .s to .o, etc. The edges of this graph should be annotated with target triple regex's, so that a different assembler could be used for x86 than for itanium. The edges in this graph are often redundant: e.g. if we have a macho writer, you can get from .bc to .o with llc directly or by going through the system as. llvmc should be concerned with determining the shortest path on this graph and running the appropriate tools. Like its current design, it should not be concerned with maintaining backwards compatibility with existing tool command line options: sanity is more important. -Chris
Another crazy idea: use scons :) Surely, we cannot use it directly, since it's big, complex and have many dependencies. But we can "steal" some good ideas from it. Actually, this approach seems to be some "implementation-detailed" versions of Chris's ideas. We can have 4 basic objects: 1. Node 2. Builder 3. Action 4. Driver Each object's meaning is straightforward (modulo implementation details). We've discussed this approach briefly with Roman Samoilov and he is going to prepare some draft proposal soon next week. I don't think, we'll end with something usable for 2.0, but something good can be done for next versions.
Anton, Are you working on this (PR686)? If so, please assign it to yourself and let us know if this is going to make it into 2.0. If not, could let me know where you left off? Thanks, Reid
Reid, This definitely won't be ready before 2.0. Currently Roman is working on description of "core" classes and first proof-of-concept. I'll present the first draft for discussion as soon as it will be ready.
*** Bug 1732 has been marked as a duplicate of this bug. ***
I really think this can be closed now. Any objections? :)
go for it
The basic ideas listed here were really implemented and overall design looks pretty promising.