At the moment, the tools link against libLLVM*.a and pull in only the symbols they need. The problem comes when loading .so files that also depend on symbols from libLLVM*.a. If the .so is not linked with the .a then it will be missing symbols. The alternative, linking to the .a doesn't quite work either. You will get the RegisterAnalysis or RegisterOpt in duplicate, triggering various errors: $ analyze -load=./hypothesis.so -hypothesis hypo1.l analyze: CommandLine Error: Argument 'track-memory' defined more than once! analyze: CommandLine Error: Argument 'info-output-file' defined more than once! analyze: CommandLine Error: Argument 'stats' defined more than once! analyze: CommandLine Error: Argument 'debug' defined more than once! analyze: CommandLine Error: Argument 'debug-only' defined more than once! analyze: CommandLine Error: Argument 'help' defined more than once! analyze: CommandLine Error: Argument 'help-hidden' defined more than once! analyze: CommandLine Error: Argument 'version' defined more than once! analyze: Pass.cpp:340: void llvm::RegisterPassBase::registerPass(): Assertion `PassInfoMap->find(PIObj.getTypeInfo()) == PassInfoMap->end() && "Pass already registered!"' failed. The solution is probably to move from .a's to .so's. The other solutions include making the .a's more fine-grained and changing the modules to explicitly allow multiple entries.
To me, the right solution to this is probably to get rid of the last circular library dependency, then switch to .o files for opt/analyze/bugpoint. However, that gets tricky too, as we go back to "linking multiple versions of the same library". What symbols are missing? Maybe we just need to add them to things like "Transforms/LinkAllPasses.h"? In general, .a files without cyclic dependencies are the best way to distribute libraries... (IMHO) -Chris
Here's what I see for potential solutions: 1. Incorporate all modules from an archive library into any LLVM tool that needs any symbol from that archive. Advantage: no link problems Disadvantage: bloated executables 2. Break up certain .a libraries into smaller pieces so the pieces conform to just what a given tool needs. Advantage: no link problems (or fewer, anyway) Disadvantage: major rework of LLVM directories or makefiles 3. Build shared objects (DLLs) that contain all the code from logical groupings of functionality. Make both user and LLVM tools link against these shared libraries. We could do this on a case-by-case basis. Tools such as llvm-as and llvm-dis which don't support -load might want to statically link everything so there is no runtime linking from a .so. Any app that does have -load would link against the shared lib library. Advantage: not link problems, saves memory when multiple LLVM programs running, resolves all the cyclic dependency problems as there are fewer things to link against. Disadvantage: slower loading times. In all these choices, we should do at least three things in conjunction with any of them: 1. Resolve circular dependencies so that there is a simple tree of dependencies 2. Strictly control what's public and what's not. There might be some things exposed as external symbols that should not be. 3. Provide as much choice and as few roadblocks as possible for the end user. Reid.
#1 is basically "use relinked .o's for everything". As you say, badness... #2 is impossible in general, because there are more llvm tools in the world than just those in mainline CVS, and no single person probably knows about all of them. :) #3, as you mention, will lead to REALLY slow load times, on the order of our current link times. I don't think this is acceptable. However, note that this problem is just for tools that have -load options. Things like gccas/gccld don't have this problem, and using minimal .a files is *really good* for them. As such, I propose #4: 4. Make the headers which are designed to suck in all symbols related to opts/analyses do their job. Continue to use these headers for opt/analyze/bugpoint. -Chris
*** Bug 800 has been marked as a duplicate of this bug. ***
Mine.
Here are some initial commits in resolution of this bug: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035424.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035425.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035426.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035427.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035428.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035429.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035432.html
This bug is resolved with all the patches between: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035435.html and http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035448.html
Nick, Althought I'm fairly confident this fix will work (because I checked linker maps to determine what was included in the executable), could you please verify that this problem fixes your situation and mark this bug verified if it does. Thanks, Reid.
Unfortunately, resolving the libLLVMCore references didn't solve the whole problem. There are now unresolved symbols for lib/Support and lib/System. The solution is to enhance lib/Support, lib/System, their corresponding header files and the LinkAllVMCore.h header file to ensure that all of lib/Support, lib/System, and lib/Support/bzip2 are incorporated. Then, a particular shared lib loadable by analyze or opt should not link with any of those libraries nor VMCore but let the symbols be resolved at runtime from the symbols in the executable. Maybe we should revisit the shared library idea?
shared libraries are a cop-out. The major issue here is that we don't have a well-defined interface for plugins are allowed to use. What parts of libsupport aren't getting pulled in right?
SlowOperationInformer for one.
Created attachment 343 [details] add more files to LinkAllVMCore This patch adds the minimum files necessary to make my program load. However, it pulls in too much as I get an error with duplicate registered passes in the pass manager.
I'm really not sure what to do about this situation. We can now selectively link what's needed, but that's only 1/2 the problem. The registrations of passes/etc. should probably be separated into separate files, or possibly duplicate registration should just be ignored. Any ideas?
Nick, your patch has been committed, more or less. I had to work around a problem that caused a dependency cycle between lib/Support and lib/System. We're trying to eliminate dependency cycles from LLVM. This still isn't finished. The rest of lib/System needs to be included, as well as more of lib/Support.
All of lib/System is now available to loaded modules. The programs that #include LinkAllVMCore.h will automatically link all of lib/System. This was the easy part. What's a little more difficult is whether all of lib/Support should also be included in such programs. My inclination is to just do that, although it will (slightly) increase the size of the programs. These patches were applied: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060724/036175.html through http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060724/036185.html
Some ideas from IRC on linking whole archives, which would eliminate the "IncludeFile" bloat: darwin ld supports -all_load to load all members of .a file. gcc supports -u <symbol> to make <symbol> undefined gnu ld supports --whole-archive solaris ld supports "-z allextract"
Using the options to include the whole archive has several problems: 1. libtool rearranges the order of --whole-archive options so that they are ineffectual (unsolvable with reasonable effort). It always places these options before the libraries. If you use just --whole-archive you get a bunch of multiply defined symbols because "whole archive" applies to libraries such as -lpthread, -lltdl, -lm, and -lelf, never mind the libgcc and libstdc++. 2. linking directly with g++ and not with libtool makes the -rpath option fail the build (solvable, but not portably) 3. linking directly with g++ increases the number of link errors. When linking analyze there are 348 multiply definitions and 46 unresolved. Not sure this approach is workable.
Another way to resolve this is to go back to generating both .o and .a files for certain libraries. For the tools that matter, we would link with the .o file (to get the entire content of that library). For others, they can use the .a to selectively link. Although this slows down our build a bit and makes LLVM's footprint larger, its actually looking like a viable/reasonable alternative at this point.
I currently have a question in to the libtool email list about the correct usage of libtool to get "whole archive" behavior. There have been some responses, I'm waiting for more. You can read the discussion starting here: http://lists.gnu.org/archive/html/libtool/2006-08/msg00007.html Apparently, building a libtool "convenience archive" might give us what we want, but I'm unclear on the usage so further answers are needed. Reid.
Very cool. Thanks for following up on this Reid! -Chris
Okay, some help from the libtool email list has given us a solution that I've tested. The problem was that when you specify LOADABLE_MODULE in the makefile, it prevented the libraries from being linked into the module. I'm not quite sure how, but libtool seems to prevent duplicate symbol definitions if you do link the libraries in. So, the solution is to use: LOADABLE_MODULE :=1 DONT_BUILD_RELINKD := 1 SHARED_LIBRARY := 1 LINK_LIBS_IN_SHARED :=1 USEDLIBS := ... This will cause the "USEDLIBS" to actually be linked into the module (shared object) which will avoid all "undefined XYZ" messages. It makes the modules larger, but they shouldn't be using tons of stuff out out of LLVM anyway. To make this easier to digest, we're going to make the only requirement the specification of "LOADABLE_MODULE". That will cause SHARED_LIBRARY, LINK_LIBS_IN_SHARED and DONT_BUILD_RELINKED all to be turned on. Now, there's only two problems left: * duplicate registration of command line options * duplicate registration of LLVM passes.
Duplicate registration of command line options and passes occurs when the loadable module has linked in something from LLVM that is already linked into the executable in which it is loaded. In general, loadable modules should not link in anything from LLVM, just use things from LLVM.
This has been fixed by Devang's change of the passmgr from using RTTI to using an explicit ID symbol for each pass.