780 – tools only pull in some symbols from library archive

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 780 - tools only pull in some symbols from library archive

Summary: tools only pull in some symbols from library archive

Status:	RESOLVED FIXED

Alias:	None

Product:	tools
Classification:	Unclassified
Component:	analyze (show other bugs)
Version:	1.0
Hardware:	All All

Importance:	P normal
Assignee:	Reid Spencer

URL:
Keywords:

Duplicates (1):	800 (view as bug list)
Depends on:
Blocks:

Reported:	2006-05-15 17:56 PDT by Nick Lewycky
Modified:	2008-03-30 12:55 PDT (History)
CC List:	3 users (show)

See Also:
Fixed By Commit(s):

Attachments
add more files to LinkAllVMCore (7.98 KB, patch) 2006-06-10 13:46 PDT, Nick Lewycky	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nick Lewycky 2006-05-15 17:56:45 PDT

At the moment, the tools link against libLLVM*.a and pull in only the symbols
they need. The problem comes when loading .so files that also depend on symbols
from libLLVM*.a. If the .so is not linked with the .a then it will be missing
symbols.

The alternative, linking to the .a doesn't quite work either. You will get the
RegisterAnalysis or RegisterOpt in duplicate, triggering various errors:

$ analyze -load=./hypothesis.so -hypothesis hypo1.l
analyze: CommandLine Error: Argument 'track-memory' defined more than once!
analyze: CommandLine Error: Argument 'info-output-file' defined more than once!
analyze: CommandLine Error: Argument 'stats' defined more than once!
analyze: CommandLine Error: Argument 'debug' defined more than once!
analyze: CommandLine Error: Argument 'debug-only' defined more than once!
analyze: CommandLine Error: Argument 'help' defined more than once!
analyze: CommandLine Error: Argument 'help-hidden' defined more than once!
analyze: CommandLine Error: Argument 'version' defined more than once!
analyze: Pass.cpp:340: void llvm::RegisterPassBase::registerPass(): Assertion
`PassInfoMap->find(PIObj.getTypeInfo()) == PassInfoMap->end() && "Pass already
registered!"' failed.

The solution is probably to move from .a's to .so's. The other solutions include
making the .a's more fine-grained and changing the modules to explicitly allow
multiple entries.

Comment 1 Chris Lattner 2006-05-15 20:21:49 PDT

To me, the right solution to this is probably to get rid of the last circular library dependency, then switch 
to .o files for opt/analyze/bugpoint.

However, that gets tricky too, as we go back to "linking multiple versions of the same library".

What symbols are missing?  Maybe we just need to add them to things like "Transforms/LinkAllPasses.h"?

In general, .a files without cyclic dependencies are the best way to distribute libraries... (IMHO)

-Chris

Comment 2 Reid Spencer 2006-05-15 20:35:55 PDT

Here's what I see for potential solutions:

1. Incorporate all modules from an archive library into any LLVM tool that needs
   any symbol from that archive.
   Advantage: no link problems
   Disadvantage: bloated executables

2. Break up certain .a libraries into smaller pieces so the pieces conform to
   just what a given tool needs.
   Advantage: no link problems (or fewer, anyway)
   Disadvantage: major rework of LLVM directories or makefiles

3. Build shared objects (DLLs) that contain all the code from logical groupings
   of functionality. Make both user and LLVM tools link against these shared
   libraries. We could do this on a case-by-case basis. Tools such as llvm-as
   and llvm-dis which don't support -load might want to statically link
   everything so there is no runtime linking from a .so. Any app that does have
   -load would link against the shared lib library.
   Advantage: not link problems, saves memory when multiple LLVM programs
              running, resolves all the cyclic dependency problems as there are
              fewer things to link against.
   Disadvantage: slower loading times.

In all these choices, we should do at least three things in conjunction with 
any of them:
1. Resolve circular dependencies so that there is a simple tree of dependencies
2. Strictly control what's public and what's not. There might be some things
   exposed as external symbols that should not be.
3. Provide as much choice and as few roadblocks as possible for the end user.

Reid.

Comment 3 Chris Lattner 2006-05-15 21:39:52 PDT

#1 is basically "use relinked .o's for everything".  As you say, badness...

#2 is impossible in general, because there are more llvm tools in the world than just those in mainline 
CVS, and no single person probably knows about all of them. :)  

#3, as you mention, will lead to REALLY slow load times, on the order of our current link times.  I don't 
think this is acceptable.

However, note that this problem is just for tools that have -load options.  Things like gccas/gccld don't 
have this problem, and using minimal .a files is *really good* for them.

As such, I propose #4:

4. Make the headers which are designed to suck in all symbols related to opts/analyses do their job.  
Continue to use these headers for opt/analyze/bugpoint.

-Chris

Comment 4 Reid Spencer 2006-06-05 11:03:30 PDT

*** Bug 800 has been marked as a duplicate of this bug. ***

Comment 5 Reid Spencer 2006-06-07 15:40:31 PDT

Mine.

Comment 6 Reid Spencer 2006-06-07 15:44:03 PDT

Here are some initial commits in resolution of this bug:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035424.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035425.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035426.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035427.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035428.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035429.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035432.html

Comment 7 Reid Spencer 2006-06-07 18:05:40 PDT

This bug is resolved with all the patches between:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035435.html

and

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060605/035448.html

Comment 8 Reid Spencer 2006-06-07 18:15:30 PDT

Nick,

Althought I'm fairly confident this fix will work (because I checked linker maps
to determine what was included in the executable), could you please verify that
this problem fixes your situation and mark this bug verified if it does.

Thanks,

Reid.

Comment 9 Reid Spencer 2006-06-07 18:59:20 PDT

Unfortunately, resolving the libLLVMCore references didn't solve the whole
problem. There are now unresolved symbols for lib/Support and lib/System.

The solution is to enhance lib/Support, lib/System, their corresponding header
files and the LinkAllVMCore.h header file to ensure that all of lib/Support,
lib/System, and lib/Support/bzip2 are incorporated. Then, a particular shared
lib loadable by analyze or opt should not link with any of those libraries nor
VMCore but let the symbols be resolved at runtime from the symbols in the
executable.

Maybe we should revisit the shared library idea?

Comment 10 Chris Lattner 2006-06-07 19:14:09 PDT

shared libraries are a cop-out.  The major issue here is that we don't have a well-defined interface for 
plugins are allowed to use.  What parts of libsupport aren't getting pulled in right?

Comment 11 Reid Spencer 2006-06-07 19:38:06 PDT

SlowOperationInformer for one.

Comment 12 Nick Lewycky 2006-06-10 13:46:58 PDT

Created attachment 343 [details]
add more files to LinkAllVMCore

This patch adds the minimum files necessary to make my program load. However,
it pulls in too much as I get an error with duplicate registered passes in the
pass manager.

Comment 13 Reid Spencer 2006-06-19 12:07:23 PDT

I'm really not sure what to do about this situation. We can now selectively link
what's needed, but that's only 1/2 the problem. The registrations of passes/etc.
should probably be separated into separate files, or possibly duplicate
registration should just be ignored.

Any ideas?

Comment 14 Reid Spencer 2006-07-26 11:21:10 PDT

Nick, your patch has been committed, more or less. I had to work around a
problem that caused a dependency cycle between lib/Support and lib/System. We're
trying to eliminate dependency cycles from LLVM.

This still isn't finished. The rest of lib/System needs to be included, as well
as more of lib/Support.

Comment 15 Reid Spencer 2006-07-26 11:59:47 PDT

All of lib/System is now available to loaded modules. The programs that #include
LinkAllVMCore.h will automatically link all of lib/System.  

This was the easy part. What's a little more difficult is whether all of
lib/Support should also be included in such programs. My inclination is to just
do that, although it will (slightly) increase the size of the programs.

These patches were applied:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060724/036175.html
through
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060724/036185.html

Comment 16 Reid Spencer 2006-07-27 13:24:26 PDT

Some ideas from IRC on linking whole archives, which would eliminate the
"IncludeFile" bloat:

darwin ld supports -all_load to load all members of .a file.
gcc supports -u <symbol> to make <symbol> undefined
gnu ld supports --whole-archive
solaris ld supports "-z allextract"

Comment 17 Reid Spencer 2006-07-27 15:46:58 PDT

Using the options to include the whole archive has several problems:

1. libtool rearranges the order of --whole-archive options so that they are
   ineffectual (unsolvable with reasonable effort). It always places these
   options before the libraries. If you use just --whole-archive you get a bunch
   of multiply defined symbols because "whole archive" applies to libraries such
   as -lpthread, -lltdl, -lm, and -lelf, never mind the libgcc and libstdc++.
2. linking directly with g++ and not with libtool makes the -rpath option fail
   the build (solvable, but not portably)
3. linking directly with g++ increases the number of link errors.  When linking
   analyze there are 348 multiply definitions and 46 unresolved. 

Not sure this approach is workable.

Comment 18 Reid Spencer 2006-08-02 15:22:16 PDT

Another way to resolve this is to go back to generating both .o and .a files for
certain libraries. For the tools that matter, we would link with the .o file (to
get the entire content of that library). For others, they can use the .a to
selectively link.  Although this slows down our build a bit and makes LLVM's
footprint larger, its actually looking like a viable/reasonable alternative at
this point.

Comment 19 Reid Spencer 2006-08-05 18:12:46 PDT

I currently have a question in to the libtool email list about the correct usage
of libtool to get "whole archive" behavior. There have been some responses, I'm
waiting for more. You can read the discussion starting here:

http://lists.gnu.org/archive/html/libtool/2006-08/msg00007.html

Apparently, building a libtool "convenience archive" might give us what we want,
but I'm unclear on the usage so further answers are needed.

Reid.

Comment 20 Chris Lattner 2006-08-05 18:14:10 PDT

Very cool.  Thanks for following up on this Reid!

-Chris

Comment 21 Reid Spencer 2006-08-07 17:30:55 PDT

Okay, some help from the libtool email list has given us a solution that I've
tested. The problem was that when you specify LOADABLE_MODULE in the makefile,
it prevented the libraries from being linked into the module.  I'm not quite
sure how, but libtool seems to prevent duplicate symbol definitions if you do
link the libraries in. So, the solution is to use:

LOADABLE_MODULE :=1
DONT_BUILD_RELINKD := 1
SHARED_LIBRARY := 1
LINK_LIBS_IN_SHARED :=1
USEDLIBS := ...

This will cause the "USEDLIBS" to actually be linked into the module (shared
object) which will avoid all "undefined XYZ" messages.  It makes the modules
larger, but they shouldn't be using tons of stuff out out of LLVM anyway. 

To make this easier to digest, we're going to make the only requirement the
specification of "LOADABLE_MODULE". That will cause SHARED_LIBRARY,
LINK_LIBS_IN_SHARED and DONT_BUILD_RELINKED all to be turned on.

Now, there's only two problems left:
  * duplicate registration of command line options
  * duplicate registration of LLVM passes.

Comment 22 Reid Spencer 2007-02-07 12:20:00 PST

Duplicate registration of command line options and passes occurs when the
loadable module has linked in something from LLVM that is already linked into
the executable in which it is loaded. In general, loadable modules should not
link in anything from LLVM, just use things from LLVM.

Comment 23 Chris Lattner 2008-03-30 12:55:43 PDT

This has been fixed by Devang's change of the passmgr from using RTTI to using an explicit ID symbol for each pass.