[SymbolTable] Reconsider one symbol table for each type #783

lattner · 2004-07-28T05:59:16Z


Bugzilla Link	411
Resolution	FIXED
Resolved on	Feb 22, 2010 12:55
Version	1.0
OS	All
Blocks	#760

Extended Description

LLVM currently keeps one symbol table for each type plane in the program. There
are historical reasons for this, but they aren't particularly relevant any more.
I think the symbol table should just be a simple wrapper around two maps from
std::string to Value* and Type* (a one level map, not a two level one like we have).

Here are some of the problems that the current symbol table design causes:

This is really slow in the common case. In particular, to get to an element
in the symbol table, we need to go through two levels of maps (one from type
-> ValueMap, one from ValueMap -> Value*). In particular, when gccld'ing
252.eon, the top two killers are map operations which I believe are due
directly to the symbol table.
Important operations are really slow. In particular Module::getNamedFunction
takes linear time with size of the module because we don't know what type the
named function will have. This leads to the nastiness we see in the
Module::getMainFunction() implementation.
Linking is a nightmare, and because of this nightmare, we have to have things
like the function resolution pass, which is a certified disturbing hack.
I'm beginning to think that the function scoped symbol table and the module
scoped symbol table should be different classes entirely (once simplified).
In particular, the function-scope symbol table cannot have types in it.
Another capability that we want is to be able to completely disable the
function-level symbol table in certain cases, such as release mode. Names
inside of a function have no meaning, they are just there to make debugging
easier. Why should release mode have to pay for all of those map lookups?

Something that is important to point out is that LLVM is not serving the
front-end at all by having type-specific symbol table planes at the global
scope. In particular, even a language that wants to allow overloading based on
type, for example, will have to implement name mangling itself: the LLVM types
for a particular method are not necessarily going to be the distinct in the same
cases the source-level types would be (e.g. due to structural equivalence).

So anyway, here is the proposal:

Get rid of the type-sensitivity in the symbol table.
Split the SymbolTable class into ModuleSymbolTable and FunctionSymbolTable
Drop type-tracking from the FunctionSymbolTable.
In NDEBUG mode, FunctionSymbolTable would be a noop, simply ignoring the
names inserted into it (setName on a Instruction, Argument, or BasicBlock
would be a noop).

Thoughts?

-Chris

llvmbot · 2004-07-28T07:38:44Z

I won't miss type-sensitivity. Go for it.

llvmbot · 2004-07-28T14:54:56Z

All sounds good to me. I argued for removal of the type plane when I did the
symbol table rewrite, so this would be a welcome, simplifying change.

llvmbot · 2004-07-28T22:31:20Z

One thing to note: bytecode changes. If this symbol table change is made, it
will break the use of SymbolTable in the bcwriter and bcreader. That's fine, but
to also implement it properly, we should really clean up the symboltable bc file
format. So, if we're going to do this, I suggest we do it before 1.3 goes out so
we don't impact 1.3 users with yet another bc change in 1.4. I'd really like to
get the bc file format (relatively) stable in the 1.3 release so my file format
document is accurate (it currently needs updating).

Reid.

lattner · 2004-07-29T01:49:02Z

I think this change can be made without breaking either .ll or .bc formats. We
should definitely avoid breaking fundamental things, though perhaps for LLVM
2.0 we can simplify them too.

-Chris

lattner · 2004-08-04T03:23:48Z

Here is some more data. When running gccld on 252.eon with -disable-opt, I see
the following numbers with a profiling build:

% cumulative self self total
time seconds seconds calls s/call s/call name
35.11 2.76 2.76 45591495 0.00 0.00
std::_Rb_tree_base_iterator::_M_increment()
26.97 4.88 2.12 4833838 0.00 0.00 std::_Rb_tree<std::string
const, std::pair<std::string const, llvm:
:Value*>, std::_Select1st<std::pair<std::string const, llvm::Value*> >,
std::less<std::string const>, std::allocator<std:
:pair<std::string const, llvm::Value*> > >::find(std::string const&)

That is spending about 60% of the time in the symbol table and map operations. Ugh.

-Chris

llvmbot · 2004-08-04T07:17:16Z

When you say "get rid of type sensitivity", I'm assuming that you mean the
nested maps would go away. That is, the ModuleSymbolTable would contain simply a
std::mapstd::string,Value*. Is this correct? Furthermore, for the
FunctionSymbolTable, I'd like to make sure the NDEBUG interface to it compiles
to nothing. That is, the public interface is inline empty functions in NDEBUG mode.

I still think this needs to get done by 1.3 because of bytecode changes. Its not
a big deal but right now the bcwriter writes all types in the symbol table
first. This would become empty when this bug is implemented but there is still a
wasted (zero valued) length field that will occur in every function. Its not a
big deal but since we've changed so much bytecode in 1.3, I'd like to get this
cleanup changed. Also, in NDEBUG mode, we shouldn't write any function level
symbol tables to the bytecode at all.

lattner · 2004-08-04T07:39:18Z

When you say "get rid of type sensitivity", I'm assuming that you mean the
nested maps would go away. That is, the ModuleSymbolTable would contain simply a
std::mapstd::string,Value*. Is this correct?

Yes, exactly. I will post some more compelling numbers that show why this is
necessary soon. :)

Furthermore, for the FunctionSymbolTable, I'd like to make sure the NDEBUG
interface to it compiles to nothing. That is, the public interface is inline
empty functions in NDEBUG mode.

Yes, also true. The basic goal would be to make Instruction::setName() do as
little as possible. Unfortunately it still needs to be a virtual method
(exposed through Value) but maybe that could change in the future.

I still think this needs to get done by 1.3 because of bytecode changes.
... This would become empty when this bug is implemented but there is still
a wasted (zero valued) length field that will occur in every function.

I don't think this is the case... empty blocks that are opened in the bcwriter
but have no data emitted to them do not make it to the bc file.

but right now the bcwriter writes all types in the symbol table first

Why would this change?

Also, in NDEBUG mode, we shouldn't write any function level symbol tables
to the bytecode at all.

That is already implemented. If you -strip a bytecode file, it does not write
out function-level symbol tables at all.

-Chris

lattner · 2004-08-04T07:46:48Z

Here is some more exciting and compelling evidence for this bug. :) After
fixing other inefficiencies in the vmcore and bcreader, this problem is
magnified more and more. In a profiled gccld of eon with -disable-opt, the top
two lines of the profile are:

% cumulative self self total
time seconds seconds calls s/call s/call name
40.08 2.08 2.08 4833838 0.00 0.00 std::map<string, Value*>::find
13.49 2.78 0.70 5966948 0.00 0.00
std::_Rb_tree_base_iterator::_M_increment()

The big consumer of std::_Rb_tree_base_iterator::_M_increment() and find seem to
be due to "FindGlobalNamed" in VMCore/Linker.cpp, a horrible function that loops
over all of the symbol table planes and calls .find() on each one (gack!!!).
This would be fixed with a non-type-descriminating SymbolTable. Here is the
profile line in question:

0.58 4.05 0.03 7575 0.00 0.00 FindGlobalNamed

gccld takes 8.5s total on eon without optimization, so this function corresponds
to about 50% of the time to link the program. uggghhh!

-Chris

lattner · 2004-08-04T10:07:52Z

Okay, I hacked around the FindGlobalNamed problem, dramatically speeding up the
linker:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040802/016776.html

However, this PR still should be fixed. :)

-Chris

llvmbot · 2004-08-04T17:04:28Z

I still think this needs to get done by 1.3 because of bytecode changes.
... This would become empty when this bug is implemented but there is still
a wasted (zero valued) length field that will occur in every function.

I don't think this is the case... empty blocks that are opened in the bcwriter
but have no data emitted to them do not make it to the bc file.

I mis-spoke a bit. Its not the length field that will be wasted, its the slot
number. The type planes are not blocks and therefore the "auto-eliding" feature
doesn't apply. What I'm talking about is the byte structure of the type planes
themselves. Each type plane is preceded by a count and a slot number for the
type. So, if we compress everything to a single type plane and retain the
current format, then we have the symbol table like this:

Block Header - Block Type & Length
VBR_UInt - number of types that follow
(String,SlotNum)* - the name/type pairs
VBR_UInt - number of values that follow
VBR_UInt - slot number of the type plane's type
(String,SlotNum)* - the name/value pairs
VBR_Uint - zero (indicates end of type planes)

So, what I'm advocating is that we eliminate 5 and 7 from the bytecode format.
Like I said, its not a big deal, but it could save us a few bytes and might as
well get cleaned up before we release 1.3

but right now the bcwriter writes all types in the symbol table first

Why would this change?

Functions don't contain types. Types are all written at the global level. They
aren't named inside functions either. However, we use the same bytecode format
for a ModuleSymbolTable and a FunctionSymbolTable. So, what I'm advocating is to
remove fields 2 and 3 (in above list) for functions since they will always be
zero/empty for functions. We could clean that up so we don't have this single
byte overhead for functions. In NDEBUG mode it wouldn't matter because the whole
symbol table would vaporize but we should clean this up so the bytecode
specification isn't confusing (i.e. specifying fields that never have any content).

lattner · 2004-08-04T20:05:52Z

I really don't think that this is a big deal (there are much better things for
your to spend your time on), but if you really want to do this, I guess ok.

Note that this is a big risk right before the release, so unless you feel
REALLY inclined to do this, I would prefer not to. Also, the bc format will
probably change for 1.4 whether we like it or not :)

-Chris

llvmbot · 2005-12-23T04:11:27Z

Going to give this one another crack.

lattner · 2005-12-23T04:14:17Z

Cool!
One thing to be aware of: we have some intrinsics that want to work with any type. For example, llvm.sqrt
wants to work with both float and double operands. I think the right solution to this is to change over to
using llvm.sqrt.f32 and llvm.sqrt.f64, and make the asmreader map llvm.sqrt to these two.

-Chris

llvmbot · 2006-01-10T12:07:00Z

These patches implement the TypeSymbolTable and ValueSymbolTable classes, a
first step towards the refactoring.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030628.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030629.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030630.html

llvmbot · 2006-01-12T23:29:15Z

Sabre,

There's a problem with this implementation. When reading a bytecode file, I
can't instantiate a Value without knowing its type because of the type planes in
the reader. If I flatten the reader's data structures to not use type planes,
there are two consequences: higher vbr slot values (bigger bc file size), and
complexity for backwards compatibility.

I think the correct thing to do is leave the type planes in the bytecode file
format. This will leave the bc file format unchanged but require some additional
code to generate the type planes when generating the byte code.

Any thoughts on this?

lattner · 2006-01-13T01:01:45Z

I think the correct thing to do is leave the type planes in the bytecode file
format. This will leave the bc file format unchanged but require some additional
code to generate the type planes when generating the byte code.

Sounds fine to me. This will make the writer a bit slower, but the reader fast. Sounds like a good
compromise, particularly for JIT applications.

Thanks for digging into this Reid!

-Chris

llvmbot · 2006-01-14T11:03:45Z

Chris,

Type planes have been retained in the bytecode file (but not the symtab) and
that part seems to be working fine.

I have gotten rid of many bugs in the asmparser but am now faced with a failing
test suite because diffs produce output because functions have been renamed. I'm
not entirely sure what to do about this. For example, consider
Feature/alignment.ll. Runing "make check" produces:

FAIL: /proj/llvm/build/../llvm/test/Feature/alignment.ll:
WARNING: Redefinition of names across types has been deprecated.
Consequently, 'test' has been renamed as 'test1'.
Other instances will not be reported.
W
6c6
< int* %test1() align 32 {

int* %test11() align 32 {
13c13
< int* %test22() {

int* %test222() {

The RUN lines look like:
; RUN: llvm-as %s -o - | llvm-dis > %t1.ll
; RUN: llvm-as %t1.ll -o - | llvm-dis > %t2.ll
; RUN: diff %t1.ll %t2.ll

These kinds of tests (numerous) all depend on the dis-assembler being able to
regurgitate its input verbatim. However, the new assembler uniquifies any
duplicate function names.

I'm trying to think of ways to not alter the test scripts, but I don't see a way
to do that.

Any ideas?

llvmbot · 2006-01-14T19:49:02Z

Okay, the re-naming issue turned out to be a red herring. I wasn't checking for
the case that the same value was being renamed, so it was getting a new name.
I'm back to just having an ilist increment off the end of the list.

lattner · 2006-01-14T22:41:22Z

yeah, these sorts of tests first asm/disasm the llvm file to get a canonical output (this part should do the
renaming). Second they asm/disasm that to make sure that the canonical output is matched. This
second pass shouldn't do any renaming.

-Chris

llvmbot · 2006-01-15T05:44:19Z

I'm now looking at the tests that use the same intrinsic with different argument
types (e.g. isunordered). Currently, I have one of them getting a "new name".
That's probably not desireable ;> I recall you wanting to split these out by
type so we'd have llvm.isunordered.float and llvm.isunordered.double. Caveats of
this are:

lots of changes throughout LLVM (thwarts incrementalism)
test cases have to change for new names of intrinsics.

I was thinking we could just make the assembler "smart" about intrinsics and
avoid renaming when an intrinsic name is used.

Got any other ideas?

llvmbot · 2006-01-17T00:29:47Z

lattner · 2006-08-15T00:37:50Z

Another idea: fixing this bug will take a concerted effort and is a nasty one to fix. OTOH, the real
performance part of the problem is due to the two level maps. We currently have (in SymbolTable):

typedef std::map<const std::string, Value *> ValueMap;
typedef std::map<const Type *, ValueMap> PlaneMap;

Things would be much more efficient (both in memory-use and compile-time) if we used this instead:

std::map<std::pair<const Type *, std::string>, ValueMap>

... and this should be a much easier change to make.

-Chris

llvmbot · 2006-08-15T01:59:31Z

I don't see how that helps. Instead of looking up the plane and then searching a
that plane's map for the corresponding name, you're now asking a (bigger) map to
search using comparisons of a std::pair. We can probably short-circuit the
comparison functor so that it amounts to about the same thing, but what does
this save us?

How about we switch this to the google dense hash map?

Reid.

lattner · 2006-08-15T04:10:55Z

I already clarified with reid, but for the record:

This is a size win, because it eliminates one std::map per type plane, a significant memory win.
This is a performance win, because it reduces the number of pointer traversals required to do a lookup.

Using a better map is another potentially useful change, but is orthogonal to this one. This is required no
matter what the implementation of map used.

-Chris

llvmbot · 2007-01-06T09:44:16Z

An incremental step has been taken with patches committed today by separating
types into their own symbol table. This unclutters the SymbolTable class and
paves the way for it to be replaced with ValueSymbolTable class, which has no
type planes.

Next Steps:

Revise the logic in bcwriter to compute the type planes without the help of
the SymbolTable class. We still want to retain type planes in the bc format
so that we don't have to write a bunch of type/value/name triples but instead
organize them by type and just write value/name pairs. This saves space but
also makes the bcreader faster.
While we're at it, also revise the logic in bcwriter to order Values by
usage. This gives the most frequently used values the lowest slot numbers in
their type plane. This ordering is not necessary for the symbol table
because there are no other consumers of the slot numbers in the symbol table.
Replace SymbolTable with ValueSymbolTable everywhere.
Adjust bcreader, asmparser, CloneModule, Linker, etc. for changes in the
symbol table class.
Simplify the code in LinkModules.cpp that deals with undoing the type planes
in the symbol table.

lattner · 2007-01-07T00:48:17Z

#1: excellent point
#2: also a good idea, but could be done at a later stage if it is a significant amount of work

-Chris

llvmbot · 2007-01-07T07:41:49Z

Chris,

With type planes gone the symbol table no longer will allow a global variable
and a function to have the same name. That's probably wise, but just wanted to
make sure with you that this is the intent. I.e. that all global names are
unique, regardless of type.

Reid.

llvmbot · 2007-01-07T08:02:02Z

For the record, Chris answered that this would be a good thing on IRC.

llvmbot · 2007-01-27T02:13:14Z

Chris,

I have llvm/llvm-bugzilla-archive#411 working except for a few FunctionResolve test cases. If I
understand that pass, it needs to go away. Is that correct? With llvm/llvm-bugzilla-archive#411 in place,
the situation it is testing for can't happen (two functions with the same name
but different prototypes). Just wanted to check with you before I removed that
pass and its uses.

Please confirm.

Reid.

lattner · 2007-01-27T02:31:09Z

yeah, ignore funcresolve, it can be removed as part of this bug.

just make sure the linker does the right thing when you link

a.c:
void foo();
bar() {foo(123); }

b.c:
void foo(int X) {}

The two foo's should get linked up and end up having a prototype that matches the definition.

-Chris

llvmbot · 2007-01-27T02:35:46Z

Right, will do.

llvmbot · 2007-01-27T05:02:17Z

Chris,

This linking is a bit tricky. I had assumed that LinkModules.cpp already handled
the case you described. But, looking into RecursiveResolveTypes, it seems to
only deal with resolution of OpaqueType. The linker fails on the test case you
provided because it fails to recognize the two functions as the same. I'm
writing code to deal with that particular situation but I have a few questions:

What if its void @foo(i32) and void @foo(i64) ?
Should that just be a "functions with same name but different type" error?
Obviously this only applies if both are external and only one is defined.
What if the module that declares the void @foo(...) passes arguments to it
in a way that the module that defines void @foo(i32) doesn't expect. For
example: call i32* (...)* @foo( i64* %ptr ). Should I track these down
by traversing the use list and issue errors or is there some way to
resolve this?
Shouldn't "declare void @foo()" be treated as a function with zero args
instead of a function that takes any args? Your example was in C so perhaps
the CFE translates "void foo()" into "declare @foo(...)"? Does it?

Reid.

lattner · 2007-01-27T06:22:59Z

Shouldn't "declare void @foo()" be treated as a function with zero args
instead of a function that takes any args?
No. LLVM is not C.

Your example was in C so perhaps the CFE translates "void foo()" into "declare @foo(...)"?
Yep exactly.

What if its void @foo(i32) and void @foo(i64) ?
Should that just be a "functions with same name but different type" error?
Obviously this only applies if both are external and only one is defined.

C doesn't require prototypes to match, but doesn't specify what should happen if they don't.

The way we should implement this in LLVM is to pick one function as the designated result (either the
one with a body if there is one, or an arbitrary proto if they are protos), this is already done. To coerce
the mismatch, the global which is eliminated should be replaced with a constant expr that bitcasts the
real result to the "wrong" type. instcombine will then eliminate the cast if possible.

In the example above, I'd expect the result to be something like this:

bar() { (void(*)(...))(foo) (123); }

void foo(int X) {}

llvmbot · 2007-02-05T23:23:20Z

w00t! This is finally done with a series of commits today.

…m#783) This PR set a proper addrspace attribute for LLVM globals. For simplicity, a temporary pointer type is created and consumed by our LLVM type converter. The correct address space is then extracted from the converted pointer type of LLVM.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

Centaurioun mentioned this issue Dec 14, 2022

[Snyk] Upgrade vsce from 1.103.1 to 2.14.0 Centaurioun/llvm-project#14

Open

baby636 mentioned this issue Feb 2, 2024

[Snyk] Fix for 14 vulnerabilities baby636/llvm-project#25

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SymbolTable] Reconsider one symbol table for each type #783

[SymbolTable] Reconsider one symbol table for each type #783

lattner commented Jul 28, 2004

llvmbot commented Jul 28, 2004

llvmbot commented Jul 28, 2004

llvmbot commented Jul 28, 2004

lattner commented Jul 29, 2004

lattner commented Aug 4, 2004

llvmbot commented Aug 4, 2004

lattner commented Aug 4, 2004

lattner commented Aug 4, 2004

lattner commented Aug 4, 2004

llvmbot commented Aug 4, 2004

lattner commented Aug 4, 2004

llvmbot commented Dec 23, 2005

lattner commented Dec 23, 2005

llvmbot commented Jan 10, 2006

llvmbot commented Jan 12, 2006

lattner commented Jan 13, 2006

llvmbot commented Jan 14, 2006

llvmbot commented Jan 14, 2006

lattner commented Jan 14, 2006

llvmbot commented Jan 15, 2006

llvmbot commented Jan 17, 2006

lattner commented Aug 15, 2006

llvmbot commented Aug 15, 2006

lattner commented Aug 15, 2006

llvmbot commented Jan 6, 2007

lattner commented Jan 7, 2007

llvmbot commented Jan 7, 2007

llvmbot commented Jan 7, 2007

llvmbot commented Jan 27, 2007

lattner commented Jan 27, 2007

llvmbot commented Jan 27, 2007

llvmbot commented Jan 27, 2007

lattner commented Jan 27, 2007

llvmbot commented Feb 5, 2007

[SymbolTable] Reconsider one symbol table for each type #783

[SymbolTable] Reconsider one symbol table for each type #783

Comments

lattner commented Jul 28, 2004

Extended Description

llvmbot commented Jul 28, 2004

llvmbot commented Jul 28, 2004

llvmbot commented Jul 28, 2004

lattner commented Jul 29, 2004

lattner commented Aug 4, 2004

llvmbot commented Aug 4, 2004

lattner commented Aug 4, 2004

lattner commented Aug 4, 2004

lattner commented Aug 4, 2004

llvmbot commented Aug 4, 2004

lattner commented Aug 4, 2004

llvmbot commented Dec 23, 2005

lattner commented Dec 23, 2005

llvmbot commented Jan 10, 2006

llvmbot commented Jan 12, 2006

lattner commented Jan 13, 2006

llvmbot commented Jan 14, 2006

FAIL: /proj/llvm/build/../llvm/test/Feature/alignment.ll: WARNING: Redefinition of names across types has been deprecated. Consequently, 'test' has been renamed as 'test1'. Other instances will not be reported. W 6c6 < int* %test1() align 32 {

llvmbot commented Jan 14, 2006

lattner commented Jan 14, 2006

llvmbot commented Jan 15, 2006

llvmbot commented Jan 17, 2006

lattner commented Aug 15, 2006

llvmbot commented Aug 15, 2006

lattner commented Aug 15, 2006

llvmbot commented Jan 6, 2007

lattner commented Jan 7, 2007

llvmbot commented Jan 7, 2007

llvmbot commented Jan 7, 2007

llvmbot commented Jan 27, 2007

lattner commented Jan 27, 2007

llvmbot commented Jan 27, 2007

llvmbot commented Jan 27, 2007

lattner commented Jan 27, 2007

llvmbot commented Feb 5, 2007

FAIL: /proj/llvm/build/../llvm/test/Feature/alignment.ll:
WARNING: Redefinition of names across types has been deprecated.
Consequently, 'test' has been renamed as 'test1'.
Other instances will not be reported.
W
6c6
< int* %test1() align 32 {