LLVM currently keeps one symbol table for each type plane in the program. There are historical reasons for this, but they aren't particularly relevant any more. I think the symbol table should just be a simple wrapper around two maps from std::string to Value* and Type* (a one level map, not a two level one like we have). Here are some of the problems that the current symbol table design causes: 1. This is really slow in the common case. In particular, to get to an element in the symbol table, we need to go through two levels of maps (one from type -> ValueMap, one from ValueMap -> Value*). In particular, when gccld'ing 252.eon, the top two killers are map operations which I believe are due directly to the symbol table. 2. Important operations are really slow. In particular Module::getNamedFunction takes linear time with size of the module because we don't know what type the named function will have. This leads to the nastiness we see in the Module::getMainFunction() implementation. 3. Linking is a nightmare, and because of this nightmare, we have to have things like the function resolution pass, which is a certified disturbing hack. 4. I'm beginning to think that the function scoped symbol table and the module scoped symbol table should be different classes entirely (once simplified). In particular, the function-scope symbol table cannot have types in it. 5. Another capability that we want is to be able to completely disable the function-level symbol table in certain cases, such as release mode. Names *inside* of a function have no meaning, they are just there to make debugging easier. Why should release mode have to pay for all of those map lookups? Something that is important to point out is that LLVM is not serving the front-end at all by having type-specific symbol table planes at the global scope. In particular, even a language that wants to allow overloading based on type, for example, will have to implement name mangling itself: the LLVM types for a particular method are not necessarily going to be the distinct in the same cases the source-level types would be (e.g. due to structural equivalence). So anyway, here is the proposal: 1. Get rid of the type-sensitivity in the symbol table. 2. Split the SymbolTable class into ModuleSymbolTable and FunctionSymbolTable 3. Drop type-tracking from the FunctionSymbolTable. 4. In NDEBUG mode, FunctionSymbolTable would be a noop, simply ignoring the names inserted into it (setName on a Instruction, Argument, or BasicBlock would be a noop). Thoughts? -Chris
I won't miss type-sensitivity. Go for it.
All sounds good to me. I argued for removal of the type plane when I did the symbol table rewrite, so this would be a welcome, simplifying change.
One thing to note: bytecode changes. If this symbol table change is made, it will break the use of SymbolTable in the bcwriter and bcreader. That's fine, but to also implement it properly, we should really clean up the symboltable bc file format. So, if we're going to do this, I suggest we do it before 1.3 goes out so we don't impact 1.3 users with yet another bc change in 1.4. I'd really like to get the bc file format (relatively) stable in the 1.3 release so my file format document is accurate (it currently needs updating). Reid.
I think this change can be made without breaking either .ll or .bc formats. We should definitely avoid breaking fundamental things, though perhaps for LLVM 2.0 we can simplify them too. -Chris
Here is some more data. When running gccld on 252.eon with -disable-opt, I see the following numbers with a profiling build: % cumulative self self total time seconds seconds calls s/call s/call name 35.11 2.76 2.76 45591495 0.00 0.00 std::_Rb_tree_base_iterator::_M_increment() 26.97 4.88 2.12 4833838 0.00 0.00 std::_Rb_tree<std::string const, std::pair<std::string const, llvm: :Value*>, std::_Select1st<std::pair<std::string const, llvm::Value*> >, std::less<std::string const>, std::allocator<std: :pair<std::string const, llvm::Value*> > >::find(std::string const&) That is spending about 60% of the time in the symbol table and map operations. Ugh. -Chris
When you say "get rid of type sensitivity", I'm assuming that you mean the nested maps would go away. That is, the ModuleSymbolTable would contain simply a std::map<std::string,Value*>. Is this correct? Furthermore, for the FunctionSymbolTable, I'd like to make sure the NDEBUG interface to it compiles to nothing. That is, the public interface is inline empty functions in NDEBUG mode. I still think this needs to get done by 1.3 because of bytecode changes. Its not a big deal but right now the bcwriter writes all types in the symbol table first. This would become empty when this bug is implemented but there is still a wasted (zero valued) length field that will occur in every function. Its not a big deal but since we've changed so much bytecode in 1.3, I'd like to get this cleanup changed. Also, in NDEBUG mode, we shouldn't write any function level symbol tables to the bytecode at all.
>When you say "get rid of type sensitivity", I'm assuming that you mean the >nested maps would go away. That is, the ModuleSymbolTable would contain simply a >std::map<std::string,Value*>. Is this correct? Yes, exactly. I will post some more compelling numbers that show why this is necessary soon. :) > Furthermore, for the FunctionSymbolTable, I'd like to make sure the NDEBUG > interface to it compiles to nothing. That is, the public interface is inline > empty functions in NDEBUG mode. Yes, also true. The basic goal would be to make Instruction::setName() do as little as possible. Unfortunately it still needs to be a virtual method (exposed through Value) but maybe that could change in the future. > I still think this needs to get done by 1.3 because of bytecode changes. > ... This would become empty when this bug is implemented but there is still > a wasted (zero valued) length field that will occur in every function. I don't think this is the case... empty blocks that are opened in the bcwriter but have no data emitted to them do not make it to the bc file. > but right now the bcwriter writes all types in the symbol table first Why would this change? > Also, in NDEBUG mode, we shouldn't write any function level symbol tables > to the bytecode at all. That is already implemented. If you -strip a bytecode file, it does not write out function-level symbol tables at all. -Chris
Here is some more exciting and compelling evidence for this bug. :) After fixing other inefficiencies in the vmcore and bcreader, this problem is magnified more and more. In a profiled gccld of eon with -disable-opt, the top two lines of the profile are: % cumulative self self total time seconds seconds calls s/call s/call name 40.08 2.08 2.08 4833838 0.00 0.00 std::map<string, Value*>::find 13.49 2.78 0.70 5966948 0.00 0.00 std::_Rb_tree_base_iterator::_M_increment() The big consumer of std::_Rb_tree_base_iterator::_M_increment() and find seem to be due to "FindGlobalNamed" in VMCore/Linker.cpp, a horrible function that loops over all of the symbol table planes and calls .find() on each one (gack!!!). This would be fixed with a non-type-descriminating SymbolTable. Here is the profile line in question: 0.58 4.05 0.03 7575 0.00 0.00 FindGlobalNamed gccld takes 8.5s total on eon without optimization, so this function corresponds to about 50% of the time to link the program. uggghhh! -Chris
Okay, I hacked around the FindGlobalNamed problem, dramatically speeding up the linker: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040802/016776.html However, this PR still should be fixed. :) -Chris
>> I still think this needs to get done by 1.3 because of bytecode changes. >> ... This would become empty when this bug is implemented but there is still >> a wasted (zero valued) length field that will occur in every function. > >I don't think this is the case... empty blocks that are opened in the bcwriter >but have no data emitted to them do not make it to the bc file. I mis-spoke a bit. Its not the length field that will be wasted, its the slot number. The type planes are not blocks and therefore the "auto-eliding" feature doesn't apply. What I'm talking about is the byte structure of the type planes themselves. Each type plane is preceded by a count and a slot number for the type. So, if we compress everything to a single type plane and retain the current format, then we have the symbol table like this: 1. Block Header - Block Type & Length 2. VBR_UInt - number of types that follow 3. (String,SlotNum)* - the name/type pairs 4. VBR_UInt - number of values that follow 5. VBR_UInt - slot number of the type plane's type 6. (String,SlotNum)* - the name/value pairs 7. VBR_Uint - zero (indicates end of type planes) So, what I'm advocating is that we eliminate 5 and 7 from the bytecode format. Like I said, its not a big deal, but it could save us a few bytes and might as well get cleaned up before we release 1.3 >> but right now the bcwriter writes all types in the symbol table first > >Why would this change? Functions don't contain types. Types are all written at the global level. They aren't named inside functions either. However, we use the same bytecode format for a ModuleSymbolTable and a FunctionSymbolTable. So, what I'm advocating is to remove fields 2 and 3 (in above list) for functions since they will always be zero/empty for functions. We could clean that up so we don't have this single byte overhead for functions. In NDEBUG mode it wouldn't matter because the whole symbol table would vaporize but we should clean this up so the bytecode specification isn't confusing (i.e. specifying fields that never have any content).
I really don't think that this is a big deal (there are much better things for your to spend your time on), but if you really want to do this, I guess ok. Note that this is a big risk right before the release, so unless you feel REALLY inclined to do this, I would prefer not to. Also, the bc format will probably change for 1.4 whether we like it or not :) -Chris
Going to give this one another crack.
Cool! One thing to be aware of: we have some intrinsics that want to work with any type. For example, llvm.sqrt wants to work with both float and double operands. I think the right solution to this is to change over to using llvm.sqrt.f32 and llvm.sqrt.f64, and make the asmreader map llvm.sqrt to these two. -Chris
These patches implement the TypeSymbolTable and ValueSymbolTable classes, a first step towards the refactoring. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030628.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030629.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060109/030630.html
Sabre, There's a problem with this implementation. When reading a bytecode file, I can't instantiate a Value without knowing its type because of the type planes in the reader. If I flatten the reader's data structures to not use type planes, there are two consequences: higher vbr slot values (bigger bc file size), and complexity for backwards compatibility. I think the correct thing to do is leave the type planes in the bytecode file format. This will leave the bc file format unchanged but require some additional code to generate the type planes when generating the byte code. Any thoughts on this?
>I think the correct thing to do is leave the type planes in the bytecode file >format. This will leave the bc file format unchanged but require some additional >code to generate the type planes when generating the byte code. Sounds fine to me. This will make the writer a bit slower, but the reader fast. Sounds like a good compromise, particularly for JIT applications. Thanks for digging into this Reid! -Chris
Chris, Type planes have been retained in the bytecode file (but not the symtab) and that part seems to be working fine. I have gotten rid of many bugs in the asmparser but am now faced with a failing test suite because diffs produce output because functions have been renamed. I'm not entirely sure what to do about this. For example, consider Feature/alignment.ll. Runing "make check" produces: FAIL: /proj/llvm/build/../llvm/test/Feature/alignment.ll: WARNING: Redefinition of names across types has been deprecated. Consequently, 'test' has been renamed as 'test1'. Other instances will not be reported. W 6c6 < int* %test1() align 32 { --- > int* %test11() align 32 { 13c13 < int* %test22() { --- > int* %test222() { The RUN lines look like: ; RUN: llvm-as %s -o - | llvm-dis > %t1.ll ; RUN: llvm-as %t1.ll -o - | llvm-dis > %t2.ll ; RUN: diff %t1.ll %t2.ll These kinds of tests (numerous) all depend on the dis-assembler being able to regurgitate its input verbatim. However, the new assembler uniquifies any duplicate function names. I'm trying to think of ways to not alter the test scripts, but I don't see a way to do that. Any ideas?
Okay, the re-naming issue turned out to be a red herring. I wasn't checking for the case that the same value was being renamed, so it was getting a new name. I'm back to just having an ilist increment off the end of the list.
yeah, these sorts of tests first asm/disasm the llvm file to get a canonical output (this part should do the renaming). Second they asm/disasm *that* to make sure that the canonical output is matched. This second pass shouldn't do any renaming. -Chris
I'm now looking at the tests that use the same intrinsic with different argument types (e.g. isunordered). Currently, I have one of them getting a "new name". That's probably not desireable ;> I recall you wanting to split these out by type so we'd have llvm.isunordered.float and llvm.isunordered.double. Caveats of this are: 1. lots of changes throughout LLVM (thwarts incrementalism) 2. test cases have to change for new names of intrinsics. I was thinking we could just make the assembler "smart" about intrinsics and avoid renaming when an intrinsic name is used. Got any other ideas?
De-overloading of intrinsics has been implemented with these patches: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030849.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030850.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030851.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030852.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030853.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030854.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030856.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030859.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030855.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030858.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030857.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030860.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060116/030865.html
Another idea: fixing this bug will take a concerted effort and is a nasty one to fix. OTOH, the real *performance* part of the problem is due to the two level maps. We currently have (in SymbolTable): typedef std::map<const std::string, Value *> ValueMap; typedef std::map<const Type *, ValueMap> PlaneMap; Things would be much more efficient (both in memory-use and compile-time) if we used this instead: std::map<std::pair<const Type *, std::string>, ValueMap> ... and this should be a *much* easier change to make. -Chris
I don't see how that helps. Instead of looking up the plane and then searching a that plane's map for the corresponding name, you're now asking a (bigger) map to search using comparisons of a std::pair. We can probably short-circuit the comparison functor so that it amounts to about the same thing, but what does this save us? How about we switch this to the google dense hash map? Reid.
I already clarified with reid, but for the record: This is a size win, because it eliminates one std::map per type plane, a significant memory win. This is a performance win, because it reduces the number of pointer traversals required to do a lookup. Using a better map is another potentially useful change, but is orthogonal to this one. This is required no matter what the implementation of map used. -Chris
An incremental step has been taken with patches committed today by separating types into their own symbol table. This unclutters the SymbolTable class and paves the way for it to be replaced with ValueSymbolTable class, which has no type planes. Next Steps: 1. Revise the logic in bcwriter to compute the type planes without the help of the SymbolTable class. We still want to retain type planes in the bc format so that we don't have to write a bunch of type/value/name triples but instead organize them by type and just write value/name pairs. This saves space but also makes the bcreader faster. 2. While we're at it, also revise the logic in bcwriter to order Values by usage. This gives the most frequently used values the lowest slot numbers in their type plane. This ordering is not necessary for the symbol table because there are no other consumers of the slot numbers in the symbol table. 3. Replace SymbolTable with ValueSymbolTable everywhere. 4. Adjust bcreader, asmparser, CloneModule, Linker, etc. for changes in the symbol table class. 5. Simplify the code in LinkModules.cpp that deals with undoing the type planes in the symbol table.
#1: excellent point #2: also a good idea, but could be done at a later stage if it is a significant amount of work -Chris
Chris, With type planes gone the symbol table no longer will allow a global variable and a function to have the same name. That's probably wise, but just wanted to make sure with you that this is the intent. I.e. that all global names are unique, regardless of type. Reid.
For the record, Chris answered that this would be a good thing on IRC.
Chris, I have PR411 working except for a few FunctionResolve test cases. If I understand that pass, it needs to go away. Is that correct? With PR411 in place, the situation it is testing for can't happen (two functions with the same name but different prototypes). Just wanted to check with you before I removed that pass and its uses. Please confirm. Reid.
yeah, ignore funcresolve, it can be removed as part of this bug. just make sure the linker does the right thing when you link a.c: void foo(); bar() {foo(123); } b.c: void foo(int X) {} The two foo's should get linked up and end up having a prototype that matches the definition. -Chris
Right, will do.
Chris, This linking is a bit tricky. I had assumed that LinkModules.cpp already handled the case you described. But, looking into RecursiveResolveTypes, it seems to only deal with resolution of OpaqueType. The linker fails on the test case you provided because it fails to recognize the two functions as the same. I'm writing code to deal with that particular situation but I have a few questions: 1. What if its void @foo(i32) and void @foo(i64) ? Should that just be a "functions with same name but different type" error? Obviously this only applies if both are external and only one is defined. 2. What if the module that declares the void @foo(...) passes arguments to it in a way that the module that defines void @foo(i32) doesn't expect. For example: call i32* (...)* @foo( i64* %ptr ). Should I track these down by traversing the use list and issue errors or is there some way to resolve this? 3. Shouldn't "declare void @foo()" be treated as a function with zero args instead of a function that takes any args? Your example was in C so perhaps the CFE translates "void foo()" into "declare @foo(...)"? Does it? Reid.
>Shouldn't "declare void @foo()" be treated as a function with zero args > instead of a function that takes any args? No. LLVM is not C. > Your example was in C so perhaps the CFE translates "void foo()" into "declare @foo(...)"? Yep exactly. > What if its void @foo(i32) and void @foo(i64) ? > Should that just be a "functions with same name but different type" error? > Obviously this only applies if both are external and only one is defined. C doesn't require prototypes to match, but doesn't specify what should happen if they don't. The way we should implement this in LLVM is to pick one function as the designated result (either the one with a body if there is one, or an arbitrary proto if they are protos), this is already done. To coerce the mismatch, the global which is eliminated should be replaced with a constant expr that bitcasts the real result to the "wrong" type. instcombine will then eliminate the cast if possible. In the example above, I'd expect the result to be something like this: bar() { (void(*)(...))(foo) (123); } void foo(int X) {}
w00t! This is finally done with a series of commits today.