Bugzilla – Bug 95
SymbolTable::getUniqueName is very inefficient
Last modified: 2003-11-09 13:49:15
You need to log in before you can comment on or make changes to this bug.
The lowerswitches pass is taking 16s on Zion to run on the 253.perlbmk benchmark. This is crazy, because the entire runtime of LLC is 29s, including bytecode loading and the entire code generator. Fixing this performance problem will also speed up the JIT a LOT on this benchmark, because the JIT gets invoked over and over again as the benchmark runs. I'm probably not going to get a chance to look into this until after the 15th, so if anyone wants to dig into it, please feel free. :) -Chris
FWIW, I can provide a bytecode file to anyone interested in working on this.
Chris, I'll take over working on this. Please provide the bytecode file you mentioned. Reid.
Chris, I did a little work on this tonight but I think I need the file you mentioned. I located the source file in LLVM with the most number of switch statements (22): lib/ExecutionEngine/Interpreter/Execute.cpp. I compiled it with llvmgcc and ran llc on the resulting bytecode. Compilation took about 3 seconds. Not good, but not horrible either (otherwise idle 2GHz CPU). I checked the corresponding assembly code (Execute.ll) and made sure it had some nice juicy switch instructions in it. It did. So, I don't think I can make progress on this without either the SPEC2000 source for 253.perlbmk (which you probably can't give me because of licensing) or the corresponding bytecode. Please provide as I have a profiling build all rarin' to go :) Reid.
Sounds great (for others, I provided Reid with the file). Please let me know what you find: it's probably something very obvious, I just haven't had a chance to look at it. Thanks a lot! -Chris
Created an attachment (id=8) [details] GPROF output from "opt" pertaining to this bug (compressed) Shows that SymbolTable::getUniqueName is the culprit for this test case because it calls std::map::find too many times.
I have located the performance issue on this bug. It is in SymbolTable::getUniqueName. The problem is that the code calls std::map::find in a while loop that gets exponentially long. It is counting up the unique names to find the next unique integer that it can use. Since map::find is somewhat of a long operation, this is expensive. It is complicated by the fact that the test case has lots of large switch statements and functions with many switch statements. LLVM is attempting to generate a unique symbol for each case (I presume) and in doing so is counting. The fix will be related to adding a counter to the symbol table to keep track of the maximum unique value so it doesn't have to call map::find in a loop, but just increment an integer and return it. I'm reviewing the code in more detail to ascertain a fix. Patch pending. I've attached the GPROF results for your perusal. FYI: The fix to this should speed up more than just this test case :)
That sounds very logical. The symbol table class in general needs a lot of cleanup (deriving from std::map is pretty gross). Feel free to add a 'next unique' counter to the class or whatever you think makes sense. -Chris
Already in the works. I'm compiling it now. It raises a question, however. I've used a "long" for the "LastUnique" member variable to SymbolTable. LastUnique keeps track of the last value getUniqueName used. Is there any chance that 2^32 unique names could be required in any translation unit? I thought it was a reasonable limit so decided to go ahead with it but thought I'd ask.
Long should do exactly what you want. On a 32-bit machine, it will be 32 bits, and we obviously cannot have more elements in the symbol table than we can fit in memory. On most 64-bit machines, long will become 64 bits. -Chris
The fix is completed. SwitchInst now takes only 0.29 seconds and is second to the Dominator pass. Significant improvement over the previous 16 second execution time. Patch to follow shortly. The -time-passes statistics are attached.
Created an attachment (id=9) [details] Statistics output from "opt" Shows the LowerSwitchInst pass now taking only 0.29 seconds wall clock time.
*Mmm*, tasty! That's what it is supposed to look like! :) -Chris
Created an attachment (id=10) [details] gprof output after the fix Shows that the time spent in SymbolTable::getUniqueName is now negligible.
Created an attachment (id=11) [details] Patch to fix this problem
This is now fixed, big thanks fly out to Reid Spencer figuring this one out! http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031103/009258.html http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031103/009259.html -Chris
Problem Resolved. The change was fairly simple and involved only the SymbolTable class. A counter, LastUnique, was added to the class to prevent the getUniqueName from counting previously used names. This reduced the number of calls to std:map:find from 7.2 million to 34,944. That number could be chopped in half again if we remove the loop in SymbolTable::getUniqueName. In my fix, I wasn't sure if numbered symbol names could creep into the symbol table from other sources (besides getUniqueName) so I decided it was worth an additional std:map:find call to make sure the name really is unique. Someone with more knowledge than me might be able to prove that this situation couldn't occur so that the extra std:map:find call could be removed. As it is, the performance improvement is significant. getUniqueName went from 9.76 seconds (85.4%) to 0.02 seconds (1.6%) as reported by gprof.
Yes, the loop is required. This code gets invoked when we have something like this: %X = add int ... and we try to add a new variable named "X". It will rename it to X2, for example. However, if there is already an "X2" in the program, it must pick a different name, though it is pretty unlikely. The algorithm used by be worst case N^2, now it is worst case O(n), but in practice it should be constant time now. :) -Chris