First Last Prev Next    No search results available
Details
: SymbolTable::getUniqueName is very inefficient
Bug#: 95
: libraries
: Core LLVM classes
Status: RESOLVED
Resolution: FIXED
: All
: All
: 1.0
: P2
: normal
: 1.1

:
: quality-of-implementation
:
:
  Show dependency tree - Show dependency graph
People
Reporter: Chris Lattner <clattner@apple.com>
Assigned To: Reid Spencer <rspencer@reidspencer.com>

Attachments
GPROF output from "opt" pertaining to this bug (compressed) (66.08 KB, application/octet-stream)
2003-11-09 12:32, Reid Spencer
Details
Statistics output from "opt" (848 bytes, text/plain)
2003-11-09 13:25, Reid Spencer
Details
gprof output after the fix (65.50 KB, application/octet-stream)
2003-11-09 13:28, Reid Spencer
Details
Patch to fix this problem (2.42 KB, patch)
2003-11-09 13:31, Reid Spencer
Details


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2003-11-06 13:55
The lowerswitches pass is taking 16s on Zion to run on the 253.perlbmk
benchmark.  This is crazy, because the entire runtime of LLC is 29s, including
bytecode loading and the entire code generator.  Fixing this performance problem
will also speed up the JIT a LOT on this benchmark, because the JIT gets invoked
over and over again as the benchmark runs.

I'm probably not going to get a chance to look into this until after the 15th,
so if anyone wants to dig into it, please feel free. :)

-Chris
------- Comment #1 From Chris Lattner 2003-11-08 23:43:20 -------
FWIW, I can provide a bytecode file to anyone interested in working on this.
------- Comment #2 From Reid Spencer 2003-11-09 02:19:55 -------
Chris, I'll take over working on this. Please provide the bytecode file you
mentioned.

Reid.
------- Comment #3 From Reid Spencer 2003-11-09 03:41:07 -------
Chris,

I did a little work on this tonight but I think I need the file you mentioned.

I located the source file in LLVM with the most number of switch statements
(22): lib/ExecutionEngine/Interpreter/Execute.cpp. I compiled it with llvmgcc
and ran llc on the resulting bytecode. Compilation took about 3 seconds. Not
good, but not horrible either (otherwise idle 2GHz CPU). I checked the
corresponding assembly code (Execute.ll) and made sure it had some nice juicy
switch instructions in it. It did. 

So, I don't think I can make progress on this without either the SPEC2000 source
for 253.perlbmk (which you probably can't give me because of licensing) or the
corresponding bytecode. Please provide as I have a profiling build all rarin' to
go :)

Reid.
------- Comment #4 From Chris Lattner 2003-11-09 09:32:21 -------
Sounds great (for others, I provided Reid with the file).  Please let me know
what you find: it's probably something very obvious, I just haven't had a chance
to look at it.

Thanks a lot!

-Chris
------- Comment #5 From Reid Spencer 2003-11-09 12:32:11 -------
Created an attachment (id=8) [details]
GPROF output from "opt" pertaining to this bug (compressed)

Shows that SymbolTable::getUniqueName is the culprit for this test case because
it calls std::map::find too many times.
------- Comment #6 From Reid Spencer 2003-11-09 12:35:31 -------
I have located the performance issue on this bug. 
It is in SymbolTable::getUniqueName.

The problem is that the code calls std::map::find in a while loop that gets
exponentially long. It is counting up the unique names to find the next unique
integer that it can use. Since map::find is somewhat of a long operation, this
is expensive. It is complicated by the fact that the test case has lots of large
switch statements and functions with many switch statements. LLVM is attempting
to generate a unique symbol for each case (I presume) and in doing so is counting.

The fix will be related to adding a counter to the symbol table to keep track of
the maximum unique value so it doesn't have to call map::find in a loop, but
just increment an integer and return it. I'm reviewing the code in more detail
to ascertain a fix.

Patch pending.

I've attached the GPROF results for your perusal.

FYI: The fix to this should speed up more than just this test case :)
------- Comment #7 From Chris Lattner 2003-11-09 12:38:01 -------
That sounds very logical.  The symbol table class in general needs a lot of
cleanup (deriving from std::map is pretty gross).  Feel free to add a 'next
unique' counter to the class or whatever you think makes sense.

-Chris
------- Comment #8 From Reid Spencer 2003-11-09 12:49:57 -------
Already in the works. I'm compiling it now.

It raises a question, however. I've used a "long" for the "LastUnique" member
variable to SymbolTable. LastUnique keeps track of the last value getUniqueName
used. Is there any chance that 2^32 unique names could be required in any
translation unit? I thought it was a reasonable limit so decided to go ahead
with it but thought I'd ask.
------- Comment #9 From Chris Lattner 2003-11-09 12:53:29 -------
Long should do exactly what you want.  On a 32-bit machine, it will be 32 bits,
and we obviously cannot have more elements in the symbol table than we can fit
in memory.  On most 64-bit machines, long will become 64 bits.

-Chris
------- Comment #10 From Reid Spencer 2003-11-09 13:23:37 -------
The fix is completed. SwitchInst now takes only 0.29 seconds and is second to
the Dominator pass. Significant improvement over the previous 16 second
execution time. Patch to follow shortly.

The -time-passes statistics are attached. 
------- Comment #11 From Reid Spencer 2003-11-09 13:25:00 -------
Created an attachment (id=9) [details]
Statistics output from "opt"

Shows the LowerSwitchInst pass now taking only 0.29 seconds wall clock time.
------- Comment #12 From Chris Lattner 2003-11-09 13:26:34 -------
*Mmm*, tasty!  That's what it is supposed to look like!  :)

-Chris
------- Comment #13 From Reid Spencer 2003-11-09 13:28:51 -------
Created an attachment (id=10) [details]
gprof output after the fix

Shows that the time spent in SymbolTable::getUniqueName is now negligible.
------- Comment #14 From Reid Spencer 2003-11-09 13:31:47 -------
Created an attachment (id=11) [details]
Patch to fix this problem
------- Comment #15 From Chris Lattner 2003-11-09 13:41:58 -------
This is now fixed, big thanks fly out to Reid Spencer figuring this one out!

http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031103/009258.html
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031103/009259.html

-Chris
------- Comment #16 From Reid Spencer 2003-11-09 13:46:08 -------
Problem Resolved. 

The change was fairly simple and involved only the SymbolTable class. A counter,
LastUnique, was added to the class to prevent the getUniqueName from counting
previously used names. This reduced the number of calls to std:map:find from 7.2
million to 34,944. That number could be chopped in half again if we remove the
loop in SymbolTable::getUniqueName. In my fix, I wasn't sure if numbered symbol
names could creep into the symbol table from other sources (besides
getUniqueName) so I decided it was worth an additional std:map:find call to make
sure the name really is unique. Someone with more knowledge than me might be
able to prove that this situation couldn't occur so that the extra std:map:find
call could be removed.

As it is, the performance improvement is significant. getUniqueName went from
9.76 seconds (85.4%) to 0.02 seconds (1.6%) as reported by gprof. 
------- Comment #17 From Chris Lattner 2003-11-09 13:49:15 -------
Yes, the loop is required.  This code gets invoked when we have something like this:

%X = add int ...

and we try to add a new variable named "X".  It will rename it to X2, for
example.  However, if there is already an "X2" in the program, it must pick a
different name, though it is pretty unlikely.

The algorithm used by be worst case N^2, now it is worst case O(n), but in
practice it should be constant time now.  :)

-Chris

First Last Prev Next    No search results available