Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate elimination of the -raise pass #1444

Closed
lattner opened this issue Jan 4, 2007 · 32 comments
Closed

Investigate elimination of the -raise pass #1444

lattner opened this issue Jan 4, 2007 · 32 comments
Labels
bugzilla Issues migrated from bugzilla code-cleanup

Comments

@lattner
Copy link
Collaborator

lattner commented Jan 4, 2007

Bugzilla Link 1072
Resolution FIXED
Resolved on Feb 22, 2010 12:47
Version trunk
OS MacOS X

Extended Description

The -raise pass was mainly around to eliminate casts from int -> uint etc through some tricky
transformations. With the recent signless work, this pass should be mostly useless. Eliminating it is good
as it reduces compile time (one fewer pass), eliminates a bunch of ugly code (the raise pass is very nasty),
and potentially eliminating the LLVMTransforms library (ExprTypeConvert.cpp, LevelRaise.cpp, and
TransformInternals.cpp).

We should do performance analysis (e.g. spec2k and 2k6) to see whether this is completely dead, and if
not, reimplement the useful pieces in other passes.

-Chris

@lattner
Copy link
Collaborator Author

lattner commented Jan 6, 2007

When this happens, we can eliminate Type::canLosslesslyBitCastTo and probably some other stuff only
used by -raise.

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 1, 2007

Is there anything preventing this from being eliminated now?

Can we use the stats to determine how much it fires (if at all)?

What pieces does it implement that might still be useful to retain?

@lattner
Copy link
Collaborator Author

lattner commented Feb 1, 2007

Nothing prevents it. We just need someone to do an llvm-test run with an without it to see if there are
any perf differences.

-Chris

@lattner
Copy link
Collaborator Author

lattner commented Feb 1, 2007

or even just run SPEC2K + SPEC2K6.

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 1, 2007

I will set up overnight performance test run.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

I want this gone. :)

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

Performance Data Spreadsheet
This attachment contains a spreadsheet in Excel format that contains the raw
data and a comparison of the External tests with and without the -raise pass.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

Summary Of Performance Results (See the Spreadsheet)

Four SPEC test suites were compared: CINT2000, CFP2000, CINT2006, and CFP2006.

In general it looks positive to remove the -raise pass. The overall results
(sum of all tests) are as follows:

                        +--Compile-+   +----------Run Times--------+

ITEM GCCAS Bytecode LLC JIT GCC CBE LLC JIT
Delta 151.05 818298 -9.07 16.2 14.66 20.81 46.40 57.15
Percent 6.49% 1.95% -1.09% 3.09% 2.18% 3.14% 7.12% 4.04%

The "Delta" values shown are the difference in aggregate time between "with
-raise" and "without -raise" runs across all test cases. Negative values mean
the "without -raise" value was larger. Positive values mean the "without -raise"
value was smaller. Because these values represent the subtraction of the
"without -raise" from the "with -raise" data sets, positive values favor removal
of -raise while negative values favor keeping -raise.

The "Percent" values are computed by dividing the "Delta" value by the "with
-raise" sum. This gives an approximation of the magnitude of change introduced
by removing the -raise pass.

To interpret this correctly, one would say: "On average, the -raise pass
increases bytecode volumes by nearly 2% and LLC runtimes by over 7%".

The only negative impact of removing -raise seems to be a 1.09% increase in the
LLC compilation time. This is rather surprising given that there is less work
done by llc without -raise.

Note that these results account for only one run of the nightly tests.
Additional samples would be welcome.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

Performance Data Spreadsheet
This updates some formatting and gets rid of columns that weren't run in both
tests and rows that indicate test failures.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2007

Chris,

Please let me know if this is sufficient evidence to support removal of the
-raise pass.

Reid

@lattner
Copy link
Collaborator Author

lattner commented Feb 2, 2007

Yes, it looks like we are close to it. Things worth investigating:

  • What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once understood, we can reimplement
    this in some other pass.

  • I have little confidence in the stabilities of these numbers, they seem very noisy. Can you see if the
    254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns reproduce? If so, we should
    investigate what is happening there too.

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

  • What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once
    understood, we can reimplement this in some other pass.

I ran 401.bzip2 using "head" from today and compared it against the "without
-raise" run I did last night. The bytecode sizes are identical. Consequently, I
don't think I was comparing apples with apples. I used the results of my nightly
run from a couple days ago for the "with -raise" data but my "without -raise"
also included the shift patch, which may have affected optimization (positively).

So, guess I get to do this again.

Devang, did you run this comparison on Darwin?

  • I have little confidence in the stabilities of these numbers, they seem very
    noisy.

I eliminated as much interference workload as I could by shutting down email,
xchat, etc. and just let it run. I think the issue here is the difference in the
builds. I'll run the comparison again and report new numbers. Unfortunately,
it'll have to wait until this evening for "quiet machine".

Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and
256.bzip2/cbe slowdowns reproduce? If so, we should investigate what is
happening there too.

I'll defer this until after I get new numbers. If they still look problematic,
I'll investigate then.

@lattner
Copy link
Collaborator Author

lattner commented Feb 3, 2007

Makes sense. A lot has changed in a few days. I suspect the 'noise' is really measuring these changes :)

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

Yes. I now have two environments that were updated at the same moment. The only
difference is that one has -raise and one doesn't. This should be good for
testing. I'm going to do Olden with GET_STABLE_NUMBERS=1 because its relatively
quick and a good test for stability. If that goes okay, I'll get to External
tests this weekend (7+ hours each).

@lattner
Copy link
Collaborator Author

lattner commented Feb 3, 2007

sounds great, thanks reid!

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

Olden Performance Comparison
This looks much more reasonable. The difference in the runs is in the small
"noise" range resulting from me using the computer while the test is ongoing.
These were done single threaded and I'll do the same as for the Externals.

Interesting note: there is 0 difference in bytecode sizes. Perhaps -raise
doesn't do anything at all on Olden?

@lattner
Copy link
Collaborator Author

lattner commented Feb 3, 2007

there is 0 difference in bytecode sizes. Perhaps -raise
doesn't do anything at all on Olden?

Cool. You could diff them to verify.

This is a good sign -raise isn't doing anything any more. :) SPEC will give more certainty though.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

[reid@bashful llvm-test-1]$ for f in
MultiSource/Benchmarks/Olden//Output/.llvm.bc ; do

cmp $f ../llvm-test-4/$f
done
[reid@bashful llvm-test-1]$ echo $?
0

Looks pretty identical to me.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

External Performance Comparison (Spreadsheet)
A comparison of the SPEC tests with and without -raise.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

Looking at the External test comparison, it doesn't look like -raise does much.

However a few things could be investigated:

255.vortex - Why does raise eliminate 733 bytes of bytecode?
433.milc - What does raise do to make llc 1.9 seconds faster?
456.hmmer - What does raise do to make llc 1.54 seconds fasters?
447.dealII - Why is "gccas" (opt) 2.93 seconds faster with -raise?
400.perlbench - What does raise do to make jit 1.04 seconds faster, especially
with 1133 more bytecode?

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

We should generate comparisons on Darwin/PPC as well just to make sure that
-raise doesn't assist any backend opts on that platform. I don't see why it
would, but we should check.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

For 255.vortex, here's what I found:

  1. The bytecode size difference is due to the -raise pass removing or
    altering the names of values. A diff of the llvm assembly corresponding to
    the .llvm.bc files from both cases produced 6000 lines of diffs. But, the
    only way they differed was in the names of values. I ran opt -strip on these
    files and re-ran the diff. It was vastly shorter. I'll attach it.

  2. The diff produces in #​1 showed that -raise will fold a GEP instruction into
    a store instruction using a GEP constant expression. Without -raise this
    doesn't happen. There is also some changes in the # of uses probably related
    to the GEP folding into the store.

Its not clear why -instcombine wouldn't do the same transform as #​2.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

255.vortex difference on stripped files
This file contains the llvm assembly differences for 255.vortex.llvm.bc after
opt -strip has been run on them.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

433.milc differences
For 433.milc the only difference is that -raise moved a GEP instruction down
one instruction in its basic block. Not sure how that can produce a 1.9 second
difference in execution time.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

456.hmmer differences
There are numerous differences on this test case. More analysis will be needed.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

447.dealII differences
Again, there's a lot of differences in this one. Many of them seem to be the
relocation of gep instructions.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 3, 2007

For 400.perlbench:

Unfortunately, my machine doesn't have enough memory to do a diff of two 2.7GB
LLVM assembly files.

@lattner
Copy link
Collaborator Author

lattner commented Feb 4, 2007

the 255.vortex differences aren't worth worrying about, we're good there.

@lattner
Copy link
Collaborator Author

lattner commented Feb 4, 2007

ok, go ahead and nuke it.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla code-cleanup
Projects
None yet
Development

No branches or pull requests

2 participants