Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSE is really slow on this input #1795

Closed
llvmbot opened this issue May 15, 2007 · 17 comments
Closed

DSE is really slow on this input #1795

llvmbot opened this issue May 15, 2007 · 17 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla slow-compile

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented May 15, 2007

Bugzilla Link 1423
Resolution FIXED
Resolved on Jan 24, 2014 13:30
Version 1.0
OS All
Attachments iterative_me.bc, iterative_me.opt.bc, iterative_me.opt.s.bz2
Reporter LLVM Bugzilla Contributor
CC @isanbard

Extended Description

This is a very problematic function.

  • The opt takes more than 20 min.
  • The llc takes about 10 min.
  • The code generated for ARM is very poor.
@llvmbot
Copy link
Collaborator Author

llvmbot commented May 15, 2007

assigned to @lattner

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 16, 2007

laurov@laurov-desktop:/tmp$ opt -time-passes -std-compile-opts iterative_me.bc
-o iterative_me.opt.bc -f
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 1555.4652 seconds (2394.6937 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- ---
Name ---
1025.0560 ( 65.9%) 0.4360 ( 56.4%) 1025.4920 ( 65.9%) 1746.6673 ( 72.9%)
Dead Store Elimination
527.5409 ( 33.9%) 0.3280 ( 42.4%) 527.8689 ( 33.9%) 645.5995 ( 26.9%)
Global Common Subexpression Elimination
0.2600 ( 0.0%) 0.0000 ( 0.0%) 0.2600 ( 0.0%) 0.2602 ( 0.0%)
Combine redundant instructions
0.2600 ( 0.0%) 0.0000 ( 0.0%) 0.2600 ( 0.0%) 0.2598 ( 0.0%)
Combine redundant instructions
0.2560 ( 0.0%) 0.0000 ( 0.0%) 0.2560 ( 0.0%) 0.2569 ( 0.0%)
Combine redundant instructions
0.2440 ( 0.0%) 0.0040 ( 0.5%) 0.2480 ( 0.0%) 0.2502 ( 0.0%)
Combine redundant instructions
0.1760 ( 0.0%) 0.0040 ( 0.5%) 0.1800 ( 0.0%) 0.1810 ( 0.0%)
Combine redundant instructions
0.0800 ( 0.0%) 0.0000 ( 0.0%) 0.0800 ( 0.0%) 0.1796 ( 0.0%)
Bitcode Writer
0.0600 ( 0.0%) 0.0000 ( 0.0%) 0.0600 ( 0.0%) 0.1587 ( 0.0%)
Aggressive Dead Code Elimination
0.1160 ( 0.0%) 0.0000 ( 0.0%) 0.1160 ( 0.0%) 0.1294 ( 0.0%)
Combine redundant instructions
0.0240 ( 0.0%) 0.0000 ( 0.0%) 0.0240 ( 0.0%) 0.1239 ( 0.0%) Find
Used Types
0.0440 ( 0.0%) 0.0000 ( 0.0%) 0.0440 ( 0.0%) 0.0529 ( 0.0%) Sparse
Conditional Constant Propagation
0.0520 ( 0.0%) 0.0000 ( 0.0%) 0.0520 ( 0.0%) 0.0516 ( 0.0%)
Loop-Closed SSA Form Pass
0.0440 ( 0.0%) 0.0000 ( 0.0%) 0.0440 ( 0.0%) 0.0474 ( 0.0%) Module
Verifier
0.0480 ( 0.0%) 0.0000 ( 0.0%) 0.0480 ( 0.0%) 0.0470 ( 0.0%) Module
Verifier
0.0440 ( 0.0%) 0.0000 ( 0.0%) 0.0440 ( 0.0%) 0.0421 ( 0.0%) Loop
Invariant Code Motion
0.0360 ( 0.0%) 0.0000 ( 0.0%) 0.0360 ( 0.0%) 0.0348 ( 0.0%) Scalar
Replacement of Aggregates
0.0320 ( 0.0%) 0.0000 ( 0.0%) 0.0320 ( 0.0%) 0.0295 ( 0.0%)
Canonicalize Induction Variables
0.0280 ( 0.0%) 0.0000 ( 0.0%) 0.0280 ( 0.0%) 0.0258 ( 0.0%)
Simplify the CFG
0.0240 ( 0.0%) 0.0000 ( 0.0%) 0.0240 ( 0.0%) 0.0225 ( 0.0%)
Reassociate expressions
0.0240 ( 0.0%) 0.0000 ( 0.0%) 0.0240 ( 0.0%) 0.0216 ( 0.0%)
Unswitch loops
0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0206 ( 0.0%)
Canonicalize natural loops
0.0160 ( 0.0%) 0.0000 ( 0.0%) 0.0160 ( 0.0%) 0.0159 ( 0.0%)
Simplify the CFG
0.0160 ( 0.0%) 0.0000 ( 0.0%) 0.0160 ( 0.0%) 0.0131 ( 0.0%) Dead
Global Elimination
0.0120 ( 0.0%) 0.0000 ( 0.0%) 0.0120 ( 0.0%) 0.0127 ( 0.0%)
Natural Loop Construction
0.0119 ( 0.0%) 0.0000 ( 0.0%) 0.0119 ( 0.0%) 0.0119 ( 0.0%)
Post-Dominator Tree Construction
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0101 ( 0.0%)
Promote Memory to Register
0.0120 ( 0.0%) 0.0000 ( 0.0%) 0.0120 ( 0.0%) 0.0100 ( 0.0%)
Simplify the CFG
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0099 ( 0.0%)
Loop-Closed SSA Form Pass
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0095 ( 0.0%)
Simplify the CFG
0.0120 ( 0.0%) 0.0000 ( 0.0%) 0.0120 ( 0.0%) 0.0089 ( 0.0%) Break
critical edges in CFG
0.0079 ( 0.0%) 0.0000 ( 0.0%) 0.0079 ( 0.0%) 0.0087 ( 0.0%)
Natural Loop Construction
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0080 ( 0.0%)
Dominator Tree Construction
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0080 ( 0.0%)
Dominator Tree Construction
0.0079 ( 0.0%) 0.0000 ( 0.0%) 0.0079 ( 0.0%) 0.0073 ( 0.0%) Break
critical edges in CFG
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0065 ( 0.0%)
Dominator Tree Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0065 ( 0.0%) Unroll
loops
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0065 ( 0.0%)
Dominator Tree Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0064 ( 0.0%)
Dominator Tree Construction
0.0039 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.0%) 0.0064 ( 0.0%)
Simplify the CFG
0.0080 ( 0.0%) 0.0000 ( 0.0%) 0.0080 ( 0.0%) 0.0063 ( 0.0%)
Dominator Tree Construction
0.0079 ( 0.0%) 0.0000 ( 0.0%) 0.0079 ( 0.0%) 0.0062 ( 0.0%)
Dominator Tree Construction
0.0039 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.0%) 0.0044 ( 0.0%) Remove
unused exception handling info
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0033 ( 0.0%)
Function Integration/Inlining
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0032 ( 0.0%) ET
Forest Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0030 ( 0.0%) ET
Forest Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0028 ( 0.0%)
Post-Dominance Frontier Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0026 ( 0.0%) ET
Forest Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0026 ( 0.0%)
Canonicalize natural loops
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0025 ( 0.0%) ET
Forest Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.0%)
Dominance Frontier Construction
0.0039 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.0%) 0.0025 ( 0.0%) ET
Forest Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.0%) ET
Forest Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0025 ( 0.0%) ET
Forest Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0021 ( 0.0%) Basic
CallGraph Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0020 ( 0.0%) Tail
Duplication
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%)
Dominance Frontier Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0020 ( 0.0%)
Dominance Frontier Construction
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0015 ( 0.0%)
Conditional Propagation
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0012 ( 0.0%) Scalar
Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Tail
Call Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) Rotate
Loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%)
Conditional Propagation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Unify
function exit nodes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Dead
Type Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%)
Simplify well-known library calls
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dead
Argument Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge
Duplicate Global Constants
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%)
Promote 'by reference' arguments to scalars
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Global
Variable Optimizer
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No
Alias Analysis (always returns 'may' alias)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar
Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic
Value Numbering (default GVN impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower
Set Jump
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar
Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Raise
allocations from calls to instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar
Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar
Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%)
Interprocedural constant propagation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Load
Value Numbering
1554.6931 (100.0%) 0.7720 (100.0%) 1555.4652 (100.0%) 2394.6937 (100.0%)
TOTAL

@asl
Copy link
Collaborator

asl commented May 16, 2007

*** This bug has been marked as a duplicate of 930 ***

@lattner
Copy link
Collaborator

lattner commented May 16, 2007

DSE is a known issue, but this actually triggers a horrible thing in sdisel as well. Reopening for that issue.

@lattner
Copy link
Collaborator

lattner commented May 16, 2007

Further, lauro, the code looks very strange. Assuming the .c file doesn't contain hundreds of loads, can
you file another bug with the .i file that contains iterative_me?

@lattner
Copy link
Collaborator

lattner commented May 16, 2007

Fixed, patch here:

This speeds up isel on this function from:

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
60.0536 ( 96.8%) 1.4269 ( 97.9%) 61.4805 ( 96.8%) 61.4842 ( 96.8%) ARM Instruction Selection
1.6086 ( 2.5%) 0.0115 ( 0.7%) 1.6202 ( 2.5%) 1.6203 ( 2.5%) Linear Scan Register Allocator

to:

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
3.9081 ( 66.9%) 1.6220 ( 98.1%) 5.5301 ( 73.8%) 5.5876 ( 74.0%) ARM Instruction Selection
1.5635 ( 26.7%) 0.0119 ( 0.7%) 1.5754 ( 21.0%) 1.5757 ( 20.8%) Linear Scan Register Allocator

a speedup of over 11x, in a release build.

-Chris

@lattner
Copy link
Collaborator

lattner commented May 16, 2007

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 6, 2009

It takes a lot of time to compile iterative_me.bc once again:

===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===

10 dse - Number of other instrs removed
2 dse - Number of stores deleted

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 139.6793 seconds (142.3439 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
138.7904 ( 99.9%) 0.8715 ( 99.9%) 139.6620 ( 99.9%) 142.3265 ( 99.9%) Dead Store Elimination
0.0157 ( 0.0%) 0.0003 ( 0.0%) 0.0161 ( 0.0%) 0.0161 ( 0.0%) Module Verifier
0.0009 ( 0.0%) 0.0001 ( 0.0%) 0.0010 ( 0.0%) 0.0011 ( 0.0%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Memory Dependence Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Preliminary module verification
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (default AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Data Layout
138.8072 (100.0%) 0.8721 (100.0%) 139.6793 (100.0%) 142.3439 (100.0%) TOTAL

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 7, 2009

Shark says the culprit is MemoryDependenceAnalysis.cpp:208.

@lattner
Copy link
Collaborator

lattner commented Dec 23, 2009

Still slow, but now GVN has joined the act:
Total Execution Time: 96.7546 seconds (96.9454 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
47.9156 ( 49.6%) 0.0372 ( 33.0%) 47.9528 ( 49.6%) 48.1304 ( 49.6%) Global Value Numbering
47.7280 ( 49.4%) 0.0537 ( 47.7%) 47.7818 ( 49.4%) 47.7866 ( 49.3%) Dead Store Elimination
0.1606 ( 0.2%) 0.0026 ( 2.3%) 0.1632 ( 0.2%) 0.1669 ( 0.2%) Combine redundant instructions
0.1548 ( 0.2%) 0.0029 ( 2.6%) 0.1577 ( 0.2%) 0.1600 ( 0.2%) Combine redundant instructions

@lattner
Copy link
Collaborator

lattner commented Dec 23, 2009

Where does this testcase come from? IT is just a huge series of:

%tmp4110778.4030 = getelementptr %struct.SnowContext* %s, i32 0, i32 9, i32 4030 ; <i8*> [#uses=2]
%tmp14809 = load i8* %tmp4110778.4030 ; [#uses=1]
%tmp4110778.4031 = getelementptr %struct.SnowContext* %s, i32 0, i32 9, i32 4031 ; <i8*> [#uses=2]
%tmp14810 = load i8* %tmp4110778.4031 ; [#uses=1]

with different offsets? Please attach a .i file. Otherwise I'll just close this as "insane".

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 23, 2009

From the name, it looks like it's the motion estimation function for the Snow video codec, which is part of FFmpeg.

@isanbard
Copy link
Contributor

The bitcode file doesn't parse anymore.

@isanbard
Copy link
Contributor

Marking this as "fixed" again. I'll also upload a .ll file so that it can be reproduced if necessary.

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.6507 seconds (0.6710 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.1882 ( 29.7%) 0.0005 ( 2.8%) 0.1887 ( 29.0%) 0.1887 ( 28.1%) Dead Store Elimination
0.1508 ( 23.8%) 0.0019 ( 10.8%) 0.1527 ( 23.5%) 0.1536 ( 22.9%) Global Value Numbering
0.0231 ( 3.6%) 0.0015 ( 8.8%) 0.0246 ( 3.8%) 0.0307 ( 4.6%) Combine redundant instructions
0.0283 ( 4.5%) 0.0020 ( 11.4%) 0.0303 ( 4.7%) 0.0303 ( 4.5%) Value Propagation
0.0292 ( 4.6%) 0.0006 ( 3.3%) 0.0298 ( 4.6%) 0.0298 ( 4.4%) Value Propagation

@isanbard
Copy link
Contributor

@lattner
Copy link
Collaborator

lattner commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#1424

@asl
Copy link
Collaborator

asl commented Nov 27, 2021

mentioned in issue #1302

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla slow-compile
Projects
None yet
Development

No branches or pull requests

4 participants