LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1423 - DSE is really slow on this input
Summary: DSE is really slow on this input
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Scalar Optimizations (show other bugs)
Version: 1.0
Hardware: All All
: P normal
Assignee: Chris Lattner
URL:
Keywords: slow-compile
Depends on:
Blocks:
 
Reported: 2007-05-15 16:06 PDT by Lauro Venancio
Modified: 2014-01-24 13:30 PST (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments
iterative_me.bc (262.46 KB, application/octet-stream)
2007-05-15 16:07 PDT, Lauro Venancio
Details
iterative_me.opt.bc (260.69 KB, application/octet-stream)
2007-05-15 16:08 PDT, Lauro Venancio
Details
iterative_me.opt.s.bz2 (56.18 KB, application/x-bzip)
2007-05-15 16:10 PDT, Lauro Venancio
Details
Non-bitcode version of the problematic file. The others no longer parse with newer LLVMs. (117.01 KB, text/plain)
2014-01-24 13:30 PST, Bill Wendling
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lauro Venancio 2007-05-15 16:06:07 PDT
This is a very problematic function.
- The opt takes more than 20 min.
- The llc takes about 10 min.
- The code generated for ARM is very poor.
Comment 1 Lauro Venancio 2007-05-15 16:07:27 PDT
Created attachment 845 [details]
iterative_me.bc

original function
Comment 2 Lauro Venancio 2007-05-15 16:08:05 PDT
Created attachment 846 [details]
iterative_me.opt.bc

optimized function
Comment 3 Lauro Venancio 2007-05-15 16:10:12 PDT
Created attachment 847 [details]
iterative_me.opt.s.bz2

generated code
Comment 4 Lauro Venancio 2007-05-15 17:01:39 PDT
laurov@laurov-desktop:/tmp$ opt -time-passes -std-compile-opts iterative_me.bc
-o iterative_me.opt.bc -f
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 1555.4652 seconds (2394.6937 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---
Name ---
  1025.0560 ( 65.9%)   0.4360 ( 56.4%)  1025.4920 ( 65.9%)  1746.6673 ( 72.9%) 
Dead Store Elimination
  527.5409 ( 33.9%)   0.3280 ( 42.4%)  527.8689 ( 33.9%)  645.5995 ( 26.9%) 
Global Common Subexpression Elimination
   0.2600 (  0.0%)   0.0000 (  0.0%)   0.2600 (  0.0%)   0.2602 (  0.0%) 
Combine redundant instructions
   0.2600 (  0.0%)   0.0000 (  0.0%)   0.2600 (  0.0%)   0.2598 (  0.0%) 
Combine redundant instructions
   0.2560 (  0.0%)   0.0000 (  0.0%)   0.2560 (  0.0%)   0.2569 (  0.0%) 
Combine redundant instructions
   0.2440 (  0.0%)   0.0040 (  0.5%)   0.2480 (  0.0%)   0.2502 (  0.0%) 
Combine redundant instructions
   0.1760 (  0.0%)   0.0040 (  0.5%)   0.1800 (  0.0%)   0.1810 (  0.0%) 
Combine redundant instructions
   0.0800 (  0.0%)   0.0000 (  0.0%)   0.0800 (  0.0%)   0.1796 (  0.0%) 
Bitcode Writer
   0.0600 (  0.0%)   0.0000 (  0.0%)   0.0600 (  0.0%)   0.1587 (  0.0%) 
Aggressive Dead Code Elimination
   0.1160 (  0.0%)   0.0000 (  0.0%)   0.1160 (  0.0%)   0.1294 (  0.0%) 
Combine redundant instructions
   0.0240 (  0.0%)   0.0000 (  0.0%)   0.0240 (  0.0%)   0.1239 (  0.0%)  Find
Used Types
   0.0440 (  0.0%)   0.0000 (  0.0%)   0.0440 (  0.0%)   0.0529 (  0.0%)  Sparse
Conditional Constant Propagation
   0.0520 (  0.0%)   0.0000 (  0.0%)   0.0520 (  0.0%)   0.0516 (  0.0%) 
Loop-Closed SSA Form Pass
   0.0440 (  0.0%)   0.0000 (  0.0%)   0.0440 (  0.0%)   0.0474 (  0.0%)  Module
Verifier
   0.0480 (  0.0%)   0.0000 (  0.0%)   0.0480 (  0.0%)   0.0470 (  0.0%)  Module
Verifier
   0.0440 (  0.0%)   0.0000 (  0.0%)   0.0440 (  0.0%)   0.0421 (  0.0%)  Loop
Invariant Code Motion
   0.0360 (  0.0%)   0.0000 (  0.0%)   0.0360 (  0.0%)   0.0348 (  0.0%)  Scalar
Replacement of Aggregates
   0.0320 (  0.0%)   0.0000 (  0.0%)   0.0320 (  0.0%)   0.0295 (  0.0%) 
Canonicalize Induction Variables
   0.0280 (  0.0%)   0.0000 (  0.0%)   0.0280 (  0.0%)   0.0258 (  0.0%) 
Simplify the CFG
   0.0240 (  0.0%)   0.0000 (  0.0%)   0.0240 (  0.0%)   0.0225 (  0.0%) 
Reassociate expressions
   0.0240 (  0.0%)   0.0000 (  0.0%)   0.0240 (  0.0%)   0.0216 (  0.0%) 
Unswitch loops
   0.0200 (  0.0%)   0.0000 (  0.0%)   0.0200 (  0.0%)   0.0206 (  0.0%) 
Canonicalize natural loops
   0.0160 (  0.0%)   0.0000 (  0.0%)   0.0160 (  0.0%)   0.0159 (  0.0%) 
Simplify the CFG
   0.0160 (  0.0%)   0.0000 (  0.0%)   0.0160 (  0.0%)   0.0131 (  0.0%)  Dead
Global Elimination
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0127 (  0.0%) 
Natural Loop Construction
   0.0119 (  0.0%)   0.0000 (  0.0%)   0.0119 (  0.0%)   0.0119 (  0.0%) 
Post-Dominator Tree Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0101 (  0.0%) 
Promote Memory to Register
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0100 (  0.0%) 
Simplify the CFG
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0099 (  0.0%) 
Loop-Closed SSA Form Pass
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0095 (  0.0%) 
Simplify the CFG
   0.0120 (  0.0%)   0.0000 (  0.0%)   0.0120 (  0.0%)   0.0089 (  0.0%)  Break
critical edges in CFG
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0087 (  0.0%) 
Natural Loop Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0080 (  0.0%) 
Dominator Tree Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0080 (  0.0%) 
Dominator Tree Construction
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0073 (  0.0%)  Break
critical edges in CFG
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0065 (  0.0%) 
Dominator Tree Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0065 (  0.0%)  Unroll
loops
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0065 (  0.0%) 
Dominator Tree Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0064 (  0.0%) 
Dominator Tree Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0064 (  0.0%) 
Simplify the CFG
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0063 (  0.0%) 
Dominator Tree Construction
   0.0079 (  0.0%)   0.0000 (  0.0%)   0.0079 (  0.0%)   0.0062 (  0.0%) 
Dominator Tree Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0044 (  0.0%)  Remove
unused exception handling info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0033 (  0.0%) 
Function Integration/Inlining
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0032 (  0.0%)  ET
Forest Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0030 (  0.0%)  ET
Forest Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0028 (  0.0%) 
Post-Dominance Frontier Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0026 (  0.0%)  ET
Forest Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0026 (  0.0%) 
Canonicalize natural loops
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0025 (  0.0%)  ET
Forest Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%) 
Dominance Frontier Construction
   0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0025 (  0.0%)  ET
Forest Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%)  ET
Forest Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0025 (  0.0%)  ET
Forest Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0021 (  0.0%)  Basic
CallGraph Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0020 (  0.0%)  Tail
Duplication
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%) 
Dominance Frontier Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0020 (  0.0%) 
Dominance Frontier Construction
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0015 (  0.0%) 
Conditional Propagation
   0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0012 (  0.0%)  Scalar
Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)  Tail
Call Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)  Rotate
Loops
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%) 
Conditional Propagation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)  Unify
function exit nodes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Dead
Type Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%) 
Simplify well-known library calls
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dead
Argument Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Merge
Duplicate Global Constants
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%) 
Promote 'by reference' arguments to scalars
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Global
Variable Optimizer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  No
Alias Analysis (always returns 'may' alias)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar
Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic
Value Numbering (default GVN impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower
Set Jump
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar
Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Raise
allocations from calls to instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar
Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar
Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%) 
Interprocedural constant propagation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Load
Value Numbering
  1554.6931 (100.0%)   0.7720 (100.0%)  1555.4652 (100.0%)  2394.6937 (100.0%) 
TOTAL
Comment 5 Anton Korobeynikov 2007-05-15 17:06:55 PDT

*** This bug has been marked as a duplicate of 930 ***
Comment 6 Chris Lattner 2007-05-16 01:22:02 PDT
DSE is a known issue, but this actually triggers a horrible thing in sdisel as well.  Reopening for that issue.
Comment 7 Chris Lattner 2007-05-16 01:26:11 PDT
Further, lauro, the code looks very strange.  Assuming the .c file doesn't contain hundreds of loads, can 
you file another bug with the .i file that contains iterative_me?
Comment 8 Chris Lattner 2007-05-16 01:39:15 PDT
Fixed, patch here:

This speeds up isel on this function from:

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  60.0536 ( 96.8%)   1.4269 ( 97.9%)  61.4805 ( 96.8%)  61.4842 ( 96.8%)  ARM Instruction Selection
   1.6086 (  2.5%)   0.0115 (  0.7%)   1.6202 (  2.5%)   1.6203 (  2.5%)  Linear Scan Register Allocator

to:

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.9081 ( 66.9%)   1.6220 ( 98.1%)   5.5301 ( 73.8%)   5.5876 ( 74.0%)  ARM Instruction Selection
   1.5635 ( 26.7%)   0.0119 (  0.7%)   1.5754 ( 21.0%)   1.5757 ( 20.8%)  Linear Scan Register Allocator

a speedup of over 11x, in a release build.

-Chris
Comment 9 Chris Lattner 2007-05-16 01:45:10 PDT
no really, patch here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070514/049659.html
Comment 10 Jakub Staszak 2009-07-05 20:18:29 PDT
It takes a lot of time to compile iterative_me.bc once again:

===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

10 dse - Number of other instrs removed
 2 dse - Number of stores deleted

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 139.6793 seconds (142.3439 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  138.7904 ( 99.9%)   0.8715 ( 99.9%)  139.6620 ( 99.9%)  142.3265 ( 99.9%)  Dead Store Elimination
   0.0157 (  0.0%)   0.0003 (  0.0%)   0.0161 (  0.0%)   0.0161 (  0.0%)  Module Verifier
   0.0009 (  0.0%)   0.0001 (  0.0%)   0.0010 (  0.0%)   0.0011 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Preliminary module verification
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (default AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Data Layout
  138.8072 (100.0%)   0.8721 (100.0%)  139.6793 (100.0%)  142.3439 (100.0%)  TOTAL
Comment 11 Owen Anderson 2009-07-07 11:44:06 PDT
Shark says the culprit is MemoryDependenceAnalysis.cpp:208.
Comment 12 Chris Lattner 2009-12-22 20:05:13 PST
Still slow, but now GVN has joined the act:
  Total Execution Time: 96.7546 seconds (96.9454 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  47.9156 ( 49.6%)   0.0372 ( 33.0%)  47.9528 ( 49.6%)  48.1304 ( 49.6%)  Global Value Numbering
  47.7280 ( 49.4%)   0.0537 ( 47.7%)  47.7818 ( 49.4%)  47.7866 ( 49.3%)  Dead Store Elimination
   0.1606 (  0.2%)   0.0026 (  2.3%)   0.1632 (  0.2%)   0.1669 (  0.2%)  Combine redundant instructions
   0.1548 (  0.2%)   0.0029 (  2.6%)   0.1577 (  0.2%)   0.1600 (  0.2%)  Combine redundant instructions

Comment 13 Chris Lattner 2009-12-22 20:08:58 PST
Where does this testcase come from?  IT is just a huge series of:

  %tmp4110778.4030 = getelementptr %struct.SnowContext* %s, i32 0, i32 9, i32 4030 ; <i8*> [#uses=2]
  %tmp14809 = load i8* %tmp4110778.4030           ; <i8> [#uses=1]
  %tmp4110778.4031 = getelementptr %struct.SnowContext* %s, i32 0, i32 9, i32 4031 ; <i8*> [#uses=2]
  %tmp14810 = load i8* %tmp4110778.4031           ; <i8> [#uses=1]

with different offsets?  Please attach a .i file.  Otherwise I'll just close this as "insane".
Comment 14 Owen Anderson 2009-12-23 00:37:31 PST
From the name, it looks like it's the motion estimation function for the Snow video codec, which is part of FFmpeg.
Comment 15 Bill Wendling 2013-10-22 03:14:24 PDT
The bitcode file doesn't parse anymore.
Comment 16 Bill Wendling 2014-01-24 13:28:35 PST
Marking this as "fixed" again. I'll also upload a .ll file so that it can be reproduced if necessary.

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.6507 seconds (0.6710 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.1882 ( 29.7%)   0.0005 (  2.8%)   0.1887 ( 29.0%)   0.1887 ( 28.1%)  Dead Store Elimination
   0.1508 ( 23.8%)   0.0019 ( 10.8%)   0.1527 ( 23.5%)   0.1536 ( 22.9%)  Global Value Numbering
   0.0231 (  3.6%)   0.0015 (  8.8%)   0.0246 (  3.8%)   0.0307 (  4.6%)  Combine redundant instructions
   0.0283 (  4.5%)   0.0020 ( 11.4%)   0.0303 (  4.7%)   0.0303 (  4.5%)  Value Propagation
   0.0292 (  4.6%)   0.0006 (  3.3%)   0.0298 (  4.6%)   0.0298 (  4.4%)  Value Propagation
Comment 17 Bill Wendling 2014-01-24 13:30:10 PST
Created attachment 11932 [details]
Non-bitcode version of the problematic file. The others no longer parse with newer LLVMs.