LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 16365 - Extremely slow compilation in -O1 and -O2
Summary: Extremely slow compilation in -O1 and -O2
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: C++ (show other bugs)
Version: 3.2
Hardware: Macintosh All
: P normal
Assignee: Andrew Trick
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-18 17:42 PDT by Fons Rademakers
Modified: 2014-04-07 16:31 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments
source file showing problem. (770.86 KB, application/octet-stream)
2013-06-18 17:42 PDT, Fons Rademakers
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fons Rademakers 2013-06-18 17:42:13 PDT
Created attachment 10703 [details]
source file showing problem.

Very slow compilation of large but very simple source file.

$ time clang++ -O2 -c biggraph.C

real	1m21.025s
user	1m19.286s
sys	0m1.734s

$ time g++-4 -O2 -c biggraph.C   [ g++ 4.8.1]

real	0m9.375s
user	0m9.191s
sys	0m0.176s
Comment 1 Fons Rademakers 2013-06-18 17:51:06 PDT
The -ftime-report gives:

$ time clang++ -ftime-report -O1 -c biggraph.C
===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0052 seconds (0.0052 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0026 ( 56.2%)   0.0005 ( 74.2%)   0.0030 ( 58.4%)   0.0030 ( 58.4%)  Local Splitting
   0.0019 ( 41.7%)   0.0001 ( 22.2%)   0.0021 ( 39.4%)   0.0021 ( 39.4%)  Seed Live Regs
   0.0001 (  1.9%)   0.0000 (  1.5%)   0.0001 (  1.8%)   0.0001 (  1.8%)  Spiller
   0.0000 (  0.2%)   0.0000 (  2.1%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Evict
   0.0046 (100.0%)   0.0006 (100.0%)   0.0052 (100.0%)   0.0052 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 83.6459 seconds (83.6467 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  80.1686 ( 97.8%)   1.6332 ( 96.8%)  81.8019 ( 97.8%)  81.8027 ( 97.8%)  Instruction Scheduling
   1.3140 (  1.6%)   0.0162 (  1.0%)   1.3302 (  1.6%)   1.3301 (  1.6%)  Instruction Creation
   0.1647 (  0.2%)   0.0053 (  0.3%)   0.1700 (  0.2%)   0.1700 (  0.2%)  DAG Legalization
   0.1666 (  0.2%)   0.0024 (  0.1%)   0.1689 (  0.2%)   0.1689 (  0.2%)  Instruction Selection
   0.0629 (  0.1%)   0.0208 (  1.2%)   0.0837 (  0.1%)   0.0837 (  0.1%)  Vector Legalization
   0.0351 (  0.0%)   0.0019 (  0.1%)   0.0370 (  0.0%)   0.0370 (  0.0%)  DAG Combining 2
   0.0298 (  0.0%)   0.0002 (  0.0%)   0.0300 (  0.0%)   0.0300 (  0.0%)  Type Legalization
   0.0150 (  0.0%)   0.0018 (  0.1%)   0.0168 (  0.0%)   0.0168 (  0.0%)  DAG Combining 1
   0.0023 (  0.0%)   0.0051 (  0.3%)   0.0074 (  0.0%)   0.0074 (  0.0%)  Instruction Scheduling Cleanup
  81.9589 (100.0%)   1.6870 (100.0%)  83.6459 (100.0%)  83.6467 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0020 seconds (0.0020 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0016 ( 83.6%)   0.0000 ( 72.7%)   0.0017 ( 83.5%)   0.0017 ( 83.4%)  DWARF Exception Writer
   0.0003 ( 16.4%)   0.0000 ( 27.3%)   0.0003 ( 16.5%)   0.0003 ( 16.6%)  DWARF Debug Writer
   0.0020 (100.0%)   0.0000 (100.0%)   0.0020 (100.0%)   0.0020 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 84.5942 seconds (84.5949 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  82.0843 ( 99.1%)   1.7079 ( 98.6%)  83.7922 ( 99.1%)  83.7930 ( 99.1%)  X86 DAG->DAG Instruction Selection
   0.3889 (  0.5%)   0.0018 (  0.1%)   0.3906 (  0.5%)   0.3906 (  0.5%)  Greedy Register Allocator
   0.0540 (  0.1%)   0.0023 (  0.1%)   0.0563 (  0.1%)   0.0562 (  0.1%)  Live Variable Analysis
   0.0510 (  0.1%)   0.0004 (  0.0%)   0.0514 (  0.1%)   0.0513 (  0.1%)  Machine Common Subexpression Elimination
   0.0442 (  0.1%)   0.0017 (  0.1%)   0.0459 (  0.1%)   0.0459 (  0.1%)  X86 AT&T-Style Assembly Printer
   0.0176 (  0.0%)   0.0078 (  0.4%)   0.0254 (  0.0%)   0.0254 (  0.0%)  Machine Function Analysis
   0.0194 (  0.0%)   0.0005 (  0.0%)   0.0199 (  0.0%)   0.0199 (  0.0%)  Simple Register Coalescing
   0.0129 (  0.0%)   0.0039 (  0.2%)   0.0167 (  0.0%)   0.0167 (  0.0%)  Live Interval Analysis
   0.0147 (  0.0%)   0.0000 (  0.0%)   0.0147 (  0.0%)   0.0147 (  0.0%)  Virtual Register Rewriter
   0.0117 (  0.0%)   0.0010 (  0.1%)   0.0128 (  0.0%)   0.0128 (  0.0%)  Two-Address instruction pass
   0.0116 (  0.0%)   0.0007 (  0.0%)   0.0124 (  0.0%)   0.0124 (  0.0%)  Peephole Optimizations
   0.0099 (  0.0%)   0.0022 (  0.1%)   0.0121 (  0.0%)   0.0121 (  0.0%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0114 (  0.0%)   0.0000 (  0.0%)   0.0114 (  0.0%)   0.0114 (  0.0%)  Machine Copy Propagation Pass
   0.0113 (  0.0%)   0.0000 (  0.0%)   0.0113 (  0.0%)   0.0113 (  0.0%)  Combine redundant instructions
   0.0108 (  0.0%)   0.0001 (  0.0%)   0.0109 (  0.0%)   0.0109 (  0.0%)  Combine redundant instructions
   0.0106 (  0.0%)   0.0003 (  0.0%)   0.0109 (  0.0%)   0.0109 (  0.0%)  Combine redundant instructions
   0.0105 (  0.0%)   0.0000 (  0.0%)   0.0105 (  0.0%)   0.0105 (  0.0%)  Combine redundant instructions
   0.0104 (  0.0%)   0.0001 (  0.0%)   0.0105 (  0.0%)   0.0105 (  0.0%)  Calculate spill weights
   0.0103 (  0.0%)   0.0001 (  0.0%)   0.0103 (  0.0%)   0.0103 (  0.0%)  Combine redundant instructions
   0.0076 (  0.0%)   0.0000 (  0.0%)   0.0076 (  0.0%)   0.0076 (  0.0%)  Remove dead machine instructions
   0.0064 (  0.0%)   0.0010 (  0.1%)   0.0074 (  0.0%)   0.0074 (  0.0%)  Slot index numbering
   0.0058 (  0.0%)   0.0000 (  0.0%)   0.0059 (  0.0%)   0.0059 (  0.0%)  Slot index numbering
   0.0048 (  0.0%)   0.0001 (  0.0%)   0.0048 (  0.0%)   0.0048 (  0.0%)  Scalar Replacement of Aggregates (DT)
   0.0037 (  0.0%)   0.0001 (  0.0%)   0.0038 (  0.0%)   0.0038 (  0.0%)  Dead Store Elimination
   0.0036 (  0.0%)   0.0000 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0035 (  0.0%)   0.0000 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  Early CSE
   0.0033 (  0.0%)   0.0000 (  0.0%)   0.0033 (  0.0%)   0.0033 (  0.0%)  Optimize for code generation
   0.0032 (  0.0%)   0.0000 (  0.0%)   0.0032 (  0.0%)   0.0032 (  0.0%)  Early CSE
   0.0031 (  0.0%)   0.0000 (  0.0%)   0.0031 (  0.0%)   0.0031 (  0.0%)  Execution dependency fix
   0.0023 (  0.0%)   0.0002 (  0.0%)   0.0024 (  0.0%)   0.0024 (  0.0%)  Basic CallGraph Construction
   0.0021 (  0.0%)   0.0000 (  0.0%)   0.0021 (  0.0%)   0.0021 (  0.0%)  Sparse Conditional Constant Propagation
   0.0019 (  0.0%)   0.0000 (  0.0%)   0.0019 (  0.0%)   0.0019 (  0.0%)  Aggressive Dead Code Elimination
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0017 (  0.0%)  Simplify the CFG
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0017 (  0.0%)  Debug Variable Analysis
   0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0016 (  0.0%)  X86 FP Stackifier
   0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  Reassociate expressions
   0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Interprocedural Sparse Conditional Constant Propagation
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)  Tail Call Elimination
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)  Process Implicit Definitions
   0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)  Expand ISel Pseudo-instructions
   0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  MemCpy Optimization
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Remove unused exception handling info
   0.0005 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Simplify well-known library calls
   0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Simplify the CFG
   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  Virtual Register Map
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  Simplify the CFG
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  Value Propagation
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  Lower 'expect' Intrinsics
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Insert stack protectors
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Inliner for always_inline functions
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Simplify the CFG
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Simplify the CFG
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Global Variable Optimizer
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Value Propagation
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Live Register Matrix
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  X86 Maximal Stack Alignment Check
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Deduce function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Control Flow Optimizer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine code sinking
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Natural Loop Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Loop Invariant Code Motion
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dead Argument Elimination
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Remove unreachable machine basic blocks
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Exception handling preparation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Block Frequency Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Tail Duplication
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Post RA top-down list latency scheduler
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Branch Probability Basic Block Placement
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  MachineDominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Jump Threading
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Strip Unused Function Prototypes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Stack Slot Coloring
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Dominator Tree Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Natural Loop Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Natural Loop Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Jump Threading
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Spill Code Placement Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Merge disjoint stack slots
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Tail Duplication
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Loop Invariant Code Motion
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar Evolution Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Optimize machine instruction PHIs
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Live Stack Slot Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scalar Replacement of Aggregates (SSAUp)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lazy Value Information Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Memory Dependence Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Delete Garbage Collector Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lazy Value Information Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
  82.8616 (100.0%)   1.7326 (100.0%)  84.5942 (100.0%)  84.5949 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  83.1012 ( 50.0%)   1.7594 ( 50.2%)  84.8606 ( 50.0%)  84.8660 ( 50.0%)  Clang front-end timer
  82.8838 ( 49.9%)   1.7383 ( 49.6%)  84.6221 ( 49.9%)  84.6275 ( 49.9%)  Code Generation Time
   0.0640 (  0.0%)   0.0064 (  0.2%)   0.0704 (  0.0%)   0.0704 (  0.0%)  LLVM IR Generation Time
  166.0490 (100.0%)   3.5041 (100.0%)  169.5531 (100.0%)  169.5639 (100.0%)  Total


real	1m24.886s
user	1m23.106s
sys	0m1.766s
Comment 2 Eric Christopher 2013-06-18 19:18:06 PDT
Was just putting that in...
Comment 3 Andrew Trick 2013-06-18 19:49:00 PDT
I'm assuming -pre-RA-sched=source has the same problem.

I'd like to replace the SD scheduler pass completely with a SD serialization pass. That won't happen for at least another month. But when it does happen I'll be able to close this.
Comment 4 Eric Christopher 2013-06-18 19:59:37 PDT
Appears so and I'm good with that solution.

How about you take this and close then?
Comment 5 Axel Naumann 2014-04-04 12:04:45 PDT
Dear Andrew,

We are getting more and more reports of this. Do you have an updated estimate? Your first one ("at least another month") was already correct ;-) Or did you replace the pass and that didn't help?

Cheers, Axel.
Comment 6 Andrew Trick 2014-04-04 20:49:17 PDT
Thanks for pinging me on this! I have not been able to work on replacing the SD scheduler pass, so should have committed a quick fix earlier. I'm waiting on benchmark results but should have a quick workaround checked in Monday at the latest.
Comment 7 Andrew Trick 2014-04-07 16:31:49 PDT
Fixed in r205738. I added a workaround to ClusterNeighboringLoads.

time /b/fix/RA/bin/clang++ -O2 -c biggraph.C

real	0m6.143s
user	0m6.016s
sys	0m0.120s