Macroscopic Data Structure Analysis and Optimization
Chris Lattner, Ph.D. Thesis

Abstract:

Providing high performance for pointer-intensive programs on modern architectures is an increasingly difficult problem for compilers. Pointer-intensive programs are often bound by memory latency and cache performance, but traditional approaches to these problems usually fail: Pointer-intensive programs are often highly-irregular and the compiler has little control over the layout of heap allocated objects.

This thesis presents a new class of techniques named ``Macroscopic Data Structure Analyses and Optimizations'', which is a new approach to the problem of analyzing and optimizing pointer-intensive programs. Instead of analyzing individual load/store operations or structure definitions, this approach identifies, analyzes, and transforms entire memory structures as a unit. The foundation of the approach is an analysis named Data Structure Analysis and a transformation named Automatic Pool Allocation. Data Structure Analysis is a context-sensitive pointer analysis which identifies data structures on the heap and their important properties (such as type safety). Automatic Pool Allocation uses the results of Data Structure Analysis to segregate dynamically allocated objects on the heap, giving control over the layout of the data structure in memory to the compiler.

Based on these two foundation techniques, this thesis describes several performance improving optimizations for pointer-intensive programs. First, Automatic Pool Allocation itself provides important locality improvements for the program. Once the program is pool allocated, several pool-specific optimizations can be performed to reduce inter-object padding and pool overhead. Second, we describe an aggressive technique, Automatic Pointer Compression, which reduces the size of pointers on 64-bit targets to 32-bits or less, increasing effective cache capacity and memory bandwidth for pointer-intensive programs.

This thesis describes the approach, analysis, and transformation of programs with macroscopic techniques, and evaluates the net performance impact of the transformations. Finally, it describes a large class of potential applications for the work in fields such as heap safety and reliability, program understanding, distributed computing, and static garbage collection.

Published:

"Macroscopic Data Structure Analysis and Optimization", Chris Lattner.
Ph.D. Thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, May 2005.

Download:

The "book form" is useful if you plan to print this out. Print the file out double sided, fold it in half, and staple it in the middle of the page. It dramatically reduces the number of pages of paper used.

BibTeX Entry:

  @PhdThesis{Lattner:PHD,
    author  = {Chris Lattner},
    title   = "{Macroscopic Data Structure Analysis and Optimization}",
    school  = "{Computer Science Dept., University of Illinois at Urbana-Champaign}",
    year    = {2005},
    address = {Urbana, IL},
    month   = {May},
    note    = {{\em See {\tt http://llvm.cs.uiuc.edu}.}}
  }

This thesis is also available from the UIUC Computer Science Department as tech report #UIUCDCS-R-2005-2536 or the UIUC Department of Engineering #UILU-ENG-2005-1728.