LLVM 3.2 Release Notes

The LLVM 3.2 distribution currently consists of production-quality code from the core LLVM repository, which roughly includes the LLVM optimizers, code generators and supporting tools, as well as Clang, DragonEgg and compiler-rt sub-project repositories. In addition to this code, the LLVM Project includes other sub-projects that are in development. Here we include updates on these sub-projects.

Clang: C/C++/Objective-C Frontend Toolkit

Clang is an LLVM front end for the C, C++, and Objective-C languages. Clang aims to provide a better user experience through expressive diagnostics, a high level of conformance to language standards, fast compilation, and low memory use. Like LLVM, Clang provides a modular, library-based architecture that makes it suitable for creating or integrating with other development tools.

In the LLVM 3.2 time-frame, the Clang team has made many improvements. Highlights include:

Improvements to Clang's diagnostics
Support for tls_model attribute
Type safety attributes

For more details about the changes to Clang since the 3.1 release, see the Clang 3.2 release notes.

If Clang rejects your code but another compiler accepts it, please take a look at the language compatibility guide to make sure this is not intentional or a known issue.

DragonEgg: GCC front-ends, LLVM back-end

DragonEgg is a gcc plugin that replaces GCC's optimizers and code generators with LLVM's. It works with gcc-4.5 and gcc-4.6 (and partially with gcc-4.7), can target the x86-32/x86-64 and ARM processor families, and has been successfully used on the Darwin, FreeBSD, KFreeBSD, Linux and OpenBSD platforms. It fully supports Ada, C, C++ and Fortran. It has partial support for Go, Java, Obj-C and Obj-C++.

The 3.2 release has the following notable changes:

Able to load LLVM plugins such as Polly.
Supports thread-local storage models.
Passes knowledge of variable lifetimes to the LLVM optimizers.
No longer requires GCC to be built with LTO support.

compiler-rt: Compiler Runtime Library

The LLVM compiler-rt project is a simple library that provides an implementation of the low-level target-specific hooks required by code generation and other runtime components. For example, when compiling for a 32-bit target, converting a double to a 64-bit unsigned integer is compiled into a runtime call to the __fixunsdfdi function. The compiler-rt library provides highly optimized implementations of this and other low-level routines (some are 3x faster than the equivalent libgcc routines).

The 3.2 release has the following notable changes:

ThreadSanitizer (TSan) - data race detector run-time library for C/C++ has been added.
Improvements to AddressSanitizer including: better portability (OSX, Android NDK), support for cmake based builds, enhanced error reporting and lots of bug fixes.
Added support for A6 'Swift' CPU.
divsi3 function has been enhanced to take advantage of a hardware unsigned divide when it is available.

LLDB: Low Level Debugger

LLDB is a ground-up implementation of a command line debugger, as well as a debugger API that can be used from other applications. LLDB makes use of the Clang parser to provide high-fidelity expression parsing (particularly for C++) and uses the LLVM JIT for target support.

The 3.2 release has the following notable changes:

Linux build fixes for clang (see Building LLDB)
Some Linux stability and usability improvements
Switch expression evaluation to use MCJIT (from legacy JIT) on Linux

libc++: C++ Standard Library

Like compiler_rt, libc++ is now dual licensed under the MIT and UIUC license, allowing it to be used more permissively.

Within the LLVM 3.2 time-frame there were the following highlights:

C++11 shared_ptr atomic access API (20.7.2.5) has been implemented.
Applied noexcept and constexpr throughout library.
Improved C++11 conformance in associative container emplace.
Performance improvements in: std::rotate algorithm and I/O.
Operator new/delete and type_infos for exception types moved from libc++ to libc++abi.
Bug fixes in: <atomic>; vector<bool> algorithms, <future>,<tuple>, <type_traits>,<fstream>,<istream>, <iterator>, <condition_variable>,<complex> as well as visibility fixes.

VMKit

The VMKit project is an implementation of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time compilation.

The 3.2 release has the following notable changes:

Bug fixes only, no functional changes.

Polly: Polyhedral Optimizer

Polly is an experimental optimizer for data locality and parallelism. It currently provides high-level loop optimizations and automatic parallelization (using the OpenMP run time). Work in the area of automatic SIMD and accelerator code generation was started.

Within the LLVM 3.2 time-frame there were the following highlights:

isl, the integer set library used by Polly, was relicensed under the MIT license.
isl based code generation.
MIT licensed replacement for CLooG (LGPLv2).
Fine grained option handling (separation of core and border computations, control overhead vs. code size).
Support for FORTRAN and Dragonegg.
OpenMP code generation fixes.

Clang Static Analyzer

The Clang Static Analyzer is an advanced source code analysis tool integrated into Clang that performs a deep analysis of code to find potential bugs.

In the LLVM 3.2 release, the static analyzer has made significant improvements in many areas, with notable highlights such as:

Improved interprocedural analysis within a translation unit (see details below), which greatly amplified the analyzer's ability to find bugs.
New infrastructure to model "well-known" APIs, allowing the analyzer to do a much better job when modeling calls to such functions.
Significant improvements to the APIs to write static analyzer checkers, with a more unified way of representing function/method calls in the checker API. Details can be found in the Building a Checker in 24 hours talk.

The release specifically includes notable improvements for Objective-C analysis, including:

Interprocedural analysis for Objective-C methods.
Interprocedural analysis of calls to "blocks".
Precise modeling of GCD APIs such as dispatch_once and friends.
Improved support for recently added Objective-C constructs such as array and dictionary literals.

The release specifically includes notable improvements for C++ analysis, including:

Interprocedural analysis for C++ methods (within a translation unit).
More precise modeling of C++ initializers and destructors.

Finally, this release includes many small improvements to scan-build, which can be used to drive the analyzer from the command line or a continuous integration system. This includes a directory-traversal issue, which could cause potential security problems in some cases. We would like to acknowledge Tim Brown of Portcullis Computer Security Ltd for reporting this issue.

External Open Source Projects Using LLVM 3.2

An exciting aspect of LLVM is that it is used as an enabling technology for a lot of other language and tools projects. This section lists some of the projects that have already been updated to work with LLVM 3.2.

Crack

Crack aims to provide the ease of development of a scripting language with the performance of a compiled language. The language derives concepts from C++, Java and Python, incorporating object-oriented programming, operator overloading and strong typing.

EmbToolkit

EmbToolkit provides Linux cross-compiler toolchain/SDK (GCC/binutils/C library (uclibc,eglibc,musl)), a build system for package cross-compilation and optionally various root file systems. It supports ARM and MIPS. There is an ongoing effort to provide a clang+llvm environment for the 3.2 releases,

FAUST

FAUST is a compiled language for real-time audio signal processing. The name FAUST stands for Functional AUdio STream. Its programming model combines two approaches: functional programming and block diagram composition. In addition with the C, C++, Java, JavaScript output formats, the Faust compiler can generate LLVM bitcode, and works with LLVM 2.7-3.2.

Glasgow Haskell Compiler (GHC)

GHC is an open source compiler and programming suite for Haskell, a lazy functional programming language. It includes an optimizing static compiler generating good code for a variety of platforms, together with an interactive system for convenient, quick development.

GHC 7.0 and onwards include an LLVM code generator, supporting LLVM 2.8 and later.

Julia

Julia is a high-level, high-performance dynamic language for technical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The compiler uses type inference to generate fast code without any type declarations, and uses LLVM's optimization passes and JIT compiler. The Julia Language is designed around multiple dispatch, giving programs a large degree of flexibility. It is ready for use on many kinds of problems.

LLVM D Compiler

LLVM D Compiler (LDC) is a compiler for the D programming Language. It is based on the DMD frontend and uses LLVM as backend.

Open Shading Language

Open Shading Language (OSL) is a small but rich language for programmable shading in advanced global illumination renderers and other applications, ideal for describing materials, lights, displacement, and pattern generation. It uses LLVM to JIT complex shader networks to x86 code at runtime.

OSL was developed by Sony Pictures Imageworks for use in its in-house renderer used for feature film animation and visual effects, and is distributed as open source software with the "New BSD" license. It has been used for all the shading on such films as The Amazing Spider-Man, Men in Black III, Hotel Transylvania, and may other films in-progress, and also has been incorporated into several commercial and open source rendering products such as Blender, VRay, and Autodesk Beast.

Portable OpenCL (pocl)

In addition to producing an easily portable open source OpenCL implementation, another major goal of pocl is improving performance portability of OpenCL programs with compiler optimizations, reducing the need for target-dependent manual optimizations. An important part of pocl is a set of LLVM passes used to statically parallelize multiple work-items with the kernel compiler, even in the presence of work-group barriers. This enables static parallelization of the fine-grained static concurrency in the work groups in multiple ways (SIMD, VLIW, superscalar,...).

Pure

Pure is an algebraic/functional programming language based on term rewriting. Programs are collections of equations which are used to evaluate expressions in a symbolic fashion. The interpreter uses LLVM as a backend to JIT-compile Pure programs to fast native code. Pure offers dynamic typing, eager and lazy evaluation, lexical closures, a hygienic macro system (also based on term rewriting), built-in list and matrix support (including list and matrix comprehensions) and an easy-to-use interface to C and other programming languages (including the ability to load LLVM bitcode modules, and inline C, C++, Fortran and Faust code in Pure programs if the corresponding LLVM-enabled compilers are installed).

Pure version 0.56 has been tested and is known to work with LLVM 3.2 (and continues to work with older LLVM releases >= 2.5).

TTA-based Co-design Environment (TCE)

TCE is a toolset for designing application-specific processors (ASP) based on the Transport triggered architecture (TTA). The toolset provides a complete co-design flow from C/C++ programs down to synthesizable VHDL/Verilog and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.

TCE uses Clang and LLVM for C/C++ language support, target independent optimizations and also for parts of code generation. It generates new LLVM-based code generators "on the fly" for the designed TTA processors and loads them in to the compiler backend as runtime libraries to avoid per-target recompilation of larger parts of the compiler chain.

What's New in LLVM 3.2?

This release includes a huge number of bug fixes, performance tweaks and minor improvements. Some of the major improvements and new features are listed in this section.

Major New Features

LLVM 3.2 includes several major changes and big features:

Loop Vectorizer.
New implementation of SROA.
New NVPTX back-end (replacing existing PTX back-end) based on NVIDIA sources.

LLVM IR and Core Improvements

LLVM IR has several new features for better support of new targets and that expose new optimization opportunities:

Thread local variables may have a specified TLS model. See the Language Reference Manual.
'TYPE_CODE_FUNCTION_OLD' type code and autoupgrade code for old function attributes format has been removed.
Internal representation of the Attributes class has been converted into a pointer to an opaque object that's uniqued by and stored in the LLVMContext object. The Attributes class then becomes a thin wrapper around this opaque object.

Optimizer Improvements

In addition to many minor performance tweaks and bug fixes, this release includes a few major enhancements and additions to the optimizers:

Loop Vectorizer - We've added a loop vectorizer and we are now able to vectorize small loops. The loop vectorizer is disabled by default and can be enabled using the -mllvm -vectorize-loops flag. The SIMD vector width can be specified using the flag -mllvm -force-vector-width=4. The default value is 0 which means auto-select.
We can now vectorize this function:

    unsigned sum_arrays(int *A, int *B, int start, int end) {
      unsigned sum = 0;
      for (int i = start; i < end; ++i)
        sum += A[i] + B[i] + i;

      return sum;
    }

We vectorize under the following loops:

The inner most loops must have a single basic block.
The number of iterations are known before the loop starts to execute.
The loop counter needs to be incremented by one.
The loop trip count can be a variable.
Loops do not need to start at zero.
The induction variable can be used inside the loop.
Loop reductions are supported.
Arrays with affine access pattern do not need to be marked as 'noalias' and are checked at runtime.

SROA - We’ve re-written SROA to be significantly more powerful and generate code which is much more friendly to the rest of the optimization pipeline. Previously this pass had scaling problems that required it to only operate on relatively small aggregates, and at times it would mistakenly replace a large aggregate with a single very large integer in order to make it a scalar SSA value. The result was a large number of i1024 and i2048 values representing any small stack buffer. These in turn slowed down many subsequent optimization paths.

The new SROA pass uses a different algorithm that allows it to only promote to scalars the pieces of the aggregate actively in use. Because of this it doesn’t require any thresholds. It also always deduces the scalar values from the uses of the aggregate rather than the specific LLVM type of the aggregate. These features combine to both optimize more code with the pass but to improve the compile time of many functions dramatically.

Branch weight metadata is preserved through more of the optimizer.

MC Level Improvements

The LLVM Machine Code (aka MC) subsystem was created to solve a number of problems in the realm of assembly, disassembly, object file format handling, and a number of other related areas that CPU instruction-set level tools work in. For more information, please see the Intro to the LLVM MC Project Blog Post.

Added support for following assembler directives: .ifb, .ifnb, .ifc, .ifnc, .purgem, .rept and .version (ELF) as well as Darwin specific .pushsection, .popsection and .previous .
Enhanced handling of .lcomm directive.
MS style inline assembler: added implementation of the offset and TYPE operators.
Targets can specify minimum supported NOP size for NOP padding.
ELF improvements: added support for generating ELF objects on Windows.
MachO improvements: symbol-difference variables are marked as N_ABS, added direct-to-object attribute for data-in-code markers.
Added support for annotated disassembly output for x86 and arm targets.
Arm support has been improved by adding support for ARM TARGET2 relocation and fixing hadling of ARM-style "$d.*" labels.
Implemented local-exec TLS on PowerPC.

Target Independent Code Generator Improvements

Stack Coloring - We have implemented a new optimization pass to merge stack objects which are used in disjoin areas of the code. This optimization reduces the required stack space significantly, in cases where it is clear to the optimizer that the stack slot is not shared. We use the lifetime markers to tell the codegen that a certain alloca is used within a region.

We now merge consecutive loads and stores.

We have put a significant amount of work into the code generator infrastructure, which allows us to implement more aggressive algorithms and make it run faster:

We added new TableGen infrastructure to support bundling for Very Long Instruction Word (VLIW) architectures. TableGen can now automatically generate a deterministic finite automaton from a VLIW target's schedule description which can be queried to determine legal groupings of instructions in a bundle.

We have added a new target independent VLIW packetizer based on the DFA infrastructure to group machine instructions into bundles.

We have added new TableGen infrastructure to support relationship maps between instructions. This feature enables TableGen to automatically construct a set of relation tables and query functions that can be used to switch between various forms of instructions. For more information, please refer to How To Use Instruction Mappings.

Basic Block Placement

A probability based block placement and code layout algorithm was added to LLVM's code generator. This layout pass supports probabilities derived from static heuristics as well as source code annotations such as __builtin_expect.

X86-32 and X86-64 Target Improvements

New features and major changes in the X86 target include:

Small codegen optimizations, especially for AVX2.

ARM Target Improvements

New features of the ARM target include:

Support and performance tuning for the A6 'Swift' CPU.

ARM Integrated Assembler

The ARM target now includes a full featured macro assembler, including direct-to-object module support for clang. The assembler is currently enabled by default for Darwin only pending testing and any additional necessary platform specific support for Linux.

Full support is included for Thumb1, Thumb2 and ARM modes, along with sub-target and CPU specific extensions for VFP2, VFP3 and NEON.

The assembler is Unified Syntax only (see ARM Architecural Reference Manual for details). While there is some, and growing, support for pre-unfied (divided) syntax, there are still significant gaps in that support.

MIPS Target Improvements

New features and major changes in the MIPS target include:

Integrated assembler support: MIPS32 works for both PIC and static, known limitation is the PR14456 where R_MIPS_GPREL16 relocation is generated with the wrong addend. MIPS64 support is incomplete, for example exception handling is not working.
Support for fast calling convention has been added.
Support for Android MIPS toolchain has been added to clang driver.
Added clang driver support for MIPS N32 ABI through "-mabi=n32" option.
MIPS32 and MIPS64 disassembler has been implemented.
Support for compiling programs with large GOTs (exceeding 64kB in size) has been added through llc option "-mxgot".
Added experimental support for MIPS32 DSP intrinsics.
Experimental support for MIPS16 with following limitations: only soft float is supported, C++ exceptions are not supported, large stack frames (> 32000 bytes) are not supported, direct object code emission is not supported only .s .
Standalone assembler (llvm-mc): implementation is in progress and considered experimental.
All classic JIT and MCJIT tests pass on Little and Big Endian MIPS32 platforms.
Inline asm support: all common constraints and operand modifiers have been implemented.
Added tail call optimization support, use llc option "-enable-mips-tail-calls" or clang options "-mllvm -enable-mips-tail-calls"to enable it.
Improved register allocation by removing registers $fp, $gp, $ra and $at from the list of reserved registers.
Long branch expansion pass has been implemented, which expands branch instructions with offsets that do not fit in the 16-bit field.
Cavium Octeon II board is used for testing builds (llvm-mips-linux builder).

PowerPC Target Improvements

Many fixes and changes across LLVM (and Clang) for better compliance with the 64-bit PowerPC ELF Application Binary Interface, interoperability with GCC, and overall 64-bit PowerPC support. Some highlights include:

MCJIT support added.
PPC64 relocation support and (small code model) TOC handling added.
Parameter passing and return value fixes (alignment issues, padding, varargs support, proper register usage, odd-sized structure support, float support, extension of return values for i32 return values).
Fixes in spill and reload code for vector registers.
C++ exception handling enabled.
Changes to remediate double-rounding compatibility issues with respect to GCC behavior.
Refactoring to disentangle ppc64-elf-linux ABI from Darwin ppc64 ABI support.
Assorted new test cases and test case fixes (endian and word size issues).
Fixes for big-endian codegen bugs, instruction encodings, and instruction constraints.
Implemented -integrated-as support.
Additional support for Altivec compare operations.
IBM long double support.

There have also been code generation improvements for both 32- and 64-bit code. Instruction scheduling support for the Freescale e500mc and e5500 cores has been added.

PTX/NVPTX Target Improvements

The PTX back-end has been replaced by the NVPTX back-end, which is based on the LLVM back-end used by NVIDIA in their CUDA (nvcc) and OpenCL compiler. Some highlights include:

Compatibility with PTX 3.1 and SM 3.5
Support for NVVM intrinsics as defined in the NVIDIA Compiler SDK
Full compatibility with old PTX back-end, with much greater coverage of LLVM IR

Please submit any back-end bugs to the LLVM Bugzilla site.

Other Target Specific Improvements

Added support for custom names for library functions in TargetLibraryInfo.

Major Changes and Removed Features

If you're already an LLVM user or developer with out-of-tree changes based on LLVM 3.2, this section lists some "gotchas" that you may run into upgrading from the previous release.

llvm-ld and llvm-stub have been removed, llvm-ld functionality can be partially replaced by llvm-link | opt | {llc | as, llc -filetype=obj} | ld, or fully replaced by Clang.
MCJIT: added support for inline assembly (requires asm parser), added faux remote target execution to lli option '-remote-mcjit'.

Internal API Changes

In addition, many APIs have changed in this release. Some of the major LLVM API changes are:

We've added a new interface for allowing IR-level passes to access target-specific information. A new IR-level pass, called "TargetTransformInfo" provides a number of low-level interfaces. LSR and LowerInvoke already use the new interface.

The TargetData structure has been renamed to DataLayout and moved to VMCore to remove a dependency on Target.

Tools Changes

In addition, some tools have changed in this release. Some of the changes are:

opt: added support for '-mtriple' option.
llvm-mc : - added '-disassemble' support for '-show-inst' and '-show-encoding' options, added '-edis' option to produce annotated disassembly output for X86 and ARM targets.
libprofile: allows the profile data file name to be specified by the LLVMPROF_OUTPUT environment variable.
llvm-objdump: has been changed to display available targets, '-arch' option accepts x86 and x86-64 as valid arch names.
llc and opt: added FMA formation from pairs of FADD + FMUL or FSUB + FMUL enabled by option '-enable-excess-fp-precision' or option '-enable-unsafe-fp-math', option '-fp-contract' controls the creation by optimizations of fused FP by selecting Fast, Standard, or Strict mode.
llc: object file output from llc is no longer considered experimental.
gold plugin: handles Position Independent Executables.

Known Problems

LLVM is generally a production quality compiler, and is used by a broad range of applications and shipping in many products. That said, not every subsystem is as mature as the aggregate, particularly the more obscure targets. If you run into a problem, please check the LLVM bug database and submit a bug if there isn't already one or ask on the LLVMdev list.

Known problem areas include:

The CellSPU, MSP430, and XCore backends are experimental, and the CellSPU backend will be removed in LLVM 3.3.
The integrated assembler, disassembler, and JIT is not supported by several targets. If an integrated assembler is not supported, then a system assembler is required. For more details, see the Target Features Matrix.

Additional Information

A wide variety of additional information is available on the LLVM web page, in particular in the documentation section. The web page also contains versions of the API documentation which is up-to-date with the Subversion version of the source code. You can access versions of these documents specific to this release by going into the "llvm/doc/" directory in the LLVM tree.

If you have any questions or comments about LLVM, please feel free to contact us via the mailing lists.

LLVM Compiler Infrastructure
Last modified: $Date: 2012-12-19 04:50:28 -0600 (Wed, 19 Dec 2012) $