The LLVM Compiler Infrastructure
Site Map:
Download!
Search this Site


Useful Links
Release Emails
18.1.2: Mar 2024
18.1.1: Mar 2024
18.1.0: Mar 2024
17.0.6: Nov 2023
17.0.5: Nov 2023
17.0.4: Oct 2023
17.0.3: Oct 2023
17.0.2: Oct 2023
17.0.1: Sep 2023
All Announcements

Maintained by the
llvm-admin team
2019 European LLVM Developers Meeting
About

The meeting serves as a forum for LLVM, Clang, LLDB and other LLVM project developers and users to get acquainted, learn how LLVM is used, and exchange ideas about LLVM and its (potential) applications.

The conference includes:

Keynote
MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure [ Video ] [ Slides ]
Tatiana Shpeisman (Google), Chris Lattner (Google)

This talk will give an overview of Multi-Level Intermediate Representation - a new intermediate representation designed to provide a unified, flexible and extensible intermediate representation that is language-agnostic and can be used as a base compiler infrastructure. MLIR shares similarities with traditional CFG-based three-address SSA representations (including LLVM IR or SIL), but it also introduces notions from the polyhedral domain as first class concepts. The notion of dialects is a core concept of MLIR extensibility, allowing multiple levels in a single representation. MLIR supports the continuous lowering from dataflow graphs to high-performance target specific code through partial specialization between dialects. We will illustrate in this talk how MLIR can be used to build an optimizing compiler infrastructure for deep learning applications.

MLIR supports multiple front- and back-ends and uses LLVM IR as one of its primary code generation targets. MLIR also relies heavily on design principles and practices developed by the LLVM community. For example, it depends on LLVM APIs and programming idioms to minimize IR size and maximize optimization efficiency. MLIR uses LLVM testing utilities such as FileCheck to ensure robust functionality at every level of the compilation stack, TableGen to express IR invariants, and it leverages LLVM infrastructure such as dominance analysis to avoid implementing all the necessary compiler functionalities from scratch. At the same time, it is a brand new IR, both more restrictive and more general than LLVM IR in different aspects of its design. We believe that the LLVM community will find in MLIR a useful tool for developing new compilers, especially in machine learning and other high-performance domains.

Technical talks
Switching a Linux distribution's main toolchains to LLVM/Clang [ Video ] [ Slides ]
Bernhard Rosenkränzer (Linaro, OpenMandriva, LinDev)

OpenMandriva is the first general-purpose Linux distribution that has switched its primary toolchain to Clang -- this talk will give an overview of what we did, what problems we've faced, and where we're still having problems (usually worked around by using gcc for some packages).

Just compile it: High-level programming on the GPU with Julia [ Video ] [ Slides ]
Tim Besard (Ghent University)

High-level programming languages often rely on interpretation or compilation schemes that are ill-suited for hardware accelerators like GPUs: These devices typically require statically compiled, straight-line code in order to reach acceptable performance. The high-level Julia programming language takes a different approach, by combining careful language design with an LLVM-based JIT compiler to generate high-quality machine code.

In this talk, I will show how we've used that capability to build a GPU back-end for the Julia language, and explain the underlying techniques that make it happen, including a high-level Julia wrapper for the LLVM libraries, and interfaces to share functionality with the existing Julia code generator. I will also demonstrate some of the powerful abstractions that we have built on top of this infrastructure.

The Future of AST Matcher-based Refactoring [ Video ] [ Slides ]
Stephen Kelly

In the last few years, Clang has opened up new possibilities in C++ tooling for the masses. Tools such as clang-tidy and clazy offer ready-to-use source-to-source transformations. Available transformations can be used to modernize (use newer C++ language features), improve readability (remove redundant constructs), or improve adherence to the C++ Core Guidelines.

However, when special needs arise, maintainers of large codebases need to learn some of the Clang APIs to create their own porting aids. The Clang APIs necessarily form a more-exact picture of the structure of C++ code than most developers keep in their heads, and bridging the conceptual gap can be a daunting task.

This talk will show tools and features which make this task easier for developers, ranging from

  • Improvements to the clang-query interpreter
  • Improvements to the AST Matcher API
  • Information essential to creating clang-tidy-checks
  • Debugging and profiling of AST Matchers
  • Advanced tooling

These features are in various stages along the way to being upstreamed to Clang. They enable new possibilities for large-scale refactoring in a reasonable timeframe by solving problems of API discovery, guiding users in creating working refactorings.

A compiler approach to Cyber-Security [ Video ] [ Slides ]
François de Ferrière (STMicroelectronics)

STMicroelectronics is developing LLVM-based compilation tools for its proprietary processors and also for the ARM cores. Applications, among which an increasing number of IOTs developments, require more and more security implemented either in hardware or software, or both. To implement complex and reliable software countermeasures that can be deployed in a timely manner, we are adding specific cybersecurity code-generation features in our production LLVM compiler, that we present in this talk.

We give implementation details on how we worked into Clang and LLVM to implement these techniques and we explain how they contribute to reinforce the software protection. We also detail how we can restrict these transformations to specific safety-critical regions of a program to meet the industrial constraints on performance and code size of our applications.

Compiler Optimizations for (OpenMP) Target Offloading to GPUs [ Video ] [ Slides ]
Johannes Doerfert (Argonne National Laboratory), Hal Finkel (Argonne National Laboratory)

The support of OpenMP target offloading in Clang is steadily increasing. However, when it comes to the optimization of such codes, LLVM is still doing a horrible job. Early separation into different modules and state machine generation are only two reasons why the middle and backend have a hard time generating efficient code.

In this talk, we want to focus on code offloading to GPUs (through OpenMP), an increasingly important part of modern programming. We will first highlight different reasons for missing optimizations and poor code quality before we introduce new practical solutions. While our implementation is still experimental, early results suggest that there is enormous optimization potential in both manually written, and automatically generated, target offloading code.

In addition to the talk, we will, closer to the conference date, initiate a discussion on the LLVM mailing list and publish our implementation.

Handling massive concurrency: Development of a programming model for GPU and CPU [ Video ] [ Slides ]
Matthias Liedtke (SAP)

For efficient parallel execution it is necessary to write massively concurrent algorithms and to optimize memory access. In this session we show our approach of a programming model that is able to execute the same concurrent algorithm efficiently on GPUs and CPUs: Similar to OpenMP it allows the programmer to describe concurrency and memory access declaratively but hides complexity like memory transfers between the CPU and the GPU. In comparison to OpenMP our model provides a higher level of expressiveness which enables us to reach a performance comparable to OpenCL/CUDA.

Automated GPU Kernel Fusion with XLA [ Video ] [ Slides ]
Thomas Joerg (Google)

XLA (Accelerated Linear Algebra) is an optimizing compiler for linear algebra that accelerates TensorFlow computations. The XLA compiler lowers to LLVM IR and relies on LLVM for low-level optimization and code generation. XLA achieves significant performance gains on TensorFlow models. We observed speedups of up to 3x on internal models. The popular image classification model ResNet-50 trains 1.6x faster.

A key optimization performed by XLA is automated GPU kernel fusion. The idea is to combine multiple linear algebra operators into a single GPU kernel to reduce memory bandwidth requirements and kernel launch overhead. TensorFlow with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org) compared to ML frameworks that rely on manually fused, hand-tuned GPU kernels.

The Helium Haskell compiler and its new LLVM backend [ Video ] [ Slides ]
Ivo Gabe de Wolff (University of Utrecht)

Helium, developed at the University of Utrecht, is a compiler for the functional, lazy language Haskell. It is used for research on error diagnosis and teaching. In this talk we will however focus on the new LLVM backend and the compilation of high level features like lambdas, laziness (call-by-need semantics), currying (partial application). Furthermore we discuss some high level optimizations which cannot be done at LLVM-level.

Testing and Qualification of Optimizing Compilers for Functional Safety [ Video ] [ Slides ]
José Luis March Cabrelles (Solid Sands)

In the development of embedded applications, the compiler plays a crucial role in the translation from source to machine code. If the application is safety-critical, functional safety standards such as ISO 26262 for the automotive industry require that the user of the compiler develops confidence in the compilers correct operation. In this presentation we will discuss the requirements of ISO 26262 on tools such as LLVM compilers and how they can be met with a testing procedure that works well with the V-Model of engineering.

As the name implies, functional safety standards deal with specified functionality of components. But what about the optimizations that a LLVM-based compiler applies to the program, sometimes even silently? Optimizations are not even mentioned in the language standards for C and C++ - they are "non-functional" behavior of the compiler. As we will demonstrate, ignoring optimizations will lead to significant holes in the compiler's test coverage. We will show how we have developed a technique that achieves good results with optimization testing and have some errors in Intel's well-regarded Clang-based compiler to show. To show the completeness of our method for the requirements of functional safety, we have analyzed how the tests match with the various LLVM IR-level transformation passes that they go through.

Improving Debug Information in LLVM to Recover Optimized-out Function Parameters [ Video ] [ Slides ]
Nikola Prica (RT-RK), Djordje Todorovic (RT-RK), Ananthakrishna Sowda (CISCO), Ivan Baev (CISCO)

Software release products are compiled with optimization level -O2 and higher. Such products might produce a core-file that is used for investigating cause of problem that produced it. First thing from which we start debug analysis is call-trace from a crash. In such traces most of the parameters are reported as optimized out due to variety of reasons. Some of parameters are really optimized out, but some of their locations could be calculated. Expert software developers are able to find what values parameters had at function entry point by using the technique that requires searching those values in disassembly of caller frame at place of that particular function call. Automation of such technique is described by DWARF 5 specifications and it is already implemented in GCC and GDB since 2011. The goal of this paper is to present ideas, implementation and problems that we encountered while we were working on this feature in LLVM. We will also show the improvement by presenting recovered parameters in some of the call-traces. This feature should improve debugging of optimized code built with LLVM by recovering optimized-out function parameters.

LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong [ Video ] [ Slides ]
Jacob Kreindl (Johannes Kepler University Linz)

Sulong is an execution engine for LLVM bitcode that has support for debugging programs at the level of source code as well as textual LLVM IR. It is part of GraalVM, a polyglot virtual machine that can also execute programs written in multiple dynamic programming languages such as Ruby and Python. Sulong supports GraalVM's language-agnostic tooling interface to provide a rich debugging experience to developers. This includes source-level debugging of native extensions compiled to LLVM bitcode and the dynamic language programs that use them, together in the same debugger session and front-end. Sulong also enables developers to debug programs at the level of LLVM IR, including stepping through the textual IR and inspecting the symbols it contains.

In this talk we will describe different ways GraalVM enables users to debug programs that were compiled to LLVM bitcode. We will introduce the general features of GraalVM-based debuggers by demonstrating source-level debugging of a standalone C/C++ application. Building on this we will showcase GraalVM's ability to provide a truly integrated debugging experience for native extensions of dynamic language programs to users. We will further demonstrate Sulong's support for debugging programs at the LLVM-IR level.

LLDB Reproducers [ Video ] [ Slides ]
Jonas Devlieghere (Apple)

The debugger, like the compiler, is a complex piece of software where bugs are inevitable. When a bug is reported, one of the first steps in its life cycle is trying reproduce the problem. Given the number of moving parts in the debugger, this can be quite challenging. Especially for more sophisticated problems, a small changes in the environment, the binary, its dependencies, or debug information might hide the problem. Getting this right puts a heavy burden on both the reporter and the developer.

Reproducers are a way to automate this process. They contains the necessary information for a bug to occur again with minimal interaction from the developer. For clang a reproducer consists of a script with the compiler invocation and a pre-processed source file. Doing the same thing for the debugger is much more complicated.

This talk discusses what was needed to have working reproducers for LLDB. It goes into detail about what information was needed, how it was captured and finally how the debugger uses it to reproduce an issue. The high level design is addressed as well as some of the challenges, such as dealing with low-level details, remote debugging, and the SB API. It concludes with an overview of what is possible and what isn't.

Sulong: An experience report of using the "other end" of LLVM in GraalVM. [ Video ] [ Slides ]
Roland Schatz (Oracle Labs), Josef Eisl (Oracle Labs)

The most common use-case for LLVM is to re-use its back-end to implement a compiler for new programming languages. In project Sulong, we are going a different route: We use LLVM frontends, and consume the resulting bitcode. Sulong is the LLVM bitcode execution engine of GraalVM, a ployglot virtual machine that executes JavaScript, Python, Ruby, R, and others. The goal of Sulong is to bring C, C++, Fortran, and other languages that compile to LLVM bitcode into the system, and allow low-cost interoperability across language borders. The latter is crucial for efficiently supporting existing native interfaces of dynamic languages.

In this talk, we want to share our experience with implementing an engine for executing LLVM IR in GraalVM. We will discuss how Sulong executes LLVM bitcode and why this allows high-performance interoperability between languages. We will show the challenges of implementing existing native interfaces in new runtime environments, and how we use the different parts of the LLVM project for solving them. We want to focus on situations we found challenging and where we think we can contribute to the project.

SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing [ Video ] [ Slides ]
Andrew Savonichev (Intel)

SYCL is an abstraction layer for C++, that allows a developer to write heterogeneous programs in a "single source" model, where host and device code are written in the same file. Utilizing modern C++ features, SYCL provides a way to develop type-safe and efficient programs for various accelerator devices.

Although SYCL is designed as "extension-free" standard C++ API, there is a need to have some compiler extensions to enable C++ code execution on accelerators. SYCL compiler is responsible for "extracting" device part of code and compiling it to SPIR-V format or device native binary. In addition to that, compiler should also emit auxiliary information, which is used by SYCL runtime to run a device code via OpenCL API.

This talk will go over technical details of the SYCL compiler, and the changes we need to make in order to bring full support for SYCL into upstream LLVM and Clang as described in the RFC: https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html

Handling all Facebook requests with JITed C++ code [ Video ] [ Slides ]
Huapeng Zhou (Facebook), Yuhan Guo (Facebook)

Facebook needs an efficient scripting framework to enable fast iteration of HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and code deployment ecosystem was created to compile/link/execute C++ script at run-time, using Clang and LLVM ORC APIs. The framework allows developers to write business logic and unit test in C++ script, as well as debug using GDB. Profiling using perf is also supported for PGO purpose. This new framework outperformed another previously used scripting language by up to 4X, measured in execution time.

In order to power the C++ script in ABI compatible way, a PCH (pre-compiled header) is built statically to provide declarations and definitions of necessary dependent types and methods. Clang APIs are then used at run-time to transform source code to LLVM IR, which are later passed through LLVM ORC layers for linking/optimizing. Above Clang/LLVM toolchains are statically linked into main binary to ensure compatibility between PCH and C++ scripts. As a result, scripts could be deployed in real time without any main binary change.

clang-scan-deps: Fast dependency scanning for explicit modules [ Video ] [ Slides ]
Alex Lorenz (Apple), Michael Spencer (Apple)

The dependency information that's provided by Clang can be used to implement a pre-scanning phase for a build system that uses Clang modules in an explicit manner, by discovering the required modules before compiling. However, the traditional approach of preprocessing all sources to find the required modular dependencies is typically not fast enough for a pre-scanning phase that must run for every build. This talk introduces clang-scan-deps, an optimized dependency discovery service that can provide speed up of up to 10X over the regular preprocessor-based scanning. This talk goes into details of how this service is implemented and how it can be leveraged by the build system to implement a fast pre-scanning phase for explicit Clang modules.

Clang tools for implementing cryptographic protocols like OTRv4 [ Video ] [ Slides ]
Sofia Celi (Centro de Autonomia Digital)

OTRv4 is the newest version of the Off-The-Record protocol. It is a protocol where the newest academic research intertwines with real-world implementations: it provides end to end encryption, and offline and online deniability for interactive and non-interactive applications. As a real world protocol, it needs to provide an implementation that works for real world users. For this, the OTRv4 team decided to implement it in C. But as we know, working in C can be challenging due to several factors.

In order to make OTRv4s implementation much safer and usable, we decided to use several clang tools, such as clang format, clang tidy and address sanitizers. By using these tools, we uncovered bugs, issues and problems. In this talk, we aim to highlight the most interesting bugs we uncovered by using these tools, by comparing the results of using static analysis and fast memory error detector. We also aim to highlight the importance of using a specific code formatting style, as it makes an implementation much clearer and easier to find bugs. We plan to high point the importance of using these tools on real world implementations that are going to be used by millions of users and that aim to provide the best security properties available.

Implementing the C++ Core Guidelines' Lifetime Safety Profile in Clang [ Video ] [ Slides ]
Gabor Horvath (Eotvos Lorand University), Matthias Gehre (Silexica GmbH), Herb Sutter (Microsoft)

This is an experience report of the Clang-based implementation of Herb Sutter's Lifetime safety profile for the C++ Core Guidelines, available online at cppx.godbolt.org.

We will cover the kinds of diagnoses supported by the checker and how they are implemented using Clang's control flow graph. We will discuss what are the main problems of the current prototype and what are we going to do to fix those. We also plan to discuss the upstreaming process. Some parts of the analysis might end up improving existing clang warnings some of which are on by default. We will also summarize early experience with performance against real-world code bases, including compile time performance for LLVM sources with the checker.

Changes to the C++ standard library for C++20 [ Video ] [ Slides ]
Marshall Clow (CppAlliance)

The next version of the C++ standard will almost certainly be approved next year, and be called C++20. There will be many new features in the standard library in C++20. Things like ranges, concepts, calendar support, and many others. In this talk, I'll give an overview of the new features, and an update on the status of their implementation in libc++.

Adventures with RISC-V Vectors and LLVM [ Video ] [ Slides ]
Robin Kruppe (TU Darmstadt), Roger Espasa (Esperanto Technologies)

RISC-V is a free and open instruction set architecture (ISA) with an established LLVM backend and numerous open-source and proprietary hardware implementations. The work-in-progress vector extension adds standardized vector processing, taking lessons both from traditional long-vector machines and from packed-SIMD approaches that dominated industrial designs in the past few decades. The resulting architecture aims to excel at various scales, from small embedded cores to large HPC accelerators and everything in between.

In this talk you will learn about the RISC-V vector ISA as well as LLVM support for it: vectorizing loops without needing scalar remainder handling, vectors whose length is not known at compile time, a vector unit that can be dynamically reconfigured for increased efficiency, and more.

A Tale of Two ABIs: ILP32 on AArch64 [ Video ] [ Slides ]
Tim Northover (Apple)

We faced the challenge of seamlessly running 32b application binaries on a new 64b S4 chip, which has no hardware support to run 32b binaries. Translating the ARM binaries directly to the new hardware would be hard, but when an application is available in bitcode format, the task is much more feasible. This talk opens the curtain for an inside look into the decisions and steps taken to translate 32b bitcode for the new 64b hardware. It will discuss the many design, implementation and verification challenges of introducing a new ABI, arm64_32, which guarantees that the binaries for the new S4 chip are compatible to the original 32b applications.

LLVM Numerics Improvements [ Video ] [ Slides ]
Michael Berg (Apple), Steve Canon (Apple)

Some LLVM based compilers currently provide two modes of floating point code generation. The first mode, called fast-math, is where performance is the primary consideration over numerical precision and accuracy. This mode does not strictly follow the IEEE-754 standard, but has proven useful for applications that do not require this level of precision. The second mode, called precise-math, is where the compiler carefully follows the subset of behavior defined in the IEEE standard that is applicable to conforming hardware targets. This mode is primarily used for compute workloads and wherever fast-math precision is inadequate, however it runs much slower as it requires a larger number of instructions in general. In practice neither of these modes is particularly desirable. The fast-math mode ignores a significant portion of the standard as pertains to handling undefined values described as Not a Number (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads when the hardware target computes these values correctly and performance remains critical.

Until recently these two models were mutually exclusive, however with the addition of IR flags they need not be. For instance, the FastMath metadata module flag drives behavior deemed numerically unsafe when it is enabled, by indiscriminately enabling optimizations. With IR flags this behavior can be enabled with much finer granularity, allowing various code forms to be fast or precise together in one module. We call this mixed mode compilation. IR flags can be used individually or paired to produce desired floating point behavior under specified constraints with fine granularity of control. Optimization passes have been modified under this new kind of control to produce this behavior. This talk will describe the recent numerics work and discuss the implications for front-ends and backends built with LLVM.

DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration [ Video ] [ Slides ]
Brian Homerding (Argonne National Laboratory), Johannes Doerfert (Argonne National Laboratory)

The US Department of Energy proxy applications are simplified models of the key components of various scientific computing workloads. These proxy applications are useful for research and exploration in many areas, including software technology. We have conducted performance analysis of these proxy application using Clang, GCC and some vendor compilers. These results have identified and motivated our work on modelling the memory access of math functions in Clang. We will discuss our design and our work to expose this ability to encode function information to the developer. Additionally in this area, I will then discuss my collaboration on a development tool designed to explore both the potential performance gap lost from knowledge the developer could encode (but did not) and the extent to which LLVM is able to profitably make use of this information.

Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline [ Video ] [ Slides ]
Kit Barton (IBM), Johannes Doerfert (Argonne National Lab), Hal Finkel (Argonne National Lab), Michael Kruse (Argonne National Lab)

Loop fusion and loop distribution are two key optimizations that typically are featured prominently in a loop optimization pipeline. They are used both to improve performance of applications and also to enable other loop optimizations. For example, loop fusion can improve the performance of applications through increasing temporal data cache locality. It can also increase the scope of other optimizations by creating larger loop nests for intra-loop nest optimizations to work on. Similarly, loop distribution is often used to improve performance directly by distributing loops that exceed hardware resources (e.g., register pressure). It is also frequently used to distribute loops containing loop-carried dependencies into two loops: one with loop carried dependencies and the second with no loop carried dependencies; this enables other optimizations (e.g., vectorization) on the independent loop. Furthermore, these two optimizations can work nicely together, as they have the ability to "undo" transformations done by the other. Thus, the implementation of both of these optimizations must be robust as they can both play an important role in a loop optimization pipeline.

This talk will be a follow-on to "Revisiting Loop Fusion, and its place in the loop transformation framework", presented at the 2018 LLVM Developers' Meeting. The patch to implement basic loop fusion described in the talk is currently undergoing review on phabricator (https://reviews.llvm.org/D55851). We have prototypes to make loop fusion more aggressive by moving code from between two loops (making them adjacent) that will be posted for review once the basic loop fusion patch is accepted. We also have plans to peel loops to (to make their bounds conform), and improve the dependence analysis between the two loop bodies. This talk will also include findings from our current analysis of the loop distribution pass in LLVM. It will provide a summary of the strengths and limitations of loop distribution, and summarize any improvements that are made prior to EuroLLVM 2019. Finally, the presentation will discuss how loop fusion and loop distribution can fit into the existing loop optimization pipeline in LLVM.

Tutorials
Tutorial: Building a Compiler with MLIR [ Video ] [ Slides ]
Amini Mehdi (Google), Nicolas Vasilache (Google), Alex Zinenko (Google)

This tutorial will complement the technical talk about MLIR. We will implement a custom DSL for numerical processing and walk the audience step-by-step through the use of MLIR to support the lowering and the optimization of such DSL, and target LLVM for lower level optimizations and code generation or JIT execution.

Building an LLVM-based tool: lessons learned [ Video ] [ Slides ]
Alex Denisov

In this talk, I want to share my experience in building an LLVM-based tool.

For the last three years, I work on a tool for mutation testing. Currently, it works on Linux, macOS, and FreeBSD and the source code is compatible with any LLVM version between 3.9 and 7.0. Anything that can run in parallel - runs in parallel. I will cover the following topics:

  • Build system: on supporting multiple LLVM versions and building against sources or precompiled binary.
  • Parallelization: which parts of the tool can be parallelized and which should run in one thread
  • Testing: how to build robust test suite for the tool
  • Bitcode: on several ways to convert a program into LLVM bitcode, that can be used by the tool.

LLVM IR Tutorial - Phis, GEPs and other things, oh my! [ Video ] [ Slides ]
Vince Bridgers (Intel Corporation), Felipe de Azevedo Piovezan (Intel Corporation)

LLVM intermediate representation (IR) is the abstract description machine operations used to translate LLVM front ends to a form that's executable by a target machine. Optimizations and transformations are performed on the IR by the LLVM library to create executable images. This tutorial will introduce the IR syntax, describe basic tools for manipulating IR formats, and describe mappings of IR from various common source code control structures. Tutorial materials with specific examples will be made available for the tutorial presentation, and for offline review.

Student Research Competition
Safely Optimizing Casts between Pointers and Integers [ Video ] [ Slides ]
Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. Lopes (Microsoft Research, UK)

In this talk, a list of optimizations that soundly removes casts between pointers and integers will be presented. In LLVM, a pointer is more than just an integer: LLVM allows a pointer to track its underlying object, and the rule to find it is defined as based-on relation. This allows LLVM to aggressively optimize load/stores, but makes the meaning of pointer-integer casts complicated. This causes conflict between existing optimizations, causing long-standing miscompilation bugs like 34548.

To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and using a safe workaround to remove them. This optimization is important because it's removing a significant portion of such cast pairs. We'll show that even if the optimization is disabled, majority of casts can be removed by carefully adding new \& modifying existing optimizations. After the updates, the performance is still comparable to the original LLVM.

An alternative OpenMP Backend for Polly [ Video ] [ Slides ]
Michael Halkenhäuser (TU Darmstadt)

LLVM's polyhedral infrastructure framework Polly may automatically exploit thread-level parallelism through OpenMP. Currently, the user can only influence the number of utilized threads, while other OpenMP parameters such as the scheduling type and chunk size are set to fixed values. This in turn, limits a user's ability to adapt the optimization process for a given problem.

In this work, we present an alternative OpenMP backend for Polly, which provides additional customization options to the user and is based on the LLVM OpenMP runtime. We evaluate our new backend and the influence of the new customization options on performance and compare to Polly's existing OpenMP backend.

Implementing SPMD control flow in LLVM using reconverging CFGs [ Video ] [ Slides ]
Fabian Wahlster (Technische Universität München), Nicolai Hähnle (Advanced Micro Devices)

Compiling programs for an SPMD execution model, e.g. for GPUs or for whole program vectorization on CPUs, requires a transform from the thread-level input program into a vectorized wave-level program in which the values of the original threads are stored in corresponding lanes of vectors. The main challenge of this transform is handling divergent control flow, where threads take different paths through the original CFG. A common approach, which is currently taken by the AMDGPU backend in LLVM, is to first structurize the program as a simplification for subsequent steps.

However, structurization is overly conservative. It can be avoided when control flow is uniform, i.e. not divergent. Even where control flow is divergent, structurization is often unnecessary. Moreover, LLVM's StructurizeCFG pass relies on region analysis, which limits the extent to which it can be evolved.

We propose a new approach to SPMD vectorization based on saying that a CFG is reconverging if for every divergent branch, one of the successors is a post-dominator. This property is weaker than structuredness, and we show that it can be achieved while preserving uniform branches and inserting fewer new basic blocks than structurization requires. It is also sufficient for code generation, because it guarantees that threads which "leave" a wave at divergent branches will be able to rejoin it later.

Function Merging by Sequence Alignment [ Video ] [ Slides ]
Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)

Resource-constrained devices for embedded systems are becoming increasingly important. In such systems, memory is highly restrictive, making code size in most cases even more important than performance. Compared to more traditional platforms, memory is a larger part of the cost and code occupies much of it. Despite that, compilers make little effort to reduce code size. One key technique attempts to merge the bodies of similar functions. However, production compilers only apply this optimization to identical functions, while research compilers improve on that by merging the few functions with identical control-flow graphs and signatures. Overall, existing solutions are insufficient and we end up having to either increase cost by adding more memory or remove functionality from programs.

We introduce a novel technique that can merge arbitrary functions through sequence alignment, a bioinformatics algo- rithm for identifying regions of similarity between sequences. We combine this technique with an intelligent exploration mechanism to direct the search towards the most promising function pairs. Our approach is more than 2.4x better than the state-of-the-art, reducing code size by up to 25%, with an overall average of 6%, while introducing an average compilation-time overhead of only 15%. When aided by profiling information, this optimization can be deployed without any significant impact on the performance of the generated code.

Compilation and optimization with security annotations [ Video ] [ Slides ]
Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), Albert Cohen (Google)

Program analysis and program transformation systems need to express additional program properties, to specify test and verification goals, and to enhance their effectiveness. Such annotations are typically inserted to the representation on which the tool operates; e.g., source level for establishing compliance with a specification, and binary level for the validation of secure code. While several annotation languages have been proposed, these typically target the expression of functional properties. For the purpose of implementing secure code, there has been little effort to support non-functional properties about side-channels or faults. Furthermore, analyses and transformations making use of such annotations may target different representations encountered along the compilation flow.

We extend an annotation language to express a wider range of functional and non-functional properties, enabling security-oriented analyses and influencing the application of code transformations along the compilation flow. We translate this language to the different compiler representations from abstract syntax down to binary code. We explore these concepts through the design and implementation of an optimizing, annotation-aware compiler, capturing annotations from the program source, propagating and emitting them in the binary, so that binary-level analysis tools can use them.

Adding support for C++ contracts to Clang [ Video ] [ Slides ]
Javier López-Gómez (University Carlos III of Madrid), J. Daniel García (University Carlos III of Madrid)

A language supporting contract-checking allows to detect programming errors. Also, making this information available to the compiler may cause it to perform additional optimizations.

This paper presents our implementation of the P0542R5 technical specification (now part of the C++20 working draft).

Lightning talks
LLVM IR Timing Predictions: Fast Explorations via lli [ Video ] [ Slides ]
Alessandro Cornaglia (FZI - Research Center for Information Technology)

Many applications, especially in the embedded domain, have to be executed on different hardware target platforms. For these applications, it is necessary to evaluate both functional and non-functional properties, such as software execution time, in all their hardware/software combinations. Especially in the context of software product line engineering, it is not feasible to test all variants one-by-one. The intermediate representation of the source code offers an attractive opportunity for a single-run analysis, because it covers the software variability, while at the same time omitting the hardware-dependent optimizations.

We present an extension for the LLVM IR execution engines, which are part of the LLVM lli tool. The extension evaluates on the fly functional and non-functional properties for all the hardware variants during one lli execution. In particular, our extension is designed for the evaluation of the execution time of a program for multiple target platforms considering different software variants. Both the interpreter and JIT execution modes are supported. Prospectively, our approach will be enriched with multiple analysis techniques. Thanks to our approach, it is now possible to evaluate software variants with regard to multiple hardware platforms in a single lli execution run.

Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP [ Video ] [ Slides ]
Dibyendu Das (AMD)

In this brief talk I will show how Outer-Loop-Vectorization (OLV), which is of great interest to the LLVM community, can be visualized as a combination of two transformations applied to a loop-nest of interest. These two transformations are LoopUnrollAndJam and SLP. LoopUnrollAndJam is a fairly new addition to the LLVM loop-optimization repertoire. Combined with a fairly powerful SLP that LLVM supports today, we are able to vectorize the outer loop of several important kernels automatically without the support of any pragma. At present our implementation is at the level of a PoC and does not exploit any rigorous costing mechanism. While we understand that OLV is being implemented in the LoopVectorizer using the VPlan technique, this paper highlights a quick and cheap way to solve the same problem in a different manner using two existing transforms.

Clacc 2019: An Update on OpenACC Support for Clang and LLVM [ Video ] [ Slides ]
Joel E. Denny (Oak Ridge National Laboratory), Seyong Lee (Oak Ridge National Laboratory), Jeffrey S. Vetter (Oak Ridge National Laboratory)

We are developing production-quality, standard-conforming OpenACC [1] compiler and runtime support in Clang and LLVM for the US Exascale Computing Project [2][3]. A key strategy of Clacc's design is to translate OpenACC to OpenMP in order to leverage Clang's existing OpenMP compiler and runtime support and to minimize implementation divergence. To maximize reuse of the OpenMP implementation and to facilitate research and development into new source-level tools involving both the OpenACC and OpenMP levels, Clacc implements this translation in the Clang AST using Clang's TreeTransform facility. However, we are also following LLVM IR parallel extensions being developed by the community as a path to improve compiler optimizations and analyses.

The purpose of this talk is to provide an update on Clacc progress over the preceding year including early performance results, to present the plan for the year ahead, and to invite participation from others. Clacc's OpenACC support is still maturing and we have not yet offered it upstream. However, we have already upstreamed many mutually beneficial improvements from the Clacc project, including improvements to LLVM's testing infrastructure and to Clang and its OpenMP support. This talk will summarize those contributions as well.

[1] OpenACC standard: https://www.openacc.org/

[2] Clacc: Translating OpenACC to OpenMP in Clang. Joel E. Denny, Seyong Lee, and Jeffrey S. Vetter. 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), Dallas, TX, USA, (2018).

[3] Clacc: OpenACC Support for Clang and LLVM. Joel E. Denny, Seyong Lee, and Jeffrey S. Vetter. 2018 European LLVM Developers Meeting (EuroLLVM 2018).

Targeting a statically compiled program repository with LLVM [ Video ] [ Slides ]
Phil Camp (SN Systems), Russell Gallop (SN Systems)

Following on from the 2016 talk "Demo of a repository for statically compiled programs", this lightning talk will present a brief overview of how LLVM was modified to target a program repository. This includes adding a new target output format and a new optimization pass to skip program elements already present in the repository. Reference: https://github.com/SNSystems/llvm-prepo

Does the win32 clang compiler executable really need to be over 21MB in size? [ Video ] [ Slides ]
Russell Gallop (SN Systems), Greg Bedwell (SN Systems)

The title of this lighting talk is from a bug filed in the early days of the PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times larger than the PS3 compiler. Since then it has almost doubled to over 40MB. For a compiler which targets one system this seems excessive. Executable size can cost in worse cache performance and cost time if transferring for distributed builds.

In this lightning talk I will look at where this comes from and how it can be managed.

Resolving the almost decade old checker dependency issue in the Clang Static Analyzer [ Video ] [ Slides ]
Kristóf Umann (Ericsson Hungary, Eötvös Loránd University)

As checkers grew in numbers in the Static Analyzer, the problem of certain checkers depending on one another was inevitable. One particular problem, for example, is that a checker called MallocChecker, which despite its name does all sorts of memory allocation and de- or reallocation related checks, depends on CStringChecker to model calls to strcmp. While these checkers are completely separate entities, the Static Analyzer also contains large checker classes that in fact expose multiple checkers to the user: For example, IteratorChecker has a modeling part, and it exposes 3 iterator related checkers, and enabling any of the three will also enable the unexposed modeling part. Having both of these structures makes it difficult to find a solution where the developer (or the experienced user) can easily see what checkers are enabled, as these dependencies are only expressed in the implementation.

This talk is going to discuss elegant solutions as to how these rather fragile checker structures can be preserved by declaring these dependencies in TableGen files, how checker developers (and users) can ensure that when the analyzer is invoked, only the requested checkers will be enabled, and also take a very brief look at what other features the analyzer gained thanks to these issues being resolved.

Adopting LLVM Binary Utilities in Toolchains [ Video ] [ Slides ]
Jordan Rupprecht (Google)

Although many projects have migrated from GCC-based toolchains to Clang-based ones, tools from the GNU Binutils collection are still widely used despite having equivalents in the LLVM project. The problems faced when attempting to use LLVM tools range anywhere from simple command line syntax differences to unimplemented or buggy features. In this talk, I will describe some of the types of challenges we faced when adopting LLVM tools, as well as some of the strategies we used to test the toolchain.

Multiplication and Division in the Range-Based Constraint Manager [ Video ] [ Slides ]
Ádám Balogh (Ericsson Hungary Ltd.)

The default constraint manager of the Clang Static Analyzer is a simple range-based constraint manager: it stores and manages the valid ranges for the values of symbolic expressions. Upon new assumptions it further constrains these ranges which often results in an empty range which tells the analyzer that the assumption is impossible. Until now the constraint manager could handle basic assumptions: A <rel> m, A + n <rel> m and A - n <rel> m where A is a symbolic expression, n and m integer constants and <rel> a relational operator. In the latter two cases where a constant is added or subtracted from the symbolic expression the range of the additive expression is calculated by adjusting the range circularly by the constant. However, it could not cope with division and multiplication, thus not even the range for A*2 could be deduced from the range of A. This shortcoming lead to both false positives and missed true positives.

To improve the true positive/false positive ratio of the analyzer we extended the range-based constraint manager to be able to handle expressions of the format A <mul> k <add> n <rel> m, where A is a symbolic expression, k, m and n integer constants, <mul> a multiplicative operator (* or /), <add> an additive operator (+ or -) and <rel> a relational operator. The main challenge in our work was to correctly scale the ranges in the circular arithmetic: for example in case of signed 8 bit types in A * 2 == 56 the value of A could not only be 28, but also -100. Similarly, in A / 3 == 4 the value of A is not necessarily 12, but anything in range [12..14]. To ensure full correctness we also proved our solution: first we generated every range for every constants in both the 8 bit signed and unsigned arithmetic, then we tested whether the scaling algorithm calculates exactly the same ranges. Finally we extrapolated this algorithm to wider integer types and ported it to the range-based constraint manager. According to our measurements there is no significant change in the performance and in the talk we will present numbers of lost false positives and new true positives.

Statistics Based Checkers in the Clang Static Analyzer [ Video ] [ Slides ]
Ádám Balogh (Ericsson Hungary Ltd.)

In almost every development project there are some conventions that the return value of some functions in an external library must be compared to some extremal value, such as zero. For example, many integer functions return negative number in case of error similarly to pointer functions returning null pointers. In a large project with many external functions it is virtually impossible to formalize all these rules explicitly: they are either unwritten or only exist in a natural language. To help enforcing these rules, we created checkers in the Clang Static Analyzer to explore these rules on statistical base and check the code for them. We currently support two kinds of extremal values: negative numbers for functions returning integers and null pointers for functions returning pointers.

Example:

int i = may_return_return_negative();

v[i]; // error: negative indexing

Exploration and checking for these rules happens in two phases: in the first phase we check every function call and create a summary for each function recording the percentage the return value is checked for negativeness (integer functions) or nullness (pointer functions). If this percentage is above a defined threshold (85% by default) we assume that the rule for the function exists. The second phase is the usual execution of the analyzer where a checker checks the code for violations of the rule: it splits the execution path to two branches at the call of the listed functions, where the return value in one branch is an extremal value (negative for integers or null for pointers) and non-extremal value on the other branch. Other checkers (e.g. the null-pointer dereference checker) are expected to find errors on the extremal-value branch if they are not terminated in the code by checking for the extremal-value. The performance impact of the state-split is low: in at least 85% of the cases the extremal-value branch is terminated quickly, in the remaining cases we expect another checker to create a sink-node because of an error. The new checker is under evaluation on open-source projects. We found some false positives, however their amount can be reduced by involving the arguments into the statistics.

Flang Update [ Video ] [ Slides ]
Steve Scalpone (NVIDA / PGI / Flang)

An update about the current state of Flang, including a report on OpenMP 4.5 target offload, Fortran performance and the new f18 front end.

Swinging Modulo Scheduling together with Register Allocation [ Video ] [ Slides ]
Lama Saba (Intel)

VLIW architectures rely heavily on Modulo Scheduling to optimize ILP in loops. Modulo Scheduling can be achieved today in LLVM using the MachinePipeliner pass, which implements a Swing Modulo Scheduler prior to register allocation [1]. For some VLIW architectures, such as those lacking hardware interlocks or the ability to spill registers onto a stack, the MachinePipeliner's decisions become crucial for the success of the register allocation phase, since they affect the latter's decisions to generate splits or spills, which in turn can result in an inefficient or even an unsuccessful resource allocation.

Nevertheless, even though the MachinePipeliner aims to schedule with a minimal Initiation Interval, it is structured in a way that facilities trying larger Initiation Intervals or a different ordering, this structure lends itself to alternative, possibly less aggressive scheduling retries, after more aggressive attempts have failed in register allocation.

This talk introduces this issue and explores how we can achieve successful modulo scheduling and register allocation for such architectures in LLVM by introducing a repetitive rollback-and-retry mechanism for altering scheduling decisions based on the register allocator's outcome, and how we can leverage such an approach to improve the scheduling of VLIW architectures in general.

[1] An Implementation of Swing Modulo Scheduling in a Production Compiler - Brendon Cahoon - http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf

LLVM for the Apollo Guidance Computer [ Video ] [ Slides ]
Lewis Revill (University of Bath)

Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for the first time. Among the many extraordinary engineering feats that made this possible was the Apollo Guidance Computer, an innovative processor for its time with an instruction set that was thought up well before the advent of C. So 50 years later, why not implement support for it in a modern compiler such as LLVM?

This talk will give a brief overview of some of the architectural features of the Apollo Guidance Computer followed by an account of my implementation of an LLVM target so far. The shortcomings of LLVM when it comes to implementing such an unusual architecture will be discussed along with the workarounds used to overcome them.

Catch dangling inner pointers with the Clang Static Analyzer [ Video ] [ Slides ]
Réka Kovács (Eötvös Loränd University)

C++ container classes provide methods that return a raw pointer to the container's inner buffer. When the container is destroyed, the inner buffer is deallocated. A common bug is to use such a raw pointer after deallocation, which may lead to crashes or other unexpected behavior.

This lightning talk will present a new Clang Static Analyzer checker designed to address the above described problems, implemented last year as a Google Summer of Code project. The checker has found serious problems in popular open source projects with a negligible false positive rate. Future plans include adding support for view-like constructs and non-STL containers.

Cross translation unit test case reduction [ Video ] [ Slides ]
Réka Kovács (Eötvös Loränd University)

C-Reduce, released by Regehr et al. in 2012, is an excellent tool designed to generate a minimal test case from a C/C++ file that has some specific property (e.g. triggers a bug). One of the most interesting parts of C-Reduce is Clang Delta, which is a set of compiler-like transformations implemented using Clang libraries. Clang Delta includes transformations like changing a function parameter to a global variable etc.

With the introduction of the experimental cross translation unit analysis feature in the Clang Static Analyzer, there arose a need to investigate crashes, bugs, or false positive reports that spread across different translation units. Unfortunately, C-Reduce was designed to minimize one translation unit at a time, and some of the Clang Delta transformations cannot be applied to multiple TUs in their original form.

This talk/poster is a status report about a work in progress that aims to make it possible to use C-Reduce for cross translation unit test case reduction.

BoFs
RFC: Towards Vector Predication in LLVM IR
Simon Moll (Saarland University), Sebastian Hack (Saarland University)

In this talk, we present the current state of the Explicit Vector Length extension for LLVM. EVL is the first step towards proper predication and active vector length support in LLVM IR. There has been a recent surge in vector ISAs, let it be the RISC-V V extension, ARM SVE or NEC SX-Aurora, all of which pose new demands to LLVM IR. Among their novel features are an active vector length, full predication on all vector instructions and a register length that is unknown at compile time. In this talk, we present the Explicit Vector Length extension (EVL) for LLVM IR. EVL provides primitives that are practical for both, backends and IR-level automatic vectorizers. At the same time, EVL is compatible with LLVM-SVE and even existing short SIMD ISAs stand to benefit from its consistent handling of predication.

IPO --- Where are we, where do we want to go?
Johannes Doerfert (Argonne National Laboratory), Kit Barton (IBM Toronto Lab)

Interprocedural optimizations (IPOs) have been historically weak in LLVM. The strong reliance on inlining can be seen as a consequence or cause. Since inlining is not always possible (parallel programs) or beneficial (large functions), the effort to improve IPO has recently seen an upswing again [0,1,2]. In order to capitalize this momentum, we would like to talk about the current situation in LLVM, and goals for the immediate, but also distant, future.

This open-ended discussion is not aimed at a particular group of people. We expect to discuss potential problems with IPO, as well as desirable analyses and optimizations, both experts and newcomers are welcome to attend.

[0] https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html

[1] These links do not yet exist but will be added later on.

[2] One link will be an RFC outlining missing IPO capabilities, the other will point to a function attribute deduction rewrite patch (almost finished).

LLVM binutils [ Notes ] [ Slides ]
James Henderson (SN Systems), Jordan Rupprecht (Google)

LLVM has a suite of binary utilities that broadly mirror the GNU binutils suite, with tools such as llvm-readelf, llvm-nm, and llvm-objcopy. These tools are already widely used in testing the rest of LLVM, and are now starting to be adopted as full replacements for the GNU tools in production environments.

This discussion will focus on what more needs to be done to make this migration process easier, how far we need to go to make drop-in replacements for the GNU tools, and what features people want to prioritize. Finally, we will look at the broader future goals of these tools.

RFC: Reference OpenCL Runtime library for LLVM
Andrew Savonichev (Intel), Alexey Sachkov (Intel)

LLVM is used as a foundation for majority of OpenCL compilers, thanks to excellent support of OpenCL C language in Clang frontend, and modularity of LLVM. Unfortunately, a compiler is not the only component that is required to develop using OpenCL: users need a runtime library that implements the OpenCL API. While there are several implementations of OpenCL runtime exist, both open and proprietary, they do not have a community wide adoption. This leads to fragmentation and effort duplication across OpenCL community, and negatively impacts OpenCL ecosystem in general.

The purpose of this BoF is to bring all parties interested in getting a reference OpenCL Runtime implementation in LLVM, that is designed to be easily extendable to support various accelerator devices (CPU/GPU/FPGA/DSP) and allow users and compiler developers to rapidly prototype OpenCL specific functionality in LLVM and Clang.

LLVM Interface Stability Guarantees BoF
Stephen Kelly

The goal of this BoF is to create the basis for a new page of documentation enumerating the stability guarantees of interfaces exposed from LLVM products.

There are some interfaces which are known to make no stability guarantees, such as the Clang C++ API, others which make strict API guarantees, such as the libclang C API, and still others, such as the LLVM IR API which is somewhere in between. Only the latter appears in the LLVM Developer Policy. Mostly the rest of the interface stability guarantees are tribal knowledge.

A centralized location in the documentation for this documentation would present guidelines for developers to follow when changing various parts of LLVM code, and inform consumers what they can expect and rely upon when using interfaces. This includes code interfaces and command line interfaces.

Clang Static Analyzer BoF
Devin Coughlin (Apple), Gabor Horvath (Eotvos Lorand University)

Let's discuss the present and future of the Clang Static Analyzer! We'll start with a brief overview of analyzer features the community has added over the last year. We'll then dive into a discussion of possible focus areas for the next year, including potential deeper integration with clang-tidy.

LLVM Numerics Improvements
Michael Berg (Apple), Steve Canon (Apple)

Some LLVM based compilers currently provide two modes of floating point code generation. The first mode, called fast-math, is where performance is the primary consideration over numerical precision and accuracy. This mode does not strictly follow the IEEE-754 standard, but has proven useful for applications that do not require this level of precision. The second mode, called precise-math, is where the compiler carefully follows the subset of behavior defined in the IEEE standard that is applicable to conforming hardware targets. This mode is primarily used for compute workloads and wherever fast-math precision is inadequate, however it runs much slower as it requires a larger number of instructions in general. In practice neither of these modes is particularly desirable. The fast-math mode ignores a significant portion of the standard as pertains to handling undefined values described as Not a Number (NaNs) and Infinities (INFs), resulting in difficulties for certain workloads when the hardware target computes these values correctly and performance remains critical.

Until recently these two models were mutually exclusive, however with the addition of IR flags they need not be. For instance, the FastMath metadata module flag drives behavior deemed numerically unsafe when it is enabled, by indiscriminately enabling optimizations. With IR flags this behavior can be enabled with much finer granularity, allowing various code forms to be fast or precise together in one module. We call this mixed mode compilation. IR flags can be used individually or paired to produce desired floating point behavior under specified constraints with fine granularity of control. Optimization passes have been modified under this new kind of control to produce this behavior. This talk will describe the recent numerics work and discuss the implications for front-ends and backends built with LLVM.

LLVM Foundation BoF
LLVM Foundation Board of Directors

Ask the LLVM Foundation Board of Directors anything, get program updates.

Posters
Clava: C/C++ source-to-source from CMake using LARA [ Poster ]
João Bispo (FEUP/INESCTEC)

Clava is a Clang-based source-to-source compiler that executes scripts written in LARA, a superset of JavaScript with additional syntax for AST analysis and transformation.

Clava intends to improve on Clang's source-to-source capabilities, by providing a more convenient and powerful way to analyze, transform and generate C/C++ code.

Although Clava is a stand-alone tool, we will present the Clava CMake plug-in, which allows to easily apply LARA scripts to C/C++ CMake projects. Clava is open-source and runs on Linux, Windows and MacOS.

Safely Optimizing Casts between Pointers and Integers [ Poster ]
Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu (University of Utah, USA), John Regehr (University of Utah, USA), Nuno P. Lopes (Microsoft Research, UK)

In this talk, a list of optimizations that soundly removes casts between pointers and integers will be presented. In LLVM, a pointer is more than just an integer: LLVM allows a pointer to track its underlying object, and the rule to find it is defined as based-on relation. This allows LLVM to aggressively optimize load/stores, but makes the meaning of pointer-integer casts complicated. This causes conflict between existing optimizations, causing long-standing miscompilation bugs like 34548.

To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and using a safe workaround to remove them. This optimization is important because it's removing a significant portion of such cast pairs. We'll show that even if the optimization is disabled, majority of casts can be removed by carefully adding new \& modifying existing optimizations. After the updates, the performance is still comparable to the original LLVM.

Scalar Evolution Canon: Click! Canonicalize SCEV and validate it by Z3 SMT solver! [ Poster ]
Lin-Ya Yu (Xilinx), Alexandre Isoard (Xilinx)

A scalar evolution(SCEV) is an analyzed expression. It represents how the value of scalar variables changes in a program when we execute the code[0]. It is implemented as a pass and is well-used in many analysis and optimizations in LLVM, such as loop strength reduction, induction variable substitution, and memory access analysis. However, it is difficult to have a canonical form for SCEV that can meet all other passes needs. Here, we develop SCEV Canon to do canonicalization and further simplification on SCEV.

A satisfiability modulo theories(SMT) solver from Microsoft Research, Z3, is introduced in this work to verify the correctness of canonicalized SCEV. Moreover, Z3 can also help us check the equivalence of SCEVs between different SCEV implementation in different released of LLVM. This poster shares the whole process of how to canonicalize SCEV without modifying the scalar evolution pass, verify and test the generated SCEV. We also try to open a discussion about some simplification that can be done on SCEV.

[0] https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution

Splendid GVN: Partial Redundancy Elimination for Algebraic Simplification [ Poster ]
Li-An Her (National Tsing Hua University), Jenq-Kuen Lee (National Tsing Hua University)

Modern computation of Neural Network, signal processing of GPS and Wifi, image processing, etc, highly depends on enormous linear algebra operations. Algebraic simplification improves performance for more and more complicated computation such as convolutions for CNN and Sobel operator, inner products for discrete cosine transform and FFT of signal processing, etc. LLVM IR provides several passes of optimization for algebraic simplification, constant folding, copy propagation, etc. One is global value numbering (GVN). These passes work fine except encountering branches and non-local cases. One case is partial redundancy elimination (PRE). At least two instructions are redundant or congruent, but they are in different blocks. Even though elimination of one redundant won't lead to logic error, compiler lacks such rule and ignores such elimination. Thus, algebraic simplification fails to optimize code when PRE occurs. GVN provides PRE mechanism with lazy code motion, but it cannot provide more accurate congruence information due to loops and Φ-nodes. New GVN handles such case and provides more delicate congruence information, but it lacks mechanism for and ignores PRE.

In this paper, we propose Splendid GVN which inserts PRE mechanism for New GVN on LLVM 7.0.0. When PRE happens, our pass checks safety and applies hoist code motion to eliminate partial redundancy. Original GVN applies less accurate algorithm and can only perform lazy code motion, which takes risk for increasing code size. Original Hoist GVN cannot handle PRE and utilizes GVN instead of New GVN, which cannot provide more delicate information due to loops and may miss opportunity for further elimination. Experiments show that our Splendid GVN performs hoist code motion for PRE on 2 qualified PRE programs from LLVM test directory for GVN (available in source code). Splendid GVN reduces total code size with -18.37% and -7% compared to original 2 programs and New GVN results.

An alternative OpenMP Backend for Polly [ Poster ]
Michael Halkenhäuser (TU Darmstadt)

LLVM's polyhedral infrastructure framework Polly may automatically exploit thread-level parallelism through OpenMP. Currently, the user can only influence the number of utilized threads, while other OpenMP parameters such as the scheduling type and chunk size are set to fixed values. This in turn, limits a user's ability to adapt the optimization process for a given problem.

In this work, we present an alternative OpenMP backend for Polly, which provides additional customization options to the user and is based on the LLVM OpenMP runtime. We evaluate our new backend and the influence of the new customization options on performance and compare to Polly's existing OpenMP backend.

Does the win32 clang compiler executable really need to be over 21MB in size? [ Poster ]
Russell Gallop (SN System), Greg Bedwell (SN Systems)

The title of this lighting talk is from a bug filed in the early days of the PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times larger than the PS3 compiler. Since then it has almost doubled to over 40MB. For a compiler which targets one system this seems excessive. Executable size can cost in worse cache performance and cost time if transferring for distributed builds.

In this lightning talk I will look at where this comes from and how it can be managed.

Enabling Multi- and Cross-Language Verification with LLVM [ Poster ]
Jack J. Garzella (University of Utah), Marek Baranowski (University of Utah), Shaobo He (University of Utah), Zvonimir Rakamaric (University of Utah)

Developers nowadays regularly use numerous programming languages with different characteristics and trade-offs. Unfortunately, implementing a software verifier for a new language from scratch is a large and tedious undertake, requiring expert knowledge in multiple domains, such as compilers, verification, and constraint solving. Hence, only a tiny fraction of the used languages has readily available software verifiers to aid in the development of correct programs. In the past decade, there has been a trend of leveraging popular compiler intermediate representations (IRs), such as LLVM IR, when implementing software verifiers. The main advantage is to avoid implementing large front-ends, and instead rely on a typically simple canonical format of an IR. In addition, processing IR promises out-of-the-box multi- and cross-language verification since, at least in theory, a verifier ought to be able to handle a program in any programming language (and their combination) that can be compiled into the IR. In practice though, to the best of our knowledge, nobody has explored the feasibility and ease of such integration of new languages. This talk introduces a methodology for adding support for a new language into an IR-based verification toolflow. Using our methodology, we extend an existing verifier called SMACK with support for 7 additional languages. We assess the quality of our extensions and the proposed methodology through several case studies, and we describe the lessons we learned in the process.

Instruction Tracing and dynamic codegen analysis to identify unique llvm performance issues. [ Poster ]
Biplob (IBM)

Performance analysis of the machine code generated by a compiler can be carried out in different ways and can also be based on application in question. Common methods use some form of profiling on a running program which generally provides the statistical information about certain data and events. While this method does give important insights to a performance problem, some of the issues are more clearly understood when the compiled applications is actually run and the dynamic instructions of hot code execution paths are traced and analyzed in a small execution window. Trace records contain instructions and data, memory addresses and other information which provide complete visibility into the workings of an application.

While tracing is very useful in micro-architecture analysis we will stick to how these traces can benefit compiler performance analysis. In this talk we will look at some of these code-gen issues which were better identified when a running application compiled by llvm and other compilers were traced for hot code sections on IBM Power9 processor.

Handling all Facebook requests with JITed C++ code [ Poster ]
Huapeng Zhou (Facebook), Yuhan Guo (Facebook)

Facebook needs an efficient scripting framework to enable fast iteration of HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and code deployment ecosystem was created to compile/link/execute C++ script at run-time, using Clang and LLVM ORC APIs. The framework allows developers to write business logic and unit test in C++ script, as well as debug using GDB. Profiling using perf is also supported for PGO purpose. This new framework outperformed another previously used scripting language by up to 4X, measured in execution time.

In order to power the C++ script in ABI compatible way, a PCH (pre-compiled header) is built statically to provide declarations and definitions of necessary dependent types and methods. Clang APIs are then used at run-time to transform source code to LLVM IR, which are later passed through LLVM ORC layers for linking/optimizing. Above Clang/LLVM toolchains are statically linked into main binary to ensure compatibility between PCH and C++ scripts. As a result, scripts could be deployed in real time without any main binary change.

Implementing SPMD control flow in LLVM using reconverging CFGs [ Poster ]
Fabian Wahlster (Technische Universität München), Nicolai Hähnle (Advanced Micro Devices)

Compiling programs for an SPMD execution model, e.g. for GPUs or for whole program vectorization on CPUs, requires a transform from the thread-level input program into a vectorized wave-level program in which the values of the original threads are stored in corresponding lanes of vectors. The main challenge of this transform is handling divergent control flow, where threads take different paths through the original CFG. A common approach, which is currently taken by the AMDGPU backend in LLVM, is to first structurize the program as a simplification for subsequent steps.

However, structurization is overly conservative. It can be avoided when control flow is uniform, i.e. not divergent. Even where control flow is divergent, structurization is often unnecessary. Moreover, LLVM's StructurizeCFG pass relies on region analysis, which limits the extent to which it can be evolved.

We propose a new approach to SPMD vectorization based on saying that a CFG is reconverging if for every divergent branch, one of the successors is a post-dominator. This property is weaker than structuredness, and we show that it can be achieved while preserving uniform branches and inserting fewer new basic blocks than structurization requires. It is also sufficient for code generation, because it guarantees that threads which "leave" a wave at divergent branches will be able to rejoin it later.

LLVM for the Apollo Guidance Computer [ Poster ]
Lewis Revill (University of Bath)

Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for the first time. Among the many extraordinary engineering feats that made this possible was the Apollo Guidance Computer, an innovative processor for its time with an instruction set that was thought up well before the advent of C. So 50 years later, why not implement support for it in a modern compiler such as LLVM?

This talk will give a brief overview of some of the architectural features of the Apollo Guidance Computer followed by an account of my implementation of an LLVM target so far. The shortcomings of LLVM when it comes to implementing such an unusual architecture will be discussed along with the workarounds used to overcome them.

LLVM Miner: Text Analytics based Static Knowledge Extractor [ Poster ]
Hameeza Ahmed (NED University of Engineering and Technology), Muhammad Ali Ismail (NED University of Engineering and Technology)

Compiler converts high level language code into assembly language by enabling optimizations. There are three phases in compiler namely front end, middle end and backend. Low Level Virtual Machine (LLVM) is an open source framework enabling provision of all these three stages. One of the reasons of huge adoption of LLVM is its powerful optimizer or middle end stage. There exist various opportunities to optimize given Intermediate Representation (IR) code generated by front end. Before applying any optimization significant efforts are dedicated for detailed analysis of given IR in order to extract static information hidden in source code.

Up till now, there exists a standard mechanism to analyze IR code by using analysis passes written in LLVM itself. Each time some information is required from IR, a pass is written or reused in LLVM core syntax. This approach is proved to be complex for novice programmers who are unfamiliar with the LLVM coding style having hard core C++ concepts. This way a significant amount of time is spent on learning LLVM programming than doing the required compile time code analysis. In this regard, an easier mechanism is needed to perform static code analysis in LLVM.

In this work, LLVM miner is presented to simplify static IR level analysis in LLVM compiler tool. LLVM miner performs text analytics in order to extract related information from given IR code. The IR generated from front end is passed through the proposed miner where static hidden features are extracted easily. The proposed approach has been tested using set of 5 mixed benchmark codes namely bfs, connected components, grep, histogram, and kmeans. The experiments are conducted using R script for determining the instruction frequency and application trend. Instruction frequency shows count of each instruction in given IR code. It is represented by means of bar graph and word cloud. Then application trend is obtained by clustering individual instructions in certain categories such as branch, compute, function calls, IO read write, memory consumption, and memory read write operations of each instruction. Application trend shows proportion of different classes of operations in a given code using bar graphs. It enables us to know whether application is compute bound, or memory bound or I/O bound etc by using static code level features. The analysis of LLVM IR using text mining techniques appears to be a promising direction towards studying significant features hidden in source code. The text analytics of given IR is expected to be an easier and less costly solution both in terms of time and efforts, as compared to the conventional LLVM analysis passes.

Function Merging by Sequence Alignment [ Poster ]
Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)

Resource-constrained devices for embedded systems are becoming increasingly important. In such systems, memory is highly restrictive, making code size in most cases even more important than performance. Compared to more traditional platforms, memory is a larger part of the cost and code occupies much of it. Despite that, compilers make little effort to reduce code size. One key technique attempts to merge the bodies of similar functions. However, production compilers only apply this optimization to identical functions, while research compilers improve on that by merging the few functions with identical control-flow graphs and signatures. Overall, existing solutions are insufficient and we end up having to either increase cost by adding more memory or remove functionality from programs.

We introduce a novel technique that can merge arbitrary functions through sequence alignment, a bioinformatics algo- rithm for identifying regions of similarity between sequences. We combine this technique with an intelligent exploration mechanism to direct the search towards the most promising function pairs. Our approach is more than 2.4x better than the state-of-the-art, reducing code size by up to 25%, with an overall average of 6%, while introducing an average compilation-time overhead of only 15%. When aided by profiling information, this optimization can be deployed without any significant impact on the performance of the generated code.

Compilation and optimization with security annotations [ Poster ]
Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM), Albert Cohen (Google)

Program analysis and program transformation systems need to express additional program properties, to specify test and verification goals, and to enhance their effectiveness. Such annotations are typically inserted to the representation on which the tool operates; e.g., source level for establishing compliance with a specification, and binary level for the validation of secure code. While several annotation languages have been proposed, these typically target the expression of functional properties. For the purpose of implementing secure code, there has been little effort to support non-functional properties about side-channels or faults. Furthermore, analyses and transformations making use of such annotations may target different representations encountered along the compilation flow.

We extend an annotation language to express a wider range of functional and non-functional properties, enabling security-oriented analyses and influencing the application of code transformations along the compilation flow. We translate this language to the different compiler representations from abstract syntax down to binary code. We explore these concepts through the design and implementation of an optimizing, annotation-aware compiler, capturing annotations from the program source, propagating and emitting them in the binary, so that binary-level analysis tools can use them.

Leveraging Polyhedral Compilation in Chapel Compiler [ Poster ]
Sahil Yerawar (IIT Hyderabad), Siddharth Bhat (IIIT Hyderabad), Michael Ferguson (Cray Inc.), Philip Pfaffe (Karlsruhe Institute of Technology), Ramakrishna Upadrasta (IIT Hyderabad)

Chapel is an emerging parallel programming language developed with the aim of providing better performance in High-Performance Computing as well as accessibility to the newcomer programmers. It relies on LLVM as one of its backends. This talk shows how the polyhedral compilation techniques available in Polly are utilized by the Chapel Compiler. We will share our experience of using Polly's Loop Optimizer in a new setting with Polly & LLVM Developers.In particular, the talk will discuss how the Chapel compiler can benefit from the optimizations available in Polly including GPGPU code generation.

LLVM on AVR - textual IR as a powerful tool for making "impossible" compilers [ Poster ]
Carl Peto (Swift for Arduino/Petosoft)

To be demonstrated on stage and available to use and test, I have built a prototype compiler for the a subset of the Swift language onto the Arduino UNO platform, which is a radically different use for the language. Despite the Swift compiler and front end having limited support for such a different back end.

Key to the success was separation of the first part of the compilation into textual LLVM IR (using a standard toolchain), followed by compilation from LLVM IR files into machine code using a custom built llc. This approach improves debugging, especially of deployed product, and separation of concerns. Ultimately it could be used as a template for other "impossible" compilers such as Swift to WebAssembly, Go to OpenGL shaders and more.

Vectorizing Add/Sub Expressions with SLP [ Poster ]
Vasileios Porpodas (Intel Corporation, USA), Rodrigo C. O. Rocha (University of Edinburgh, UK), Evgueni Brevnov (Intel Corporation, USA), Luís F. W. Góes (PUC Minas, Brazil), Timothy Mattson (Intel Corporation, USA)

The SLP Vectorizer is LLVM's second vectorizer (after the Loop Vectorizer). It performs auto-vectorization of straight-line code. It works by first exploring the scalar code for vectorizable patterns (groups), and then by replacing each group with its vectorized form.

This talk presents the existing design of the SLP vectorizer and shows how it fails to vectorize simple IR inputs with Add/Sub (or Mul/Div) expression trees. We propose specific improvements to the current design that will let us effectively handle such code. We named this design SuperNode SLP (SN-SLP) because it extends the SLP graph to include new "fat" nodes that include multiple instructions. This talk also presents our detailed plan for upstreaming the bulk of this work in a sequence of patches.

Adding support for C++ contracts to Clang [ Poster ]
Javier López-Gómez (University Carlos III of Madrid), J. Daniel García (University Carlos III of Madrid)

A language supporting contract-checking allows to detect programming errors. Also, making this information available to the compiler may cause it to perform additional optimizations.

This paper presents our implementation of the P0542R5 technical specification (now part of the C++20 working draft).

Optimizing Nondeterminacy: Exploiting Race Conditions in Parallel Programs [ Poster ]
William S. Moses (MIT CSAIL)

As computation moves towards parallel programming models, writing efficient parallel programs becomes paramount. As a result, there have been several efforts (Tapir, HPVM, among others) to augment serial compilers such as LLVM to have a first-class representation of parallelism. Such representations theoretically permit the compiler to both analyze and optimize parallel programs.

A major difference between serial and parallel programs is that in many parallel runtimes, one cannot make any assumptions about the ordering of various logical tasks. This nondeterminism creates an opportunity for the compiler. Since any ordering is valid, the compiler can also reorder tasks if it believes it beneficial.

This talk will discuss how the compiler can take advantage of this nondeterminacy through a number of example optimizations, taking a look at their theoretical implications as well as how they perform when implemented atop the Tapir extension to LLVM.