Site Map:
Download!
Search this Site
Useful Links
Release Emails
Maintained by the
llvm-admin team
|
2019 European LLVM Developers Meeting
About
The meeting serves as a forum for LLVM, Clang, LLDB and other LLVM project
developers and users to get acquainted, learn how LLVM is used, and exchange
ideas about LLVM and its (potential) applications.
The conference includes:
Keynote
MLIR: Multi-Level Intermediate Representation for Compiler Infrastructure
[ Video ]
[ Slides ]
Tatiana Shpeisman (Google), Chris Lattner (Google)
This talk will give an overview of Multi-Level Intermediate Representation -
a new intermediate representation designed to provide a unified, flexible and
extensible intermediate representation that is language-agnostic and can be
used as a base compiler infrastructure. MLIR shares similarities with
traditional CFG-based three-address SSA representations (including LLVM IR or
SIL), but it also introduces notions from the polyhedral domain as first class
concepts. The notion of dialects is a core concept of MLIR extensibility,
allowing multiple levels in a single representation. MLIR supports the
continuous lowering from dataflow graphs to high-performance target specific
code through partial specialization between dialects. We will illustrate in
this talk how MLIR can be used to build an optimizing compiler infrastructure
for deep learning applications.
MLIR supports multiple front- and back-ends and uses LLVM IR as one of its
primary code generation targets. MLIR also relies heavily on design principles
and practices developed by the LLVM community. For example, it depends on LLVM
APIs and programming idioms to minimize IR size and maximize optimization
efficiency. MLIR uses LLVM testing utilities such as FileCheck to ensure robust
functionality at every level of the compilation stack, TableGen to express IR
invariants, and it leverages LLVM infrastructure such as dominance analysis to
avoid implementing all the necessary compiler functionalities from scratch. At
the same time, it is a brand new IR, both more restrictive and more general
than LLVM IR in different aspects of its design. We believe that the LLVM
community will find in MLIR a useful tool for developing new compilers,
especially in machine learning and other high-performance domains.
|
Technical talks
Switching a Linux distribution's main toolchains to LLVM/Clang
[ Video ]
[ Slides ]
Bernhard Rosenkränzer (Linaro, OpenMandriva, LinDev)
OpenMandriva is the first general-purpose Linux distribution that has
switched its primary toolchain to Clang -- this talk will give an overview of
what we did, what problems we've faced, and where we're still having
problems (usually worked around by using gcc for some packages).
|
Just compile it: High-level programming on the GPU with Julia
[ Video ]
[ Slides ]
Tim Besard (Ghent University)
High-level programming languages often rely on interpretation or compilation
schemes that are ill-suited for hardware accelerators like GPUs: These devices
typically require statically compiled, straight-line code in order to reach
acceptable performance. The high-level Julia programming language takes a
different approach, by combining careful language design with an LLVM-based JIT
compiler to generate high-quality machine code.
In this talk, I will show how we've used that capability to build a GPU
back-end for the Julia language, and explain the underlying techniques that
make it happen, including a high-level Julia wrapper for the LLVM libraries,
and interfaces to share functionality with the existing Julia code generator. I
will also demonstrate some of the powerful abstractions that we have built on
top of this infrastructure.
|
The Future of AST Matcher-based Refactoring
[ Video ]
[ Slides ]
Stephen Kelly
In the last few years, Clang has opened up new possibilities in C++ tooling
for the masses. Tools such as clang-tidy and clazy offer ready-to-use
source-to-source transformations. Available transformations can be used to
modernize (use newer C++ language features), improve readability (remove
redundant constructs), or improve adherence to the C++ Core Guidelines.
However, when special needs arise, maintainers of large codebases need to
learn some of the Clang APIs to create their own porting aids. The Clang APIs
necessarily form a more-exact picture of the structure of C++ code than most
developers keep in their heads, and bridging the conceptual gap can be a
daunting task.
This talk will show tools and features which make this task easier for
developers, ranging from
- Improvements to the clang-query interpreter
- Improvements to the AST Matcher API
- Information essential to creating clang-tidy-checks
- Debugging and profiling of AST Matchers
- Advanced tooling
These features are in various stages along the way to being upstreamed to
Clang. They enable new possibilities for large-scale refactoring in a
reasonable timeframe by solving problems of API discovery, guiding users in
creating working refactorings.
|
A compiler approach to Cyber-Security
[ Video ]
[ Slides ]
François de Ferrière (STMicroelectronics)
STMicroelectronics is developing LLVM-based compilation tools for its
proprietary processors and also for the ARM cores. Applications, among which an
increasing number of IOTs developments, require more and more security
implemented either in hardware or software, or both. To implement complex and
reliable software countermeasures that can be deployed in a timely manner, we
are adding specific cybersecurity code-generation features in our production
LLVM compiler, that we present in this talk.
We give implementation details on how we worked into Clang and LLVM to
implement these techniques and we explain how they contribute to reinforce the
software protection. We also detail how we can restrict these transformations
to specific safety-critical regions of a program to meet the industrial
constraints on performance and code size of our applications.
|
Compiler Optimizations for (OpenMP) Target Offloading to GPUs
[ Video ]
[ Slides ]
Johannes Doerfert (Argonne National Laboratory), Hal Finkel (Argonne National Laboratory)
The support of OpenMP target offloading in Clang is steadily increasing.
However, when it comes to the optimization of such codes, LLVM is still doing a
horrible job. Early separation into different modules and state machine
generation are only two reasons why the middle and backend have a hard time
generating efficient code.
In this talk, we want to focus on code offloading to GPUs (through OpenMP),
an increasingly important part of modern programming. We will first highlight
different reasons for missing optimizations and poor code quality before we
introduce new practical solutions. While our implementation is still
experimental, early results suggest that there is enormous optimization
potential in both manually written, and automatically generated, target
offloading code.
In addition to the talk, we will, closer to the conference date, initiate a
discussion on the LLVM mailing list and publish our implementation.
|
Handling massive concurrency: Development of a programming model for GPU and CPU
[ Video ]
[ Slides ]
Matthias Liedtke (SAP)
For efficient parallel execution it is necessary to write massively
concurrent algorithms and to optimize memory access. In this session we show
our approach of a programming model that is able to execute the same concurrent
algorithm efficiently on GPUs and CPUs: Similar to OpenMP it allows the
programmer to describe concurrency and memory access declaratively but hides
complexity like memory transfers between the CPU and the GPU. In comparison to
OpenMP our model provides a higher level of expressiveness which enables us to
reach a performance comparable to OpenCL/CUDA.
|
Automated GPU Kernel Fusion with XLA
[ Video ]
[ Slides ]
Thomas Joerg (Google)
XLA (Accelerated Linear Algebra) is an optimizing compiler for linear
algebra that accelerates TensorFlow computations. The XLA compiler lowers to
LLVM IR and relies on LLVM for low-level optimization and code generation. XLA
achieves significant performance gains on TensorFlow models. We observed
speedups of up to 3x on internal models. The popular image classification model
ResNet-50 trains 1.6x faster.
A key optimization performed by XLA is automated GPU kernel fusion. The idea
is to combine multiple linear algebra operators into a single GPU kernel to
reduce memory bandwidth requirements and kernel launch overhead. TensorFlow
with XLA demonstrated competitive performance on MLPerf benchmarks (mlperf.org)
compared to ML frameworks that rely on manually fused, hand-tuned GPU
kernels.
|
The Helium Haskell compiler and its new LLVM backend
[ Video ]
[ Slides ]
Ivo Gabe de Wolff (University of Utrecht)
Helium, developed at the University of Utrecht, is a compiler for the
functional, lazy language Haskell. It is used for research on error diagnosis
and teaching. In this talk we will however focus on the new LLVM backend and
the compilation of high level features like lambdas, laziness (call-by-need
semantics), currying (partial application). Furthermore we discuss some high
level optimizations which cannot be done at LLVM-level.
|
Testing and Qualification of Optimizing Compilers for Functional Safety
[ Video ]
[ Slides ]
José Luis March Cabrelles (Solid Sands)
In the development of embedded applications, the compiler plays a crucial
role in the translation from source to machine code. If the application is
safety-critical, functional safety standards such as ISO 26262 for the
automotive industry require that the user of the compiler develops confidence
in the compilers correct operation. In this presentation we will discuss the
requirements of ISO 26262 on tools such as LLVM compilers and how they can be
met with a testing procedure that works well with the V-Model of
engineering.
As the name implies, functional safety standards deal with specified
functionality of components. But what about the optimizations that a LLVM-based
compiler applies to the program, sometimes even silently? Optimizations are not
even mentioned in the language standards for C and C++ - they are
"non-functional" behavior of the compiler. As we will demonstrate,
ignoring optimizations will lead to significant holes in the compiler's
test coverage. We will show how we have developed a technique that achieves
good results with optimization testing and have some errors in Intel's
well-regarded Clang-based compiler to show. To show the completeness of our
method for the requirements of functional safety, we have analyzed how the
tests match with the various LLVM IR-level transformation passes that they go
through.
|
Improving Debug Information in LLVM to Recover Optimized-out Function Parameters
[ Video ]
[ Slides ]
Nikola Prica (RT-RK), Djordje Todorovic (RT-RK), Ananthakrishna Sowda (CISCO), Ivan Baev (CISCO)
Software release products are compiled with optimization level -O2 and
higher. Such products might produce a core-file that is used for investigating
cause of problem that produced it. First thing from which we start debug
analysis is call-trace from a crash. In such traces most of the parameters are
reported as optimized out due to variety of reasons. Some of parameters are
really optimized out, but some of their locations could be calculated. Expert
software developers are able to find what values parameters had at function
entry point by using the technique that requires searching those values in
disassembly of caller frame at place of that particular function call.
Automation of such technique is described by DWARF 5 specifications and it is
already implemented in GCC and GDB since 2011. The goal of this paper is to
present ideas, implementation and problems that we encountered while we were
working on this feature in LLVM. We will also show the improvement by
presenting recovered parameters in some of the call-traces. This feature should
improve debugging of optimized code built with LLVM by recovering optimized-out
function parameters.
|
LLVM IR in GraalVM: Multi-Level, Polyglot Debugging with Sulong
[ Video ]
[ Slides ]
Jacob Kreindl (Johannes Kepler University Linz)
Sulong is an execution engine for LLVM bitcode that has support for
debugging programs at the level of source code as well as textual LLVM IR. It
is part of GraalVM, a polyglot virtual machine that can also execute programs
written in multiple dynamic programming languages such as Ruby and Python.
Sulong supports GraalVM's language-agnostic tooling interface to provide a
rich debugging experience to developers. This includes source-level debugging
of native extensions compiled to LLVM bitcode and the dynamic language programs
that use them, together in the same debugger session and front-end. Sulong also
enables developers to debug programs at the level of LLVM IR, including
stepping through the textual IR and inspecting the symbols it contains.
In this talk we will describe different ways GraalVM enables users to debug
programs that were compiled to LLVM bitcode. We will introduce the general
features of GraalVM-based debuggers by demonstrating source-level debugging of
a standalone C/C++ application. Building on this we will showcase GraalVM's
ability to provide a truly integrated debugging experience for native
extensions of dynamic language programs to users. We will further demonstrate
Sulong's support for debugging programs at the LLVM-IR level.
|
LLDB Reproducers
[ Video ]
[ Slides ]
Jonas Devlieghere (Apple)
The debugger, like the compiler, is a complex piece of software where bugs
are inevitable. When a bug is reported, one of the first steps in its life
cycle is trying reproduce the problem. Given the number of moving parts in the
debugger, this can be quite challenging. Especially for more sophisticated
problems, a small changes in the environment, the binary, its dependencies, or
debug information might hide the problem. Getting this right puts a heavy
burden on both the reporter and the developer.
Reproducers are a way to automate this process. They contains the necessary
information for a bug to occur again with minimal interaction from the
developer. For clang a reproducer consists of a script with the compiler
invocation and a pre-processed source file. Doing the same thing for the
debugger is much more complicated.
This talk discusses what was needed to have working reproducers for LLDB. It
goes into detail about what information was needed, how it was captured and
finally how the debugger uses it to reproduce an issue. The high level design
is addressed as well as some of the challenges, such as dealing with low-level
details, remote debugging, and the SB API. It concludes with an overview of
what is possible and what isn't.
|
Sulong: An experience report of using the "other end" of LLVM in GraalVM.
[ Video ]
[ Slides ]
Roland Schatz (Oracle Labs), Josef Eisl (Oracle Labs)
The most common use-case for LLVM is to re-use its back-end to implement a
compiler for new programming languages. In project Sulong, we are going a
different route: We use LLVM frontends, and consume the resulting bitcode.
Sulong is the LLVM bitcode execution engine of GraalVM, a ployglot virtual
machine that executes JavaScript, Python, Ruby, R, and others. The goal of
Sulong is to bring C, C++, Fortran, and other languages that compile to LLVM
bitcode into the system, and allow low-cost interoperability across language
borders. The latter is crucial for efficiently supporting existing native
interfaces of dynamic languages.
In this talk, we want to share our experience with implementing an engine
for executing LLVM IR in GraalVM. We will discuss how Sulong executes LLVM
bitcode and why this allows high-performance interoperability between
languages. We will show the challenges of implementing existing native
interfaces in new runtime environments, and how we use the different parts of
the LLVM project for solving them. We want to focus on situations we found
challenging and where we think we can contribute to the project.
|
SYCL compiler: zero-cost abstraction and type safety for heterogeneous computing
[ Video ]
[ Slides ]
Andrew Savonichev (Intel)
SYCL is an abstraction layer for C++, that allows a developer to write
heterogeneous programs in a "single source" model, where host and
device code are written in the same file. Utilizing modern C++ features, SYCL
provides a way to develop type-safe and efficient programs for various
accelerator devices.
Although SYCL is designed as "extension-free" standard C++ API,
there is a need to have some compiler extensions to enable C++ code execution
on accelerators. SYCL compiler is responsible for "extracting" device
part of code and compiling it to SPIR-V format or device native binary. In
addition to that, compiler should also emit auxiliary information, which is
used by SYCL runtime to run a device code via OpenCL API.
This talk will go over technical details of the SYCL compiler, and the
changes we need to make in order to bring full support for SYCL into upstream
LLVM and Clang as described in the RFC:
https://lists.llvm.org/pipermail/cfe-dev/2019-January/060811.html
|
Handling all Facebook requests with JITed C++ code
[ Video ]
[ Slides ]
Huapeng Zhou (Facebook), Yuhan Guo (Facebook)
Facebook needs an efficient scripting framework to enable fast iteration of
HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and
code deployment ecosystem was created to compile/link/execute C++ script at
run-time, using Clang and LLVM ORC APIs. The framework allows developers to
write business logic and unit test in C++ script, as well as debug using GDB.
Profiling using perf is also supported for PGO purpose. This new framework
outperformed another previously used scripting language by up to 4X, measured
in execution time.
In order to power the C++ script in ABI compatible way, a PCH (pre-compiled
header) is built statically to provide declarations and definitions of
necessary dependent types and methods. Clang APIs are then used at run-time to
transform source code to LLVM IR, which are later passed through LLVM ORC
layers for linking/optimizing. Above Clang/LLVM toolchains are statically
linked into main binary to ensure compatibility between PCH and C++ scripts. As
a result, scripts could be deployed in real time without any main binary
change.
|
clang-scan-deps: Fast dependency scanning for explicit modules
[ Video ]
[ Slides ]
Alex Lorenz (Apple), Michael Spencer (Apple)
The dependency information that's provided by Clang can be used to
implement a pre-scanning phase for a build system that uses Clang modules in an
explicit manner, by discovering the required modules before compiling. However,
the traditional approach of preprocessing all sources to find the required
modular dependencies is typically not fast enough for a pre-scanning phase that
must run for every build. This talk introduces clang-scan-deps, an optimized
dependency discovery service that can provide speed up of up to 10X over the
regular preprocessor-based scanning. This talk goes into details of how this
service is implemented and how it can be leveraged by the build system to
implement a fast pre-scanning phase for explicit Clang modules.
|
Clang tools for implementing cryptographic protocols like OTRv4
[ Video ]
[ Slides ]
Sofia Celi (Centro de Autonomia Digital)
OTRv4 is the newest version of the Off-The-Record protocol. It is a protocol
where the newest academic research intertwines with real-world implementations:
it provides end to end encryption, and offline and online deniability for
interactive and non-interactive applications. As a real world protocol, it
needs to provide an implementation that works for real world users. For this,
the OTRv4 team decided to implement it in C. But as we know, working in C can
be challenging due to several factors.
In order to make OTRv4s implementation much safer and usable, we decided to
use several clang tools, such as clang format, clang tidy and address
sanitizers. By using these tools, we uncovered bugs, issues and problems. In
this talk, we aim to highlight the most interesting bugs we uncovered by using
these tools, by comparing the results of using static analysis and fast memory
error detector. We also aim to highlight the importance of using a specific
code formatting style, as it makes an implementation much clearer and easier to
find bugs. We plan to high point the importance of using these tools on real
world implementations that are going to be used by millions of users and that
aim to provide the best security properties available.
|
Implementing the C++ Core Guidelines' Lifetime Safety Profile in Clang
[ Video ]
[ Slides ]
Gabor Horvath (Eotvos Lorand University), Matthias Gehre (Silexica GmbH),
Herb Sutter (Microsoft)
This is an experience report of the Clang-based implementation of Herb
Sutter's Lifetime safety profile for the C++ Core Guidelines, available
online at cppx.godbolt.org.
We will cover the kinds of diagnoses supported by the checker and how they
are implemented using Clang's control flow graph. We will discuss what are
the main problems of the current prototype and what are we going to do to fix
those. We also plan to discuss the upstreaming process. Some parts of the
analysis might end up improving existing clang warnings some of which are on by
default. We will also summarize early experience with performance against
real-world code bases, including compile time performance for LLVM sources with
the checker.
|
Changes to the C++ standard library for C++20
[ Video ]
[ Slides ]
Marshall Clow (CppAlliance)
The next version of the C++ standard will almost certainly be approved next
year, and be called C++20. There will be many new features in the standard
library in C++20. Things like ranges, concepts, calendar support, and many
others. In this talk, I'll give an overview of the new features, and an
update on the status of their implementation in libc++.
|
Adventures with RISC-V Vectors and LLVM
[ Video ]
[ Slides ]
Robin Kruppe (TU Darmstadt), Roger Espasa (Esperanto Technologies)
RISC-V is a free and open instruction set architecture (ISA) with an
established LLVM backend and numerous open-source and proprietary hardware
implementations. The work-in-progress vector extension adds standardized vector
processing, taking lessons both from traditional long-vector machines and from
packed-SIMD approaches that dominated industrial designs in the past few
decades. The resulting architecture aims to excel at various scales, from small
embedded cores to large HPC accelerators and everything in between.
In this talk you will learn about the RISC-V vector ISA as well as LLVM
support for it: vectorizing loops without needing scalar remainder handling,
vectors whose length is not known at compile time, a vector unit that can be
dynamically reconfigured for increased efficiency, and more.
|
A Tale of Two ABIs: ILP32 on AArch64
[ Video ]
[ Slides ]
Tim Northover (Apple)
We faced the challenge of seamlessly running 32b application binaries on a
new 64b S4 chip, which has no hardware support to run 32b binaries. Translating
the ARM binaries directly to the new hardware would be hard, but when an
application is available in bitcode format, the task is much more feasible.
This talk opens the curtain for an inside look into the decisions and steps
taken to translate 32b bitcode for the new 64b hardware. It will discuss the
many design, implementation and verification challenges of introducing a new
ABI, arm64_32, which guarantees that the binaries for the new S4 chip are
compatible to the original 32b applications.
|
LLVM Numerics Improvements
[ Video ]
[ Slides ]
Michael Berg (Apple), Steve Canon (Apple)
Some LLVM based compilers currently provide two modes of floating point code
generation. The first mode, called fast-math, is where performance is the
primary consideration over numerical precision and accuracy. This mode does not
strictly follow the IEEE-754 standard, but has proven useful for applications
that do not require this level of precision. The second mode, called
precise-math, is where the compiler carefully follows the subset of behavior
defined in the IEEE standard that is applicable to conforming hardware targets.
This mode is primarily used for compute workloads and wherever fast-math
precision is inadequate, however it runs much slower as it requires a larger
number of instructions in general. In practice neither of these modes is
particularly desirable. The fast-math mode ignores a significant portion of the
standard as pertains to handling undefined values described as Not a Number
(NaNs) and Infinities (INFs), resulting in difficulties for certain workloads
when the hardware target computes these values correctly and performance
remains critical.
Until recently these two models were mutually exclusive, however with the
addition of IR flags they need not be. For instance, the FastMath metadata
module flag drives behavior deemed numerically unsafe when it is enabled, by
indiscriminately enabling optimizations. With IR flags this behavior can be
enabled with much finer granularity, allowing various code forms to be fast or
precise together in one module. We call this mixed mode compilation. IR flags
can be used individually or paired to produce desired floating point behavior
under specified constraints with fine granularity of control. Optimization
passes have been modified under this new kind of control to produce this
behavior. This talk will describe the recent numerics work and discuss the
implications for front-ends and backends built with LLVM.
|
DOE Proxy Apps: Compiler Performance Analysis and Optimistic Annotation Exploration
[ Video ]
[ Slides ]
Brian Homerding (Argonne National Laboratory), Johannes Doerfert (Argonne National Laboratory)
The US Department of Energy proxy applications are simplified models of the
key components of various scientific computing workloads. These proxy
applications are useful for research and exploration in many areas, including
software technology. We have conducted performance analysis of these proxy
application using Clang, GCC and some vendor compilers. These results have
identified and motivated our work on modelling the memory access of math
functions in Clang. We will discuss our design and our work to expose this
ability to encode function information to the developer. Additionally in this
area, I will then discuss my collaboration on a development tool designed to
explore both the potential performance gap lost from knowledge the developer
could encode (but did not) and the extent to which LLVM is able to profitably
make use of this information.
|
Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline
[ Video ]
[ Slides ]
Kit Barton (IBM), Johannes Doerfert (Argonne National Lab), Hal Finkel
(Argonne National Lab), Michael Kruse (Argonne National Lab)
Loop fusion and loop distribution are two key optimizations that typically
are featured prominently in a loop optimization pipeline. They are used both to
improve performance of applications and also to enable other loop
optimizations. For example, loop fusion can improve the performance of
applications through increasing temporal data cache locality. It can also
increase the scope of other optimizations by creating larger loop nests for
intra-loop nest optimizations to work on. Similarly, loop distribution is often
used to improve performance directly by distributing loops that exceed hardware
resources (e.g., register pressure). It is also frequently used to distribute
loops containing loop-carried dependencies into two loops: one with loop
carried dependencies and the second with no loop carried dependencies; this
enables other optimizations (e.g., vectorization) on the independent loop.
Furthermore, these two optimizations can work nicely together, as they have the
ability to "undo" transformations done by the other. Thus, the
implementation of both of these optimizations must be robust as they can both
play an important role in a loop optimization pipeline.
This talk will be a follow-on to "Revisiting Loop Fusion, and its place
in the loop transformation framework", presented at the 2018 LLVM
Developers' Meeting. The patch to implement basic loop fusion described in
the talk is currently undergoing review on phabricator (https://reviews.llvm.org/D55851). We
have prototypes to make loop fusion more aggressive by moving code from between
two loops (making them adjacent) that will be posted for review once the basic
loop fusion patch is accepted. We also have plans to peel loops to (to make
their bounds conform), and improve the dependence analysis between the two loop
bodies. This talk will also include findings from our current analysis of the
loop distribution pass in LLVM. It will provide a summary of the strengths and
limitations of loop distribution, and summarize any improvements that are made
prior to EuroLLVM 2019. Finally, the presentation will discuss how loop fusion
and loop distribution can fit into the existing loop optimization pipeline in
LLVM.
|
Tutorials
Tutorial: Building a Compiler with MLIR
[ Video ]
[ Slides ]
Amini Mehdi (Google), Nicolas Vasilache (Google), Alex Zinenko (Google)
This tutorial will complement the technical talk about MLIR. We will
implement a custom DSL for numerical processing and walk the audience
step-by-step through the use of MLIR to support the lowering and the
optimization of such DSL, and target LLVM for lower level optimizations and
code generation or JIT execution.
|
Building an LLVM-based tool: lessons learned
[ Video ]
[ Slides ]
Alex Denisov
In this talk, I want to share my experience in building an LLVM-based tool.
For the last three years, I work on a tool for mutation testing. Currently,
it works on Linux, macOS, and FreeBSD and the source code is compatible with
any LLVM version between 3.9 and 7.0. Anything that can run in parallel -
runs in parallel. I will cover the following topics:
- Build system: on supporting multiple LLVM versions and building against
sources or precompiled binary.
- Parallelization: which parts of the tool can be parallelized and which
should run in one thread
- Testing: how to build robust test suite for the tool
- Bitcode: on several ways to convert a program into LLVM bitcode, that can
be used by the tool.
|
LLVM IR Tutorial - Phis, GEPs and other things, oh my!
[ Video ]
[ Slides ]
Vince Bridgers (Intel Corporation), Felipe de Azevedo Piovezan (Intel Corporation)
LLVM intermediate representation (IR) is the abstract description machine
operations used to translate LLVM front ends to a form that's executable by
a target machine. Optimizations and transformations are performed on the IR by
the LLVM library to create executable images. This tutorial will introduce the
IR syntax, describe basic tools for manipulating IR formats, and describe
mappings of IR from various common source code control structures. Tutorial
materials with specific examples will be made available for the tutorial
presentation, and for offline review.
|
Student Research Competition
Safely Optimizing Casts between Pointers and Integers
[ Video ]
[ Slides ]
Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul
National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu
(University of Utah, USA), John Regehr (University of Utah, USA), Nuno P.
Lopes (Microsoft Research, UK)
In this talk, a list of optimizations that soundly removes casts between
pointers and integers will be presented. In LLVM, a pointer is more than just
an integer: LLVM allows a pointer to track its underlying object, and the rule
to find it is defined as based-on relation. This allows LLVM to aggressively
optimize load/stores, but makes the meaning of pointer-integer casts
complicated. This causes conflict between existing optimizations, causing
long-standing miscompilation bugs like 34548.
To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and
using a safe workaround to remove them. This optimization is important because
it's removing a significant portion of such cast pairs. We'll show that
even if the optimization is disabled, majority of casts can be removed by
carefully adding new \& modifying existing optimizations. After the
updates, the performance is still comparable to the original LLVM.
|
An alternative OpenMP Backend for Polly
[ Video ]
[ Slides ]
Michael Halkenhäuser (TU Darmstadt)
LLVM's polyhedral infrastructure framework Polly may automatically
exploit thread-level parallelism through OpenMP. Currently, the user can only
influence the number of utilized threads, while other OpenMP parameters such as
the scheduling type and chunk size are set to fixed values. This in turn,
limits a user's ability to adapt the optimization process for a given
problem.
In this work, we present an alternative OpenMP backend for Polly, which
provides additional customization options to the user and is based on the LLVM
OpenMP runtime. We evaluate our new backend and the influence of the new
customization options on performance and compare to Polly's existing OpenMP
backend.
|
Implementing SPMD control flow in LLVM using reconverging CFGs
[ Video ]
[ Slides ]
Fabian Wahlster (Technische Universität München), Nicolai
Hähnle (Advanced Micro Devices)
Compiling programs for an SPMD execution model, e.g. for GPUs or for whole
program vectorization on CPUs, requires a transform from the thread-level input
program into a vectorized wave-level program in which the values of the
original threads are stored in corresponding lanes of vectors. The main
challenge of this transform is handling divergent control flow, where threads
take different paths through the original CFG. A common approach, which is
currently taken by the AMDGPU backend in LLVM, is to first structurize the
program as a simplification for subsequent steps.
However, structurization is overly conservative. It can be avoided when
control flow is uniform, i.e. not divergent. Even where control flow is
divergent, structurization is often unnecessary. Moreover, LLVM's
StructurizeCFG pass relies on region analysis, which limits the extent to which
it can be evolved.
We propose a new approach to SPMD vectorization based on saying that a CFG
is reconverging if for every divergent branch, one of the successors is a
post-dominator. This property is weaker than structuredness, and we show that
it can be achieved while preserving uniform branches and inserting fewer new
basic blocks than structurization requires. It is also sufficient for code
generation, because it guarantees that threads which "leave" a wave
at divergent branches will be able to rejoin it later.
|
Function Merging by Sequence Alignment
[ Video ]
[ Slides ]
Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of
Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of
Edinburgh), Hugh Leather (University of Edinburgh)
Resource-constrained devices for embedded systems are becoming increasingly
important. In such systems, memory is highly restrictive, making code size in
most cases even more important than performance. Compared to more traditional
platforms, memory is a larger part of the cost and code occupies much of it.
Despite that, compilers make little effort to reduce code size. One key
technique attempts to merge the bodies of similar functions. However,
production compilers only apply this optimization to identical functions, while
research compilers improve on that by merging the few functions with identical
control-flow graphs and signatures. Overall, existing solutions are
insufficient and we end up having to either increase cost by adding more memory
or remove functionality from programs.
We introduce a novel technique that can merge arbitrary functions through
sequence alignment, a bioinformatics algo- rithm for identifying regions of
similarity between sequences. We combine this technique with an intelligent
exploration mechanism to direct the search towards the most promising function
pairs. Our approach is more than 2.4x better than the state-of-the-art,
reducing code size by up to 25%, with an overall average of 6%, while
introducing an average compilation-time overhead of only 15%. When aided by
profiling information, this optimization can be deployed without any
significant impact on the performance of the generated code.
|
Compilation and optimization with security annotations
[ Video ]
[ Slides ]
Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM),
Albert Cohen (Google)
Program analysis and program transformation systems need to express
additional program properties, to specify test and verification goals, and to
enhance their effectiveness. Such annotations are typically inserted to the
representation on which the tool operates; e.g., source level for establishing
compliance with a specification, and binary level for the validation of secure
code. While several annotation languages have been proposed, these typically
target the expression of functional properties. For the purpose of implementing
secure code, there has been little effort to support non-functional properties
about side-channels or faults. Furthermore, analyses and transformations making
use of such annotations may target different representations encountered along
the compilation flow.
We extend an annotation language to express a wider range of functional and
non-functional properties, enabling security-oriented analyses and influencing
the application of code transformations along the compilation flow. We
translate this language to the different compiler representations from abstract
syntax down to binary code. We explore these concepts through the design and
implementation of an optimizing, annotation-aware compiler, capturing
annotations from the program source, propagating and emitting them in the
binary, so that binary-level analysis tools can use them.
|
Adding support for C++ contracts to Clang
[ Video ]
[ Slides ]
Javier López-Gómez (University Carlos III of Madrid), J.
Daniel García (University Carlos III of Madrid)
A language supporting contract-checking allows to detect programming errors.
Also, making this information available to the compiler may cause it to perform
additional optimizations.
This paper presents our implementation of the P0542R5 technical
specification (now part of the C++20 working draft).
|
Lightning talks
LLVM IR Timing Predictions: Fast Explorations via lli
[ Video ]
[ Slides ]
Alessandro Cornaglia (FZI - Research Center for Information Technology)
Many applications, especially in the embedded domain, have to be executed on
different hardware target platforms. For these applications, it is necessary to
evaluate both functional and non-functional properties, such as software
execution time, in all their hardware/software combinations. Especially in the
context of software product line engineering, it is not feasible to test all
variants one-by-one. The intermediate representation of the source code offers
an attractive opportunity for a single-run analysis, because it covers the
software variability, while at the same time omitting the hardware-dependent
optimizations.
We present an extension for the LLVM IR execution engines, which are part of
the LLVM lli tool. The extension evaluates on the fly functional and
non-functional properties for all the hardware variants during one lli
execution. In particular, our extension is designed for the evaluation of the
execution time of a program for multiple target platforms considering different
software variants. Both the interpreter and JIT execution modes are supported.
Prospectively, our approach will be enriched with multiple analysis techniques.
Thanks to our approach, it is now possible to evaluate software variants with
regard to multiple hardware platforms in a single lli execution run.
|
Simple Outer-Loop-Vectorization == LoopUnroll-And-Jam + SLP
[ Video ]
[ Slides ]
Dibyendu Das (AMD)
In this brief talk I will show how Outer-Loop-Vectorization (OLV), which is
of great interest to the LLVM community, can be visualized as a combination of
two transformations applied to a loop-nest of interest. These two
transformations are LoopUnrollAndJam and SLP. LoopUnrollAndJam is a fairly new
addition to the LLVM loop-optimization repertoire. Combined with a fairly
powerful SLP that LLVM supports today, we are able to vectorize the outer loop
of several important kernels automatically without the support of any pragma.
At present our implementation is at the level of a PoC and does not exploit any
rigorous costing mechanism. While we understand that OLV is being implemented
in the LoopVectorizer using the VPlan technique, this paper highlights a quick
and cheap way to solve the same problem in a different manner using two
existing transforms.
|
Clacc 2019: An Update on OpenACC Support for Clang and LLVM
[ Video ]
[ Slides ]
Joel E. Denny (Oak Ridge National Laboratory), Seyong Lee (Oak Ridge
National Laboratory), Jeffrey S. Vetter (Oak Ridge National Laboratory)
We are developing production-quality, standard-conforming OpenACC [1]
compiler and runtime support in Clang and LLVM for the US Exascale Computing
Project [2][3]. A key strategy of Clacc's design is to translate OpenACC
to OpenMP in order to leverage Clang's existing OpenMP compiler and runtime
support and to minimize implementation divergence. To maximize reuse of the
OpenMP implementation and to facilitate research and development into new
source-level tools involving both the OpenACC and OpenMP levels, Clacc
implements this translation in the Clang AST using Clang's TreeTransform
facility. However, we are also following LLVM IR parallel extensions being
developed by the community as a path to improve compiler optimizations and
analyses.
The purpose of this talk is to provide an update on Clacc progress over the
preceding year including early performance results, to present the plan for the
year ahead, and to invite participation from others. Clacc's OpenACC
support is still maturing and we have not yet offered it upstream. However, we
have already upstreamed many mutually beneficial improvements from the Clacc
project, including improvements to LLVM's testing infrastructure and to
Clang and its OpenMP support. This talk will summarize those contributions as
well.
[1] OpenACC standard: https://www.openacc.org/
[2] Clacc: Translating OpenACC to OpenMP in Clang. Joel E. Denny, Seyong
Lee, and Jeffrey S. Vetter. 2018 IEEE/ACM 5th Workshop on the LLVM Compiler
Infrastructure in HPC (LLVM-HPC), Dallas, TX, USA, (2018).
[3] Clacc: OpenACC Support for Clang and LLVM. Joel E. Denny, Seyong Lee,
and Jeffrey S. Vetter. 2018 European LLVM Developers Meeting (EuroLLVM
2018).
|
Targeting a statically compiled program repository with LLVM
[ Video ]
[ Slides ]
Phil Camp (SN Systems), Russell Gallop (SN Systems)
Following on from the 2016 talk "Demo of a repository for statically
compiled programs", this lightning talk will present a brief overview of
how LLVM was modified to target a program repository. This includes adding a
new target output format and a new optimization pass to skip program elements
already present in the repository. Reference: https://github.com/SNSystems/llvm-prepo
|
Does the win32 clang compiler executable really need to be over 21MB in size?
[ Video ]
[ Slides ]
Russell Gallop (SN Systems), Greg Bedwell (SN Systems)
The title of this lighting talk is from a bug filed in the early days of the
PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times
larger than the PS3 compiler. Since then it has almost doubled to over 40MB.
For a compiler which targets one system this seems excessive. Executable size
can cost in worse cache performance and cost time if transferring for
distributed builds.
In this lightning talk I will look at where this comes from and how it can
be managed.
|
Resolving the almost decade old checker dependency issue in the Clang Static Analyzer
[ Video ]
[ Slides ]
Kristóf Umann (Ericsson Hungary, Eötvös Loránd University)
As checkers grew in numbers in the Static Analyzer, the problem of certain
checkers depending on one another was inevitable. One particular problem, for
example, is that a checker called MallocChecker, which despite its name does
all sorts of memory allocation and de- or reallocation related checks, depends
on CStringChecker to model calls to strcmp. While these checkers are completely
separate entities, the Static Analyzer also contains large checker classes that
in fact expose multiple checkers to the user: For example, IteratorChecker has
a modeling part, and it exposes 3 iterator related checkers, and enabling any
of the three will also enable the unexposed modeling part. Having both of these
structures makes it difficult to find a solution where the developer (or the
experienced user) can easily see what checkers are enabled, as these
dependencies are only expressed in the implementation.
This talk is going to discuss elegant solutions as to how these rather
fragile checker structures can be preserved by declaring these dependencies in
TableGen files, how checker developers (and users) can ensure that when the
analyzer is invoked, only the requested checkers will be enabled, and also take
a very brief look at what other features the analyzer gained thanks to these
issues being resolved.
|
Adopting LLVM Binary Utilities in Toolchains
[ Video ]
[ Slides ]
Jordan Rupprecht (Google)
Although many projects have migrated from GCC-based toolchains to
Clang-based ones, tools from the GNU Binutils collection are still widely used
despite having equivalents in the LLVM project. The problems faced when
attempting to use LLVM tools range anywhere from simple command line syntax
differences to unimplemented or buggy features. In this talk, I will describe
some of the types of challenges we faced when adopting LLVM tools, as well as
some of the strategies we used to test the toolchain.
|
Multiplication and Division in the Range-Based Constraint Manager
[ Video ]
[ Slides ]
Ádám Balogh (Ericsson Hungary Ltd.)
The default constraint manager of the Clang Static Analyzer is a simple
range-based constraint manager: it stores and manages the valid ranges for the
values of symbolic expressions. Upon new assumptions it further constrains
these ranges which often results in an empty range which tells the analyzer
that the assumption is impossible. Until now the constraint manager could
handle basic assumptions: A <rel> m, A + n <rel> m and A - n
<rel> m where A is a symbolic expression, n and m integer constants and
<rel> a relational operator. In the latter two cases where a constant is
added or subtracted from the symbolic expression the range of the additive
expression is calculated by adjusting the range circularly by the constant.
However, it could not cope with division and multiplication, thus not even the
range for A*2 could be deduced from the range of A. This shortcoming lead to
both false positives and missed true positives.
To improve the true positive/false positive ratio of the analyzer we
extended the range-based constraint manager to be able to handle expressions of
the format A <mul> k <add> n <rel> m, where A is a symbolic
expression, k, m and n integer constants, <mul> a multiplicative operator
(* or /), <add> an additive operator (+ or -) and <rel> a
relational operator. The main challenge in our work was to correctly scale the
ranges in the circular arithmetic: for example in case of signed 8 bit types in
A * 2 == 56 the value of A could not only be 28, but also -100. Similarly, in A
/ 3 == 4 the value of A is not necessarily 12, but anything in range [12..14].
To ensure full correctness we also proved our solution: first we generated
every range for every constants in both the 8 bit signed and unsigned
arithmetic, then we tested whether the scaling algorithm calculates exactly the
same ranges. Finally we extrapolated this algorithm to wider integer types and
ported it to the range-based constraint manager. According to our measurements
there is no significant change in the performance and in the talk we will
present numbers of lost false positives and new true positives.
|
Statistics Based Checkers in the Clang Static Analyzer
[ Video ]
[ Slides ]
Ádám Balogh (Ericsson Hungary Ltd.)
In almost every development project there are some conventions that the
return value of some functions in an external library must be compared to some
extremal value, such as zero. For example, many integer functions return
negative number in case of error similarly to pointer functions returning null
pointers. In a large project with many external functions it is virtually
impossible to formalize all these rules explicitly: they are either unwritten
or only exist in a natural language. To help enforcing these rules, we created
checkers in the Clang Static Analyzer to explore these rules on statistical
base and check the code for them. We currently support two kinds of extremal
values: negative numbers for functions returning integers and null pointers for
functions returning pointers.
Example:
int i = may_return_return_negative();
v[i]; // error: negative indexing
Exploration and checking for these rules happens in two phases: in the first
phase we check every function call and create a summary for each function
recording the percentage the return value is checked for negativeness (integer
functions) or nullness (pointer functions). If this percentage is above a
defined threshold (85% by default) we assume that the rule for the function
exists. The second phase is the usual execution of the analyzer where a checker
checks the code for violations of the rule: it splits the execution path to two
branches at the call of the listed functions, where the return value in one
branch is an extremal value (negative for integers or null for pointers) and
non-extremal value on the other branch. Other checkers (e.g. the null-pointer
dereference checker) are expected to find errors on the extremal-value branch
if they are not terminated in the code by checking for the extremal-value. The
performance impact of the state-split is low: in at least 85% of the cases the
extremal-value branch is terminated quickly, in the remaining cases we expect
another checker to create a sink-node because of an error. The new checker is
under evaluation on open-source projects. We found some false positives,
however their amount can be reduced by involving the arguments into the
statistics.
|
Flang Update
[ Video ]
[ Slides ]
Steve Scalpone (NVIDA / PGI / Flang)
An update about the current state of Flang, including a report on OpenMP 4.5
target offload, Fortran performance and the new f18 front end.
|
Swinging Modulo Scheduling together with Register Allocation
[ Video ]
[ Slides ]
Lama Saba (Intel)
VLIW architectures rely heavily on Modulo Scheduling to optimize ILP in
loops. Modulo Scheduling can be achieved today in LLVM using the
MachinePipeliner pass, which implements a Swing Modulo Scheduler prior to
register allocation [1]. For some VLIW architectures, such as those lacking
hardware interlocks or the ability to spill registers onto a stack, the
MachinePipeliner's decisions become crucial for the success of the register
allocation phase, since they affect the latter's decisions to generate
splits or spills, which in turn can result in an inefficient or even an
unsuccessful resource allocation.
Nevertheless, even though the MachinePipeliner aims to schedule with a
minimal Initiation Interval, it is structured in a way that facilities trying
larger Initiation Intervals or a different ordering, this structure lends
itself to alternative, possibly less aggressive scheduling retries, after more
aggressive attempts have failed in register allocation.
This talk introduces this issue and explores how we can achieve successful
modulo scheduling and register allocation for such architectures in LLVM by
introducing a repetitive rollback-and-retry mechanism for altering scheduling
decisions based on the register allocator's outcome, and how we can
leverage such an approach to improve the scheduling of VLIW architectures in
general.
[1] An Implementation of Swing Modulo Scheduling in a Production Compiler -
Brendon Cahoon -
http://llvm.org/devmtg/2015-10/slides/Cahoon-SwingModuloScheduling.pdf
|
LLVM for the Apollo Guidance Computer
[ Video ]
[ Slides ]
Lewis Revill (University of Bath)
Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for
the first time. Among the many extraordinary engineering feats that made this
possible was the Apollo Guidance Computer, an innovative processor for its time
with an instruction set that was thought up well before the advent of C. So 50
years later, why not implement support for it in a modern compiler such as
LLVM?
This talk will give a brief overview of some of the architectural features
of the Apollo Guidance Computer followed by an account of my implementation of
an LLVM target so far. The shortcomings of LLVM when it comes to implementing
such an unusual architecture will be discussed along with the workarounds used
to overcome them.
|
Catch dangling inner pointers with the Clang Static Analyzer
[ Video ]
[ Slides ]
Réka Kovács (Eötvös Loränd University)
C++ container classes provide methods that return a raw pointer to the
container's inner buffer. When the container is destroyed, the inner buffer
is deallocated. A common bug is to use such a raw pointer after deallocation,
which may lead to crashes or other unexpected behavior.
This lightning talk will present a new Clang Static Analyzer checker
designed to address the above described problems, implemented last year as a
Google Summer of Code project. The checker has found serious problems in
popular open source projects with a negligible false positive rate. Future
plans include adding support for view-like constructs and non-STL
containers.
|
Cross translation unit test case reduction
[ Video ]
[ Slides ]
Réka Kovács (Eötvös Loränd University)
C-Reduce, released by Regehr et al. in 2012, is an excellent tool designed
to generate a minimal test case from a C/C++ file that has some specific
property (e.g. triggers a bug). One of the most interesting parts of C-Reduce
is Clang Delta, which is a set of compiler-like transformations implemented
using Clang libraries. Clang Delta includes transformations like changing a
function parameter to a global variable etc.
With the introduction of the experimental cross translation unit analysis
feature in the Clang Static Analyzer, there arose a need to investigate
crashes, bugs, or false positive reports that spread across different
translation units. Unfortunately, C-Reduce was designed to minimize one
translation unit at a time, and some of the Clang Delta transformations cannot
be applied to multiple TUs in their original form.
This talk/poster is a status report about a work in progress that aims to
make it possible to use C-Reduce for cross translation unit test case
reduction.
|
BoFs
RFC: Towards Vector Predication in LLVM IR
Simon Moll (Saarland University), Sebastian Hack (Saarland University)
In this talk, we present the current state of the Explicit Vector Length
extension for LLVM. EVL is the first step towards proper predication and active
vector length support in LLVM IR. There has been a recent surge in vector ISAs,
let it be the RISC-V V extension, ARM SVE or NEC SX-Aurora, all of which pose
new demands to LLVM IR. Among their novel features are an active vector length,
full predication on all vector instructions and a register length that is
unknown at compile time. In this talk, we present the Explicit Vector Length
extension (EVL) for LLVM IR. EVL provides primitives that are practical for
both, backends and IR-level automatic vectorizers. At the same time, EVL is
compatible with LLVM-SVE and even existing short SIMD ISAs stand to benefit
from its consistent handling of predication.
|
IPO --- Where are we, where do we want to go?
Johannes Doerfert (Argonne National Laboratory), Kit Barton (IBM Toronto Lab)
Interprocedural optimizations (IPOs) have been historically weak in LLVM.
The strong reliance on inlining can be seen as a consequence or cause. Since
inlining is not always possible (parallel programs) or beneficial (large
functions), the effort to improve IPO has recently seen an upswing again
[0,1,2]. In order to capitalize this momentum, we would like to talk about the
current situation in LLVM, and goals for the immediate, but also distant,
future.
This open-ended discussion is not aimed at a particular group of people. We
expect to discuss potential problems with IPO, as well as desirable analyses
and optimizations, both experts and newcomers are welcome to attend.
[0]
https://lists.llvm.org/pipermail/llvm-dev/2018-August/125537.html
[1] These links do not yet exist but will be added later on.
[2] One link will be an RFC outlining missing IPO capabilities, the other
will point to a function attribute deduction rewrite patch (almost
finished).
|
LLVM binutils
[ Notes ]
[ Slides ]
James Henderson (SN Systems), Jordan Rupprecht (Google)
LLVM has a suite of binary utilities that broadly mirror the GNU binutils
suite, with tools such as llvm-readelf, llvm-nm, and llvm-objcopy. These tools
are already widely used in testing the rest of LLVM, and are now starting to be
adopted as full replacements for the GNU tools in production environments.
This discussion will focus on what more needs to be done to make this
migration process easier, how far we need to go to make drop-in replacements
for the GNU tools, and what features people want to prioritize. Finally, we
will look at the broader future goals of these tools.
|
RFC: Reference OpenCL Runtime library for LLVM
Andrew Savonichev (Intel), Alexey Sachkov (Intel)
LLVM is used as a foundation for majority of OpenCL compilers, thanks to
excellent support of OpenCL C language in Clang frontend, and modularity of
LLVM. Unfortunately, a compiler is not the only component that is required to
develop using OpenCL: users need a runtime library that implements the OpenCL
API. While there are several implementations of OpenCL runtime exist, both open
and proprietary, they do not have a community wide adoption. This leads to
fragmentation and effort duplication across OpenCL community, and negatively
impacts OpenCL ecosystem in general.
The purpose of this BoF is to bring all parties interested in getting a
reference OpenCL Runtime implementation in LLVM, that is designed to be easily
extendable to support various accelerator devices (CPU/GPU/FPGA/DSP) and allow
users and compiler developers to rapidly prototype OpenCL specific
functionality in LLVM and Clang.
|
LLVM Interface Stability Guarantees BoF
Stephen Kelly
The goal of this BoF is to create the basis for a new page of documentation
enumerating the stability guarantees of interfaces exposed from LLVM
products.
There are some interfaces which are known to make no stability guarantees,
such as the Clang C++ API, others which make strict API guarantees, such as the
libclang C API, and still others, such as the LLVM IR API which is somewhere in
between. Only the latter appears in the LLVM Developer Policy. Mostly the rest
of the interface stability guarantees are tribal knowledge.
A centralized location in the documentation for this documentation would
present guidelines for developers to follow when changing various parts of LLVM
code, and inform consumers what they can expect and rely upon when using
interfaces. This includes code interfaces and command line interfaces.
|
Clang Static Analyzer BoF
Devin Coughlin (Apple), Gabor Horvath (Eotvos Lorand University)
Let's discuss the present and future of the Clang Static Analyzer!
We'll start with a brief overview of analyzer features the community has
added over the last year. We'll then dive into a discussion of possible
focus areas for the next year, including potential deeper integration with
clang-tidy.
|
LLVM Numerics Improvements
Michael Berg (Apple), Steve Canon (Apple)
Some LLVM based compilers currently provide two modes of floating point code
generation. The first mode, called fast-math, is where performance is the
primary consideration over numerical precision and accuracy. This mode does not
strictly follow the IEEE-754 standard, but has proven useful for applications
that do not require this level of precision. The second mode, called
precise-math, is where the compiler carefully follows the subset of behavior
defined in the IEEE standard that is applicable to conforming hardware targets.
This mode is primarily used for compute workloads and wherever fast-math
precision is inadequate, however it runs much slower as it requires a larger
number of instructions in general. In practice neither of these modes is
particularly desirable. The fast-math mode ignores a significant portion of the
standard as pertains to handling undefined values described as Not a Number
(NaNs) and Infinities (INFs), resulting in difficulties for certain workloads
when the hardware target computes these values correctly and performance
remains critical.
Until recently these two models were mutually exclusive, however with the
addition of IR flags they need not be. For instance, the FastMath metadata
module flag drives behavior deemed numerically unsafe when it is enabled, by
indiscriminately enabling optimizations. With IR flags this behavior can be
enabled with much finer granularity, allowing various code forms to be fast or
precise together in one module. We call this mixed mode compilation. IR flags
can be used individually or paired to produce desired floating point behavior
under specified constraints with fine granularity of control. Optimization
passes have been modified under this new kind of control to produce this
behavior. This talk will describe the recent numerics work and discuss the
implications for front-ends and backends built with LLVM.
|
LLVM Foundation BoF
LLVM Foundation Board of Directors
Ask the LLVM Foundation Board of Directors anything, get program updates.
|
Posters
Clava: C/C++ source-to-source from CMake using LARA
[ Poster ]
João Bispo (FEUP/INESCTEC)
Clava is a Clang-based source-to-source compiler that executes scripts
written in LARA, a superset of JavaScript with additional syntax for AST
analysis and transformation.
Clava intends to improve on Clang's source-to-source capabilities, by
providing a more convenient and powerful way to analyze, transform and generate
C/C++ code.
Although Clava is a stand-alone tool, we will present the Clava CMake
plug-in, which allows to easily apply LARA scripts to C/C++ CMake projects.
Clava is open-source and runs on Linux, Windows and MacOS.
|
Safely Optimizing Casts between Pointers and Integers
[ Poster ]
Juneyoung Lee (Seoul National University, Korea), Chung-Kil Hur (Seoul
National University, Korea), Ralf Jung (MPI-SWS, Germany), Zhengyang Liu
(University of Utah, USA), John Regehr (University of Utah, USA), Nuno P.
Lopes (Microsoft Research, UK)
In this talk, a list of optimizations that soundly removes casts between
pointers and integers will be presented. In LLVM, a pointer is more than just
an integer: LLVM allows a pointer to track its underlying object, and the rule
to find it is defined as based-on relation. This allows LLVM to aggressively
optimize load/stores, but makes the meaning of pointer-integer casts
complicated. This causes conflict between existing optimizations, causing
long-standing miscompilation bugs like 34548.
To fix it, we suggest disabling folding of inttoptr(ptrtoint(p)) to p and
using a safe workaround to remove them. This optimization is important because
it's removing a significant portion of such cast pairs. We'll show that
even if the optimization is disabled, majority of casts can be removed by
carefully adding new \& modifying existing optimizations. After the
updates, the performance is still comparable to the original LLVM.
|
Scalar Evolution Canon: Click! Canonicalize SCEV and validate it by Z3 SMT solver!
[ Poster ]
Lin-Ya Yu (Xilinx), Alexandre Isoard (Xilinx)
A scalar evolution(SCEV) is an analyzed expression. It represents how the
value of scalar variables changes in a program when we execute the code[0]. It
is implemented as a pass and is well-used in many analysis and optimizations in
LLVM, such as loop strength reduction, induction variable substitution, and
memory access analysis. However, it is difficult to have a canonical form for
SCEV that can meet all other passes needs. Here, we develop SCEV Canon to do
canonicalization and further simplification on SCEV.
A satisfiability modulo theories(SMT) solver from Microsoft Research, Z3, is
introduced in this work to verify the correctness of canonicalized SCEV.
Moreover, Z3 can also help us check the equivalence of SCEVs between different
SCEV implementation in different released of LLVM. This poster shares the whole
process of how to canonicalize SCEV without modifying the scalar evolution
pass, verify and test the generated SCEV. We also try to open a discussion
about some simplification that can be done on SCEV.
[0]
https://subscription.packtpub.com/book/application_development/9781785280801/5/ch05lvl1sec36/scalar-evolution
|
Splendid GVN: Partial Redundancy Elimination for Algebraic Simplification
[ Poster ]
Li-An Her (National Tsing Hua University), Jenq-Kuen Lee (National Tsing Hua University)
Modern computation of Neural Network, signal processing of GPS and Wifi,
image processing, etc, highly depends on enormous linear algebra operations.
Algebraic simplification improves performance for more and more complicated
computation such as convolutions for CNN and Sobel operator, inner products for
discrete cosine transform and FFT of signal processing, etc. LLVM IR provides
several passes of optimization for algebraic simplification, constant folding,
copy propagation, etc. One is global value numbering (GVN). These passes work
fine except encountering branches and non-local cases. One case is partial
redundancy elimination (PRE). At least two instructions are redundant or
congruent, but they are in different blocks. Even though elimination of one
redundant won't lead to logic error, compiler lacks such rule and ignores
such elimination. Thus, algebraic simplification fails to optimize code when
PRE occurs. GVN provides PRE mechanism with lazy code motion, but it cannot
provide more accurate congruence information due to loops and Φ-nodes. New
GVN handles such case and provides more delicate congruence information, but it
lacks mechanism for and ignores PRE.
In this paper, we propose Splendid GVN which inserts PRE mechanism for New
GVN on LLVM 7.0.0. When PRE happens, our pass checks safety and applies hoist
code motion to eliminate partial redundancy. Original GVN applies less accurate
algorithm and can only perform lazy code motion, which takes risk for
increasing code size. Original Hoist GVN cannot handle PRE and utilizes GVN
instead of New GVN, which cannot provide more delicate information due to loops
and may miss opportunity for further elimination. Experiments show that our
Splendid GVN performs hoist code motion for PRE on 2 qualified PRE programs
from LLVM test directory for GVN (available in source code). Splendid GVN
reduces total code size with -18.37% and -7% compared to original 2 programs
and New GVN results.
|
An alternative OpenMP Backend for Polly
[ Poster ]
Michael Halkenhäuser (TU Darmstadt)
LLVM's polyhedral infrastructure framework Polly may automatically
exploit thread-level parallelism through OpenMP. Currently, the user can only
influence the number of utilized threads, while other OpenMP parameters such as
the scheduling type and chunk size are set to fixed values. This in turn,
limits a user's ability to adapt the optimization process for a given
problem.
In this work, we present an alternative OpenMP backend for Polly, which
provides additional customization options to the user and is based on the LLVM
OpenMP runtime. We evaluate our new backend and the influence of the new
customization options on performance and compare to Polly's existing OpenMP
backend.
|
Does the win32 clang compiler executable really need to be over 21MB in size?
[ Poster ]
Russell Gallop (SN System), Greg Bedwell (SN Systems)
The title of this lighting talk is from a bug filed in the early days of the
PS4 compiler. It noted that the LLVM-based PS4 compiler was more than 3 times
larger than the PS3 compiler. Since then it has almost doubled to over 40MB.
For a compiler which targets one system this seems excessive. Executable size
can cost in worse cache performance and cost time if transferring for
distributed builds.
In this lightning talk I will look at where this comes from and how it can
be managed.
|
Enabling Multi- and Cross-Language Verification with LLVM
[ Poster ]
Jack J. Garzella (University of Utah), Marek Baranowski (University of Utah),
Shaobo He (University of Utah), Zvonimir Rakamaric (University of Utah)
Developers nowadays regularly use numerous programming languages with
different characteristics and trade-offs. Unfortunately, implementing a
software verifier for a new language from scratch is a large and tedious
undertake, requiring expert knowledge in multiple domains, such as compilers,
verification, and constraint solving. Hence, only a tiny fraction of the used
languages has readily available software verifiers to aid in the development of
correct programs. In the past decade, there has been a trend of leveraging
popular compiler intermediate representations (IRs), such as LLVM IR, when
implementing software verifiers. The main advantage is to avoid implementing
large front-ends, and instead rely on a typically simple canonical format of an
IR. In addition, processing IR promises out-of-the-box multi- and
cross-language verification since, at least in theory, a verifier ought to be
able to handle a program in any programming language (and their combination)
that can be compiled into the IR. In practice though, to the best of our
knowledge, nobody has explored the feasibility and ease of such integration of
new languages. This talk introduces a methodology for adding support for a new
language into an IR-based verification toolflow. Using our methodology, we
extend an existing verifier called SMACK with support for 7 additional
languages. We assess the quality of our extensions and the proposed methodology
through several case studies, and we describe the lessons we learned in the
process.
|
Instruction Tracing and dynamic codegen analysis to identify unique llvm performance issues.
[ Poster ]
Biplob (IBM)
Performance analysis of the machine code generated by a compiler can be
carried out in different ways and can also be based on application in question.
Common methods use some form of profiling on a running program which generally
provides the statistical information about certain data and events. While this
method does give important insights to a performance problem, some of the
issues are more clearly understood when the compiled applications is actually
run and the dynamic instructions of hot code execution paths are traced and
analyzed in a small execution window. Trace records contain instructions and
data, memory addresses and other information which provide complete visibility
into the workings of an application.
While tracing is very useful in micro-architecture analysis we will stick to
how these traces can benefit compiler performance analysis. In this talk we
will look at some of these code-gen issues which were better identified when a
running application compiled by llvm and other compilers were traced for hot
code sections on IBM Power9 processor.
|
Handling all Facebook requests with JITed C++ code
[ Poster ]
Huapeng Zhou (Facebook), Yuhan Guo (Facebook)
Facebook needs an efficient scripting framework to enable fast iteration of
HTTP request handling logic in our L7 reverse proxy. A C++ scripting engine and
code deployment ecosystem was created to compile/link/execute C++ script at
run-time, using Clang and LLVM ORC APIs. The framework allows developers to
write business logic and unit test in C++ script, as well as debug using GDB.
Profiling using perf is also supported for PGO purpose. This new framework
outperformed another previously used scripting language by up to 4X, measured
in execution time.
In order to power the C++ script in ABI compatible way, a PCH (pre-compiled
header) is built statically to provide declarations and definitions of
necessary dependent types and methods. Clang APIs are then used at run-time to
transform source code to LLVM IR, which are later passed through LLVM ORC
layers for linking/optimizing. Above Clang/LLVM toolchains are statically
linked into main binary to ensure compatibility between PCH and C++ scripts. As
a result, scripts could be deployed in real time without any main binary
change.
|
Implementing SPMD control flow in LLVM using reconverging CFGs
[ Poster ]
Fabian Wahlster (Technische Universität München), Nicolai Hähnle (Advanced Micro Devices)
Compiling programs for an SPMD execution model, e.g. for GPUs or for whole
program vectorization on CPUs, requires a transform from the thread-level input
program into a vectorized wave-level program in which the values of the
original threads are stored in corresponding lanes of vectors. The main
challenge of this transform is handling divergent control flow, where threads
take different paths through the original CFG. A common approach, which is
currently taken by the AMDGPU backend in LLVM, is to first structurize the
program as a simplification for subsequent steps.
However, structurization is overly conservative. It can be avoided when
control flow is uniform, i.e. not divergent. Even where control flow is
divergent, structurization is often unnecessary. Moreover, LLVM's
StructurizeCFG pass relies on region analysis, which limits the extent to which
it can be evolved.
We propose a new approach to SPMD vectorization based on saying that a CFG
is reconverging if for every divergent branch, one of the successors is a
post-dominator. This property is weaker than structuredness, and we show that
it can be achieved while preserving uniform branches and inserting fewer new
basic blocks than structurization requires. It is also sufficient for code
generation, because it guarantees that threads which "leave" a wave
at divergent branches will be able to rejoin it later.
|
LLVM for the Apollo Guidance Computer
[ Poster ]
Lewis Revill (University of Bath)
Nearly 50 years ago on the 20th of July 1969 humans set foot on the moon for
the first time. Among the many extraordinary engineering feats that made this
possible was the Apollo Guidance Computer, an innovative processor for its time
with an instruction set that was thought up well before the advent of C. So 50
years later, why not implement support for it in a modern compiler such as
LLVM?
This talk will give a brief overview of some of the architectural features
of the Apollo Guidance Computer followed by an account of my implementation of
an LLVM target so far. The shortcomings of LLVM when it comes to implementing
such an unusual architecture will be discussed along with the workarounds used
to overcome them.
|
LLVM Miner: Text Analytics based Static Knowledge Extractor
[ Poster ]
Hameeza Ahmed (NED University of Engineering and Technology), Muhammad Ali
Ismail (NED University of Engineering and Technology)
Compiler converts high level language code into assembly language by
enabling optimizations. There are three phases in compiler namely front end,
middle end and backend. Low Level Virtual Machine (LLVM) is an open source
framework enabling provision of all these three stages. One of the reasons of
huge adoption of LLVM is its powerful optimizer or middle end stage. There
exist various opportunities to optimize given Intermediate Representation (IR)
code generated by front end. Before applying any optimization significant
efforts are dedicated for detailed analysis of given IR in order to extract
static information hidden in source code.
Up till now, there exists a standard mechanism to analyze IR code by using
analysis passes written in LLVM itself. Each time some information is required
from IR, a pass is written or reused in LLVM core syntax. This approach is
proved to be complex for novice programmers who are unfamiliar with the LLVM
coding style having hard core C++ concepts. This way a significant amount of
time is spent on learning LLVM programming than doing the required compile time
code analysis. In this regard, an easier mechanism is needed to perform static
code analysis in LLVM.
In this work, LLVM miner is presented to simplify static IR level analysis
in LLVM compiler tool. LLVM miner performs text analytics in order to extract
related information from given IR code. The IR generated from front end is
passed through the proposed miner where static hidden features are extracted
easily. The proposed approach has been tested using set of 5 mixed benchmark
codes namely bfs, connected components, grep, histogram, and kmeans. The
experiments are conducted using R script for determining the instruction
frequency and application trend. Instruction frequency shows count of each
instruction in given IR code. It is represented by means of bar graph and word
cloud. Then application trend is obtained by clustering individual instructions
in certain categories such as branch, compute, function calls, IO read write,
memory consumption, and memory read write operations of each instruction.
Application trend shows proportion of different classes of operations in a
given code using bar graphs. It enables us to know whether application is
compute bound, or memory bound or I/O bound etc by using static code level
features. The analysis of LLVM IR using text mining techniques appears to be a
promising direction towards studying significant features hidden in source
code. The text analytics of given IR is expected to be an easier and less
costly solution both in terms of time and efforts, as compared to the
conventional LLVM analysis passes.
|
Function Merging by Sequence Alignment
[ Poster ]
Rodrigo Rocha (University of Edinburgh), Pavlos Petoumenos (University of Edinburgh), Zheng Wang (Lancaster University), Murray Cole (University of Edinburgh), Hugh Leather (University of Edinburgh)
Resource-constrained devices for embedded systems are becoming increasingly
important. In such systems, memory is highly restrictive, making code size in
most cases even more important than performance. Compared to more traditional
platforms, memory is a larger part of the cost and code occupies much of it.
Despite that, compilers make little effort to reduce code size. One key
technique attempts to merge the bodies of similar functions. However,
production compilers only apply this optimization to identical functions, while
research compilers improve on that by merging the few functions with identical
control-flow graphs and signatures. Overall, existing solutions are
insufficient and we end up having to either increase cost by adding more memory
or remove functionality from programs.
We introduce a novel technique that can merge arbitrary functions through
sequence alignment, a bioinformatics algo- rithm for identifying regions of
similarity between sequences. We combine this technique with an intelligent
exploration mechanism to direct the search towards the most promising function
pairs. Our approach is more than 2.4x better than the state-of-the-art,
reducing code size by up to 25%, with an overall average of 6%, while
introducing an average compilation-time overhead of only 15%. When aided by
profiling information, this optimization can be deployed without any
significant impact on the performance of the generated code.
|
Compilation and optimization with security annotations
[ Poster ]
Son Tuan Vu (LIP6), Karine Heydemann (LIP6), Arnaud de Grandmaison (ARM),
Albert Cohen (Google)
Program analysis and program transformation systems need to express
additional program properties, to specify test and verification goals, and to
enhance their effectiveness. Such annotations are typically inserted to the
representation on which the tool operates; e.g., source level for establishing
compliance with a specification, and binary level for the validation of secure
code. While several annotation languages have been proposed, these typically
target the expression of functional properties. For the purpose of implementing
secure code, there has been little effort to support non-functional properties
about side-channels or faults. Furthermore, analyses and transformations making
use of such annotations may target different representations encountered along
the compilation flow.
We extend an annotation language to express a wider range of functional and
non-functional properties, enabling security-oriented analyses and influencing
the application of code transformations along the compilation flow. We
translate this language to the different compiler representations from abstract
syntax down to binary code. We explore these concepts through the design and
implementation of an optimizing, annotation-aware compiler, capturing
annotations from the program source, propagating and emitting them in the
binary, so that binary-level analysis tools can use them.
|
Leveraging Polyhedral Compilation in Chapel Compiler
[ Poster ]
Sahil Yerawar (IIT Hyderabad), Siddharth Bhat (IIIT Hyderabad), Michael Ferguson (Cray Inc.), Philip Pfaffe (Karlsruhe Institute of Technology), Ramakrishna Upadrasta (IIT Hyderabad)
Chapel is an emerging parallel programming language developed with the aim
of providing better performance in High-Performance Computing as well as
accessibility to the newcomer programmers. It relies on LLVM as one of its
backends. This talk shows how the polyhedral compilation techniques available
in Polly are utilized by the Chapel Compiler. We will share our experience of
using Polly's Loop Optimizer in a new setting with Polly & LLVM
Developers.In particular, the talk will discuss how the Chapel compiler can
benefit from the optimizations available in Polly including GPGPU code
generation.
|
LLVM on AVR - textual IR as a powerful tool for making "impossible" compilers
[ Poster ]
Carl Peto (Swift for Arduino/Petosoft)
To be demonstrated on stage and available to use and test, I have built a
prototype compiler for the a subset of the Swift language onto the Arduino UNO
platform, which is a radically different use for the language. Despite the
Swift compiler and front end having limited support for such a different back
end.
Key to the success was separation of the first part of the compilation into
textual LLVM IR (using a standard toolchain), followed by compilation from LLVM
IR files into machine code using a custom built llc. This approach improves
debugging, especially of deployed product, and separation of concerns.
Ultimately it could be used as a template for other "impossible"
compilers such as Swift to WebAssembly, Go to OpenGL shaders and more.
|
Vectorizing Add/Sub Expressions with SLP
[ Poster ]
Vasileios Porpodas (Intel Corporation, USA), Rodrigo C. O. Rocha (University
of Edinburgh, UK), Evgueni Brevnov (Intel Corporation, USA), Luís F.
W. Góes (PUC Minas, Brazil), Timothy Mattson (Intel Corporation,
USA)
The SLP Vectorizer is LLVM's second vectorizer (after the Loop
Vectorizer). It performs auto-vectorization of straight-line code. It works by
first exploring the scalar code for vectorizable patterns (groups), and then by
replacing each group with its vectorized form.
This talk presents the existing design of the SLP vectorizer and shows how
it fails to vectorize simple IR inputs with Add/Sub (or Mul/Div) expression
trees. We propose specific improvements to the current design that will let us
effectively handle such code. We named this design SuperNode SLP (SN-SLP)
because it extends the SLP graph to include new "fat" nodes that
include multiple instructions. This talk also presents our detailed plan for
upstreaming the bulk of this work in a sequence of patches.
|
Adding support for C++ contracts to Clang
[ Poster ]
Javier López-Gómez (University Carlos III of Madrid), J.
Daniel García (University Carlos III of Madrid)
A language supporting contract-checking allows to detect programming errors.
Also, making this information available to the compiler may cause it to perform
additional optimizations.
This paper presents our implementation of the P0542R5 technical
specification (now part of the C++20 working draft).
|
Optimizing Nondeterminacy: Exploiting Race Conditions in Parallel Programs
[ Poster ]
William S. Moses (MIT CSAIL)
As computation moves towards parallel programming models, writing efficient
parallel programs becomes paramount. As a result, there have been several
efforts (Tapir, HPVM, among others) to augment serial compilers such as LLVM to
have a first-class representation of parallelism. Such representations
theoretically permit the compiler to both analyze and optimize parallel
programs.
A major difference between serial and parallel programs is that in many
parallel runtimes, one cannot make any assumptions about the ordering of
various logical tasks. This nondeterminism creates an opportunity for the
compiler. Since any ordering is valid, the compiler can also reorder tasks if
it believes it beneficial.
This talk will discuss how the compiler can take advantage of this
nondeterminacy through a number of example optimizations, taking a look at
their theoretical implications as well as how they perform when implemented
atop the Tapir extension to LLVM.
|
Diamond Sponsors:
Platinum Sponsors:
Gold Sponsors:
Corporate Supporters
Thank you to our sponsors!
|