Posted 20 Feb 2017
We are happy to announce that the list of accepted sessions is now available and can be browsed below. The schedule can be found here.
Special thanks to all authors that submitted a proposal as well as the program committee members who reviewed the proposals in time!
Keynotes | |
---|---|
Argonne National Laboratory Keynote |
LLVM for the future of Supercomputing - [pdf] [video]
LLVM is solidifying its foothold in high-performance computing, and as we look forward toward the exascale computing era, LLVM promises to be a cornerstone of our programming environments. In this talk, I'll discuss several of the ways in which we're working to improve LLVM in support of this vision. Ongoing work includes better handling of restrict-qualified pointers [2], optimization of OpenMP constructs [3], and extending LLVM's IR to support an explicit representation of parallelism [4]. We're exploring several ways in which LLVM can be better integrated with autotuning technologies, how we can improve optimization reporting and profiling, and a myriad of other ways we can help move LLVM forward. Much of this effort is now a part of the US Department of Energy's Exascale Computing Project [1]. This talk will start by presenting the big picture, in part discussing goals of performance portability and how those maps into technical requirements, and then discuss details of current and planned development. |
Max Planck Institute for Software Systems (MPI-SWS) Keynote |
Weak Memory Concurrency in C/C++11 and LLVM - [pdf] [video] Which compiler optimizations are correct in a concurrent setting? How should C/C++11 atomics be compiled on architecture X? The answers to these questions are not unique, but depend very much on the concurrency model of the programming language and/or compiler. While such a model can act the golden standard and used to answer these questions, it is very challenging to define an appropriate concurrency model for almost any programming language. In this talk, I will focus on the C/C++11 concurrency model and the closely related LLVM model. I will discuss some of the serious flaws that we found in these models, ways of correcting them, and some remaining open problems. |
Technical Talks | |
Apple Technical Talk |
Adventures in Fuzzing Instruction Selection - [pdf] [video]
Recently there has been a lot of work on GlobalISel, which aims to entirely replace the existing instruction selectors for LLVM. In order to approach such a transition, we need an effective way to test instruction selection and evaluate the new selector compared to the older ones. |
ARM ARM ARM ARM Technical Talk |
ARM Code Size Optimisations - [pdf] [video]
Last year, we've done considerable ARM code size optimisations in LLVM as that's an area that LLVM was lacking, see also e.g. Samsung's and Intel's EuroLLVM talks. In this presentation, we want to present lessons learned and insights gained from our work, leading to about 200 commits. The areas that we identified that are most important for code size are: I) turn off specific optimisations when optimising for size, II) tuning optimisations, III) constants, and IV) bit twiddling. |
Intel Technical Talk |
AVX-512 Mask Registers Code Generation Challenges in LLVM - [pdf] [video]
In the past years LLVM has been extended to support Intel AVX512 [1] [2] instructions. One of the features introduced by the AVX-512 architecture is the concept of masked operations. In the Euro LLVM 2015 developer meeting Intel presented the new masked vector intrinsics, which assist LLVM IR optimizations (e.g. Loop Vectorizer) in selecting vector masked operations [3]. |
Oracle Oracle Technical Talk |
Clank: Java-port of C/C++ compiler frontend - [pdf] [video]
Clang was written in a way that allows to use it inside IDEs as a provider for various things - from navigation and code completion to refactorings. But is it possible to use it with the modern IDE written in pure Java? Our team spent some time porting Clang into Java and got "Clank - the Java equivalent of native Clang". We will tell you why we failed to use native Clang, how porting to Java was done, what difficulties we faced and what outcome we have at this point. |
Ericsson Ltd., Eötvös Loránd University, Faculty of Informatics, Dept of Programming Languages and Compilers Ericsson Ltd. Eötvös Loránd University, Faculty of Informatics, Dept of Programming Languages and Compilers Eötvös Loránd University, Faculty of Informatics, Dept of Programming Languages and Compilers Technical Talk |
CodeCompass: An Open Software Comprehension Framework - [pdf] [video]
Bugfixing or new feature development requires a confident understanding of all details and consequences of the planned changes. For long existing large telecom systems, where the code base have been developed and maintained for decades by fluctuating teams, original intentions are lost, the documentation is untrustworthy or missing, the only reliable information is the code itself. Code comprehension of such large software systems is an essential, but usually very challenging task. As the method of comprehension is fundamentally different from writing new code, development tools are not performing well. During the years, different programs have been developed with various complexity and feature set for code comprehension but none of them fulfilled all requirements. |
Ericsson ELTE Ericsson ELTE Technical Talk |
Cross Translational Unit Analysis in Clang Static Analyzer: Prototype and Measurements - [pdf] [video] Today Clang Static Analyzer [4] can perform (context-sensitive) interprocedural analysis for C,C++ and Objective C files by inlining the called function into the callers' context. This means that that the full calling context (assumptions about the values of function parameters, global variables) is passed when analyzing the called function and then the assumptions about the returned value is passed back to the caller. This works well for function calls within a translation unit (TU), but when the symbolic execution reaches a function that is implemented in another TU, the analyzer engine skips the analysis of the called function definition. In particular, assumptions about references and pointers passed as function parameters get invalidated, and the return value of the function will be unknown. Losing information this way may lead to false positive and false negative findings. The cross translation unit (CTU) feature allows the analysis of called functions even if the definition of the function is external to the currently analyzed TU. This would allow detection of bugs in library functions stemming from incorrect usage (e.g. a library assumes that the user will free a memory block allocated by the library), and allows for more precise analysis of the caller in general if a TU external function is invoked (by not losing assumptions). We implemented (based on the prototype by A. Sidorin, et al. [2]) the Cross Translation Unit analysis feature for Clang SA (4.0) and evaluated its performance on various open source projects. In our presentation, we show that by using the CTU feature we found many new true positive reports and eliminated some false positives in real open source projects. We show that while the total analysis time increases by 2-3 times compared to the non-CTU analysis time, the execution remains scalable in the number of CPUs. We also point out how the analysis coverage changes that may lead to the loss of reports compared to the non-CTU baseline version. |
Sony Interactive Entertainment (SIE) Sony Interactive Entertainment (SIE) Sony Interactive Entertainment (SIE) Technical Talk |
Delivering Sample-based PGO for PlayStation(R)4 (and the impact on optimized debugging) - [pdf] [video]
Users of the PlayStation(R)4 toolchain have a number of expectations from their development tools: good runtime performance is vitally important, as is the ability to debug fully optimized code. The team at Sony Interactive Entertainment have been working on delivering a Profile Guided Optimization solution to our users to allow them to maximize their runtime performance. First we provided instrumentation-based PGO which has been successfully used by a number of our users. More recently we followed this up by also providing a Sample-based PGO approach, built upon the work of and working together with the LLVM community, and integrated with the PS4 SDK's profiling tools for a simple and seamless workflow. |
Compiler Design Lab, Saarland University Compiler Design Lab, Saarland University Compiler Design Lab, Saarland University German Research Center for Artificial Intelligence (DFKI) Intel Visual Computing Institute, Saarland University German Research Center for Artificial Intelligence (DFKI) Technical Talk |
Effective Compilation of Higher-Order Programs - [pdf] [video]
Many modern programming languages support both imperative and functional idioms. However, state-of-the-art SSA-based intermediate representations like LLVM cannot natively represent crucial functional concepts like higher-order functions. On the other hand, functional intermediate representations like GHC's Core employ an explicit scope nesting, which is cumbersome to maintain across certain transformations. |
Azul Systems Technical Talk |
Expressing high level optimizations within LLVM - [pdf] [video]
At Azul we are building a production quality, state of the art LLVM based JIT compiler for Java. Originally targeted for C and C++, the LLVM IR is a rather low-level representation, which makes it challenging to represent and utilize high level Java semantics in the optimizer. One of the approaches is to perform all the high-level transformations over another IR before lowering the code to the LLVM IR, like it is done in the Swift compiler. However, this involves building a new IR and related infrastructure. In our compiler we have opted to express all the information we need in the LLVM IR instead. In this talk we will outline the embedded high level IR which enables us to perform high level Java specific optimizations over the LLVM IR. We will show the optimizations based on top of it and discuss some pros and cons of the approach we chose. |
Max Planck Institute for Software Systems (MPI-SWS) Max Planck Institute for Software Systems (MPI-SWS) Technical Talk |
Formalizing the Concurrency Semantics of an LLVM Fragment - [pdf] [video]
The LLVM compiler follows closely the concurrency model of C/C++ 2011, but with a crucial difference. While in C/C++ a data race between a non-atomic read and a write is declared to be undefined behavior, in LLVM such a race has defined behavior: the read returns the special `undef' value. This subtle difference in the semantics of racy programs has profound consequences on the set of allowed program transformations, but it has been not formally been studied before. |
Intel Intel Technical Talk |
Introducing VPlan to the Loop Vectorizer - [pdf] [video]
This talk describes our efforts to refactor LLVM’s Loop Vectorizer following the RFC posted on llvm-dev mailing list[1] and the presentation delivered at LLVM-US 2016[2]. We describe the design and initial implementation of VPlan which models the vectorized code and drives its transformation. |
IBM Technical Talk |
LLVM performance optimization for z Systems - [pdf] [video]
Since we initially added support for the IBM z Systems line of mainframe processors back in 2013, one of the main goals of ongoing LLVM back-end development work has been to improve the performance of generated code. |
University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Technical Talk |
LLVMTuner: An Autotuning framework for LLVM [video] We present LLVMTuner, an autotuning framework targeting whole program autotuning (instead of just small computation kernels). LLVMTuner significantly speeds up search by extracting the hottest top-level loop nests into separate LLVM modules, along with private copies of the functions most frequently called from each such loop nest and individually applying some search strategy to optimize each such extracted module. |
AMD AMD AMD Technical Talk |
Path Invariance Based Partial Loop Un-switching - [pdf] [video] Loop un-switching is a well-known compiler optimization technique, it moves a conditional inside a loop outside by duplicating the loop's body and placing a version of it inside each of the if and else clauses of the conditional. Efficient Loop un-switching is inhibited in cases where a condition inside a loop is not loop-invariant or invariant in any of the conditional-paths inside the loop but not invariant in all the paths. We propose here a novel, efficient technique to identify partial invariant cases and optimize them by using partial loop un-switching. |
Swedish Institute of Computer Science KTH Royal Institute of Technology Swedish Institute of Computer Science KTH Royal Institute of Technology Technical Talk |
Register Allocation and Instruction Scheduling in Unison - [pdf] [video]
This talk presents Unison - a simple, flexible and potentially optimal tool that solves register allocation and instruction scheduling simultaneously. Unison is integrated with LLVM's code generator and can be used as a complement to the existing heuristic algorithms. |
ARM Poznan University of Technology Technical Talk |
SPIR-V infrastructure and its place in the LLVM ecosystem - [pdf] [video]
SPIR-V is a new portable intermediate representation for parallel computing designed by the Khronos Group. Although its predecessor, SPIR, was based on the LLVM IR, there are many differences between the formats and the communities behind them. |
Solid Sands B.V. Technical Talk |
Using LLVM for Safety-Critical Applications [video]
Would you step into a car if you knew that the software for the brakes was compiled with LLVM? The question is not academic. Compiled code is used today for many of the safety-critical components in modern cars. For the development of autonomous driving systems, the car industry demands safety qualified, high performance compilers to compile image and radar signal processing libraries written in C++, among other things. Fortunately, there are international standards such as ISO 26262 that describe the requirements for electronic components, and their software, to be used in safety-critical systems. |
SAP SE SAP SE Technical Talk |
Using LLVM in a scalable, high-available, in-memory database server - [pdf] [video]
In this presentation we would like to show you how we at SAP are using LLVM within our HANA database. We will show the benefits we have from using LLVM as well as the specific challenges of working in an in-memory database server. Thereby we will explain the changes we have to do in the LLVM source and why we have a significant delay until we can move to the latest LLVM version. |
Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Google Inc. Technical Talk |
XLA: Accelerated Linear Algebra [video] We'll introduce XLA, a domain-specific optimizing compiler and runtime for linear algebra. XLA compiles a graph of linear algebra operations to LLVM IR and then uses LLVM to compile IR to CPU or GPU executables. We integrated XLA to TensorFlow, and XLA sped up a variety of internal and open-source TensorFlow benchmarks by up to 4.7x with a geometric mean of 1.4x. |
Student Research Competition (SRC) | |
CEA CEA CEA LIP6 - Université Paris VI Student Research Competition (SRC) |
Automated Combination of Tolerance and Control Flow Integrity Countermeasures against Multiple Fault Attacks - [pdf] [video]
Fault injection attacks are considered as one of the most fearsome threats against secure embedded systems. Existing software countermeasures are either applied at the source code level where cautions must be taking to prevent the compiler from altering the countermeasure during compilation, or at the assembly code level where the code lacks semantic information, which as a result, limits the possibilities of code transformation and leads to significant overheads. Moreover, to protect against various fault models, countermeasures are usually applied incrementally without taking into account the impact one can have on another. |
University of Muenster University of Edinburgh University of Muenster University of Muenster Student Research Competition (SRC) |
Bringing Next Generation C++ to GPUs: The LLVM-based PACXX Approach - [pdf] [video]
In this paper, we describe PACXX -- our approach for programming Graphics Processing Unit (GPU) in C++. PACXX is based on Clang and LLVM and allows to compile arbitrary C++ code for GPU execution. PACXX enables developers to use all the convenient features of modern C++14: type deduction, lambda expressions, and algorithms from the Standard Template Library (STL). Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs as in CUDA or OpenCL. Using LLVM's just-in-time compilation capabilities, PACXX generates efficient GPU code at runtime. |
Università della Svizzera Italiana Università della Svizzera Italiana Università della Svizzera Italiana Student Research Competition (SRC) |
Data Reuse Analysis for Automated Synthesis of Custom Instructions in Sliding Window Applications - [pdf] [video]
The efficiency of accelerators supporting complex instructions is often limited by their input/output bandwidth requirements. To overcome this bottleneck, we herein introduce a novel methodology that, following a static code analysis approach, harnesses data reuse in-between multiple iteration of loop bodies to reduce the amount of data transfers. Our methodology, building upon the features offered by the LLVM-Polly framework, enables the automated design of fully synthesisable and highly-efficient accelerators. Our approach is targeted towards sliding window kernels, which are employed in many applications in the signal and image processing domain. |
University Of Cambridge University Of Cambridge University Of Cambridge University Of Cambridge Student Research Competition (SRC) |
ELF GOT Problems? CFI Can Help. Control-Flow Integrity (CFI) techniques make the deployment of malicious exploits harder by constraining the control flow of programs to that of a statically analyzed control-flow graph (CFG). This is made harder when position-independent dynamically shared objects are compiled separately, and then linked together only at runtime by a dynamic linker. Deploying CFI only on statically linked objects ensures that control flow enters only the correct procedure linkage table (PLT) entry, not where that trampoline jumps to; it leaves a weak link at the boundaries of shared objects that attackers can use to gain control. We show that manipulation of the PLT GOT has a long history of exploitation, and is still being used today against real binaries - even with state of the art CFI enforcement. PLT-CFI is a CFI implementation for the ELF dynamic-linkage model, designed to work along-side existing CFI implementations that ensure correct control flow within a single dynamic shared object (DSO). We make modifications to the LLVM stack to insert dynamic checks into the PLT that ensure correct control flow even in the presence of an unknown base address of a dynamic library, while maintaining the ability to link in a lazy fashion and allowing new implementations (e.g., plug-ins) to be loaded at runtime. We make only minor ABI changes, and still offer full backwards compatibility with binaries compiled without our scheme. Furthermore, we deployed our CFI scheme for both AMD64 and AArch64 on the FreeBSD operating system and measured performance. |
Stanford University Stanford University Student Research Competition (SRC) |
LifeJacket: Verifying Precise Floating-Point Optimizations in LLVM - [pdf] [video] Users depend on correct compiler optimizations but floating-point arithmetic is difficult to optimize transparently. Manually reasoning about all of floating-point arithmetic’s esoteric properties is error-prone and increases the cost of adding new optimizations. We present an approach to automate reasoning about precise floating-point optimizations using satisfiability modulo theories (SMT) solvers. We implement the approach in LifeJacket, a system for automatically verifying precise floating-point optimizations for the LLVM assembly language. We have used LifeJacket to verify 43 LLVM optimizations and to discover eight incorrect ones, including three previously unreported problems. LifeJacket is an open source extension of the Alive system for optimization verification. |
University of Cambridge University of Cambridge Student Research Competition (SRC) |
Software Prefetching for Indirect Memory Accesses - [pdf] [video]
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. |
Lightning Talks | |
Università della Svizzera Italiana Università della Svizzera Italiana Lightning Talk |
ClrFreqPrinter: A Tool for Frequency Annotated Control Flow Graphs Generation - [pdf] [web] [video] Recent LLVM distributions have been offering the option to print the Control Flow Graph (CFG) of functions in the Intermediate Representation (IR) level. This feature is fairly useful as it enables the visualization of the CFG of a function, thus providing a better overview of the control flow among the Basic Blocks (BBs). In many occasions, though, more information than that is needed in order to obtain quickly an adequate high level view of the execution of a function. One such desired attribute, that could lead to a better understanding, is the execution frequency of each Basic Block. We have developed our own LLVM analysis pass which makes use of the BB Frequency Info Analysis pass methods, as well as the profiling information gathered by the use of the llvm-profdata tool. Our analysis pass gathers the execution frequency of each BB in every function of an application. Subsequently, the other part of our toolchain, exploiting the default LLVM CFG printer, makes use of this data and assigns a specific colour to each BB in a CFG of a function. The colour scheme followed was inspired by a typical weather map, as it can bee seen in Figure 1. An example of the generated colour annotated CFG of a jpeg function can be seen in Figure 2. Our tool, ClrFreqPrinter, can be applied in any benchmark and can be used to provide instant intuition regarding the execution frequency of BBs inside a function. A feature that can be useful for any developer or researcher working with the LLVM framework. |
Sony Interactive Entertainment (SIE) Lightning Talk |
DIVA (Debug Information Visual Analyzer) - [pdf] [video] In this lightning talk, Phillip will present DIVA (Debug Information Visual Analyzer). DIVA is a new command line tool that processes DWARF debug information contained within ELF files and prints the semantics of that debug information. The DIVA output is designed with an aim to be understandable by software programmers without any low-level compiler or DWARF knowledge; as such, it can be used to report debug information bugs to the compiler provider. DIVA's output can also be used as the input to DWARF tests, to compare the debug information generated from multiple compilers, from different versions of the same compiler, from different compiler switches and from the use of different DWARF specifications (i.e. DWARF 3, 4 and 5). DIVA will be open sourced in 2017 to be used in the LLVM project to test and validate the output of clang to help improve the quality of the debug experience. |
Sony Interactive Entertainment (SIE) Lightning Talk |
Generalized API checkers for the Clang Static Analyzer - [pdf] [video] I present three modified API checkers, that use external metadata, to warn on improper function calls. We aim to upstream these checkers to replace existing hard-coded data and duplicated code. The goal is to allow anyone to check any API, using the Static Analyzer as a black box. |
Red Hat Lightning Talk |
LibreOffice loves LLVM - [pdf] [video]
LibreOffice (with its StarOffice/OpenOffice.org ancestry) is one of the behemoths in the open source C++ project zoo. On the one hand, we are always looking for tools that help us in keeping its code in shape and maintainable. On the other hand, the sheer size of the code base and its diversity are a welcome test bed for any tool to run against. Whatever clever static analysis feat you come up with, you'll be sure to find at least one hit in the LibreOffice code base. |
Heidelberg Institute for Theoretical Studies KTH Royal Institue of Technology Heidelberg Institute for Theoretical Studies (HITS) Lightning Talk |
LLVM AMDGPU for High Performance Computing: are we competitive yet? [pdf] [web] [video]
Advances in AMDGPU LLVM backend and radeonsi Gallium compute stack for Radeon Graphics Core Next (GCN) GPUs have closed the feature gap between the open source and proprietary drivers. During 2016, we have collaborated with AMDGPU developers to make GROMACS, a popular open source OpenCL-accelerated scientific software package for simulating molecular dynamics, run on Radeon GPUs using Mesa graphics library, libclc, Clang OpenCL compiler, and AMDGPU LLVM backend. This is the first fully open source OpenCL stack that has ever ran GROMACS and possibly any similarly popular scientific software. |
Poznan University of Technology Lightning Talk |
Simple C++ reflection with a Clang plugin [video]
Static and dynamic reflection is a mechanism that can be used for various purposes: serialization of arbitrary data structures, scripting, remote procedure calls, etc. Currently, the C++ programming language lacks a standard solution for it, but it is not that difficult to implement a simple reflection framework as a library with a custom Clang plugin. |
BoFs | |
AbsInt Angewandte Informatik GmbH BoF |
Etherpad |
Ericsson BoF |
Clangd: A new Language Server Protocol implementation leveraging Clang - [pdf]
Etherpad |
Linaro ARM Linaro BoF |
Etherpad |
BoF |
Etherpad |
Posters | |
Politehnica University of Bucharest
Poster |
A Source-to-Source Vectorizer for the Connex SIMD Accelerator
We present the implementation of a CPU portable automatic vectorization technique using the LLVM compiler, for the Connex SIMD processor. We achieve host-independent vectorization by using Opincaa, a runtime C++ assembler library for Connex. Source-to-source transformation is achieved by recovering from LLVM IR back to C++ and replacing in the source program the vectorized loops with the compiled Opincaa Connex kernel code. Opincaa allows also assembling at runtime immediate operands from symbolic expressions, allowing to run more expressive programs on the accelerator. |
German Research Center for Artificial Intelligence (DFKI) Intel Visual Computing Institute, Saarland University Bonn-Rhein-Sieg University of Applied Sciences German Research Center for Artificial Intelligence (DFKI) Compiler Design Lab, Saarland University Compiler Design Lab, Saarland University Compiler Design Lab, Saarland University Poster |
AnyDSL: A Compiler-Framework for Domain-Specific Libraries (DSLs) - [pdf]
AnyDSL is a framework for the rapid development of domain-specific libraries (DSLs). AnyDSL's main ingredient is AnyDSL's intermediate representation Thorin. In contrast to other intermediate representations, Thorin features certain abstractions which allow to maintain domain-specific types and control-flow. On these grounds, a DSL compiler gains two major advantages: |
ARM Ltd Poster |
Binary Instrumentation of ELF Objects on ARM
Often application source code is not available to compiler engineers, which can make program analysis more difficult. Binary instrumentation is a process of binary modification, where code is inserted into an already existing binary, which can help understand how the program performs. We have created an LLVM-based binary instrumenter, building upon llvm-objdump, to enable us to gather static and runtime information of ELF binaries. |
Ericsson Ericsson Eötvös Loránd University, Faculty of Informatics, Dept of Programming Languages and Compilers Eötvös Loránd University, Faculty of Informatics, Dept of Programming Languages and Compilers Poster |
CodeCompass: An Open Software Comprehension Framework
Bugfixing or new feature development requires a confident understanding of all details and consequences of the planned changes. For long existing large telecom systems, where the code base have been developed and maintained for decades by fluctuating teams, original intentions are lost, the documentation is untrustworthy or missing, the only reliable information is the code itself. Code comprehension of such large software systems is an essential, but usually very challenging task. As the method of comprehension is fundamentally different from writing new code, development tools are not performing well. During the years, different programs have been developed with various complexity and feature set for code comprehension but none of them fulfilled all requirements. |
National Tsing-Hua University National Tsing-Hua University Poster |
Hydra LLVM: Instruction Selection with Threads - [pdf]
By the rise of program complexity and some specific usages like JIT(Just-In-Time) compilation, compilation speed becomes more and more important in recent years. |
Center for Scientific Computing Deutsches Klimarechenzentrum Universität Hamburg Poster |
Intelligent selection of compiler options to optimize compile time and performance - [pdf]
The efficiency of the optimization process during the compilation is crucial for the later execution behavior of the code. The achieved performance depends on the hardware architecture and the compiler's capabilities to extract this performance. |
Inria Inria Lirmm Poster |
LLVM-based silent stores optimization to reduce energy consumption on STT-RAM cache memory For the last few decades, energy consumption has become a significant metric to take into account by designers while developing high-performance systems and embedded systems. In on-chip architectures, the memory system including processor caches, is an important contributor in energy consumption due to traditional memory technologies. New non volatile memories are emerging with notable features and appear as an interesting memory technology for onchip cache memory. However, they suffer from high write latency and energy consumption. This makes them less favorable for first level caches such as L1 cache, compared to usual SRAM memory. In this paper, we propose a compiler approach to attenuate the cost of write operations in an architecture that integrates magnetic memory such as the Spin Transfer Torque Random Access Memory (STT-RAM) technology for L1 cache. We present an LLVM optimization to reduce the number of silent stores in memory, therefore mitigating the number of write transactions on STT-RAM memory. The results show the promising impact of our optimization on the total energy consumption of a cache. |
KTH Royal Institute of Technology Swedish Institute of Computer Science Swedish Institute of Computer Science KTH Royal Institute of Technology Poster |
Modeling Universal Instruction Selection Instruction selection implements a program under compilation by selecting processor instructions and has tremendous impact on the performance of the code generated by a compiler. We have introduced a graph-based universal representation that unifies data and control flow for both programs and processor instructions. The representation is the essential prerequisite for a constraint model for instruction selection introduced in this paper. The model is demonstrated to be expressive in that it supports many processor features that are out of reach of state-of-the-art approaches, such as advanced branching instructions, multiple register banks, and SIMD instructions. The resulting model can be solved for small to medium size input programs and sophisticated processor instructions and is competitive with LLVM in code quality. Model and representation are significant due to their expressiveness and their potential to be combined with models for other code generation tasks. |
Argonne National Laboratory Poster |
Preparing LLVM for the Future of Supercomputing LLVM is solidifying its foothold in high-performance computing, and as we look forward toward the exascale computing era, LLVM promises to be a cornerstone of our programming environments. In this talk, I'll discuss several of the ways in which we're working to improve LLVM in support of this vision. Ongoing work includes better handling of restrict-qualified pointers [2], optimization of OpenMP constructs [3], and extending LLVM's IR to support an explicit representation of parallelism [4]. We're exploring several ways in which LLVM can be better integrated with autotuning technologies, how we can improve optimization reporting and profiling, and a myriad of other ways we can help move LLVM forward. Much of this effort is now a part of the US Department of Energy's Exascale Computing Project [1]. This talk will start by presenting the big picture, in part discussing goals of performance portability and how those maps into technical requirements, and then discuss details of current and planned development. |