Search this Site
Maintained by the
2022 LLVM Developers' Meeting
Visit the official event site for Registration: https://llvm.swoogo.com/2022devmtg/2359289
The LLVM Developers' Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.
Where do you see the LLVM project 10 years from now? Intermediate representation (IR) plays a central role in this question. LLVM IR can be represented on top of MLIR's data structures, but in practice it uses its own data structures. That creates a barrier in compilation pipelines and has other downsides. Is there hope for unification on a single set of data structures? How can we move towards such a goal? Let me show you a framework for thinking about these questions and some concrete ideas for how we can move in the right direction.
Unlike its peer languages, Swift has made the deliberate decision to embrace a stable Application Binary Interface (ABI) along with native code compilation, such that separately-compiled software modules can evolve independently without breaking binary compatibility. Come learn about the impact that a stable ABI has on the design of a programming language and its implementation in LLVM.
The HLSL programming language has a rich library of built in types that model semantics which can't be written in HLSL. Clang's implementation of HLSL leverages existing extensions and abstractions with a few tweaks here and there to implement the unimplementable datatypes in valid Clang ASTs.
An alternative debug information representation for LLVM is proposed, which removes classes of redundant representations of semantically equivalent expressions and makes expression evaluation context-free. These changes open the possibility of general support for heterogeneous architectures, as well as more aggressive optimizations.
An introduction to the Reference Types and Garbage Collection proposal along with what already has been upstreamed and how we propose to integrate the trickier bits into Clang/LLVM.
Modified Condition/Decision Coverage (MC/DC) is a comprehensive code coverage criterion that is extremely useful in weeding out hidden bugs and guaranteeing robustness. MC/DC is very handy for average developers as well as those in the safety-critical embedded Industrial, Automotive, and Aviation markets where it is required. In this talk, I will show how we extended LLVM’s Source-based Code Coverage infrastructure to support MC/DC by tracking test vectors, which represent the sequential true/false evaluation of conditions through a boolean expression.
Many of us have broken a Buildbot at least once, but do you know what goes into running them? Why are there so many configurations and who are the people behind it all? Attend this talk to see behind the scenes of one of the largest providers of LLVM Buildbots.
What if we could know the complete and reproducible artifact tree for every binary executable, shared object, container, etc – including all its dependencies – and we could efficiently cross-reference that against a database of known vulnerabilities before deployment? If we had had that information, could we have remediated vulnerabilities such as Log4Shell faster? Might it even help open-source maintainers identify at-risk dependencies sooner? GitBOM is an open-source initiative to construct a verifiable Artifact Dependency Graph (ADG) and enable automatic, verifiable artifact resolution. In this talk, we will explain about GitBOM and demonstrate a use case on CVE detection using llvm-gitbom. Given a version of OpenSSL, we will show how we detect if this version has any vulnerabilities that are not fixed and what if any, have been fixed in that version.
We propose and build a framework that executes CUDA programs on non-NVIDIA devices without relying on any other programming languages. In particularly, compared with existing CUDA on CPU frameworks, our framework achieves the highest coverage and performance on X86, AArch64, and RISC-V.
We present a definition of thread convergence that is reasonable for targets that execute threads in groups (e.g., GPUs). This is accompanied by a definition of uniformity (i.e., when do different threads compute the same value), and a *uniformity analysis* that extends the existing divergence analysis to cover irreducible control-flow.
In this presentation, we introduce a Content-Addressable Storage (CAS) library for LLVM and use it to create a compilation caching system for Clang. We isolate the functional computations from filesystem and execution environment and model input discovery explicitly, caching computations based on explicit inputs from the CAS. We increase cache hits between related compiler invocations by caching fine-grained actions/requests that prune and canonicalize their inputs. We also explore modeling object file contents, such as debug information, as a CAS graph, in order to deduplicate and reduce the redundancy in the output format, thus reducing the storage cost for the cached compilation artifacts.
In this talk, we will present a direct GPU compilation scheme that leverages the portable target offloading interface provided by LLVM/OpenMP Utilizing this infrastructure allows us to compile an existing host application for the GPU and execute it there with only a minimal wrapper layer for the user code, command line arguments, and a compiler provided GPU implementation of C/C++ standard library functions. The C/C++ library functions are partially implemented for direct device execution and otherwise fallback to remote procedure call (RPC) to call host functions transparently. Our proposed prototype will allow users to quickly compile for, and test on, the GPU without explicitly handling kernel launches, data mapping, or host-device synchronization. We will demonstrate our implementation using three proxy applications with host OpenMP parallelism and three microbenchmarks to test the correctness of our prototype GPU compilation.
Modern mobile applications have grown rapidly in binary size, which restricts user growth and updates for existing users. Thus, reducing the binary size is important for application developers.
In this paper, we propose several novel optimization techniques that do not require significant customization to the build pipeline and reduce binary size with low build time overhead. As opposed to re-invoking the compiler during link time, we perform true linker optimization directly as optimization passes within the linker. The proposed optimizations are generic and could be incorporated into popular linkers as optimization passes.
Minotaur is a synthesis-based superoptimizer for the LLVM intermediate representation, that focuses on optimizing LLVM’s portable vector operations as well as intrinsics specific to the Intel AVX extensions. The goal is to automatically discover transformation rules that are missing from LLVM, which are challenging due to the large number of intrinsics, their semantic complexity, and their counterintuitive costs. Minotaur has found many new transformations for vectors instruction. We have evaluated Minotaur on various micro-benchmarks and real-world applications such as GNU MP library. The micro-benchmarks optimized by Minotaur show speedups up to 1.4x, and the real-world applications show speedups up to 1.06x.
Compilers often need to make estimates of hardware characteristics during early optimization passes, which are available only later such as execution unit utilization, number of register spills, latency, throughput etc. Often a hand-written static/analytical hardware cost model is built into the compiler, for example, LLVM's TTI. However, the need for more sophisticated and varied predictions has become more pronounced with the development of deep learning compilers which need to optimize dataflow graphs. Such compilers usually employ a much higher level MLIR form as an IR representation. A static/analytical cost model is cumbersome and error prone for such systems. We develop a machine learning-based cost model for high-level MLIR which can predict different target variables of interest such as CPU/GPU/xPU utilization, instructions executed, register usage etc. by considering the incoming MLIR as a text input a la NLP models. The learnt model can be deployed to estimate the target variable of interest for a new computation graph. We report early work in progress results of developing such a model and show that these models can provide reasonably good estimates with low error bounds.
Program analysis has specific requirements for compiler toolchains that are usually unsatisfied. Ideally, an analysis tool would pick the best-fit representation that preserves interesting semantic features. Such a representation would know the precise relationships between low-level constructs in IR and the analyzed source code. LLVM IR is rarely the best fit representation for program analysis. In this talk, we will look at how we can improve the situation using an MLIR infrastructure called VAST. VAST is an MLIR library for multi-level C/C++ representation. With VAST, an analysis does not need to commit to a single best fit. Instead, an analysis can have simultaneous visibility into multiple progressions of the code, from very high-level down to very low-level.
In this talk, we discuss the implementation, upstreaming, and community concerns of adoption LLVM and MLIR within the Lean4 proof assistant, and more broadly, discuss takeaways for MLIR to have strong support for functional programming languages. We walk through the process of creating a new MLIR-based backend for Lean4, a dependently typed programming language. We demonstrate our MLIR dialect (https://arxiv.org/abs/2201.07272) which encode core functional programming concepts within the SSA style. However, having a fully functional backend is not enough; We discuss the worries around MLIR adoption in the Lean4 community, and the discussions that led to Lean4 choosing to adopt LLVM for the time being. We discuss our current LLVM backend effort for Lean4 (https://github.com/leanprover/lean4/pull/1497), and end with a discussion of how the MLIR community could help with the adoption of MLIR for functional programming languages.
SPIR-V is a binary intermediate language commonly used for GPU computations and targeted by many projects (including OpenCL, OpenMP and SYCL). In this talk, we will discuss what it took to upstream SPIR-V GlobalISel-based backend, present some of the issues stemming from the high-level design of the language, and explain the steps required to maintain the target in-tree. We will also talk briefly about the extensibility, support for other APIs/SPIR-V flavors (e.g. Vulkan), and the ongoing effort to unify methods of lowering builtins across GPU targets.
We present IRDL, a dialect for representing IR definitions. IRDL lets users define dialects in a declarative style, allowing for a dynamic registration of dialects using dynamic dialects, which were recently introduced in MLIR. Additionally, we will present two lower-lever dialects, IRDL-SSA and IRDL-Eval, and their respective lowerings, which enable interesting optimizations on the operation verifiers, which ODS does not currently handle. We hope that with IRDL, we will simplify the generation of dialects through metaprogramming, or external languages, like Python.
We developed an automated bug-finding tool for LLVM’s AArch64 backend. Our prototype, ARM-TV, builds on Alive2, a bounded translation validator for LLVM’s optimization passes. Using ARM-TV, we have discovered and reported 17 new miscompilation bugs in the SelectionDAG and GlobalISel backends, most of which have been fixed. In this talk, we will describe the current state of our prototype and our plans for enhancing the tool.
Is your compiler stack built on LLVM and you're eyeing some of the goodness provided by MLIR, but can't justify rewriting your stack? Then we may have just the project for you! llvm-dialects is an add-on to LLVM that allows you to define dialects and gradually transition to their use within a compiler stack built on LLVM IR.
YARPGen is a generative (as opposed to mutation-based) compiler fuzzer that we developed. It previously focused on testing scalar optimizations, but after a recent substantial upgrade, it now supports a collection of strategies for stress-testing loop optimizations. To achieve this, we ensure that its tests contain optimization prerequisites and interesting data access patterns (e.g., stencils), which are necessary to trigger loop optimizations (e.g., GVN). YARPGen's internal intermediate representation allows us to lower generated tests to C, C++, DPC++, and ISPC. Along with an automated testing system, this new version of YARPGen has discovered more than 120 bugs in compilers such as Clang, GCC, the ISPC compiler, and the DPC++ compiler, in addition to finding a comparable number of bugs in proprietary compilers.
The 64-bit RISC-V target is the only one in-tree that does not have 32-bit sub-registers or i32 as a legal type. Many instructions have forms that sign extend their result by copying bit 31 into bits 63:32. Only loads are able to implicitly zero bits 63:32. Some instructions such as comparisons only operate on all 64 bits and require smaller values to be extended. The ABI also requires 32-bit arguments and return values to be sign extended to 64 bits. Making good use of the implicit sign extending instructions is important to generate optimal assembly from C code where 32-bit integers are prevalent.
This talk will discuss how this differs from other 64-bit targets, how single basic block SelectionDAG makes this difficult, and how optimizations that are good for other 64-bit targets may be harmful for RISC-V. It will cover the optimizations and custom passes that have been added to improve the generated code and ideas for future enhancements.
In this talk, we will start by showing the multi-CPU architectural IoT malware. And why it is challenging to analyze such IoT malware from the perspective of static and dynamic analysis. Then we will talk about how it was possible to do cross-architectural malware analysis through the LLVM interpreter by lifting it to LLVM IR. Next, we will explain a problem that could be a significant hurdle in being a practical analysis tool: slow execution, and how we resolved this problem by inventing execution domain transition. Finally, we will end our talk with a demo of our work.
LLVM's libc is a sanitizer friendly green field libc which will eventually serve as a full drop-in-replacement for the system libc. While it is not yet ready to be a drop-in-replacement, it has enough functionality that one can start using it in their projects and avail themselves of its benefits in production contexts. In this tutorial, we will talk about how we have used modern C++ to implement a sanitizer instrumentable libc which can be easily decomposed and custom tuned. We will also talk about how it is being used in production contexts at Google. There has been a lot of interest in the LLVM community in putting together an LLVM only toolchain. We will demonstrate how one can build and package the libc in order to put together such a toolchain and use it in their projects.
JITLink is a new JIT linker in LLVM developed to eliminate limitations in LLVM's JIT implementation. With JITLink, it is not required to use special compilation flags or workarounds to load code into the JIT, since most of the object file features including small code model and thread local storage are fully implemented. This tutorial will explain how to use JITLink by working on a windows JIT application that just-in-time links to third-party static libraries. The tutorial will also dig into internals of JITLink by working on a JITLink plugin managing SEH exception tables.
Machine Learning Guided Optimizations (MLGO) in LLVM
The panel brings together: compiler engineers working on ML-guided optimizations in LLVM, product engineers applying such optimizations in production, and researchers exploring the area.
Panel discussion on Best practices with toolchain release and maintenance
With the proliferation of vendors shipping custom llvm toolchain, it would be great to bring in toolchain distributors and share each other's experience. We’ll focus the discussion on:
Static Analysis in Clang
The Clang ecosystem has multiple static analysis tools. The compiler can produce easy to understand error and warning messages. The Clang Static Analyzer (CSA) is capable of finding bugs that span across multiple function calls using symbolic execution. Clang Tidy can help modernize large code bases using automatic code rewrites. While there are some out of tree Clang-based static analysis tools, CSA and Clang Tidy were the go-to solutions for the static analysis needs of the community. However, during the last year, a couple of RFCs surfaced on the mailing list to add a dataflow analysis framework to Clang and introduce a MLIR based new IR. Come and join this panel discussion to learn how to get involved in the ongoing static analysis projects, what the new proposals mean for our loved and proven tools, and what does the future holds for static analysis in Clang. You will have the opportunity to ask questions from some of the code owners of these tools, and authors of the new proposals.
High-level IRs for a C/C++ Optimizing Compiler
Most C/C++ optimizing compilers employ multiple intermediate representations (IRs). LLVM IR has been the cornerstone of C/C++ LLVM-based compilers for many years. However, optimizations involving loop nests, data layout, or multidimensional arrays, for example, challenge the existing LLVM infrastructure.
The panelists will discuss higher-level (HL) IRs for optimizing compilers, primarily from C/C++ and optimization/analysis perspective. We will ask our expert panel to share their experience and insights on:
What optimizations are easier to implement and maintain with HL IR?
Both experts and newcomers are welcome to attend. Send questions to the organizers prior to the conference to allow consideration.
Interested in expanding the LLVM community through education? Interested in better documentation, tutorials, and examples? Interested in sharing your knowledge to help other engineers grow? Come learn about the proposal for a new LLVM Education working group!
BOLT is a post-link optimizer, built on top of the LLVM. It achieves performance improvement by optimizing application's code layout based on execution profile gathered by a sampling profiler, such as Linux perf tool. In case when necessary advanced hardware counters for precise profiling are not available on some target platforms, one may collect profile by instrumenting binary. In this talk, we will cover changes essential for enabling instrumentation support in BOLT for a new target platform using AArch64 as an example.
The string to float conversion functions are deceptively simple. You pass them a string of digits, and they return the floating point value closest to that string. The process of finding that value as quickly as possible is very complex, and in this talk I will describe how the implementation in LLVM’s libc works. The focus will be mainly on the three conversion algorithms used, specifically W.D Clinger’s Fast Path, the Eisel-Lemire fast_float algorithm, and Nigel Tao’s Simple Decimal Conversion. I will explain the overview of how they work and how they fit together to create a complete strto
Bugpoint has long existed to assist in reducing LLVM IR testcases, but lacked an equivalent tool for reducing code generation passes. Recently llvm-reduce gained support for reducing MIR. This talk will discuss the current status and future improvements, difficulties MIR presents compared to the higher level IR, and my experience using it to reduce register allocation failures in large test cases.
While we'd all prefer if programs never crashed, the logs captured from those crashes can help troubleshoot bugs and get your program up and running again. At Apple, diagnostic data gets captured into a crash report: a detailed textual representation of the program's state when it crashed. Thanks to the addition of interactive crashlogs, developers can now load crash reports into LLDB and interact with them like a regular lldb session, using all the techniques they're already familiar with to debug the issue.
This talk introduces clang-extract-api, a new tool to collect and serialize API information from header files, that enables downstream tooling, like documentation generation, to inspect API symbols without having to understand the clang AST.
LLVM libc's math routines aim to be both performant and correctly rounded according to the IEEE 754 standard. Modern CPU instruction sets include many useful instructions for mathematical computations. Effectively utilize these instructions could boost the performance of your math functions' implementations significantly. In this talk, we will discuss about how 2 families of such instructions, fused-multiply-add (FMA) and floating point rounding, are used in LLVM's libc for x86-64 and ARMv8 architectures allowing us to have comparable performance to glibc while achieving accuracy for all rounding modes.
Golang is a very specific language, which compiles to an architecture-specific binary, but also uses its own runtime library, which in turn uses a version-specific data structures to support internal mechanisms like garbage collection, scheduling, reflection and others. BOLT is a post-link optimizer – it rearranges code and data locations in the output binary, so Golang-specific tables should also be updated according to performed modifications. In this talk, we will cover the status of current implementation of Golang support in BOLT, achieved optimization effect and challenges of enabling Golang binaries optimization by BOLT.
Inlining for size is critical in mobile apps as app size continues to grow. While a link-time optimization (LTO) largely minimizes the app size at minimum size optimization (-Oz), a scalable link-time optimization (ThinLTO) misses many inline opportunities because each module inliner works independently without modeling the size cost globally. We first show how to use the ModuleInliner with LTO. Then, we describe how to improve inlining with ThinLTO by extending the bitcode summary, followed by a global inline analysis. We also explain how to overcome import restrictions, often appearing in Objective-C or Swift, by pre-merging bitcode modules. We reduced the code size by 2.8% for SocialApp, 4.0% for ChatApp, and 3.0% for Clang, compared to -Oz with ThinLTO.
This short talk provides an example how newly introduced feature into real HW can be adopted into Clang and LLVM and thanks to it easily available for the user. Indirect Memory Access Instructions (IMAI) can provide significant performance improvement but its usability is limited with particular HW restrictions. This talk will present how we tried to reconcile HW limitations, complexity of IMAI and ease of use by handling dedicated pragma in Clang and applying Complex Patterns in DAG in LLVM Backend.
Embedded-application systems have limited memory, so user control over placement of functions and variables is important. The programmer uses a linker script to define a memory configuration and specify placement constraints on input sections that contain function and variable definitions. With LTO enabled, it is critical that the compiler incorporate link-time placement information into the LTO recompile (Edler von Koch - LLVM 2017). This talk discusses a compiler and linker implementation that roughly follows the ideas presented in Edler von Koch, highlighting differences in our implementation that offer significant advantages.
LLVM's __builtin_expect, and a variant we recently added, __builtin_expect_with_probability, allow source code control over branch weights and can boost performance with or without PGO via hot/cold splitting. But in LLVM optimization, it's not always intuitive how to update branch weight metadata with control flow changes. We talk about recent issues with losing branch weights in SimplifyCFG and possible improvements to the infrastructure for maintaining branch weights.
In this talk we show that performance portability and interoperability are achievable goals even for existing (HPC) software. Through compiler and runtime augmentation, we can run off-the-shelf CUDA programs efficiently on AMD GPUs and further debug them on the host, all without modifications of the original source code. As a side-effect, a modern LLVM/Clang will provide a compilation environment in which CUDA and OpenMP offload are fully interoperable, allowing the use of both in the same project, even the same kernel, without intrinsic overheads.
In this short talk we will ramble about some of the discrepancies between GPU and CPU targets as well as the accompanying infrastructure. While we briefly mention ongoing efforts to rectify some of the problems, we'll mainly focus on the areas where solutions are sparse and efforts are required.
Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. However, the complexity of developing an efficient FHE application currently limits deploying FHE in practice. In this talk, we will first present the underlying challenges of FHE development that motivate the development of tools and compilers. We then discuss how MLIR has been used by three different efforts, including one led by us, to significantly advance the state of the art in FHE tooling. While MLIR has brought great benefits to the FHE community, we also want to highlight some of the challenges experienced when introducing the framework to a new domain. Finally, we conclude by discussing how the ongoing efforts could be combined and unified before potentially being up-streamed.
As part of registering for the 2021 LLVM dev meeting, participants were asked to answer a few questions about how the LLVM community could increase engagement and contributions. Out of the 450 people replying, the top 3 issues mentioned were "sometimes people aren't receiving detailed enough feedback on their proposals"; "people are worried to come across as an idiot when asking a question on the mailing list/on record"; "People cannot find where to start; where to find documentation; etc." These were discussed in the community.o workshop at the 2021 LLVM dev meeting, and a summary of that discussion was presented by Adelina Chalmers as a keynote session, see 2021 LLVM Dev Mtg "Deconstructing the Myth: Only real coders contribute to LLVM!? - Takeaways" One of the solutions suggested to help address those top identified barriers from the majority of participants is introducing the concept of "office hours". We have taken steps since then to make "office hours" a reality. In this lightning talk, I will talk about what issues "office hours" is aiming to address; how both newbies and experienced contributors can get a lot of value out of them; and where we are in implementing this concept and how you can help for them to be as effective as possible.
Fuzzing has been a effective method to test software's. However, even with libFuzzer, LLVM backend is not sufficiently fuzzed nowadays. The difficulties are two fold. First, we lack a better way to monitor program behavior, edge coverage is not effective when backend heavily rely on target descriptor where data flow is more important than control flow. Second, mutation method is naive and ineffective. We design a new tool to better fuzz LLVM backend and we have found numerous missing features inside AMD. We also found many bugs in LLVM upstream, eight of which have been confirmed, 2 of which are fixed.
Interactive programming with Jupyter is a game changer for learning. The ability to have your code and documentation in one place, always up to date and extendable. See how this is being applied to a core part of LLVM, TableGen, and why we should embrace the concept.
In this talk we demonstrate a shared memory implementation and its performance improvements for most use cases of JITLink. We demonstrate the benefits of a separate executor process on top of the same underlying physical memory. We elaborate how this work will be useful to larger projects such as clang-repl and Cling.
Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. FFTc is composed of: A domain-specific abstraction level (FFT MLIR dialect), a domain-specific compilation pipeline, and a domain-specific runtime (working in progress). We present the initial design, implementation, and preliminary results of FFTc.
In this talk we outline the PTU-based error recovery capability implemented in Clang and available in Clang-Repl. We explain the challenges in error recovery of templated code. We demonstrate how to extend the error recovery facility to implement restoring the Clang infrastructure to a previous state. We demonstrate the `undo` command available in Clang-Repl and the changes required for its reliability.
We share our experiences with the first steps to implement GlobalISel for the PowerPC target.
We present a framework that allows non-standard floating point reductions in OpenMP, for example to ensure reproducibility, compute roundoff estimates, or exploit sparsity in array reductions.
In this presentation, we talk about the effort to implement type resugaring in Clang. This is an economical way to solve, for the majority of cases, diagnostic issues related to the canonicalization of template arguments during instantiation. The infamous 'std::basic_string' appearing on the diagnostics when the user wrote 'std::string' is the classic example."
Using LLVM APIs from a different language than C++ has often been necessary to develop compilers and program analysis tools. However, LLVM headers rely on many C++ features, and most languages do not provide interoperability with C++. As part of the ongoing Swift/C++ interoperability effort, we have been creating Swift bindings for LLVM APIs that feel convenient and natural in Swift, with the purpose of using the bindings to implement parts of the Swift compiler in Swift. In this talk, I will present our current status and what we were able to accomplish so far.
IRPGO has a mode to collect function entry coverage, which can be used for dead code detection. When combined with Lightweight Instrumentation, the binary size and performance overhead should be small enough to be used in a production setting. Unfortunately, when building an instrumented binary with -Oz, the “.text" size overhead is much larger than what we’d expect from the injected instrumentation instructions alone. In fact, even if we block instrumentation for all functions we still get a 15% “.text" size overhead from extra passes added by IRPGO. This talk explores the flags we can use to create a function entry coverage instrumented binary with a “.text" size overhead of 4% or smaller.
We extent Polygeist/MLIR to succinctly representation, optimize, and transpile CPU and GPU parallel programs. Through the use of our new operations (e.g. memory effects-based barrier) and transformations, we can successfully transpile GPU Rodinia and PyTorch benchmarks to efficiently run on the CPU _faster_ than their existing CPU parallel versions.
DWARF expressions describe how to recover the location or value of a variable which has been optimized away. They are expressed in terms of postfix operations that operate on a stack machine. A DWARF program is encoded as a stream of operations, each consisting of an opcode followed by a variable number of literal operands. Some DWARF programs are difficult to interpret and check for correctness in their assembly-language format. Currently, checking a DWARF expression requires the building of an executable with debuginfo and running the executable in a debugger, such as LLDB. We propose and have begun a fun project to construct a small suite of tools to aid in construction and checking of non-trivial DWARF programs.
The llvm-mca tool performs static performance analysis on basic blocks and llvm-mcad tool performs dynamic performance analysis on program traces. These tools allow us to gain insights on how sequences of instructions run on different subtargets.
In this talk, I will discuss the shortcomings of these tools when they are tasked to report on RISC-V programs containing vector instructions, how we have extended these tools to generate more accurate reports for RISC-V vector programs, and how these improved reports can be used to make meaningful improvements to scheduler models and assist performance analysis.
Advanced build configuration with BOLT for faster Clang
GraphCore is a mature and well documented architecture that features a MIMD execution model. Different to the other players in the market, GraphCore systems are currently available, its compiler infrastructure is based on LLVM, and it allows direct compilation to the device. Furthermore, the Poplar SDK is a C++ library that can be directly used with the current OpenMP Offloading Runtime (i.e. libomptarget). In this short presentation, we describe the strategy we are currently using to explore compilation of OpenMP Offloading support for the GraphCore architecture.
Student Technical Talks:
In this talk, we will discuss about Control-flow Melding (CFM) and its implementation in LLVM. CFM is a new compiler transformation that exploits both instruction and control-flow similarity to improve performance and reduce code size. CFM uses a hierarchical region and instruction alignment approach to merge common code fragments. CFM is implemented as an LLVM-IR transformation pass and our evaluation suggests its utility in multiple applications.
We developed a new fuzzer, Alive-mutate, that randomly alters an LLVM module and then invokes the Alive2 translation validation tool to see if the mutated module is optimized correctly. Alive-mutate achieves high throughput by avoiding the creation of invalid IR and also by running in the same address space as Alive2, keeping OS-related overhead out of our fuzzing loop. We support 9 different kinds of mutation and have used Alive-mutate to find 23 LLVM bugs including 10 miscompilation bugs in the AArch64 backend and 5 crashes in the instruction combiner.
This talk explores the application of Transformers to learning LLVM, which can open up new possibilities in optimization. Low-level programs like LLVM tend to be more verbose than high-level languages to precisely specify program behavior and provide more details about microarchitecture, all of which make it difficult for machine learning. We apply Transformer models to translate from C to both unoptimized (-O0) and optimized (-O1) LLVM IR and discuss various techniques that can boost model effectiveness. On the AnghaBench dataset, our model achieves a 49.57% verbatim match and BLEU score of 87.68 against Clang -O0 and 38.73% verbatim match and BLEU score of 77.03 against Clang -O1.
Automatic differentiation (AD) is a central algorithm in machine learning and optimization. This talk introduces LAGrad, a reverse-mode source-to-source AD system that differentiates tensor operations in the linalg, scf, and tensor dialects of MLIR. LAGrad leverages the value semantics of linalg-on-tensors in MLIR to simplify the analyses required to generate adjoint code that is efficient in terms of both run time and memory consumption. LAGrad also combines AD with MLIR’s type system to exploit structured sparsity patterns such as lower triangular tensors. We compare performance results to Enzyme, a state of the art AD system, on Microsoft’s ADBench suite. Our results show speedups of up to 2x relative to Enzyme and in some cases use 30x less memory.
Posters: (Coming Soon)