The LLVM Compiler Infrastructure
Site Map:
Download!
Search this Site


Useful Links
Release Emails
19.1.7: Jan 2025
19.1.6: Dec 2024
19.1.5: Dec 2024
19.1.4: Nov 2024
19.1.3: Oct 2024
19.1.2: Oct 2024
19.1.1: Oct 2024
19.1.0: Sep 2024
18.1.8: Jun 2024
18.1.7: Jun 2024
18.1.6: May 2024
18.1.5: May 2024
18.1.4: Apr 2024
18.1.3: Apr 2024
18.1.2: Mar 2024
18.1.1: Mar 2024
18.1.0: Mar 2024
17.0.6: Nov 2023
17.0.5: Nov 2023
17.0.4: Oct 2023
17.0.3: Oct 2023
17.0.2: Oct 2023
17.0.1: Sep 2023
All Announcements

Maintained by the
llvm-admin team
Ninth LLVM Performance Workshop at CGO
  • What: Ninth LLVM Performance Workshop at CGO
  • When: Saturday, March 1st, 2025
  • Where: Las Vegas, Nevada, United States
  • The Westin Las Vegas Hotel & Spa, 160 East Flamingo Road, Las Vegas, Nevada, USA, 89109 [In person]
  • Proposals should be submitted to: Easychair Submission
  • The deadline for receiving submissions is: February 1st, 2025 January 25th, 2025
  • Speakers will be notified of acceptance or rejection by: February 2nd, 2025 February 1st, 2025
  • Note: Travel grants are available to eligible candidates upon request. Please reach out to the program committee if you need a travel grant.
  • Note: Invitation letters for visa application are available upon request. Please reach out to the program committee if you need an invitation letter.

The Ninth LLVM Performance Workshop will be held at (CGO 2025). The workshop is co-located with CC, HPCA, and PPoPP. If you are interested in attending the workshop, please register at the (CGO website). The LLVM workshop at CGO will be in-person.

Program Committee:

  • Aditya (hiraditya at msn.com)
  • Jose M Monsalve Diaz (jmonsalvediaz at anl.gov)
  • Shilei Tian (i at tianshilei.me)
  • Rafael Andres Herrera Guaitero (rafaelhg at udel.edu)
  • Kevin Sala (ksala at llnl.gov)

Schedule

Time (EDT)

Speaker

Title

Topic

8:30 - 9:30 (60 min)

Jose M Monsalve Diaz
Shilei Tian
Aditya
Rafael A Herrera Guaitero
Kevin Sala

Opening Remarks + RISC-V Work Discussion

Welcome and Introduction

9:30 - 10:00 (30 min)

Corbin Robeck, Yuanwei Fang, Keren Zhou

The Proton Dialect: An MLIR Dialect For AI Compiler GPU Kernel Profiling

Performance Analysis, Optimization, MLIR, GPU Profiling

10:00 - 10:30 (30 min)

-

Coffee Break

-

10:30 - 11:00 (30 min)

Rafael Andres Herrera Guaitero,
Joseph B. Manzano Franco,
Joshua D. Suetterlein,
Xiaoming Li,
Andres Marquez

CARTS: Enabling Event-Driven Task and Data Block Compilation for Distributed HPC

HPC, Compiler, OpenMP, MLIR, ARTS

11:00 - 11:30 (30 min)

Baodi Shan, Barbara Chapman, Johannes Doerfert

Fuzzlang: Leveraging Transformers and LLM Agents for Enhanced Compilation Error Repair

LLVM, LLM Agents, Compilation Errors

11:30 - 12:00 (30 min)

Miguel Romero Rosas, Rudolf Eigenmann

Effective Tuning of Automatically Parallelized OpenMP Applications Using Two Classical Optimizing Compilers

Performance Evaluation, Classical Optimizing Compilers, Automatic Tuning

12:00 - 13:00 (60 min)

-

Lunch Break

-

13:00 - 13:30 (30 min)

Yongtai Li, Chunyu Liao, Ji Qiu

Comparative Analysis of Compiler Performance for RISC-V on SPEC CPU 2017

LLVM, GCC, Compiler Optimization, RISC-V, SPEC CPU 2017, Code Size, Dynamic Instruction Count, Auto-Vectorization

13:30 - 14:00 (30 min)

Kevin Sala, Johannes Doerfert

Instrumentor: Easily Customizable Code Instrumentation based on LLVM

Instrumentation, LLVM, Compiler

14:00 - 14:30 (30 min)

Ehsan Amiri, Rouzbeh Paktinatkeleshteri, Hao Jin, Eric Wang, Jose Nelson Amaral

Container Class Annotations in C++ Improve the Capability of Static Analysis in MLIR

Performance, Data Layout Optimization, Static Analysis, C++, MLIR, LLVM IR

14:30 - 15:00 (30 min)

Mahesh Ravishankar

IREE: Compiling ML Programs Using MLIR

ML Compilers, MLIR, LLVM

15:00 - 15:30 (30 min)

-

Coffee Break

-

15:30 - 16:00 (30 min)

Ivan R. Ivanov, William Moses, Emil Vatai, Toshio Endo, Jens Domke, Oleksandr Zinenko

Polyhedral Rescheduling of GPU Kernels To Exploit Async Memory Movement

GPU, Polyhedral, Scheduling, MLIR

16:00 - 16:30 (30 min)

Pawel Radtke, Johannes Doerfert

Optimizing Accelerator Memory Transfers within libomptarget

Target Offloading, Optimizing Memory Transfers, libomptarget

16:30 - 16:40 (10 min)

Jose M Monsalve Diaz
Shilei Tian
Aditya
Rafael A Herrera Guaitero
Kevin Sala

Closing Remarks

Getting Feedback

Abstracts

The Proton Dialect: An MLIR Dialect For AI Compiler GPU Kernel Profiling
▲ back to schedule

Corbin Robeck, Yuanwei Fang, Keren Zhou

Modern machine learning compilers make heavy use of MLIR to generate code optimally for complex AI operators (sophisticated matrix multiplications, flash attention, etc.). To target the latest generation of GPU accelerators with minimal user intervention and make full use of the available hardware and software features (matrix/tensor cores, warp specialization, wave priority and scheduling, loop pipelining), requires specialized compiler passes to make performance critical decisions to map the domain specific algorithms to the underlying hardware resources. This makes performance analysis and profiling tools that integrate directly into the domain-specific language's (DSL) operations critical to achieving performance comparable to handwritten kernels.
In this talk we present the Proton Dialect: A compiler-integrated, intra-kernel, MLIR operation set for performance analysis and optimization. The dialect approach integrates profiling and performance analysis features directly into both the upper-level ML compiler language (e.g. Python) operations and the various MLIR ops and lowerings allowing operation aware passes, like loop pipelining, to handle the dialect like any other registered dialect's operations (e.g. loads, stores, control flow, etc.).
The dialect has been upstreamed into the popular machine learning compiler Triton with implementation details described. A walkthrough is given of cross platform (Nvidia and AMD) examples of instruction scheduling optimizations made in production grade AI kernels using the dialect interleaved directly with standard (e.g. llvm, arith) and Triton DSL dialects (e.g. Triton IR, Triton AMDGPU IR, Triton Nvidia GPU IR).

CARTS: Enabling Event-Driven Task and Data Block Compilation for Distributed HPC
▲ back to schedule

Rafael Andres Herrera Guaitero, Joseph B. Manzano Franco, Joshua D. Suetterlein, Xiaoming Li, Andres Marquez

The increasing complexity and heterogeneity of high-performance computing (HPC) systems demand innovative compiler workflows. CARTS, a Compiler framework for an scalable asynchronous many task system called ARTS, addresses this need by integrating the extensibility of MLIR with the robustness of LLVM, enabling the development of task-centric compiler pipelines optimized for distributed-memory HPC environments. ARTS is a runtime developed at the Pacific Northwest National Laboratory, provides a scalable and efficient execution environment tailored for fine-grained, event-driven tasks across distributed systems. This paper introduces the ARTS dialect, designed to represent Event-Driven Tasks (EDTs) and several scalable communication/synchronization abstract primitives between them. By bridging high-level programming models like OpenMP with low-level LLVM IR, CARTS enhances both developer productivity and execution efficiency, making it a critical advancement in the field of HPC. Additionally, CARTS's ability to integrate seamlessly with existing MLIR and LLVM ecosystems demonstrates its potential for widespread adoption across HPC domains, addressing current and future challenges in system optimization.

Fuzzlang: Leveraging Transformers and LLM Agents for Enhanced Compilation Error Repair
▲ back to schedule

Baodi Shan, Barbara Chapman, Johannes Doerfert

Compilers play a pivotal role in software development, evolving alongside the increasing complexity and diversity of programming languages. However, systematic research on the generation, classification, and reproduction of compilation errors remains sparse. Compiler developers often modify error diagnostics on a best-effort basis to meet user needs and adapt to language evolution, leaving gaps in error handling.
To address these challenges, we introduce Fuzzlang, a novel framework for generating extensive datasets of compiler errors, both in isolation and within real-world code contexts. Fuzzlang comprises the Fuzzlang Transformer, which systematically generates diverse compilation errors from existing code, and Fuzzlang Agent, which employs large language models (LLMs) to analyze and isolate complex errors from internal compiler tests. Together, Fuzzlang generates five times more independent error types than the DeepFix database and achieves 83.1% coverage of error conditions triggered by LLVM/Clang's internal testing.
Our evaluation demonstrates that fine-tuning LLMs with the Fuzzlang dataset substantially enhances their code repair capabilities. The precision of Llama3-8B improved from 37.22% to 93.97%, and GPT-4o-mini rose from 72.29% to 96.70%. These results highlight Fuzzlang's potential as an effective tool for advancing intelligent code repair and compiler diagnostics research through comprehensive dataset generation.

Effective Tuning of Automatically Parallelized OpenMP Applications Using Two Classical Optimizing Compilers
▲ back to schedule

Miguel Romero Rosas, Rudolph Eigenmann

Automatic parallelization, still cannot achieve the required performance to be considered a true alternative to hand parallelization. However, when combined with effective tuning techniques, it provides a promising alternative to manual parallelization of sequential programs by leveraging the computational potential of modern multi-core architectures. While automatic parallelization focuses on identifying potential parallelism in code, tuning systems refine performance by optimizing efficient parallel code segments and serializing inefficient ones based on runtime metrics.
This study investigates the performance gap between automatically and manually parallelized OpenMP applications, addressing whether this gap can be closed through compile-time solutions or if it necessitates user-interactive or dynamic approaches. We propose a novel tuning system that employs an efficient algorithm, Combined Elimination (CE), to partition and optimize program sections individually. CE demonstrates a significant advancement over existing methods by achieving equivalent performance while reducing tuning time to 57% of the closest alternative.
Our experimental evaluation, conducted on a 16-core system, utilizes the NAS Parallel Benchmark Suite and the Polybench Suite with both the GCC and Clang compilers. Results reveal that the tuned applications consistently outperform their original serial versions and, in several cases, exceed the performance of manually parallelized implementations.
This work stands out as one of the few approaches delivering an auto-parallelization system with guaranteed performance improvements across diverse programs, effectively eliminating the need for extensive user experimentation to achieve optimal runtimes.

Comparative Analysis of Compiler Performance for RISC-V on SPEC CPU 2017
▲ back to schedule

Yongtai Li, Chunyu Liao, Ji Qiu

This study presents a comparative analysis of LLVM and GCC compiler performance on RISC-V using SPEC CPU2017, focusing on code size and dynamic instruction count. Results show LLVM generates smaller binaries for most C/C++ workloads but lags significantly in Fortran code. GCC demonstrates superior dynamic instruction efficiency in integer workloads, while LLVM excels in floating-point auto-vectorization using RISC-V’s V-extension. A case study on 548.exchange2_r reveals LLVM’s optimization gaps, mitigated via PASS tuning.

Instrumentor: Easily Customizable Code Instrumentation based on LLVM
▲ back to schedule

Kevin Sala, Johannes Doerfert

Code instrumentation is a widely used technique for tracking applications' behavior. Common uses of code instrumentation include debugging and error diagnosis, logging of certain events, monitoring resource usage, and analyzing performance for code optimization. Typically, instrumenting a program involves modifying its original code by inserting extra code to retrieve data regarding its runtime behavior. During the execution of the instrumented program, all this data is usually collected by a runtime component (e.g., a library) for online or offline processing.
However, although being a common technique, compilers lack a generic mechanism for instrumenting code. For instance, in the LLVM compiler infrastructure, numerous LLVM passes manually instrument LLVM IR code with different purposes. Each of these implements an ad hoc instrumentation, missing the opportunities to improve code maintainability, reduce code replication, or simplifying the effort of developing new instrumentation-based tools.
In this talk, we will introduce the Instrumentor, an LLVM pass that allows instrumenting code in a simple and customizable way. The Instrumentor accepts a JSON file with predefined options describing which IR instructions and patterns need recording and what details are required. The Instrumentor pass then inserts function calls that a runtime component can implement to collect the forwarded information. For instance, a tool may instrument loads, stores, function calls, memory allocations, and decide which set of information expects the runtime component. The Instrumentor aims to provide a unified and simple method for instrumenting code, reducing maintainability costs and code replication, as well as paving the path for future instrumentation-based tools. Furthermore, the Instrumentor implements several elective optimizations to reduce instrumentation's overhead and may be built as a plugin to be used in other LLVM-based compilers.
This technical presentation will cover the functionalities of the Instrumentor, which will be useful for compiler, runtime and tool developers. We will also show its versatility through several use cases, including a novel address sanitizer implemented using the new Instrumentor.

Container Class Annotations in C++ Improve the Capability of Static Analysis in MLIR
▲ back to schedule

Ehsan Amiri, Rouzbeh Paktinatkeleshteri, Hao Jin, Eric Wang, Jose Nelson Amaral

An important case that motivates using the higher-level MLIR in LLVM compilers for C++ is the recognition of standard-library containers and their functions (such as push_back(), etc.) by the compilers. Recognizing such functions enables optimizations that are difficult to implement in a lower-level representation [1,2]. This talk argues that this observation can be generalized. Instead of using standard-library containers, some programs implement their own container classes. However, currently, C++ does not have a mechanism for the programmer to declare that a class is a container or to provide high-level semantic information about member functions and member variables of the container class. We will present examples to argue that such a mechanism would help an MLIR C++ compiler to perform more aggressive optimizations. One example extracted from an actual workload demonstrates the hoisting of a member function of a container. Currently, this hoisting is blocked because of imprecision in the alias analysis. However, we will show that having extra information helps an analysis performed at a higher representation of the code, such as MLIR, to discover that the hoisting is a safe transformation. More complex optimizations can also use container information to create more robust code-transformation legality analysis. One example of such an optimization was presented in the 2023 LLVM Dev meeting [3]. Since then we have discovered more general forms for this optimization. We use some examples to argue that information about container classes is required for the legality analysis of these optimizations. Introducing a way to declare container classes in the language would make the implementation of a robust legality analysis easier. C++ has a “Container” named requirement, which is very similar to what is proposed here. It might be useful to have an attribute with a similar definition and an extra attribute that can specify common functions and variables in a container class (inserting an element, removing an element, allocating memory, etc.). In this talk we will discuss how such extensions to the language could help compiler optimizaitons. [1] [2] [3]

IREE: Compiling ML Programs Using MLIR
▲ back to schedule

Mahesh Ravishankar

IREE is an open-source compiler stack that is designed to compile programs from ML (and similar) domains. It is built from the ground up to target multiple architectures. Built in-conjunction with developments in MLIR, it is also a driver or primary user of many dialects/transformations that have been developed in MLIR. In this talk we will discuss the design of the IREE compiler, and the reasons for it. We will also highlight how dialects and transformations built in MLIR are used in IREE, what is the current state of support for ML compilation using IREE, and the challenges faced by a compiler stack like IREE in todays ML landscape.

Polyhedral Rescheduling of GPU Kernels To Exploit Async Memory Movement
▲ back to schedule

Ivan R. Ivanov, William Moses, Emil Vatai, Toshio Endo, Jens Domke, Oleksandr Zinenko

Recent trends in high performance computing show an increase in compute power, while memory movement capabilities stagnate. A way to compensate for the growing difference between the two has been to introduce new features that enable better efficiency of moving data. An example of such a feature in NVIDIA GPUs is the capability of asynchronous copies from global to shared memory (available from the Ampere architecture onward). However, even though high-performance libraries such as cutlass make use of these instructions, their usage in general purpose hand written kernels is limited due to various reasons such as portability concerns or programming difficulty. In addition, program analysis and optimization have not kept up with this introduction.
We present a compilation flow which allows analysis and optimization of existing parallel GPU kernels in the polyhedral framework.
In addition, we introduce a notion of optimizing with asynchronous execution in polyhedral scheduling and show that it allows us to reschedule GPU kernels to make use of async features.
We focus specifically on the global memory to shared memory asynchronous copy capabilities of NVIDIA GPUs.

Optimizing Accelerator Memory Transfers within libomptarget
▲ back to schedule

Pawel Radtke, Johannes Doerfert

Offloading host computations to accelerators frequently incurs a substantial penalty in the form of memory transfer overhead. Although the traditional LLVM optimization pipeline offers robust static analyses, it remains unable to address optimising offload transfers directly due to architectural decisions that relocate transfer logic and metadata management to the runtime. This presentation examines both the potential and the constraints of optimizing offload memory transfers within LLVM's offloading runtime library - libomptarget, and runtime metadata produced by OpenMP offloading directives. We evaluate the scope for eliminating redundant data transfers, removing unused data segments, and thereby reducing transfer overhead based on currently available runtime information, while also considering what additional gains might be realized through enhanced metadata availability.

Call for Speakers

We invite speakers from academia and industry to present their work on the following list of topics (including and not limited to:):

  • Improving performance and code-size of applications built by LLVM toolchains
  • Improving performance of LLVM's runtime libraries
  • Improving the security of generated code
  • Any tools or products developed by using one of the libraries in LLVM infrastructure
  • Performance tracking over time
  • Compiler flags, annotations and remarks to understand and improve performance
  • Any other topic related to improving and maintaining the performance and quality of LLVM generated code

While the primary focus of the workshop is on these topics, we welcome any submission related to the LLVM project, its sub-projects (clang, mlir, lldb, Polly, lld, openmp, pstl, compiler-rt, etc.), as well as their use in industry and academia.

We are looking for:

  • keynote speakers (30-60 minutes),
  • technical presentations (30 minutes plus questions and discussion),
  • tutorials (30-60 minutes),
  • panels (30-60 minutes),
  • BOFs (30-60 minutes)

Proposals should provide sufficient information for the review committee to be able to judge the quality of the submission. Proposals can be submitted under the form of an extended abstract, full paper, or slides. Accepted presentations will be presented online. The presentations will be publicly available on https://llvm.org/devmtg/

In case of any queries please reach out to the workshop organizers: Aditya (hiraditya at msn.com), Jose M Monsalve Diaz (jmonsalvediaz at anl.gov), Shilei Tian (i at tianshilei.me), Rafael (rafaelhg at udel.edu), or Kevin Sala (kevin.sala at bsc.es).

What types of people attend?

  • Active developers of projects in the LLVM Umbrella (LLVM core, Clang, LLDB, libc++, compiler_rt, klee, lld, OpenMP, etc).
  • Anyone interested in using these as part of another project.
  • Students and Researchers.
  • Compiler, programming language, and runtime enthusiasts.
  • Those interested in using compiler and toolchain technology in novel and interesting ways.

Panels

Panel sessions are guided discussions about a specific topic. The panel consists of ~3 developers who discuss a topic through prepared questions from a moderator. The audience is also given the opportunity to ask questions of the panel.

Birds of a Feather (BoF)

A BoF session, an informal meeting at conferences, where the attendees group together based on a shared interest and carry out discussions without any pre-planned agenda.

Technical Talks

These 20-30 minute talks cover all topics from core infrastructure talks, to project's using LLVM's infrastructure. Attendees will take away technical information that could be pertinent to their project or general interest.

Tutorials

Tutorials are 30-60 minute sessions that dive down deep into a technical topic. Expect in depth examples and explanations.
Code of Conduct

The LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct. We also adhere to the CGO Code of Conduct.