Site Map:
Download! Search this Site Useful Links Release Emails Maintained by the llvm-admin team |
Open LLVM Projects
Welcome prospective Google Summer of Code 2021 Students! This document is your starting point to finding interesting and important projects for LLVM, Clang, and other related sub-projects. This list of projects is not only developed for Google Summer of Code, but open projects that really need developers to work on and are very beneficial for the LLVM community. We encourage you to look through this list and see which projects excite you and match well with your skill set. We also invite proposals not on this list. You must propose your idea to the LLVM community through our developers' mailing list (llvm-dev@lists.llvm.org or specific subproject mailing list). Feedback from the community is a requirement for your proposal to be considered and hopefully accepted. The LLVM project has participated in Google Summer of Code for several years and has had some very successful projects. We hope that this year is no different and look forward to hearing your proposals. For information on how to submit a proposal, please visit the Google Summer of Code main website. LLVM participation in Google Summer of Code 2020 was very successful and resulted in many interesting projects contributed to LLVM. For the list of accepted and completed projects, please take a look into Google Summer of Code website. Description of the project: This is a short description, please reach out to Johannes (jdoerfert on IRC) if it sounds interesting. During the GSoC'19 we build the Attributor framework to improve the inter-procedural capabilities of LLVM. This is useful on its own but especially in situations where inlining is impossible or undesirable. In this GSoC project we will look at capabilities not yet available in the Attributor and for the potential to connect the Attributor with existing intra- and inter-procedural optimizations. In this project there is a lot of freedom to determine the actual tasks but we will provide a pool of smaller and medium sized tasks that can be chosen from as well. Preparation resources: The Attributor YouTube videos from the LLVM Developers Meeting 2019 and the recording of the IPO panel from the same meeting. The Attributor framework as well as other existing inter-procedural analyses and optimizations in LLVM. Expected results: Measurable better IPO, especially visible in cases where inlining is not an option or undesirable. Confirmed Mentor: Johannes Doerfert Desirable skills: Intermediate knowledge of C++, self motivation. Description of the project: This is a short description, please reach out to Johannes (jdoerfert on IRC) if it sounds interesting. With the OpenMPOpt pass (under review) we started to teach the LLVM optimization pipeline about OpenMP parallelism encoded as OpenMP runtime calls. In this GSoC project we will look at capabilities not yet available in the OpenMPOpt pass and for the potential to connect existing intra- and inter-procedural optimizations, e.g. the Attributor. In this project there is a lot of freedom to determine the actual tasks but we will provide a pool of smaller and medium sized tasks that can be chosen from as well. Preparation resources: The "Optimizing Indirections, using abstractions without remorse" video on YouTube from the LLVM Developers Meeting 2018. The paper "Compiler Optimizations for OpenMP" and "Compiler Optimizations For Parallel Programs" both by J. Doerfert and H. Finkel (the slides for these are potentially even more useful). Expected results: Measurable better performance or program analysis results for parallel programs with a focus on OpenMP. Confirmed Mentor: Johannes Doerfert Desirable skills: Intermediate knowledge of C++, self motivation. Description of the project:
Generating debug information is one of the fundamental tasks a compiler
typically fulfills. It is clear that executable generated code should not
depend on the presence of debug information.
Preparation resources:
Expected results:
Confirmed Mentors: Paul Robinson and David Tellenbach Desirable skills: Intermediate knowledge of C++, some familarity with general computer architecture, some familarity with the x86 or Arm/AArch64 instruction set. Description of the project: MergeSimilarFunctions pass is able to merge not just identical functions, but also functions with small differences in their instructions to reduce code size. It does this by inserting control flow and an additional argument in the merged function to account for the differences. This work was presented at the LLVM Dev Meeting in 2013 A more detailed description was published in a paper at LCTES 2014. The code was released to the community at the time. Meanwhile, the pass has been in production use at QuIC for the past few years and has been actively maintained internally. In order to magnify the impact of MergeSimilarFunctions, it has been ported to ThinLTO and the patches have been upstreamed (see stack of 5 patches mentioned below). But instead of replacing the existing MergeFunctions pass in LLVM-upstream the community suggested we improve the existing one with the ideas from MergeSimilarFunctions. And then leverage the ThinLTO on top of that. The MergeSimilarFunction used in ThinLTO gives impressive code size reduction across a wide range of workloads and the work was presented at LLVM-dev 2018. The LLVM project would greatly benefit from this code size optimization as most embedded systems (think SmartPhones) applications are constrained on code-size. Preparation resources:
Expected results:
Confirmed Mentors:Aditya Kumar (hiraditya on IRC and phabricator), JF Bastien (jfb on phabricator) Desirable skills: Course on compiler design, SSA Representation, Intermediate knowledge of C++, Familiarity with LLVM Core. Description of the project: LLVM provides a tool called yaml2obj which coverts a YAML document into an object file, for various different file formats such as ELF, COFF and Mach-O, along with obj2yaml which does the inverse. The tool is commonly used to test parts of LLVM, as YAML is often easier to use to describe an object file than raw assembly and more maintainable than a pre-built binary. DWARF is a debugging file format commonly used by LLVM. Many of the tests for LLVM’s DWARF emission are written in assembly, but it would be nicer to write them in YAML. However, yaml2obj does not properly support emission of DWARF sections. This project is to add functionality to yaml2obj to make writing test inputs for DWARF tests simpler, particularly for ELF objects. Preparation resources: Reading up on the DWARF file format will be useful, in particular the standards available at http://dwarfstd.org/Download.php. Also, familiarising yourself with the basics of the ELF file format, as described here https://www.sco.com/developers/gabi/latest/contents.html, may be beneficial. Expected results: The ability to use yaml2obj to generate DWARF sections for object files. Particularly important is ensuring the input YAML can be more easily understood than the equivalent assembly. Confirmed Mentors: James Henderson Desirable skills: Intermediate knowledge of C++. Description of the project: Hot Cold Splitting in LLVM is an IR level function splitting transformation. The goal of hot/cold splitting is to improve the memory locality of code and helps reduce startup working set. The splitting pass does this by identifying cold blocks and moving them into separate functions. Because it is implemented at the IR level all the back end target benefit from it. It is a relatively new optimization and it was recently presented at the LLVM Dev Meeting in 2019 and the slides are here Because most of the benefit comes from outlining small blocks e.g., __assert_rtn. The goal of this project is to identify potential blocks via static analysis e.g., exception handling code, optimizing personality functions. Use cost-model to ensure outlining reduces the code size of the caller, use tail call whenever appropriate to save instructions. Preparation resources:
Expected results:
Confirmed Mentors:Aditya Kumar (hiraditya on IRC and phabricator) Desirable skills: Course on compiler design, SSA Representation, Intermediate knowledge of C++, Familiarity with LLVM Core. Description of the project: Selecting optimization passes for given application is very important but non-trivial problem because of the huge size of the compiler transformation space (incl. pass ordering). While the existing heuristics can provide high performance code for certain applications, they cannot easily benefit a wide range of application codes. The goal of the project is to learn the interplay between LLVM transformation passes and code structures, then improve the existing heuristics (or replace the heuristics with machine learning-based models) so that the LLVM compiler can provide a superior order of the passes customized per application. Expected results (possibilities):
Preparation resources:
Confirmed Mentors:EJ Park, Giorgis Georgakoudis, Johannes Doerfert Desirable skills: C++, Python, experience with LLVM and learning-based prediction preferable.
Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Description of the project: Current machine learning models for compiler optimization select the best optimization strategies for functions based on isolated per function analysis. In this approach, the constructed models are not aware of any relationships with other functions around it (callers or callees) which can be helpful to decide the best optimization strategies for each function. In this project, we want to explore the SCC (Strongly Connected Components) call graph to add inter-procedural features in constructing machine learning-based models to find the best optimization strategies per function. Moreover, we want to explore the case that it is helpful to group strongly related functions together and optimize them as a group, instead of per function. Expected results (possibilities):
Preparation resources:
Confirmed Mentors:EJ Park, Giorgis Georgakoudis, Johannes Doerfert Desirable skills: C++, Python, experience with LLVM and learning-based prediction preferable. Description of the project: There is currently no easy way to use the result of PostDominatorTreeAnalysis in a loop pass, as PostDominatorTreeAnalysis is a function analysis, and it is not included in LoopStandardAnalysisResults. If one adds PostDominatorTreeAnalysis in LoopStandardAnalysisResults, then all loop passes need to preserve it, meaning that all loop passes need to make sure the result is up to date. In this project, we want to modify some commonly used utilities to generate a list of updates, which can be consume by different updaters, e.g. DomTreeUpdater to update DominatorTree and PostDominatorTree, and MSSAU to update MemorySSA, etc, instead of only updating the DominatorTree. In additional, we want to change existing loop passes to preserve the PostDominatorTree. Finally, adding PostDominatorTree in LoopStandardAnalysisResults. Expected results (possibilities): PostDominatorTree added in LoopStandardAnalysisResults, and can be used by loop passes. More common utilities change to generate list of updates to be easily obtained by different updaters. Confirmed Mentors: Whitney Tsang, Ettore Tiotto, Bardia Mahjour Desirable skills: Intermediate knowledge of C++, self-motivation. Description of the project: Currently if you want to write a pass that works on a loop nest, you have to pick from either a function pass or a loop pass. If you chose to write it as a function pass, then you lose the ability to add loops dynamically back to the pipeline. If you decide to write it as a loop pass, then you are wasting compile time to traverse to your pass and return right away when the given loop is not the outermost loop. In this project, we want to create a LoopNestPass, where transformations intended for loop nest can inherit from it, and have the same ability as the LoopPass to dynamically add loops to the pipeline. In addition, create all the adaptors requires to add loop nest passes at different points of the pass builder. Expected results (possibilities): Transformations/Analyses can be written as LoopNestPass, without compromising compile time or usability. Confirmed Mentors: Whitney Tsang, Ettore Tiotto Desirable skills: Intermediate knowledge of C++, self-motivation. Description of the project: TableGen is flexible and allow the end-user to define and set common properties of records (instructions). Every target has dozens or hundreds of such instruction properties. As target code evolve, the td files become more and more complicated, it become harder to see whether the setting of some properties is necessary, even correct or not. eg: whether hasSideEffects property is correctly set on all instructions? One can manually search through the TableGen-generated files; or write some script to run TableGen and matching the output for some specific properties, but a standalone utility that can dump and check instruction properties systematically (eg: also allow target to define some verification rules) might be better from a build-process-management standpoint. This can help to find quite some hidden bugs and hence improve the overall codegen code quality. In addition, the utility can be used to write regression tests for instruction properties, which will increase the quality and precision of LLVM's regression tests. Expected results (possibilities): A standalone llvm tool or utility that can dump and check instruction properties systematically Confirmed Mentors: Hal Finkel, Jinsong Ji , Qingshan Zhang Desirable skills: Intermediate knowledge of C++, self-motivation. Description of the project: Determining whether it is safe to move code around is implemented in several transformations in LLVM (e.g. canSinkOrHoistInst in LICM, or makeLoopInvariant in Loop). Each of these implementations may return different results for a given query, making code motion safety checks inconsistent and duplicated. On the other hand, the mechanism for doing the actual code motion is also different in each transformation. Code duplication causes maintenance problems and increases the time taken to write new transformation. In this project, we want to first identify all the existing ways in loop transformations (could be function or loop pass) to check if code is safe to move, and to move code, and create a standardize way to do so. Expected results (possibilities):
A standardize/superset of all the existing ways in loop
transformations of checking if code is safe to be moved and to move Confirmed Mentors: Whitney Tsang, Ettore Tiotto, Bardia Mahjour Desirable skills: Intermediate knowledge of C++, self-motivation. All the items in the list of open projects are opened to GSOC. Feel free to propose your own ideas as well on Discourse. Description of the project: When instantiating a template, the template arguments are canonicalized before being substituted into the template pattern. Clang does not preserve type sugar when subsequently accessing members of the instantiation. std::vector<std::string> vs; int n = vs.front(); // bad diagnostic: [...] aka 'std::basic_string<char>' [...] template<typename T> struct Id { typedef T type; }; Id<size_t>::type // just 'unsigned long', 'size_t' sugar has been lostClang should "re-sugar" the type when performing member access on a class template specialization, based on the type sugar of the accessed specialization. The type of vs.front() should be std::string, not std::basic_string<char, [...]>. Suggested design approach: add a new type node to represent template argument sugar, and implicitly create an instance of this node whenever a member of a class template specialization is accessed. When performing a single-step desugar of this node, lazily create the desugared representation by propagating the sugared template arguments onto inner type nodes (and in particular, replacing Subst*Parm nodes with the corresponding sugar). When printing the type for diagnostic purposes, use the annotated type sugar to print the type as originally written. For good results, template argument deduction will also need to be able to deduce type sugar (and reconcile cases where the same type is deduced twice with different sugar). Expected results: Diagnostics preserve type sugar even when accessing members of a template specialization. T<unsigned long> and T<size_t> are still the same type and the same template instantiation, but T<unsigned long>::type single-step desugars to 'unsigned long' and T<size_t>::type single-step desugars to 'size_t'. Confirmed Mentor: Vassil Vassilev, Richard Smith Desirable skills: Good knowledge of clang API, clang's AST, intermediate knowledge of C++. Description of the project: The Clang Static Analyzer already knows how to prevent crashes caused by null pointer dereference in arbitrary code, however it often "gives up" when the code is too complicated. In particular, implementation details of C++ standard classes, even simple ones such as smart pointers or optionals, may be too convoluted for the Analyzer to fully understand. Moreover, the exact behavior depends on which implementation of the Standard Library is used (e.g., GNU libstdc++ or LLVM's own libc++). We can enable the Analyzer to find more bugs in modern C++ code by teaching it explicitly about the behavior of C++ standard classes, and therefore skipping the whole process in which the Analyzer tries to understand all the implementation details on its own. For example, we could teach it that a default-constructed smart pointer is null, and any attempt to dereference it would result in a crash. The project would therefore consist in manually providing implementations for various methods of standard classes. Expected results: We want the Static Analyzer to emit warnings when a null smart pointer dereference would occur in the code. For example: #include <memory> int foo(bool flag) { std::unique_ptr<int> x; // note: Default constructor produces a null unique pointer; if (flag) // note: Assuming 'flag' is false; return 0; // note: Taking false branch return *x; // warning: Dereferenced smart pointer 'x' is null. }We should be able to cover at least one class fully, for example, std::unique_ptr, and then see if we can generalize our results to other classes, such as std::shared_ptr or the C++17 std::optional. Confirmed Mentor: Artem Dergachev, Gábor Horváth Desirable skills: Intermediate knowledge of C++. Description of the project: LLDB's command line offers several convenience features that are inspired by features of UNIX shells such as tab completions or a command history. One feature that is not implemented yet are 'autosuggestions'. These are suggestions for possible commands that the user might want to type, but unlike tab completions they are displayed directly behind the cursor while the user is typing a command. A good demonstration how this could look like are the autosuggestions implemented in fish shell. This project is about implementing autosuggestions in LLDB's editline-based command shell. Confirmed Mentor: Jonas Devlieghere and Raphael Isemann Desirable skills: Intermediate knowledge of C++. Description of the project: LLDB's command line offers several convenience features that are inspired by features of UNIX shells such as tab completions for commands. These tab completions are implemented by a completion engine that is not only used by the command line interface of LLDB, but also by graphical interfaces for LLDB such as IDEs. While the tab completions in LLDB are really useful, they are currently not implemented for all commands and their respective arguments. This project is about implementing the remaining completions for the commands in LLDB which will greatly improve the user experience of LLDB. Improving existing completions is also part of the project. Note that the completions are not static list of strings but often require inspecting and understanding the internal state of LLDB. As LLDB commands and their tab completions cover all aspects of LLDB, this project offers a great way to get an overview of all the functionality in LLDB. Confirmed Mentor:Raphael Isemann Desirable skills: Intermediate knowledge of C++. Description of the project: Just as LLVM is a library to build compilers, LLDB is a library to build debuggers. LLDB vends a stable, public SB API. Due to historic reasons the LLDB command line interface is currently implemented on top of LLDB's private API and it duplicates a lot of functionality that is already implemented in the public API. Rewriting LLDB's command line interface on top of the public API would simplify the implementation, eliminate duplicate code, and most importantly reduce the testing surface. This work will also provide an opportunity to clean up the SB API of commands that have accrued too many overloads over time and convert them to make use of option classes to both gather up all the variants and also future-proof the APIs. Confirmed Mentor:Adrian Prantl and Jim Ingham Desirable skills: Intermediate knowledge of C++. Description of the project: One of the tensions in the testsuite is that spinning up a process and getting it to some point is not a cheap operation, so you'd like to do a bunch of tests when you get there. But the current testsuite bails at the first failure, so you don't want to do many tests since the failure of one fails all the others. On the other hand, there are some individual test assertions where the failure of the assertion should cause the whole test to fail. For example, if you fail to stop at a breakpoint where you want to check some variable values, then the whole test should fail. But if your test then wants to check the value of five independent locals, it should be able to do all five, and then report how many of the five variable assertions failed. We could do this by adding Start and End markers for a batch of tests, do all the tests in the batch without failing the whole test, and then report the error and fail the whole test if appropriate. There might also be a nice way to do this in Python using scoped objects for the test sections. Confirmed Mentor: Jim Ingham Desirable skills: Intermediate knowledge of Python. Google Summer of Code 2019 contributed a lot to the LLVM project. For the list of accepted and completed projects, please take a look into Google Summer of Code website. Description of the project: Adding Debug Info (compiling with `clang -g`) shouldn't change the generated code at all. Unfortunately we have bugs. These are usually not too hard to fix and a good way to discover new part of the codebase! We suggest building object files both ways and disassembling the text sections, which will give cleaner diffs than comparing .s files. Expected results: Reduced test cases, bug reports with analysis (e.g., which pass is responsible), possibly patches. Confirmed Mentor: Paul Robinson Desirable skills: Intermediate knowledge of C++, some familiarity with x86 or ARM instruction set. Description of the project: Clang contains an ASTImporter which allows moving declarations and statements from one Clang AST to another. This is for example used for static analysis across translation units and in LLDB's expression evaluator. The current ASTImporter works as intended when moving simple C code from one AST to another. However, more complicated declarations such as C++'s OOP features and templates are not fully implemented and can cause crashes or invalid AST nodes. The bug reports related to these crashes are often filed against LLDB's expression evaluator and are rarely submited with a minimal reproducer. This makes improving ASTImporter a time-consuming and tedious task. This project is about writing a fuzzer to proactively discover these ASTImporter bugs and provide minimal reproducers which make understanding and fixing the underlying bug easier. A possible implementation of such a fuzzer and driver could look like this:
Confirmed Mentor: Raphael Isemann, Shafik Yaghmour Desirable skills: Intermediate knowledge of C++. Description of the project: Clang has a newly implemented autocompletion feature which details can be found at LLVM blog. We would like to improve this by adding more flags to autocompletion, supporting more shells (currently it supports only bash) and exporting this feature to other projects such as llvm-opt. Accepted student will be working on Clang Driver, LLVM Options and shell scripts. Expected Results: Autocompletion working on bash and zsh, support llvm-opt options. Confirmed Mentor: Yuka Takahashi and Vassil Vassilev Desirable skills: Intermediate knowledge of C++ and shell scripting Decription: Clang diagnostics (warnings and errors) issues to the programmer are a critical feature of the compiler. Great diagnostics can have a signifiant impact on the user experience of the compiler and increase their productivity. Recent improvements in GCC 9.0 show that there is significant headroom to improve diagnostics (and user interactions in general). It would be a very impactful project to survey and identify all the possible improvements to clang on this topic, and start resigning the next generation of our diagnostics. Desirable skills: C++ coding experience Google Summer of Code 2018 contributed a lot to the LLVM project. For the list of accepted and completed projects, please take a look into Google Summer of Code website. Google Summer of Code 2017 contributed a lot to the LLVM project. For the list of accepted and completed projects, please take a look into Google Summer of Code website. This document is meant to be a sort of "big TODO list" for LLVM. Each project in this document is something that would be useful for LLVM to have, and would also be a great way to get familiar with the system. Some of these projects are small and self-contained, which may be implemented in a couple of days, others are larger. Several of these projects may lead to interesting research projects in their own right. In any case, we welcome all contributions. If you are thinking about tackling one of these projects, please send a mail to the LLVM Developer's mailing list, so that we know the project is being worked on. Additionally this is a good way to get more information about a specific project or to suggest other projects to add to this page. The projects in this page are open-ended. More specific projects are filed as unassigned enhancements in the LLVM bug tracker. See the list of currently outstanding issues if you wish to help improve LLVM. In addition to hacking on the main LLVM project, LLVM has several subprojects, including Clang and others. If you are interested in working on these, please see their "Open projects" page:
Improvements to the current infrastructure are always very welcome and tend to be fairly straight-forward to implement. Here are some of the key areas that can use improvement... Currently, both Clang and LLVM have a separate target description infrastructure, with some features duplicated, others "shared" (in the sense that Clang has to create a full LLVM target description to query specific information). This separation has grown in parallel, since in the beginning they were quite different and served disparate purposes. But as the compiler evolved, more and more features had to be shared between the two so that the compiler would behave properly. An example is when targets have default features on speficic configurations that don't have flags for. If the back-end has a different "default" behaviour than the front-end and the latter has no way of enforcing behaviour, it won't work. An alternative would be to create flags for all little quirks, but first, Clang is not the only front-end or tool that uses LLVM's middle/back ends, and second, that's what "default behaviour" is there for, so we'd be missing the point. Several ideas have been floating around to fix the Clang driver WRT recognizing architectures, features and so on (table-gen it, user-specific configuration files, etc) but none of them touch the critical issue: sharing that information with the back-end. Recently, the idea to factor out the target description infrastructure from both Clang and LLVM into its own library that both use, has been floating around. This would make sure that all defaults, flags and behaviour are shared, but would also reduce the complexity (and thus the cost of maintenance) a lot. That would also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour across the board. The main challenges are:
The LLVM bug tracker occasionally has "code-cleanup" bugs filed in it. Taking one of these and fixing it is a good way to get your feet wet in the LLVM code and discover how some of its components work. Some of these include some major IR redesign work, which is high-impact because it can simplify a lot of things in the optimizer. Some specific ones that would be great to have:
Additionally, there are performance improvements in LLVM that need to get fixed. These are marked with the slow-compile keyword. Use this Bugzilla query to find them. The llvm-test testsuite is a large collection of programs we use for nightly testing of generated code performance, compile times, correctness, etc. Having a large testsuite gives us a lot of coverage of programs and enables us to spot and improve any problem areas in the compiler. One extremely useful task, which does not require in-depth knowledge of compilers, would be to extend our testsuite to include new programs and benchmarks. In particular, we are interested in cpu-intensive programs that have few library dependencies, produce some output that can be used for correctness testing, and that are redistributable in source form. Many different programs are suitable, for example, see this list for some potential candidates. We are always looking for new testcases and benchmarks for use with LLVM. In particular, it is useful to try compiling your favorite C source code with LLVM. If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you get the program to compile, it would be extremely useful to convert the build system to be compatible with the LLVM Programs testsuite so that we can check it into SVN and the automated tester can use it to track progress of the compiler. When testing a code, try running it with a variety of optimizations, and with all the back-ends: CBE, llc, and lli. Find benchmarks either using our test results or on your own, where LLVM code generators do not produce optimal code or where another compiler produces better code. Try to minimize the test case that demonstrates the issue. Then, either submit a bug with your testcase and the code that LLVM produces vs. the code that it should produce, or even better, see if you can improve the code generator and submit a patch. The basic idea is that it's generally quite easy for us to fix performance problems if we know about them, but we generally don't have the resources to go finding out why performance is bad. The LNT perf database has some nice features like detect moving average, standard deviations, variations, etc. But the report page give too much emphasis on the individual variation (where noise can be higher than signal), eg. this case. The first part of the project would be to create an analysis tool that would track moving averages and report:
The second part would be to create a web page which would show all related benchmarks (possibly configurable, like a dashboard) and show the basic statistics with red/yellow/green colour codes to show status and links to more detailed analysis of each benchmark. A possible third part would be to be able to automatically cross reference different builds, so that if you group them by architecture/compiler/number of CPUs, this automated tool would understand that the changes are more common to one particular group. The LLVM Coverage Report has a nice interface to show what source lines are covered by the tests, but it doesn't mentions which tests, which revision and what architecture is covered. A project to renovate LCOV would involve:
Another idea is to enable the test suite to run all built backends, not only the host architecture, so that coverage report can be built in a fast machine and have one report per commit without needing to update the buildbots.
Sometimes creating new things is more fun than improving existing things. These projects tend to be more involved and perhaps require more work, but can also be very rewarding. Many proposed extensions and improvements to LLVM core are awaiting design and implementation.
We have a strong base for development of both pointer analysis based optimizations as well as pointer analyses themselves. We want to take advantage of this:
We now have a unified infrastructure for writing profile-guided transformations, which will work either at offline-compile-time or in the JIT, but we don't have many transformations. We would welcome new profile-guided transformations as well as improvements to the current profiling system. Ideas for profile-guided transformations:
Improvements to the existing support:
LLVM aggressively optimizes for performance, but does not yet optimize for code size. With a new ARM backend, there is increasing interest in using LLVM for embedded systems where code size is more of an issue. Someone interested in working on implementing code compaction in LLVM might want to read this article, describing using link-time optimizations for code size optimization.
In addition to projects that enhance the existing LLVM infrastructure, there are projects that improve software that uses, but is not included with, the LLVM compiler infrastructure. These projects include open-source software projects and research projects that use LLVM. Like projects that enhance the core LLVM infrastructure, these projects are often challenging and rewarding. At least one project (and probably more) needs to use analysis information (such as call graph analysis) from within a MachineFunctionPass, however, most analysis passes operate at the LLVM IR level. In some cases, a value (e.g., a function pointer) cannot be mapped from the MachineInstr level back to the LLVM IR level reliably, making the use of existing LLVM analysis passes from within a MachineFunctionPass impossible (or at least brittle). This project is to encode analysis information from the LLVM IR level into the MachineInstr IR when it is generated so that it is available to a MachineFunctionPass. The exemplar is call graph analysis (useful for control-flow integrity instrumentation, analysis of code reuse defenses, and gadget compilers); however, other LLVM analyses may be useful. Implement an on-demand function relocator in the LLVM JIT. This can help improve code locality using runtime profiling information. The idea is to use a relocation table for every function. The relocation entries need to be updated upon every function relocation (take a look at this article). A (per-function) basic block reordering would be a useful extension. The goal of this project is to implement better data layout optimizations using the model of reference affinity. This paper provides some background information. Slimmer is a prototype tool, built using LLVM, that uses dynamic analysis to find potential performance bugs in programs. Development on Slimmer started during Google Summer of Code in 2015 and resulted in an initial prototype, but evaluation of the prototype and improvements to make it portable and robust are still needed. This project would have a student pick up and finish the Slimmer work. The source code of Slimmer and its current documentation can be found at its Github web page. |