#### RISC-V Support into LLVM's libc: Challenges and Solutions for 32-bit and 64-bit

Mikhail R. Gadelha, PhD

24th October 2024



#### The LLVM C Library

- LLVM-libc is a libc implementation under the LLVM project umbrella, targeting performance and modularity
- Includes support for various architectures, including x86\_64, aarch64, arm, and RISC-V (32 and 64-bit)
- □ Also supports building for GPU, both AMD and NVIDIA
- □ Written in modern C++



#### **RISC-V** Overview

- □ An **open** instruction set architecture
- □ It's not a chip (Core i7)
- □ It's not a piece of IP (ARM Cortex-M)
- □ It's the ISA (x86, AArch64, PowerPC)





# Why RISC-V Support in LLVM libc?

I wanted to learn more about the architecture 😅

- □ I was looking for an IIvm project to contribute to so libc was the perfect match
- Coincidentally, the first patch with initial RV64 support landed one month prior:
  - Basic implementation of memory functions (memcmp, memmove, memcpy, etc)
  - Partial implementation ctype.h, string.h, math.h and syscalls
  - Missing crt, fenv, threads, longjmp/setjmp, and others



□ Basic support can be enabled by changing only 3 files

- libc/cmake/modules/LLVMLibCArchitectures.cmake: queries arch name (e.g., riscv64)
- libc/config/linux/**riscv64**/entrypoints.txt: defines the supported functions
- libc/config/linux/**riscv64**/headers.txt: defines the system headers



□ libc/cmake/modules/LLVMLibCArchitectures.cmake: get the target architecture

from the compiler's default target triple

```
@@ -55,6 +55,8 @@ function(get arch and system from triple triple arch var sys var)
    set(target arch "x86 64")
  elseif(target arch MATCHES "^(powerpc|ppc)")
    set(target arch "power")
  elseif(target arch MATCHES "^riscv64")
    set(target arch "riscv64")
  else()
    return()
  endif()
@@ -146,6 +148,8 @@ elseif(LIBC TARGET ARCHITECTURE STREQUAL "aarch64")
  set(LIBC TARGET ARCHITECTURE IS AARCH64 TRUE)
elseif(LIBC TARGET ARCHITECTURE STREQUAL "x86 64")
  set(LIBC TARGET ARCHITECTURE IS X86 TRUE)
+elseif(LIBC TARGET ARCHITECTURE STREOUAL "riscv64")
  set(LIBC TARGET ARCHITECTURE IS RISCV64 TRUE)
else()
  message(FATAL ERROR
           "Unsupported libc target architecture ${LIBC TARGET ARCHITECTURE}")
```



□ libc/config/linux/**riscv64**/entrypoints.txt: defines the supported functions

| libc/config/linux/riscv64/entrypoints.txt |
|-------------------------------------------|
| new file mode 100644                      |
| index 00000000000183cf1b66a88             |
| @@ -0,0 +1,106 @@                         |
| +set(TARGET_LIBC_ENTRYPOINTS              |
| + # ctype.h entrypoints                   |
| + libc.src.ctype.isalnum                  |
| + libc.src.ctype.isalpha                  |
| + libc.src.ctype.isascii                  |
| + libc.src.ctype.isblank                  |
| + libc.src.ctype.iscntrl                  |
| + libc.src.ctype.isdigit                  |
| + libc.src.ctype.isgraph                  |
| + libc.src.ctype.islower                  |
| + libc.src.ctype.isprint                  |



□ libc/config/linux/**riscv64**/headers.txt: defines the system headers

----- libc/config/linux/riscv64/headers.txt ------new file mode 100644
index 00000000000..cc436c7119f4
@@ -0,0 +1,8 @@
+set(TARGET\_PUBLIC\_HEADERS
+ libc.include.ctype
+ libc.include.errno
+ libc.include.errno
+ libc.include.inttypes
+ libc.include.math
+ libc.include.stdlib
+ libc.include.string
+)



□ Once basic support is done, it's a matter of:

- Adding more functions to entrypoints.txt and headers.txt
- Run tests
- Fix bugs
- Rinse and repeat



# Challenges of RISC-V: emulators

- □ Hardware was not widely available, so we need emulators: qemu, spike, and others
- Initially I used qemu-riscv64 (user space emulator) but several syscalls would fail or succeed when we expected the other way around



# Challenges of RISC-V: emulators

[=======] Running 2 tests from 1 test suite. [ RUN ] LlvmLibcPosixMadviseTest.NoError [ OK ] LlvmLibcPosixMadviseTest.NoError (414 us) [ RUN ] LlvmLibcPosixMadviseTest.Error\_BadPtr /home/lnt/experiment/llvm-project/libc/test/src/sys/mman/linux/posix\_madvise\_test.cpp:49: FAILURE Expected: \_\_llvm\_libc\_20\_0\_0\_git::posix\_madvise(nullptr, 8, 2) Which is: 0 To be equal to: 12 Which is: 12 [ FAILED ] LlvmLibcPosixMadviseTest.Error\_BadPtr Ran 2 tests. PASS: 1 FAIL: 1

posix\_madvise: gives advice about patterns of memory usage

 $\Box$  Returns 0 on success. If the first arg is invalid, returns ENOMEM (0x12)

# Challenges of RISC-V: emulators

□ The solution was to use qemu-system:

- For RV64 is quite simple, there are images available from several linux distributions
- For RV32 you need to build your own image, using buildroot or yocto. There is a tutorial on <u>https://discourse.llvm.org/</u> on how to create it
- I used yocto since it enables creating an image with gcc/clang as part of the image.
- For some reason RV32 images are restricted to 1GB of RAM.



# Challenges of RISC-V: syscalls

Different syscalls compared to other architectures.

- Simplified design compared to x86 or ARM, but lacks backward compatibility for older syscalls, e.g.:
  - SYS\_open  $\rightarrow$  SYS\_openat
  - SYS\_unlink  $\rightarrow$  SYS\_unlinkat
  - SYS\_getdents  $\rightarrow$  SYS\_getdents64
  - SYS\_sched\_rr\_get\_interval  $\rightarrow$  SYS\_sched\_rr\_get\_interval\_time64
  - SYS\_wait, SYS\_waitpid, SYS\_wait3, SYS\_wait4  $\rightarrow$  SYS\_waitid



```
int fcntl(int fd, int cmd, void *arg) {
+#if SYS fcntl
+ constexpr auto FCNTL SYSCALL ID = SYS fcntl;
+#elif defined(SYS fcntl64)
+ constexpr auto FCNTL SYSCALL ID = SYS fcntl64;
+#else
+#error "fcntl and fcntl64 syscalls not available."
+#endif
   switch (cmd) {
   case F OFD SETLKW: {
     struct flock *flk = reinterpret cast<struct flock *>(arg);
@@ -33,7 +41,7 @@ int fcntl(int fd, int cmd, void *arg) {
     flk64.l len = flk->l len;
     flk64.l pid = flk->l pid;
     // create a syscall
     return LIBC NAMESPACE::syscall impl<int>(SYS fcntl, fd, cmd, &flk64);
     return LIBC NAMESPACE::syscall impl<int>(FCNTL SYSCALL ID, fd, cmd, &flk64);
```

□ fcntl: manipulate file descriptor



```
+// We use dup3 if dup2 is not available, similar to our implementation of dup2
bool dup2(int fd, int newfd) {
+#ifdef SYS_dup2
long ret = __llvm_libc::syscall_impl(SYS_dup2, fd, newfd);
+#elif defined(SYS_dup3)
+ long ret = __llvm_libc::syscall_impl(SYS_dup3, fd, newfd, 0);
+#else
+#error "SYS_dup2 and SYS_dup3 not available for the target."
+#endif
return ret < 0 ? false : true;
}
```

□ dup, dup2, dup3: duplicate a file descriptor



```
@@ -23,14 +23,15 @@ LLVM LIBC FUNCTION(off t, lseek, (int fd, off t offset, int whence)) {
#ifdef SYS lseek
  <u>int ret = llvm libc::syscall impl<int>(SYS lseek, fd, offset, whence);</u>
   result = ret;
-#elif defined(SYS llseek)
   long ret = llvm libc::syscall impl(SYS llseek, fd,
                                        (long)(((uint64 t)(offset)) >> 32),
                                        (long)offset, &result, whence);
+#elif defined(SYS llseek) || defined(SYS llseek)
+#ifdef SYS llseek
+ constexpr long LLSEEK SYSCALL NO = SYS llseek;
#elif defined(SYS llseek)
  int ret = llvm libc::syscall impl<int>(SYS llseek, fd, offset >> 32,
                                            offset, &result, whence);
+ constexpr long LLSEEK SYSCALL NO = SYS llseek;
+#endif
+ uint64 t offset 64 = static cast<uint64 t>(offset);
  int ret = llvm libc::syscall impl<int>(
      LLSEEK SYSCALL NO, fd, offset 64 >> 32, offset 64, &result, whence);
#else
#error "lseek, llseek and llseek syscalls not available."
#endif
```

□ lseek: reposition read/write file offset



□ sched\_rr\_get\_interval: get the SCHED\_RR interval for the named process



#### +LIBC\_INLINE ErrorOr<pid\_t> wait4impl(pid\_t pid, int \*wait\_status, int options, +#if SYS wait4 if (pid == -1) { idtype = P ALL; } else if (pid < -1) { } else if (pid == 0) { llvm libc::syscall impl(SYS waitid, idtype, pid, &info, options, usage); if (wait status) { case CLD EXITED: \*wait status = W EXITCODE(info.si status, 0); case CLD DUMPED: \*wait status = info.si status | WCOREFLAG; case CLD KILLED: \*wait status = info.si status; case CLD TRAPPED: case CLD STOPPED \*wait status = W STOPCODE(info.si status); case CLD CONTINUED: \*wait status = W CONTINUED; default: +#else +#error "wait4 and waitid syscalls not available." +#endif + if (pid < 0)

This is SYS\_waitid that handles SYS\_wait, SYS\_waitpid, SYS\_wait3, SYS\_wait4





typedef struct {
 int si\_signo;
 int si\_code;
 int si\_errno;
 ...

MIPS

struct siginfo\_t:data structure containing signal information; it is passed as the second parameter to a user signal handler function.

# Challenges of RISC-V: others

- Some tests can't fail because of different syscalls and need to be disabled, e.g.,
   SYS\_epoll\_create
- rand was using xorshift64star pseudo random number generator, but the LSB was not uniform enough when executed in 32-bit systems
- implicit conversions, e.g., ssize\_t size = sizeof(AuxEntry);
- □ RV32 has 128-bit long double but no int128\_t.



#### **RISC-V on libC Today**





#### **RISC-V on libC Today**



□ \_Float16: <u>https://github.com/llvm/llvm-project/issues/107607</u>



#### Thank you!



