LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 41856 - CP15 barrier instructions should be emitted before the exclusives loops
Summary: CP15 barrier instructions should be emitted before the exclusives loops
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: ARM (show other bugs)
Version: trunk
Hardware: Other Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-13 03:19 PDT by Robert Vojta
Modified: 2021-01-21 06:02 PST (History)
11 users (show)

See Also:
Fixed By Commit(s):


Attachments
Never ending loop (986.32 KB, image/png)
2019-05-13 03:19 PDT, Robert Vojta
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Vojta 2019-05-13 03:19:51 PDT
Created attachment 21941 [details]
Never ending loop

## Symptoms

* Rustup hangs (unable to install Rust)
  * Tracked down to parking_lot issue: https://github.com/Amanieu/parking_lot/issues/130
* Not just Rust related, Swift has this problem as well
  * https://forums.balena.io/t/cloud-build-fails-but-local-device-build-works-on-raspberry-pi-zero/4994

strex never succeeds and is looping indefinitely.

## Environment

* Linux kernel with CP15_BARRIER_EMULATION=y
* abi.cp15_barrier set to 1 (emulate)
* arm-unknown-linux-gnueabihf toolchain

## CP15 barrier instructions

* They're deprecated since armv7
* Linux kernel can emulate or HW exec them: https://www.kernel.org/doc/Documentation/arm64/legacy_instructions.txt
  * abi.cp15_barrier is set to 2 (HW exec) -> there's no issue
    * The CPU must support them
    * ARMv8 in our case, which still supports them
  * abi.cp15_barrier is set to 1 (emulate) -> there's this issue

## Issue description

parking_lot author:

> This seems to be closer to an LLVM bug than a parking_lot bug. The source
> of the problem is the CP15 emulation in the kernel. Essentially the
> mcr p15, #0x0, r12, c7, c10, #0x5 is trapping to the kernel every time,
> which invalidates the exclusive monitor between the ldrex and strex
> instructions. This results in the strex never succeeding and looping
> indefinitely.

See attached loop.png image.

ARM engineer (Will Deacon) response on this:

> Hi again, Robert,
> 
> Just a quick update on this:
>
>  1. CP15 barriers remain deprecated in the Armv8 architecture, and so
>    may be removed entirely from future CPUs.
>
> 2. Because of (1), the kernel defaults to trap+emulate, so that it can
>    warn about the use of these instructions. I think this is the right
>    thing to do because, once the instructions have been removed, we
>    will have no choice but to trap+emulate (this happened for the SWP
>    instruction already). This trapping will prevent your exclusives loop
>    from ever succeeding.
>
> 3. The right place to address this issue is in LLVM, where atomic
>    read-modify-write operations with conditional release semantics (i.e.
>    release on success) should actually emit the CP15 barrier before the
>    exclusives loop. Assuming that contention is rare (which it kind of
>    needs to be for performant compare-and-swap anyway), I don't see this
>    having a meaningful impact on performance.
>
> I've reached out to one of our upstream LLVM developers, and I'll be talking
> with him face-to-face next week about getting this fixed.
>
> Will

## Solution

Will's third point:

> Atomic read-modify-write operations with conditional release semantics (i.e.
> release on success) should actually emit the CP15 barrier before the
> exclusives loop. Assuming that contention is rare (which it kind of
> needs to be for performant compare-and-swap anyway), I don't see this
> having a meaningful impact on performance.

## No fix

People aren't / won't be able to use Rust / Swift / ... on Linux with
CP15_BARRIER_EMULATION=y & abi.cp15_barrier=1 (emulation, default value)
& arm-unknown-linux-gnueabihf toolchain.
Comment 1 Robert Vojta 2019-05-13 03:24:22 PDT
Rust compiler issue: https://github.com/rust-lang/rust/issues/60605
Comment 2 Eli Friedman 2019-05-13 16:49:02 PDT
I'm not sure I understand why this is being deprecated, but it's not my decision, I guess.

To be clear, the issue here is specifically with weak cmpxchg with release semantics, where the compiler is targeting armv6?  (Next time, please include a testcase; it took me a few minutes to work that out based on just this report.)

If the barrier is going to trap to the kernel, we shouldn't be putting between an ldrex and an strex, yes.

If we're using a barrier we know won't trap, like dmb, we want to avoid the barrier where possible, for the sake of performance.  For example, if a "release" cmpxchg fails, we don't need a barrier at all.  This may not be due to contention; it might just be that the variable is in a state where the cmpxchg will always fail.  This is more likely to be relevant for a "strong" cmpxchg.

Independent of where we place the barrier, if we expect that cp15 barriers will trap on future Linux releases, should LLVM be changed so it never uses cp15 barriers on Linux? We can call __sync_synchronize instead.
Comment 3 Eli Friedman 2019-05-13 18:41:08 PDT
(In reply to Eli Friedman from comment #2)
> Independent of where we place the barrier, if we expect that cp15 barriers
> will trap on future Linux releases, should LLVM be changed so it never uses
> cp15 barriers on Linux? We can call __sync_synchronize instead.

Actually, thinking about this a bit more, this isn't really independent.  We probably should just expand all non-"monotonic" atomicrmw and cmpxchg operations to __sync_* on Linux if we have ldrex, but not dmb. If we have to call a helper for the barrier anyway, there isn't much point to expanding the atomic operation inline.
Comment 4 Peter Michael Green 2021-01-21 06:02:49 PST
Hi, I maintain the raspbian project I was recently directed to this bug as being the root cause of an issue that has been a major PITA in the raspbian project for some time. I decided to take a bit of a look.

> Actually, thinking about this a bit more, this isn't really independent.  We 
> probably should just expand all non-"monotonic" atomicrmw and cmpxchg operations 
> to __sync_* on Linux if we have ldrex, but not dmb. If we have to call a helper
> for the barrier anyway, there isn't much point to expanding the atomic operation > inline.

I agree, when people build code for "armv6" they generally expect it to work on "armv6 or better" and given the situation with these instructions going forward
avoiding them seems the safest option.

I decided to take a look at this myself, but I was rather out of my depth in terms of both rust and LLVM. After some searching the code I put the following patch in place, which I thought would disable the use of the cp15 barriers.


--- llvm-toolchain-11-11.0.0.orig/llvm/lib/Target/ARM/ARMSubtarget.h            
+++ llvm-toolchain-11-11.0.0/llvm/lib/Target/ARM/ARMSubtarget.h                 
@@ -647,7 +647,9 @@ public:                                                     
   bool hasAcquireRelease() const { return HasAcquireRelease; }                 
.                                                                               
   bool hasAnyDataBarrier() const {                                             
-    return HasDataBarrier || (hasV6Ops() && !isThumb());                       
+    //avoid using armv6 cp15 barriers, they cause more                         
+    //problems than they solve.                                                
+    return HasDataBarrier;// || (hasV6Ops() && !isThumb());                    
   }                                                                            
.                                                                               
   bool useMulOps() const { return UseMulOps; }  

Unfortunately after building and installing the new llvm and clearing the cargo caches, the parking lot testsuite still hangs, I don't know if this is because my change was ineffective or because there was some other problem.

Am I looking along the right lines?

Does anyone know how to reproduce this issue without involving rust or swift?