LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 38846 - LLVM lowers __atomic builtin to a function call when lowering __sync builtin to a single instruction with LOCK prefix
Summary: LLVM lowers __atomic builtin to a function call when lowering __sync builtin ...
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: Common Code Generator Code (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
: 25335 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-09-05 09:15 PDT by Taewook Oh
Modified: 2021-07-28 06:43 PDT (History)
3 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Taewook Oh 2018-09-05 09:15:37 PDT
=== Test program ===

#include <cstdint>

template <typename T>
struct __attribute__((__packed__)) Foo {
    T a;
    void incAtomic() {
        __atomic_add_fetch(&a, 1, __ATOMIC_RELEASE);
    }
    void incSync() {

        __sync_add_and_fetch(&a, 1);
    }
};

int main(int argc, char** argv) {
    Foo<uint16_t> f;
    f.a = argc;
    f.incAtomic();
    f.incSync();
    return f.a;
}

=== Disassembled instructions ===
(compiled with clang++ --std=c++17 -O3 -Wall -DNDEBUG=1) 

main:
        push    rax
        mov     word ptr [rsp], di
        mov     rdi, rsp 
        mov     esi, 1
        mov     edx, 3
        call    __atomic_fetch_add_2
        lock            add     word ptr [rsp], 1 # __sync_add_and_fetch 
        movzx   eax, word ptr [rsp]
        pop     rcx
        ret

GCC's documentation for builtins (https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/_005f_005fatomic-Builtins.html) says that __atomic builtsins are intended to replace __sync builtins, so there should be no semantic difference between corresponding ones. 

I believe the difference comes from the alignment checking logic (https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGAtomic.cpp#L768) only included in handling __atomic builtins. However, considering that LOCK prefix works for unaligned accesses as well, I'm not sure if that alignment check is necessary. LOCK for unaligned memory access is expensive, but not sure if it is more expensive than library calls.
Comment 1 Richard Smith 2018-09-05 12:14:47 PDT
I think your example has undefined behavior in both the __sync and __atomic forms. You're calling the versions of these builtins that take an unsigned short*, and passing a pointer that is not suitably aligned. This is UB, just as it would be if you called a function that performed a non-atomic load on such a pointer.

However, I think that's largely irrelevant for the purpose of this bug, and we usually do try to guarantee that cases like this work, when the alignment can be determined from the form of the expression. (I think the cause of the difference between the __sync_* builtins and the __atomic_* builtins is that the former make no attempt to make such cases work, whereas the latter do try to make this work.)

The same behavior can be seen for cases whose behavior is supposed to be defined, such as:

struct A {
  char c[4];
} *p, *q;
void f() { __atomic_load(p, q, 5); } // calls __atomic_load_4

struct B {
  int n;
} *r, *s;
void g() { __atomic_load(r, s, 5); } // emitted as a load

... so it looks like we use a libcall for unaligned accesses for __atomic_*, and do not try to support unaligned accesses for __sync_*. (Note that the __sync_* functions only take built-in integral and pointer types, for which alignment is implied, whereas the __atomic_* functions take more arbitrary types.)

Now, it's wrong to call __atomic_load_4 in the above example: the "optimized" library functions are only valid to call on properly-aligned pointers. We should be calling the generic __atomic_load library function instead.

Also, it's worth noting that choosing to use inlined code rather than a libcall is an ABI decision, not merely a performance choice: if some part of the program uses inlined code, the libcall implementation must not perform nonatomic accesses under a lock. It's always correct to conservatively use a libcall, but it's not correct to inline code if the libatomic implementation would use a lock for that object.

I would note that GCC generates a call to __atomic_load_16 for a 1-byte-aligned 16 byte load -- https://godbolt.org/z/ScqQB9 -- so we're not the only ones to get this wrong. Also, GCC's libatomic implementation of __atomic_load will use a lock to load an object of type 'struct A' above if it straddles a 16-byte-alignment boundary, so GCC + GNU libatomic gets the f() case wrong too by emitting a 'mov' rather than a libcall.

One final problem is that the compiler-rt implementation of __atomic_load does not even depend on the alignment of the pointer: when given a size for which a naturally-aligned load would be atomic, it just calls __c11_atomic_load, which will in general *not* be atomic when given an unaligned pointer. (GCC's libatomic gets this right, as far as I can see.)

I would conclude:

 * in principle and in practice, __sync_* atomics do not support unaligned pointers
 * in principle, __atomic_* atomics appear to be intended to support unaligned pointers; in practice, such support does not appear to work

(and the above seems to hold across both Clang and GCC).

The best advice I can give you is to avoid misaligned atomic operations.
Comment 2 Taewook Oh 2018-09-05 12:46:29 PDT
Thank you Richard for your detailed reply! Is unaligned atomic operation UB only for these builtins, or for std::atomic as well? Here https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and seems that the generated code is just same as the one from __sync (in therms of atomic operation).
Comment 3 Richard Smith 2018-09-05 14:11:27 PDT
(In reply to Taewook Oh from comment #2)
> Thank you Richard for your detailed reply! Is unaligned atomic operation UB
> only for these builtins, or for std::atomic as well? Here
> https://gcc.godbolt.org/z/MZLRi8 I made std::atomic variable unaligned and
> seems that the generated code is just same as the one from __sync (in therms
> of atomic operation).

Generally, when you pass a T* to a function, that function is free to assume that the pointer value is suitably aligned for type T. As a result, you must be extremely careful when using pointers that point to members of a struct declared with __attribute__((packed)).

If you're using atomic operations on a packed struct as a whole, that should work fine. Both _Atomic(T) and std::atomic<T> will increase the alignment of the atomic type as necessary to make natural atomic operations work on the object, though, so atomic<packed_struct> will typically have an alignment greater than 1.

There's implementation divergence in your example: GCC ignores the __attribute__((packed)) on the member of type atomic<uint16_t> in Foo2 (and gives you a working program that performs a correct, naturally-aligned atomic access). Clang respects the request, which means you're making a member function call with a misaligned this pointer, which is UB.


You might also be interested in this greatly reduced example of what happens if you try to use __atomic ops on pointers that are not naturally aligned:  https://gcc.godbolt.org/z/LqXKrt

Note that GCC generates a plain 'mov' for this 1-byte-aligned 4 byte load. That's not atomic in general. Clang generates a call to __atomic_load_4, which in both compiler-rt and libatomic will result in a plain 'mov'.
Comment 4 Taewook Oh 2018-09-05 17:51:19 PDT
I see. Thank you for the explanation! Could you please point me where in the specification the alignment requirement for atomics is described? I tried to find it by myself but couldn't find one.
Comment 5 Richard Smith 2018-09-06 15:47:38 PDT
(In reply to Taewook Oh from comment #4)
> I see. Thank you for the explanation! Could you please point me where in the
> specification the alignment requirement for atomics is described?

For which form of atomics? (__sync_* or __atomic_* or _Atomic or std::atomic?)
Comment 6 Taewook Oh 2018-09-07 10:55:45 PDT
I meant std::atomic. (I guess there's no "standard" for builtins?)
Comment 7 Richard Smith 2018-09-07 13:28:36 PDT
(In reply to Taewook Oh from comment #6)
> I meant std::atomic. (I guess there's no "standard" for builtins?)

There is no alignment requirement in any specification for std::atomic; rather, the C++ standard requires it to work for an arbitrary trivially-copyable type, and implementations make that efficient by increasing the alignment where necessary.

In libc++ this is done by implementing it in terms of _Atomic (where available); Clang's implementation of _Atomic increases the alignment as necessary. (The alignment of _Atomic(T) *should* be specified by the psABI for the target, but in practice it is not.) However, when compiled with GCC, it just uses __atomic_* and doesn't increase the alignment, presumably resulting in miscompiles in the cases where GCC's __atomic_* functions fail to actually be atomic.

In libstdc++, the GCC bug gcc.gnu.org/PR87237 is worked around by increasing the alignment on std::atomic<T> when sizeof(T) is a power of 2 <= 16 and alignof(T) is less than sizeof(T):

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/atomic#L189
Comment 8 Taewook Oh 2018-09-07 13:33:27 PDT
Got it. Thank you for the detailed explanation!
Comment 9 Richard Smith 2018-09-07 13:33:51 PDT
PR38871 filed for the bug that libc++'s std::atomic is not always atomic when compiling with GCC. (It's a GCC bug but libc++ needs to work around it.)
Comment 10 Richard Smith 2018-09-07 17:01:52 PDT
As of r341734, clang no longer calls the __builtin_foo_<n> lib functions for misaligned pointers.
Comment 11 Nikita Kniazev 2021-07-28 06:43:57 PDT
*** Bug 25335 has been marked as a duplicate of this bug. ***