New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvm doesn't generate inline sequence for load/store of unaligned atomic variable on x86_64 #33695
Comments
Should we be using mov for unaligned atomic accesses in general, or only (as in this example) where we have extrinsic knowledge that the access is actually properly aligned? |
I guess we want to generate mov for unaligned case in general, since unaligned load/store only cause minor or no slowdown on modern x86 architecutures. |
But isn't that lowering incorrect -- don't we lose atomicity if the object is actually not aligned (in particular, if it crosses a cache line boundary)? |
You are right. Thanks. Look at the asm code more, and find that GCC is able to align ip3 and ip4 to 8 bytes but llvm is only able to align ip3 and ip4 to 4 bytes. This is the root cause of the difference. So we better increase the alignment of atomic variable as an optimization, instead of generate mov for unaligned access. |
Looks like a frontend bug: struct AM { int f1, f2; }; results in an alignment 4 atomic load, which the frontend turns into a libcall. We're apparently only considering the type alignment and not the alignment of the actual atomic operand to the builtin. |
The fix for this bug (https://reviews.llvm.org/rL314145) has strange side effects on 32-bit x86, which cause quite a number of problems in FreeBSD ports. For example, take this minimalist test case: // clang -m32 -march=i486 -O2 -fomit-frame-pointer -S atomic-test-2.c -o - extern unsigned long long c; void foo(void) With clang r314144, this results in: foo: # @foo BB#0: # %entry
With clang r314145, it starts producing: foo: # @foo BB#0: # %entry
.LBB0_1: # %atomicrmw.start BB#2: # %atomicrmw.end
Why is this? The problem is that a lot of programs test for availability of __atomic primitives by compiling small test cases with code like above, which are now suddenly succeeding because libcalls to __atomic functions have been replaced by inline lock cmpxchg8b's. Note that these are even invalid with -march=i486, but clang does produce them! |
In https://wmi@llvm.org/svn/llvm-project/cfe/trunk/lib/Basic/Targets/X86.h, there is a FIXME in class X86_32TargetInfo.
Now we need to fix it. I will prepare a patch for it. |
Yes, please note that there is also this quite old review which attempted to solve the case more generally, but it got stalled: https://reviews.llvm.org/D29542 |
Wei, any update on that patch? :) |
It is committed at https://reviews.llvm.org/rL323281. Could you verify its effectiveness? |
It does work for the test cases here, but if you use e.g. -march=i586 or higher, there are still sometimes calls to __atomic functions, instead of inlined cmpxchg8b operations. I don't really know where that changed, between clang 5.0 and 6.0. |
Clang uses "Ptr.getAlignment() % sizeChars == 0" as a precondition to use inline atomic. For x86-32, unsigned long long has 8 bytes size but may have 4 bytes alignment, although it is possible to do atomic load/store for the data with such type using cmpxchg8b, clang will use libcall for it. It seems a separate restriction of Clang. Note: for clang, unsigned long long inside of structure uses 4 bytes alignment. standalone unsigned long long uses 8 bytes alignment. |
The implementation of __atomic_load in compiler-rt appears to be buggy by only looking at the size of the input when deciding whether to take a lock. It (and hence all others) should probably fall back for misaligned atomics too. A locked cmpxchg is theoretically viable on x86, but it looks like that would break libatomic compatibility. It's certainly an ABI issue. Also, if you're using genuinely misaligned atomics you probably deserve whatever you get. Finally, everyone seems to have decided that mixing libcalls and inlined atomics on the same object is OK. That's probably fine because GCC seems to agree, but be careful! |
We just hit a build failure in Fedora caused by r314145: https://bugzilla.redhat.com/show_bug.cgi?id=1565766 I've reduced the test case from the Fedora bug even further and it's another example where clang will no longer generate a native cmpxchg instruction: // clang -O2 -c t.c struct attribute ((packed)) baz { struct foo { uint32_t foor(struct foo *p, uint32_t y) |
Atomics: for the bugs that keep on giving. It seems everyone also used to agree that misaligned atomics on x86 should be lock-free, which raises 3 issues:
|
mentioned in issue llvm/llvm-bugzilla-archive#36860 |
1 similar comment
mentioned in issue llvm/llvm-bugzilla-archive#36860 |
A bunch of stuff has changed here; I think this is mostly fixed. #38194 tracks the issue that LLVM and gcc disagree about what to do if the argument to _atomic* is misaligned. |
Extended Description
-------------------- 1.cc -----------------------
#include
struct AM {
int f1;
int f2;
};
std::atomic ip3{AM()};
std::atomic ip4{AM()};
void foo() {
auto val = ip4.load(std::memory_order_relaxed);
ip3.store(val, std::memory_order_relaxed);
}
~/workarea/llvm-r309240/dbuild/bin/clang -O2 -std=c++11 -fno-exceptions -S 1.cc
.cfi_startproc
BB#0: # %entry
.Lcfi0:
.cfi_def_cfa_offset 16
movl $local_ip4, %edi
xorl %esi, %esi
callq __atomic_load_8
movl $local_ip3, %edi
xorl %edx, %edx
movq %rax, %rsi
popq %rax
jmp __atomic_store_8 # TAILCALL
.Lfunc_end0:
.size _Z3foov, .Lfunc_end0-_Z3foov
.cfi_endproc
g++ -O2 -std=c++11 -fno-exceptions -S 1.cc
.cfi_startproc
movq ip4(%rip), %rax
movq %rax, ip3(%rip)
ret
.cfi_endproc
If alignas(8) is added:
struct alignas(8) AM {
int f1;
int f2;
};
clang will generate inline sequence for the atomic accesses:
.cfi_startproc
movq ip4(%rip), %rax
movq %rax, ip3(%rip)
retq
The text was updated successfully, but these errors were encountered: