New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression(r304267): clang miscompiles chrome on arm32 #33403
Comments
assigned to @MatzeB |
matzeb, do you happen to see how your change could've triggered this? |
(and GetTLSValue() is just pthread_getspecific() on iOS) |
And base::subtle::NoBarrier_Load() is: typedef volatile std::atomic* AtomicLocation32; |
matzeb, ping? |
I think argument promotion is the source of the under-aligned alloca: *** IR Dump After Deduce function attributes *** It should use the 'align' attribute to set the alignment of the alloca it creates, rather than relying on DataLayout to figure it out. |
Wrong bug :( |
The relevant commit changes the lowering of cmpxchg instructions, however the mentioned So I'd suspect the actual miscompilation happens somewhere else in the code. I'll look around a bit more, but ideally if you can compile the relevant test with and without this change and inspect the difference, I'd suspect it to be pretty small and it should hint at what function is miscompiled. |
fairly reduced [1/9] ../../../../llvm-build/bin/clang++ -MMD -MF obj/base/base_unittests_arch_executable_sources/test_support_ios.o.d -DSYSTEM_NATIVE_UTF8 -DV8_DEPRECATION_WARNINGS -DNO_TCMALLOC -DDISABLE_NACL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -DCR_CLANG_REVISION="307486-1" -DCR_XCODE_VERSION=0833 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DWTF_USE_DYNAMIC_ANNOTATIONS=1 -I../.. -Igen -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -fcolor-diagnostics -arch armv7 -Wall -Werror -Wextra -Wundeclared-selector -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -O0 -fno-omit-frame-pointer -gdwarf-2 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.3.sdk -stdlib=libc++ -miphoneos-version-min=9.0 -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-undefined-bool-conversion -Wno-tautological-undefined-compare -std=c++11 -fobjc-call-cxx-cdtors -Wobjc-missing-property-synthesis -fno-rtti -fno-exceptions -fvisibility-inlines-hidden -c ../../base/test/test_support_ios.mm -o obj/base/base_unittests_arch_executable_sources/test_support_ios.o With r304267 applied, this test succeeds and the breakpoint is hit:
With it reverted, it doesn't succeed and the program quits without trouble. I'll also attach the final binary in a second (out/Release-iphoneos/base_unittests.app/base_unittests -- note that this is a debug build, despite the "Release" in the filename). The disassembly of the two binaries looks identical to me (except for offsets), except that __ZN4base6subtle24NoBarrier_CompareAndSwapEPViii has a bunch of additional
lines. |
slightly smaller repro |
pretty small repro typedef int32_t Atomic32; inline Atomic32 NoBarrier_CompareAndSwap(volatile Atomic32* ptr, inline Atomic32 NoBarrier_Load(volatile const Atomic32* ptr) {
|
Thanks a lot for providing the reduced program. Indeed zeroing out the status register is the only change I can see by inspecting the assembly and that is exactly one of the things that commit changed, and as far as I can see that zeroed register isn't even read by anyone. Digging further... |
Early in __Z24NoBarrier_CompareAndSwapPViii we have: 0000ba3 .byte 30 @ KIND_JUMP_TABLE8 The line marked XXX is present in the bad binary but not in the good. Two lines further down r1 is compared against r3. So that zeroed register does look like it's getting read to me (?) |
That part looks like this for me:
|
The problem is that the Thumb "movs" only accepts registers r0-r8. As luck would have it 9 % 8 == 1, so when LLVM is trying to encode a move to r9 it gets interpreted as a clobber of r1. |
Sorry, that should obviously be r0-r7. Anyway, until recently we could have used "t2MOVi" instead, but unfortunately v8M-baseline has ldrex/strex but no t2MOVi so we need to make sure that whatever we do works on (e.g.) Cortex-M23 too. |
Hah, thanks for that insight, Tim :-) MatzeB, if this is tricky to fix, can we revert for now? Having known miscompiles in the tree is suboptimal. |
I'll put something in this afternoon, I think we can go back to the way before my patch without the mov (while maintaining the parts of my patch that give correct live-in lists, which was my main motivation anyway). |
And for the record: Turns out that when you immediately compile to a .o file you do indeed get |
Fixed in r310534 |
@Nico Weber: Can you test and see if this fixes your issues. If it does, I'll merge the fix into the llvm 5.0 branch. |
Let's keep this open until we have the fix in llvm-5.0 |
Thanks, I can confirm that this fixes my reduced example, and most of our base_unittests now pass for me. The two that don't also fail with the new change and r304267 reverted. So this looks much better now. |
Merged to 5.0 in r310628. |
mentioned in issue llvm/llvm-bugzilla-archive#34212 |
Extended Description
Chrome and all of its test binaries fail when built in a debug build on arm32. We bisected this down to r304267.
https://bugs.chromium.org/p/chromium/issues/detail?id=742563#c15 has some analysis on what's wrong. The file mentioned there is https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?q=base/threading/thread_local_storage.cc&sq=package:chromium&dr&l=320
TlsVectorEntry* tls_data = static_cast<TlsVectorEntry*>(
PlatformThreadLocalStorage::GetTLSValue(
base::subtle::NoBarrier_Load(&g_native_tls_key)));
if (!tls_data)
tls_data = ConstructTlsVector();
DCHECK_NE(slot_, kInvalidSlotValue);
DCHECK_LT(slot_, kThreadLocalStorageSize);
tls_data[slot_].data = value;
tls_data[slot_].version = version_;
"First ThreadLocalStorage::StaticSlot::Set(void* value) gets called, then when ThreadLocalStorage::StaticSlot::Get() is called later on, tls_data is null, and ConstructTlsVector() is called again." (this is the bug)
The text was updated successfully, but these errors were encountered: