Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression(r304267): clang miscompiles chrome on arm32 #33403

Closed
nico opened this issue Aug 3, 2017 · 29 comments
Closed

Regression(r304267): clang miscompiles chrome on arm32 #33403

nico opened this issue Aug 3, 2017 · 29 comments
Assignees
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@nico
Copy link
Contributor

nico commented Aug 3, 2017

Bugzilla Link 34056
Resolution FIXED
Resolved on Aug 10, 2017 10:13
Version trunk
OS Linux
Blocks #33196
CC @zmodem,@MatzeB,@rnk,@TNorthover

Extended Description

Chrome and all of its test binaries fail when built in a debug build on arm32. We bisected this down to r304267.

https://bugs.chromium.org/p/chromium/issues/detail?id=742563#c15 has some analysis on what's wrong. The file mentioned there is https://cs.chromium.org/chromium/src/base/threading/thread_local_storage.cc?q=base/threading/thread_local_storage.cc&sq=package:chromium&dr&l=320

TlsVectorEntry* tls_data = static_cast<TlsVectorEntry*>(
PlatformThreadLocalStorage::GetTLSValue(
base::subtle::NoBarrier_Load(&g_native_tls_key)));
if (!tls_data)
tls_data = ConstructTlsVector();
DCHECK_NE(slot_, kInvalidSlotValue);
DCHECK_LT(slot_, kThreadLocalStorageSize);
tls_data[slot_].data = value;
tls_data[slot_].version = version_;

"First ThreadLocalStorage::StaticSlot::Set(void* value) gets called, then when ThreadLocalStorage::StaticSlot::Get() is called later on, tls_data is null, and ConstructTlsVector() is called again." (this is the bug)

@nico
Copy link
Contributor Author

nico commented Aug 3, 2017

assigned to @MatzeB

@nico
Copy link
Contributor Author

nico commented Aug 3, 2017

matzeb, do you happen to see how your change could've triggered this?

@nico
Copy link
Contributor Author

nico commented Aug 3, 2017

(and GetTLSValue() is just pthread_getspecific() on iOS)

@nico
Copy link
Contributor Author

nico commented Aug 3, 2017

And base::subtle::NoBarrier_Load() is:

typedef volatile std::atomic* AtomicLocation32;
inline Atomic32 NoBarrier_Load(volatile const Atomic32* ptr) {
return ((AtomicLocation32)ptr)->load(std::memory_order_relaxed);
}

@nico
Copy link
Contributor Author

nico commented Aug 4, 2017

matzeb, ping?

@rnk
Copy link
Collaborator

rnk commented Aug 4, 2017

I think argument promotion is the source of the under-aligned alloca:

*** IR Dump After Deduce function attributes ***
; Function Attrs: inlinehint nounwind uwtable
define internal fastcc void @​foo(%struct.key* byval align 16 %end) unnamed_addr #​2 {
entry:
%0 = load i8* (%struct.obj*, i64, i32, i64, i64, %struct.key*), i8* (%struct.obj*, i64, i32, i64, i64, %struct.key*)* bitcast (%struct.obj* @​obj to i8* (%struc
t.obj*, i64, i32, i64, i64, %struct.key*)), align 8, !tbaa !​2
%1 = load i8
(%struct.obj
, i64, i32, i64, i64, %struct.key
), i8 (%struct.obj*, i64, i32, i64, i64, %struct.key*)** %0, align 8, !tbaa !​7
%call = call i8* %1(%struct.obj* nonnull @​obj, i64 0, i32 0, i64 0, i64 0, %struct.key* byval nonnull align 16 %end) #​3
ret void
}
*** IR Dump After Promote 'by reference' arguments to scalars ***
; Function Attrs: inlinehint nounwind uwtable
define internal fastcc void @​foo(i128 %end.0) unnamed_addr #​2 {
entry:
%end = alloca %struct.key
%.0 = getelementptr %struct.key, %struct.key* %end, i32 0, i32 0
store i128 %end.0, i128* %.0
%0 = load i8* (%struct.obj*, i64, i32, i64, i64, %struct.key*), i8* (%struct.obj*, i64, i32, i64, i64, %struct.key*)* bitcast (%struct.obj* @​obj to i8* (%struct.obj*, i64, i32, i64, i64, %struct.key*)), align 8, !tbaa !​2
%1 = load i8
(%struct.obj
, i64, i32, i64, i64, %struct.key
), i8 (%struct.obj*, i64, i32, i64, i64, %struct.key*)** %0, align 8, !tbaa !​7
%call = call i8* %1(%struct.obj* nonnull @​obj, i64 0, i32 0, i64 0, i64 0, %struct.key* byval nonnull align 16 %end) #​3
ret void
}

It should use the 'align' attribute to set the alignment of the alloca it creates, rather than relying on DataLayout to figure it out.

@rnk
Copy link
Collaborator

rnk commented Aug 4, 2017

Wrong bug :(

@MatzeB
Copy link
Contributor

MatzeB commented Aug 4, 2017

The relevant commit changes the lowering of cmpxchg instructions, however the mentioned ThreadLocalStorage::StaticSlot::Set function does not seem to contain any cmpxchgs (at least copying and pasting together the relevant snippets doesn't produce one for me and from reading the code I don't see one either).

So I'd suspect the actual miscompilation happens somewhere else in the code. I'll look around a bit more, but ideally if you can compile the relevant test with and without this change and inspect the difference, I'd suspect it to be pretty small and it should hint at what function is miscompiled.

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

fairly reduced
Down to one file. Compile flag (and a bunch of other goop that won't help you):

[1/9] ../../../../llvm-build/bin/clang++ -MMD -MF obj/base/base_unittests_arch_executable_sources/test_support_ios.o.d -DSYSTEM_NATIVE_UTF8 -DV8_DEPRECATION_WARNINGS -DNO_TCMALLOC -DDISABLE_NACL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -DCR_CLANG_REVISION="307486-1" -DCR_XCODE_VERSION=0833 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DWTF_USE_DYNAMIC_ANNOTATIONS=1 -I../.. -Igen -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -fcolor-diagnostics -arch armv7 -Wall -Werror -Wextra -Wundeclared-selector -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -O0 -fno-omit-frame-pointer -gdwarf-2 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.3.sdk -stdlib=libc++ -miphoneos-version-min=9.0 -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-undefined-bool-conversion -Wno-tautological-undefined-compare -std=c++11 -fobjc-call-cxx-cdtors -Wobjc-missing-property-synthesis -fno-rtti -fno-exceptions -fvisibility-inlines-hidden -c ../../base/test/test_support_ios.mm -o obj/base/base_unittests_arch_executable_sources/test_support_ios.o
[2/9] touch obj/base/base_unittests_arch_executable_sources.stamp
[3/9] TOOL_VERSION=1479159887 ../../build/toolchain/mac/linker_driver.py ../../../../llvm-build/bin/clang++ -Xlinker -rpath -Xlinker @​executable_path/Frameworks -Xlinker -objc_abi_version -Xlinker 2 -arch armv7 -Werror -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.3.sdk -stdlib=libc++ -miphoneos-version-min=9.0 -Wl,-ObjC -o "obj/base/arm/base_unittests" -Wl,-filelist,"obj/base/arm/base_unittests.rsp" -framework UIKit -framework CoreFoundation -framework CoreGraphics -framework CoreText -framework Foundation
[4/9] touch obj/base/base_unittests_executable.inputdeps.stamp
[5/9] python ../../build/toolchain/mac/linker_driver.py xcrun lipo -create -output obj/base/base_unittests obj/base/arm/base_unittests
[6/9] touch obj/base/base_unittests_executable.stamp
[7/9] touch obj/base/base_unittests.codesigning.inputdeps.stamp
[8/9] python ../../build/config/ios/codesign.py code-sign-bundle -t=iphoneos -i=E4F29B16FFF03137BD658E870CABC14FD4961B7F -e=../../build/config/ios/entitlements.plist -b=obj/base/base_unittests base_unittests.app
[9/9] touch obj/base/base_unittests.stamp

With r304267 applied, this test succeeds and the breakpoint is hit:

if (this != MyGetTLSMessageLoop()->Get())
  asm("bkpt 0");

With it reverted, it doesn't succeed and the program quits without trouble. I'll also attach the final binary in a second (out/Release-iphoneos/base_unittests.app/base_unittests -- note that this is a debug build, despite the "Release" in the filename).

The disassembly of the two binaries looks identical to me (except for offsets), except that __ZN4base6subtle24NoBarrier_CompareAndSwapEPViii has a bunch of additional

movs	r1, #&#8203;0x0

lines.

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

smaller repro

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

slightly smaller repro

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

slightly smaller repro
(whoops, prior attachment didn't compile)

@nico
Copy link
Contributor Author

nico commented Aug 8, 2017

pretty small repro
Now 74 lines total. The relevant pieces:

typedef int32_t Atomic32;
typedef volatile std::atomic* AtomicLocation32;

inline Atomic32 NoBarrier_CompareAndSwap(volatile Atomic32* ptr,
Atomic32 old_value,
Atomic32 new_value) {
((AtomicLocation32)ptr)
->compare_exchange_strong(old_value,
new_value,
std::memory_order_relaxed,
std::memory_order_relaxed);
return old_value;
}

inline Atomic32 NoBarrier_Load(volatile const Atomic32* ptr) {
return ((AtomicLocattion32)ptr)->load(std::memory_order_relaxed);
}

  • (void)runTests {
    pthread_key_t key = NoBarrier_Load(&g_native_tls_key);
    if (key == TLS_KEY_OUT_OF_INDEXES) {
    pthread_key_create(&key, OnThreadExit);
    NoBarrier_CompareAndSwap(&g_native_tls_key, TLS_KEY_OUT_OF_INDEXES, key);
    pthread_setspecific(key, new int);
    }
    if (!pthread_getspecific(NoBarrier_Load(&g_native_tls_key)))
    asm("bkpt 0"); // <- this gets hit with r304267 applied
    }

@MatzeB
Copy link
Contributor

MatzeB commented Aug 8, 2017

Thanks a lot for providing the reduced program. Indeed zeroing out the status register is the only change I can see by inspecting the assembly and that is exactly one of the things that commit changed, and as far as I can see that zeroed register isn't even read by anyone. Digging further...

@nico
Copy link
Contributor Author

nico commented Aug 9, 2017

Thanks a lot for providing the reduced program. Indeed zeroing out the
status register is the only change I can see by inspecting the assembly and
that is exactly one of the things that commit changed, and as far as I can
see that zeroed register isn't even read by anyone. Digging further...

Early in __Z24NoBarrier_CompareAndSwapPViii we have:

0000ba3 .byte 30 @ KIND_JUMP_TABLE8
0000ba33 .byte 30 @ KIND_JUMP_TABLE8
0000ba34 .byte 35 @ KIND_JUMP_TABLE8
0000ba35 .byte 64 @ KIND_JUMP_TABLE8
0000ba36 .byte 70 @ KIND_JUMP_TABLE8
0000ba38 1c 98 ldr r0, [sp, #​0x70]
0000ba3a 01 68 ldr r1, [r0]
0000ba3c 23 9a ldr r2, [sp, #​0x8c]
0000ba3e dd f8 78 c0 ldr.w r12, [sp, #​0x78]
0000ba42 00 21 movs r1, #​0x0 # XXX
0000ba44 5c e8 00 3f ldrex r3, [r12]
0000ba48 0b 45 cmp r3, r1

The line marked XXX is present in the bad binary but not in the good. Two lines further down r1 is compared against r3. So that zeroed register does look like it's getting read to me (?)

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

That part looks like this for me:
ldr r0, [sp, #​112] @ 4-byte Reload
ldr r1, [r0]
ldr r2, [sp, #​140]
ldr.w r12, [sp, #​120] @ 4-byte Reload
LBB4_4: @ %monotonic.i
@ =>This Inner Loop Header: Depth=1

  •   movs    r9, #&#8203;0                 @ This appears/disappears depending on patch
      ldrex   r3, [r12]
      cmp     r3, r1
      bne     LBB4_6
    

@TNorthover
Copy link
Contributor

The problem is that the Thumb "movs" only accepts registers r0-r8. As luck would have it 9 % 8 == 1, so when LLVM is trying to encode a move to r9 it gets interpreted as a clobber of r1.

@TNorthover
Copy link
Contributor

Sorry, that should obviously be r0-r7.

Anyway, until recently we could have used "t2MOVi" instead, but unfortunately v8M-baseline has ldrex/strex but no t2MOVi so we need to make sure that whatever we do works on (e.g.) Cortex-M23 too.

@nico
Copy link
Contributor Author

nico commented Aug 9, 2017

Hah, thanks for that insight, Tim :-)

MatzeB, if this is tricky to fix, can we revert for now? Having known miscompiles in the tree is suboptimal.

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

I'll put something in this afternoon, I think we can go back to the way before my patch without the mov (while maintaining the parts of my patch that give correct live-in lists, which was my main motivation anyway).

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

And for the record:

Turns out that when you immediately compile to a .o file you do indeed get movs r1, #&#8203;0x0 when you go through an assembly file then you have movs r9, #&#8203;0 in the assembly file and the assembler turns that into movs.w r9, #&#8203;0x0 which works.

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

Fixed in r310534

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

@​Nico Weber: Can you test and see if this fixes your issues. If it does, I'll merge the fix into the llvm 5.0 branch.

@MatzeB
Copy link
Contributor

MatzeB commented Aug 9, 2017

Let's keep this open until we have the fix in llvm-5.0

@nico
Copy link
Contributor Author

nico commented Aug 9, 2017

Thanks, I can confirm that this fixes my reduced example, and most of our base_unittests now pass for me. The two that don't also fail with the new change and r304267 reverted. So this looks much better now.

@zmodem
Copy link
Collaborator

zmodem commented Aug 10, 2017

Merged to 5.0 in r310628.

@MatzeB
Copy link
Contributor

MatzeB commented Nov 26, 2021

mentioned in issue llvm/llvm-bugzilla-archive#34212

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

5 participants