-- $ cat memcmp.ll target datalayout = "e-i8:8:8-i16:16:16" target triple = "x86_64-unknown-unknown" declare i32 @memcmp(i8* nocapture, i8* nocapture, i64) define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) { %call = tail call i32 @memcmp(i8* %x, i8* %y, i64 2) ret i32 %call } $ ./llvm/bin/opt -expandmemcmp -S -o - memcmp.ll ; ModuleID = 'memcmp.ll' source_filename = "memcmp.ll" target datalayout = "e-i8:8:8-i16:16:16" target triple = "x86_64-unknown-unknown" declare i32 @memcmp(i8* nocapture, i8* nocapture, i64) define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) { %1 = bitcast i8* %x to i16* %2 = bitcast i8* %y to i16* %3 = load i16, i16* %1 %4 = load i16, i16* %2 %5 = call i16 @llvm.bswap.i16(i16 %3) %6 = call i16 @llvm.bswap.i16(i16 %4) %7 = zext i16 %5 to i32 %8 = zext i16 %6 to i32 %9 = sub i32 %7, %8 ret i32 %9 } ; Function Attrs: nounwind readnone speculatable willreturn declare i16 @llvm.bswap.i16(i16) #0 attributes #0 = { nounwind readnone speculatable willreturn } -- This is incorrect because the loads in the output has align 2 (omitted alignment in loads mean they have the ABI alignment). If %x or %y is not 2-byte aligned, the optimized code raises undefined behavior, where as the source wouldn't.
Unaligned loads are both valid and fast on X86, so we do expand memcmp. Arm v6. for example, does not allow unaligned loads, so we're not expanding.
(In reply to Clement Courbet from comment #1) > Unaligned loads are both valid and fast on X86, so we do expand memcmp. > Arm v6. for example, does not allow unaligned loads, so we're not expanding. Hello Clement, Thank you for the info - so unaligned loads are allowed in this case. Would it make sense if the two loads are explicitly given `align 1`? ``` %3 = load i16, i16* %1, align 1 %4 = load i16, i16* %2, align 1 ``` If they are not attached, the pointers are assumed to have alignment 2 in this case, which may not be correct.
Made a patch here: https://reviews.llvm.org/D76113 Turns out that expandmemcmp can be activated at target aarch64 and powerpc (test/CodeGen/AArch64/bcmp-inline-small.ll, etc). In case of aarch64, mismatch in alignment may cause trap
I was wrong actually, aarch64 does indeed expand even when strict loads are required. What it does is check strict loads to allow overlapping loads: the underlying assumption here is that input buffers are always aligned :(
Fixed via https://reviews.llvm.org/rGacdcd23b7b07 , thanks!
Sorry, wrong link. It is https://github.com/llvm/llvm-project/commit/7aecf2323c4ef007ed443d9a58703fe08815b805