New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-expandmemcmp generates loads with incorrect alignment #43225
Comments
Unaligned loads are both valid and fast on X86, so we do expand memcmp. |
Hello Clement,
If they are not attached, the pointers are assumed to have alignment 2 in this case, which may not be correct. |
Made a patch here: https://reviews.llvm.org/D76113 Turns out that expandmemcmp can be activated at target aarch64 and powerpc (test/CodeGen/AArch64/bcmp-inline-small.ll, etc). In case of aarch64, mismatch in alignment may cause trap |
I was wrong actually, aarch64 does indeed expand even when strict loads are required. What it does is check strict loads to allow overlapping loads: the underlying assumption here is that input buffers are always aligned :( |
Fixed via https://reviews.llvm.org/rGacdcd23b7b07 , thanks! |
Sorry, wrong link. It is 7aecf23 |
Extended Description
--
$ cat memcmp.ll
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)
define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
%call = tail call i32 @memcmp(i8* %x, i8* %y, i64 2)
ret i32 %call
}
$ ./llvm/bin/opt -expandmemcmp -S -o - memcmp.ll
; ModuleID = 'memcmp.ll'
source_filename = "memcmp.ll"
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)
define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
%1 = bitcast i8* %x to i16*
%2 = bitcast i8* %y to i16*
%3 = load i16, i16* %1
%4 = load i16, i16* %2
%5 = call i16 @llvm.bswap.i16(i16 %3)
%6 = call i16 @llvm.bswap.i16(i16 %4)
%7 = zext i16 %5 to i32
%8 = zext i16 %6 to i32
%9 = sub i32 %7, %8
ret i32 %9
}
; Function Attrs: nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #0
attributes #0 = { nounwind readnone speculatable willreturn }
This is incorrect because the loads in the output has align 2 (omitted alignment in loads mean they have the ABI alignment). If %x or %y is not 2-byte aligned, the optimized code raises undefined behavior, where as the source wouldn't.
The text was updated successfully, but these errors were encountered: