Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-expandmemcmp generates loads with incorrect alignment #43225

Closed
aqjune opened this issue Nov 2, 2019 · 6 comments
Closed

-expandmemcmp generates loads with incorrect alignment #43225

aqjune opened this issue Nov 2, 2019 · 6 comments
Labels
bugzilla Issues migrated from bugzilla llvm:codegen

Comments

@aqjune
Copy link
Contributor

aqjune commented Nov 2, 2019

Bugzilla Link 43880
Resolution FIXED
Resolved on Mar 16, 2020 06:50
Version trunk
OS All
CC @legrosbuffle,@RKSimon,@nunoplopes,@rotateright

Extended Description

--
$ cat memcmp.ll
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"
declare i32 @​memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @​cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
%call = tail call i32 @​memcmp(i8* %x, i8* %y, i64 2)
ret i32 %call
}

$ ./llvm/bin/opt -expandmemcmp -S -o - memcmp.ll
; ModuleID = 'memcmp.ll'
source_filename = "memcmp.ll"
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"

declare i32 @​memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @​cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
%1 = bitcast i8* %x to i16*
%2 = bitcast i8* %y to i16*
%3 = load i16, i16* %1
%4 = load i16, i16* %2
%5 = call i16 @​llvm.bswap.i16(i16 %3)
%6 = call i16 @​llvm.bswap.i16(i16 %4)
%7 = zext i16 %5 to i32
%8 = zext i16 %6 to i32
%9 = sub i32 %7, %8
ret i32 %9
}

; Function Attrs: nounwind readnone speculatable willreturn
declare i16 @​llvm.bswap.i16(i16) #​0

attributes #​0 = { nounwind readnone speculatable willreturn }

This is incorrect because the loads in the output has align 2 (omitted alignment in loads mean they have the ABI alignment). If %x or %y is not 2-byte aligned, the optimized code raises undefined behavior, where as the source wouldn't.

@legrosbuffle
Copy link
Contributor

Unaligned loads are both valid and fast on X86, so we do expand memcmp.
Arm v6. for example, does not allow unaligned loads, so we're not expanding.

@aqjune
Copy link
Contributor Author

aqjune commented Mar 9, 2020

Unaligned loads are both valid and fast on X86, so we do expand memcmp.
Arm v6. for example, does not allow unaligned loads, so we're not expanding.

Hello Clement,
Thank you for the info - so unaligned loads are allowed in this case.
Would it make sense if the two loads are explicitly given align 1?

  %3 = load i16, i16* %1, align 1
  %4 = load i16, i16* %2, align 1

If they are not attached, the pointers are assumed to have alignment 2 in this case, which may not be correct.

@aqjune
Copy link
Contributor Author

aqjune commented Mar 15, 2020

Made a patch here: https://reviews.llvm.org/D76113

Turns out that expandmemcmp can be activated at target aarch64 and powerpc (test/CodeGen/AArch64/bcmp-inline-small.ll, etc). In case of aarch64, mismatch in alignment may cause trap

@legrosbuffle
Copy link
Contributor

I was wrong actually, aarch64 does indeed expand even when strict loads are required. What it does is check strict loads to allow overlapping loads: the underlying assumption here is that input buffers are always aligned :(

@aqjune
Copy link
Contributor Author

aqjune commented Mar 16, 2020

Fixed via https://reviews.llvm.org/rGacdcd23b7b07 , thanks!

@aqjune
Copy link
Contributor Author

aqjune commented Mar 16, 2020

Sorry, wrong link. It is 7aecf23

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen
Projects
None yet
Development

No branches or pull requests

2 participants