LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43880 - -expandmemcmp generates loads with incorrect alignment
Summary: -expandmemcmp generates loads with incorrect alignment
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Common Code Generator Code (show other bugs)
Version: trunk
Hardware: PC All
: P normal
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-01 22:18 PDT by Juneyoung Lee
Modified: 2020-03-16 06:50 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Juneyoung Lee 2019-11-01 22:18:54 PDT
--
$ cat memcmp.ll
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y)  {
  %call = tail call i32 @memcmp(i8* %x, i8* %y, i64 2)
  ret i32 %call
}

$ ./llvm/bin/opt -expandmemcmp -S -o - memcmp.ll
; ModuleID = 'memcmp.ll'
source_filename = "memcmp.ll"
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"

declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
  %1 = bitcast i8* %x to i16*
  %2 = bitcast i8* %y to i16*
  %3 = load i16, i16* %1
  %4 = load i16, i16* %2
  %5 = call i16 @llvm.bswap.i16(i16 %3)
  %6 = call i16 @llvm.bswap.i16(i16 %4)
  %7 = zext i16 %5 to i32
  %8 = zext i16 %6 to i32
  %9 = sub i32 %7, %8
  ret i32 %9
}

; Function Attrs: nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #0

attributes #0 = { nounwind readnone speculatable willreturn }
--

This is incorrect because the loads in the output has align 2 (omitted alignment in loads mean they have the ABI alignment). If %x or %y is not 2-byte aligned, the optimized code raises undefined behavior, where as the source wouldn't.
Comment 1 Clement Courbet 2020-03-09 03:35:05 PDT
Unaligned loads are both valid and fast on X86, so we do expand memcmp.
Arm v6. for example, does not allow unaligned loads, so we're not expanding.
Comment 2 Juneyoung Lee 2020-03-09 06:12:28 PDT
(In reply to Clement Courbet from comment #1)
> Unaligned loads are both valid and fast on X86, so we do expand memcmp.
> Arm v6. for example, does not allow unaligned loads, so we're not expanding.

Hello Clement,
Thank you for the info - so unaligned loads are allowed in this case.
Would it make sense if the two loads are explicitly given `align 1`?
```
  %3 = load i16, i16* %1, align 1
  %4 = load i16, i16* %2, align 1
```
If they are not attached, the pointers are assumed to have alignment 2 in this case, which may not be correct.
Comment 3 Juneyoung Lee 2020-03-15 03:52:07 PDT
Made a patch here: https://reviews.llvm.org/D76113

Turns out that expandmemcmp can be activated at target aarch64 and powerpc (test/CodeGen/AArch64/bcmp-inline-small.ll, etc). In case of aarch64, mismatch in alignment may cause trap
Comment 4 Clement Courbet 2020-03-16 00:47:41 PDT
I was wrong actually, aarch64 does indeed expand even when strict loads are required. What it does is check strict loads to allow overlapping loads: the underlying assumption here is that input buffers are always aligned :(
Comment 5 Juneyoung Lee 2020-03-16 06:49:13 PDT
Fixed via https://reviews.llvm.org/rGacdcd23b7b07 , thanks!
Comment 6 Juneyoung Lee 2020-03-16 06:50:51 PDT
Sorry, wrong link. It is https://github.com/llvm/llvm-project/commit/7aecf2323c4ef007ed443d9a58703fe08815b805