43880 – -expandmemcmp generates loads with incorrect alignment

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43880 - -expandmemcmp generates loads with incorrect alignment

Summary: -expandmemcmp generates loads with incorrect alignment

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-11-01 22:18 PDT by Juneyoung Lee
Modified:	2020-03-16 06:50 PDT (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Juneyoung Lee 2019-11-01 22:18:54 PDT

--
$ cat memcmp.ll
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"
declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y)  {
  %call = tail call i32 @memcmp(i8* %x, i8* %y, i64 2)
  ret i32 %call
}

$ ./llvm/bin/opt -expandmemcmp -S -o - memcmp.ll
; ModuleID = 'memcmp.ll'
source_filename = "memcmp.ll"
target datalayout = "e-i8:8:8-i16:16:16"
target triple = "x86_64-unknown-unknown"

declare i32 @memcmp(i8* nocapture, i8* nocapture, i64)

define i32 @cmp2(i8* nocapture readonly %x, i8* nocapture readonly %y) {
  %1 = bitcast i8* %x to i16*
  %2 = bitcast i8* %y to i16*
  %3 = load i16, i16* %1
  %4 = load i16, i16* %2
  %5 = call i16 @llvm.bswap.i16(i16 %3)
  %6 = call i16 @llvm.bswap.i16(i16 %4)
  %7 = zext i16 %5 to i32
  %8 = zext i16 %6 to i32
  %9 = sub i32 %7, %8
  ret i32 %9
}

; Function Attrs: nounwind readnone speculatable willreturn
declare i16 @llvm.bswap.i16(i16) #0

attributes #0 = { nounwind readnone speculatable willreturn }
--

This is incorrect because the loads in the output has align 2 (omitted alignment in loads mean they have the ABI alignment). If %x or %y is not 2-byte aligned, the optimized code raises undefined behavior, where as the source wouldn't.

Comment 1 Clement Courbet 2020-03-09 03:35:05 PDT

Unaligned loads are both valid and fast on X86, so we do expand memcmp.
Arm v6. for example, does not allow unaligned loads, so we're not expanding.

Comment 2 Juneyoung Lee 2020-03-09 06:12:28 PDT

(In reply to Clement Courbet from comment #1)
> Unaligned loads are both valid and fast on X86, so we do expand memcmp.
> Arm v6. for example, does not allow unaligned loads, so we're not expanding.

Hello Clement,
Thank you for the info - so unaligned loads are allowed in this case.
Would it make sense if the two loads are explicitly given `align 1`?
```
  %3 = load i16, i16* %1, align 1
  %4 = load i16, i16* %2, align 1
```
If they are not attached, the pointers are assumed to have alignment 2 in this case, which may not be correct.

Comment 3 Juneyoung Lee 2020-03-15 03:52:07 PDT

Made a patch here: https://reviews.llvm.org/D76113

Turns out that expandmemcmp can be activated at target aarch64 and powerpc (test/CodeGen/AArch64/bcmp-inline-small.ll, etc). In case of aarch64, mismatch in alignment may cause trap

Comment 4 Clement Courbet 2020-03-16 00:47:41 PDT

I was wrong actually, aarch64 does indeed expand even when strict loads are required. What it does is check strict loads to allow overlapping loads: the underlying assumption here is that input buffers are always aligned :(

Comment 5 Juneyoung Lee 2020-03-16 06:49:13 PDT

Fixed via https://reviews.llvm.org/rGacdcd23b7b07 , thanks!

Comment 6 Juneyoung Lee 2020-03-16 06:50:51 PDT

Sorry, wrong link. It is https://github.com/llvm/llvm-project/commit/7aecf2323c4ef007ed443d9a58703fe08815b805