47430 – Missing fold &a[i] - &a[j] => i - j

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47430 - Missing fold &a[i] - &a[j] => i - j

Summary: Missing fold &a[i] - &a[j] => i - j

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-09-05 07:34 PDT by David Bolvansky
Modified:	2020-11-24 12:25 PST (History)
CC List:	7 users (show)

See Also:	47431
Fixed By Commit(s):	ab29f09, 678b9c5

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Bolvansky 2020-09-05 07:34:12 PDT

typedef __SIZE_TYPE__ size_t;
size_t a[64];
size_t f(size_t b, size_t c)
{
  return &a[b] - &a[c];
}

Clang:
f(unsigned long, unsigned long): # @f(unsigned long, unsigned long)
  sub rdi, rsi
  lea rax, [8*rdi]
  sar rax, 3
  ret
a:
  .zero 512

GCC:
f(unsigned long, unsigned long):
        mov     rax, rdi
        sub     rax, rsi
        ret
a:
        .zero   512

https://godbolt.org/z/7M8o5a

Comment 1 Sanjay Patel 2020-09-06 09:11:51 PDT

In this case, instcombine is converting pointer math to straight math as:

define i64 @src(i64 %x, i64 %y) {
  %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
  %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
  %p1 = ptrtoint i64* %gep1 to i64
  %p2 = ptrtoint i64* %gep2 to i64
  %a = sub i64 %p1, %p2
  %r = sdiv exact i64 %a, 8
  ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
  %s = sub i64 %x, %y
  %sh = shl i64 %s, 3
  %r = ashr exact i64 %sh, 3
  ret i64 %r
}

https://alive2.llvm.org/ce/z/VFCZgB

And that's correct...but once we do that, it seems like there's no way to recover the lost information that we don't the need shifts? We need to get to the 'sub'-only form directly.

Comment 2 David Bolvansky 2020-09-06 09:19:49 PDT

Yeah, I was playing with that and seems like we need to do it on GEP-level.

Comment 3 Sanjay Patel 2020-09-06 09:30:41 PDT

I would need to re-read the rules for 'inbounds' but if that translates to 'nuw nsw' on the mul/shift, then we should be able to get this in 2 steps. 

We seem to also miss this:
https://alive2.llvm.org/ce/z/YQXKk5

Comment 4 David Bolvansky 2020-09-06 09:38:24 PDT

@a = dso_local global [64 x i64] zeroinitializer

define i64 @src(i64 %x, i64 %y) {
  %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
  %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
  %p1 = ptrtoint i64* %gep1 to i64
  %p2 = ptrtoint i64* %gep2 to i64
  %a = sub i64 %p1, %p2
  %r = sdiv exact i64 %a, 8
  ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
  %s = sub nsw i64 %x, %y
  %sh = shl nsw i64 %s, 3
  %r = ashr exact i64 %sh, 3
  ret i64 %r
}


So yeah, NSW is correct here. And then... this fold kicks in.


define i64 @src(i64 %x, i64 %y) {
  %s = sub nsw i64 %x, %y
  %sh = shl nsw i64 %s, 3
  %r = ashr exact i64 %sh, 3
  ret i64 %r
}


define i64 @tgt(i64 %x, i64 %y) {
  %r = sub nsw i64 %x, %y
  ret i64 %r
}



So to sum up, we just need to put nsw flags on those ops?

Comment 5 Sanjay Patel 2020-09-06 10:12:20 PDT

(In reply to David Bolvansky from comment #4)
> @a = dso_local global [64 x i64] zeroinitializer
> 
> define i64 @src(i64 %x, i64 %y) {
>   %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
>   %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
>   %p1 = ptrtoint i64* %gep1 to i64
>   %p2 = ptrtoint i64* %gep2 to i64
>   %a = sub i64 %p1, %p2
>   %r = sdiv exact i64 %a, 8
>   ret i64 %r
> }
> 
> define i64 @tgt(i64 %x, i64 %y) {
>   %s = sub nsw i64 %x, %y
>   %sh = shl nsw i64 %s, 3
>   %r = ashr exact i64 %sh, 3
>   ret i64 %r
> }
> 
> 
> So yeah, NSW is correct here. And then... this fold kicks in.
> 
> 
> define i64 @src(i64 %x, i64 %y) {
>   %s = sub nsw i64 %x, %y
>   %sh = shl nsw i64 %s, 3
>   %r = ashr exact i64 %sh, 3
>   ret i64 %r
> }
> 
> 
> define i64 @tgt(i64 %x, i64 %y) {
>   %r = sub nsw i64 %x, %y
>   ret i64 %r
> }
> 
> 
> 
> So to sum up, we just need to put nsw flags on those ops?

That would be good, but I suspect that we still have multiple points of failure on this example - a first hack in "OptimizePointerDifference" didn't do the job for me.

Comment 6 David Bolvansky 2020-09-06 10:28:07 PDT

Hm, that is strange. Current codegen + nsw on shl:

define i64 @src(i64 %x, i64 %y) {
  %s = sub i64 %x, %y
  %sh = shl nsw i64 %s, 3
  %r = ashr exact i64 %sh, 3
  ret i64 %r
}

opt -instsimplify produces:

define i64 @src(i64 %x, i64 %y) local_unnamed_addr #0 {
  %s = sub i64 %x, %y
  ret i64 %s
}

https://godbolt.org/z/xdqb5b

Comment 7 Sanjay Patel 2020-09-07 13:04:23 PDT

Probably won't do anything by itself, but here's one step:
https://reviews.llvm.org/rG8b300679192b

Comment 8 Sanjay Patel 2020-09-11 08:02:46 PDT

And seemingly one step back:
https://reviews.llvm.org/rG6aa3fc4a5b88

Comment 9 Sanjay Patel 2020-11-24 12:25:22 PST

Re-did the wrapping flags patch:
https://reviews.llvm.org/rGab29f091eb64

And re-arranged the folds (we still miss the example in comment 3):
https://reviews.llvm.org/rG678b9c5dde0d

So that should take care of the problem in the description.