Missing fold &a[i] - &a[j] => i - j #46774

davidbolvansky · 2020-09-05T14:34:12Z


Bugzilla Link	47430
Resolution	FIXED
Resolved on	Nov 24, 2020 12:25
Version	trunk
OS	Linux
CC	@fhahn,@jdoerfert,@LebedevRI,@RKSimon,@nikic,@rotateright
Fixed by commit(s)	`ab29f09`, `678b9c5`

Extended Description

typedef SIZE_TYPE size_t;
size_t a[64];
size_t f(size_t b, size_t c)
{
return &a[b] - &a[c];
}

Clang:
f(unsigned long, unsigned long): # @f(unsigned long, unsigned long)
sub rdi, rsi
lea rax, [8*rdi]
sar rax, 3
ret
a:
.zero 512

GCC:
f(unsigned long, unsigned long):
mov rax, rdi
sub rax, rsi
ret
a:
.zero 512

https://godbolt.org/z/7M8o5a

The text was updated successfully, but these errors were encountered:

rotateright · 2020-09-06T16:11:51Z

In this case, instcombine is converting pointer math to straight math as:

define i64 @src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
%s = sub i64 %x, %y
%sh = shl i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

https://alive2.llvm.org/ce/z/VFCZgB

And that's correct...but once we do that, it seems like there's no way to recover the lost information that we don't the need shifts? We need to get to the 'sub'-only form directly.

davidbolvansky · 2020-09-06T16:19:49Z

Yeah, I was playing with that and seems like we need to do it on GEP-level.

rotateright · 2020-09-06T16:30:41Z

I would need to re-read the rules for 'inbounds' but if that translates to 'nuw nsw' on the mul/shift, then we should be able to get this in 2 steps.

We seem to also miss this:
https://alive2.llvm.org/ce/z/YQXKk5

davidbolvansky · 2020-09-06T16:38:24Z

@a = dso_local global [64 x i64] zeroinitializer

define i64 @src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

So yeah, NSW is correct here. And then... this fold kicks in.

define i64 @src(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
%r = sub nsw i64 %x, %y
ret i64 %r
}

So to sum up, we just need to put nsw flags on those ops?

rotateright · 2020-09-06T17:12:20Z

@a = dso_local global [64 x i64] zeroinitializer

define i64 @src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

So yeah, NSW is correct here. And then... this fold kicks in.

define i64 @src(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

define i64 @tgt(i64 %x, i64 %y) {
%r = sub nsw i64 %x, %y
ret i64 %r
}

So to sum up, we just need to put nsw flags on those ops?

That would be good, but I suspect that we still have multiple points of failure on this example - a first hack in "OptimizePointerDifference" didn't do the job for me.

davidbolvansky · 2020-09-06T17:28:07Z

Hm, that is strange. Current codegen + nsw on shl:

define i64 @src(i64 %x, i64 %y) {
%s = sub i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

opt -instsimplify produces:

define i64 @src(i64 %x, i64 %y) local_unnamed_addr #0 {
%s = sub i64 %x, %y
ret i64 %s
}

https://godbolt.org/z/xdqb5b

rotateright · 2020-09-07T20:04:23Z

Probably won't do anything by itself, but here's one step:
https://reviews.llvm.org/rG8b300679192b

rotateright · 2020-09-11T15:02:46Z

And seemingly one step back:
https://reviews.llvm.org/rG6aa3fc4a5b88

rotateright · 2020-11-24T20:25:22Z

Re-did the wrapping flags patch:
https://reviews.llvm.org/rGab29f091eb64

And re-arranged the folds (we still miss the example in comment 3):
https://reviews.llvm.org/rG678b9c5dde0d

So that should take care of the problem in the description.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing fold &a[i] - &a[j] => i - j #46774

Missing fold &a[i] - &a[j] => i - j #46774

davidbolvansky commented Sep 5, 2020

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 7, 2020

rotateright commented Sep 11, 2020

rotateright commented Nov 24, 2020

Missing fold &a[i] - &a[j] => i - j #46774

Missing fold &a[i] - &a[j] => i - j #46774

Comments

davidbolvansky commented Sep 5, 2020

Extended Description

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 6, 2020

davidbolvansky commented Sep 6, 2020

rotateright commented Sep 7, 2020

rotateright commented Sep 11, 2020

rotateright commented Nov 24, 2020