Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing fold &a[i] - &a[j] => i - j #46774

Closed
davidbolvansky opened this issue Sep 5, 2020 · 9 comments
Closed

Missing fold &a[i] - &a[j] => i - j #46774

davidbolvansky opened this issue Sep 5, 2020 · 9 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@davidbolvansky
Copy link
Collaborator

Bugzilla Link 47430
Resolution FIXED
Resolved on Nov 24, 2020 12:25
Version trunk
OS Linux
CC @fhahn,@jdoerfert,@LebedevRI,@RKSimon,@nikic,@rotateright
Fixed by commit(s) ab29f09, 678b9c5

Extended Description

typedef SIZE_TYPE size_t;
size_t a[64];
size_t f(size_t b, size_t c)
{
return &a[b] - &a[c];
}

Clang:
f(unsigned long, unsigned long): # @​f(unsigned long, unsigned long)
sub rdi, rsi
lea rax, [8*rdi]
sar rax, 3
ret
a:
.zero 512

GCC:
f(unsigned long, unsigned long):
mov rax, rdi
sub rax, rsi
ret
a:
.zero 512

https://godbolt.org/z/7M8o5a

@rotateright
Copy link
Contributor

In this case, instcombine is converting pointer math to straight math as:

define i64 @​src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @​tgt(i64 %x, i64 %y) {
%s = sub i64 %x, %y
%sh = shl i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

https://alive2.llvm.org/ce/z/VFCZgB

And that's correct...but once we do that, it seems like there's no way to recover the lost information that we don't the need shifts? We need to get to the 'sub'-only form directly.

@davidbolvansky
Copy link
Collaborator Author

Yeah, I was playing with that and seems like we need to do it on GEP-level.

@rotateright
Copy link
Contributor

I would need to re-read the rules for 'inbounds' but if that translates to 'nuw nsw' on the mul/shift, then we should be able to get this in 2 steps.

We seem to also miss this:
https://alive2.llvm.org/ce/z/YQXKk5

@davidbolvansky
Copy link
Collaborator Author

@​a = dso_local global [64 x i64] zeroinitializer

define i64 @​src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @​tgt(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

So yeah, NSW is correct here. And then... this fold kicks in.

define i64 @​src(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

define i64 @​tgt(i64 %x, i64 %y) {
%r = sub nsw i64 %x, %y
ret i64 %r
}

So to sum up, we just need to put nsw flags on those ops?

@rotateright
Copy link
Contributor

@​a = dso_local global [64 x i64] zeroinitializer

define i64 @​src(i64 %x, i64 %y) {
%gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %x
%gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @​a, i64 0, i64 %y
%p1 = ptrtoint i64* %gep1 to i64
%p2 = ptrtoint i64* %gep2 to i64
%a = sub i64 %p1, %p2
%r = sdiv exact i64 %a, 8
ret i64 %r
}

define i64 @​tgt(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

So yeah, NSW is correct here. And then... this fold kicks in.

define i64 @​src(i64 %x, i64 %y) {
%s = sub nsw i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

define i64 @​tgt(i64 %x, i64 %y) {
%r = sub nsw i64 %x, %y
ret i64 %r
}

So to sum up, we just need to put nsw flags on those ops?

That would be good, but I suspect that we still have multiple points of failure on this example - a first hack in "OptimizePointerDifference" didn't do the job for me.

@davidbolvansky
Copy link
Collaborator Author

Hm, that is strange. Current codegen + nsw on shl:

define i64 @​src(i64 %x, i64 %y) {
%s = sub i64 %x, %y
%sh = shl nsw i64 %s, 3
%r = ashr exact i64 %sh, 3
ret i64 %r
}

opt -instsimplify produces:

define i64 @​src(i64 %x, i64 %y) local_unnamed_addr #​0 {
%s = sub i64 %x, %y
ret i64 %s
}

https://godbolt.org/z/xdqb5b

@rotateright
Copy link
Contributor

Probably won't do anything by itself, but here's one step:
https://reviews.llvm.org/rG8b300679192b

@rotateright
Copy link
Contributor

And seemingly one step back:
https://reviews.llvm.org/rG6aa3fc4a5b88

@rotateright
Copy link
Contributor

Re-did the wrapping flags patch:
https://reviews.llvm.org/rGab29f091eb64

And re-arranged the folds (we still miss the example in comment 3):
https://reviews.llvm.org/rG678b9c5dde0d

So that should take care of the problem in the description.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants