typedef __SIZE_TYPE__ size_t; size_t a[64]; size_t f(size_t b, size_t c) { return &a[b] - &a[c]; } Clang: f(unsigned long, unsigned long): # @f(unsigned long, unsigned long) sub rdi, rsi lea rax, [8*rdi] sar rax, 3 ret a: .zero 512 GCC: f(unsigned long, unsigned long): mov rax, rdi sub rax, rsi ret a: .zero 512 https://godbolt.org/z/7M8o5a
In this case, instcombine is converting pointer math to straight math as: define i64 @src(i64 %x, i64 %y) { %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y %p1 = ptrtoint i64* %gep1 to i64 %p2 = ptrtoint i64* %gep2 to i64 %a = sub i64 %p1, %p2 %r = sdiv exact i64 %a, 8 ret i64 %r } define i64 @tgt(i64 %x, i64 %y) { %s = sub i64 %x, %y %sh = shl i64 %s, 3 %r = ashr exact i64 %sh, 3 ret i64 %r } https://alive2.llvm.org/ce/z/VFCZgB And that's correct...but once we do that, it seems like there's no way to recover the lost information that we don't the need shifts? We need to get to the 'sub'-only form directly.
Yeah, I was playing with that and seems like we need to do it on GEP-level.
I would need to re-read the rules for 'inbounds' but if that translates to 'nuw nsw' on the mul/shift, then we should be able to get this in 2 steps. We seem to also miss this: https://alive2.llvm.org/ce/z/YQXKk5
@a = dso_local global [64 x i64] zeroinitializer define i64 @src(i64 %x, i64 %y) { %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y %p1 = ptrtoint i64* %gep1 to i64 %p2 = ptrtoint i64* %gep2 to i64 %a = sub i64 %p1, %p2 %r = sdiv exact i64 %a, 8 ret i64 %r } define i64 @tgt(i64 %x, i64 %y) { %s = sub nsw i64 %x, %y %sh = shl nsw i64 %s, 3 %r = ashr exact i64 %sh, 3 ret i64 %r } So yeah, NSW is correct here. And then... this fold kicks in. define i64 @src(i64 %x, i64 %y) { %s = sub nsw i64 %x, %y %sh = shl nsw i64 %s, 3 %r = ashr exact i64 %sh, 3 ret i64 %r } define i64 @tgt(i64 %x, i64 %y) { %r = sub nsw i64 %x, %y ret i64 %r } So to sum up, we just need to put nsw flags on those ops?
(In reply to David Bolvansky from comment #4) > @a = dso_local global [64 x i64] zeroinitializer > > define i64 @src(i64 %x, i64 %y) { > %gep1 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %x > %gep2 = getelementptr inbounds [64 x i64], [64 x i64]* @a, i64 0, i64 %y > %p1 = ptrtoint i64* %gep1 to i64 > %p2 = ptrtoint i64* %gep2 to i64 > %a = sub i64 %p1, %p2 > %r = sdiv exact i64 %a, 8 > ret i64 %r > } > > define i64 @tgt(i64 %x, i64 %y) { > %s = sub nsw i64 %x, %y > %sh = shl nsw i64 %s, 3 > %r = ashr exact i64 %sh, 3 > ret i64 %r > } > > > So yeah, NSW is correct here. And then... this fold kicks in. > > > define i64 @src(i64 %x, i64 %y) { > %s = sub nsw i64 %x, %y > %sh = shl nsw i64 %s, 3 > %r = ashr exact i64 %sh, 3 > ret i64 %r > } > > > define i64 @tgt(i64 %x, i64 %y) { > %r = sub nsw i64 %x, %y > ret i64 %r > } > > > > So to sum up, we just need to put nsw flags on those ops? That would be good, but I suspect that we still have multiple points of failure on this example - a first hack in "OptimizePointerDifference" didn't do the job for me.
Hm, that is strange. Current codegen + nsw on shl: define i64 @src(i64 %x, i64 %y) { %s = sub i64 %x, %y %sh = shl nsw i64 %s, 3 %r = ashr exact i64 %sh, 3 ret i64 %r } opt -instsimplify produces: define i64 @src(i64 %x, i64 %y) local_unnamed_addr #0 { %s = sub i64 %x, %y ret i64 %s } https://godbolt.org/z/xdqb5b
Probably won't do anything by itself, but here's one step: https://reviews.llvm.org/rG8b300679192b
And seemingly one step back: https://reviews.llvm.org/rG6aa3fc4a5b88
Re-did the wrapping flags patch: https://reviews.llvm.org/rGab29f091eb64 And re-arranged the folds (we still miss the example in comment 3): https://reviews.llvm.org/rG678b9c5dde0d So that should take care of the problem in the description.