Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x #33896

Open
nunoplopes opened this issue Sep 11, 2017 · 109 comments
Open

InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x #33896

nunoplopes opened this issue Sep 11, 2017 · 109 comments
Labels

Comments

@nunoplopes
Copy link
Member

Bugzilla Link 34548
Version trunk
OS All
Blocks #34577 #39193
CC @comex,@majnemer,@vns-mn,@dwblaikie,@efriedma-quic,@fhahn,@hfinkel,@jensmaurer,@dobbelaj-snps,@aqjune,@RKSimon,@Meinersbur,@sunfishcode,@MattPD,@uecker,@RalfJung,@regehr,@rnk,@sanjoy,@rotateright,@yuanfang-chen

Extended Description

Example of an end-to-end miscompilation by clang of the following code involving ptrtoint:

$ cat c.c
#include <stdio.h>

void f(int*, int*);

int main()
{
int a=0, y[1], x = 0;
uintptr_t pi = (uintptr_t) &x;
uintptr_t yi = (uintptr_t) (y+1);
uintptr_t n = pi != yi;

if (n) {
a = 100;
pi = yi;
}

if (n) {
a = 100;
pi = (uintptr_t) y;
}

*(int *)pi = 15;

printf("a=%d x=%d\n", a, x);

f(&x,y);

return 0;
}

$ cat b.c
void f(intx, inty) {}

$ clang -O2 c.c b.c -o foo

$ ./foo
a=0 x=0

This result is wrong. The two possible outcomes are: a=0 x=15, and a=100 x=0.

The bug is in Instcombine that treats inttoptr(ptrtoint(x)) == x, which is incorrect. This transformation can only be done if x is dereferenceable for the accesses through inttoptr.
Compare the following:
clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa
clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa -instcombine

Integer compares are replaces with pointer compares (wrong) and load/stores are changed from inttoptr to pointers directly (also wrong).

Test case by Gil Hur.

@rnk
Copy link
Collaborator

rnk commented Sep 11, 2017

Personally I get "a=100 x=0" for this test case.

Does the soundness issue go away if we stop optimizing 'n' to 1? That seems like a much better fix, IMO. Integer/pointer casts are far more important to optimize through than

@regehr
Copy link
Contributor

regehr commented Sep 11, 2017

Reid, try this change:

int a = 0, y[1], x = 0;
//int a=0, x = 0, y[1];

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 11, 2017

Reid, try this change:

int a = 0, y[1], x = 0;
//int a=0, x = 0, y[1];

Doesn't seem to repro here as well (x86-64_linux) [same happens with John's suggested change].

[davide@cupiditate bin]$ ./clang c.c b.c -O1 -o foo && ./foo
a=100 x=0
[davide@cupiditate bin]$ ./clang c.c b.c -O2 -o foo && ./foo
a=100 x=0
[davide@cupiditate bin]$ ./clang c.c b.c -O3 -o foo && ./foo
a=100 x=0

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 11, 2017

Also

[davide@cupiditate bin]$ ./clang c.c -O0 -S -emit-llvm -o - | ./opt -S -sroa > pre.ll
[davide@cupiditate bin]$ ./clang c.c -O0 -S -emit-llvm -o - | ./opt -S -sroa -instcombine > post.ll
[davide@cupiditate bin]$ diff -u pre.ll post.ll
[davide@cupiditate bin]$

Presumably this might have been fixed/hidden? Which revision are you at?
I'm at clang version 6.0.0 (trunk 312918) (llvm/trunk 312925)

Side note, I had to add <stdint.h> to the code to get it compile.

@regehr
Copy link
Contributor

regehr commented Sep 11, 2017

Weird, I see "a=0 x=0" at -O2 using both 4.0 and 5.0 on OS X and Ubuntu 16.04 on x86-64.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 11, 2017

Weird, I see "a=0 x=0" at -O2 using both 4.0 and 5.0 on OS X and Ubuntu
16.04 on x86-64.

Can you verify this still happen on ToT ? In the meanwhile, I'm going to try with 5.0 on my machine (see if that applies, and bisect in case).

@david-xl
Copy link
Contributor

I could not reproduce the problem either (same IR)

@regehr
Copy link
Contributor

regehr commented Sep 11, 2017

Johns-MacBook-Pro-2:code regehr$ clang -v
clang version 6.0.0 (trunk 312932)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Users/regehr/llvm-install/bin

Johns-MacBook-Pro-2:code regehr$ clang -O2 alias4.c alias4-b.c
Johns-MacBook-Pro-2:code regehr$ ./a.out
a=0 x=0

Johns-MacBook-Pro-2:code regehr$ cat alias4.c
#include <stdint.h>
#include <stdio.h>

void f(int *, int *);

int main() {
int a = 0, y[1], x = 0;
//int a=0, x = 0, y[1];
uintptr_t pi = (uintptr_t)&x;
uintptr_t yi = (uintptr_t)(y + 1);
uintptr_t n = pi != yi;

if (n) {
a = 100;
pi = yi;
}

if (n) {
a = 100;
pi = (uintptr_t)y;
}

*(int *)pi = 15;

printf("a=%d x=%d\n", a, x);

f(&x, y);

return 0;
}

Johns-MacBook-Pro-2:code regehr$ cat alias4-b.c
void f(int *x, int *y) {}

@efriedma-quic
Copy link
Collaborator

Does the soundness issue go away if we stop optimizing 'n' to 1?

I don't think we do that? At least, I can't reproduce that with trunk clang... and it's pretty clearly not a legal transform. See https://reviews.llvm.org/rL249490 .

@nunoplopes
Copy link
Member Author

Ok, so I guess some of you cannot reproduce this because your clang has a default option to enable some sanitization.
Try with clang -cc1 (adjust paths as needed):

$ ./bin/clang -cc1 -triple x86_64-pc-linux-gnu -emit-obj -O2 -o c.o c.c -internal-externc-isystem /usr/include/ -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-isystem /home/nuno/llvm-build/lib/clang/6.0.0/include

$ ./bin/clang -o foo b.c c.o

$ ./foo

@rnk
Copy link
Collaborator

rnk commented Sep 15, 2017

I was able to repro with John's reordering of the variables, but that doesn't seem that important.

Why can't we assume inttoptr(ptrtoint(x)) is a no-op? What's wrong with that in principle? "a=0 x=0" seems like a real bug, but I'm not convinced this is the cause.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 15, 2017

Can someone attach a ll file and the opt commandline to reproduce the problem consistently?

@efriedma-quic
Copy link
Collaborator

The problem really stems the following rule from http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:

  • A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.

I think Nuno wrote up a good description somewhere, but essentially the problem is that we end up erasing dependencies, so alias analysis returns the wrong result.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 15, 2017

The problem really stems the following rule from
http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:

  • A pointer value formed by an inttoptr is based on all pointer values that
    contribute (directly or indirectly) to the computation of the pointer’s
    value.

I think Nuno wrote up a good description somewhere, but essentially the
problem is that we end up erasing dependencies, so alias analysis returns
the wrong result.

Is this the problem:

inttoptr(ptrtoint(x)) + int_value

where the int_value is actually a pointer but x is not.

If the llvm folds the first, this becomes a getelementptr with x as the base, then the result value won't aliased with what int_value points to anymore?

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 15, 2017

The problem really stems the following rule from
http://llvm.org/docs/LangRef.html#pointer-aliasing-rules:

  • A pointer value formed by an inttoptr is based on all pointer values that
    contribute (directly or indirectly) to the computation of the pointer’s
    value.

I think Nuno wrote up a good description somewhere, but essentially the
problem is that we end up erasing dependencies, so alias analysis returns
the wrong result.

I think that this is essentially correct. The problem here is that inttoptr(ptrtoint(x)) only appears be based on x. However, in this case, the inttoptr is also control dependent on a comparison between some other pointer values. The resulting pointer, therefore, is based on all of these pointers. AA won't see this, just based on inttoptr(ptrtoint(x)) == x, because it will only see x.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 15, 2017

I can now reproduce the problem. An IR reproducible:

llc good.ll > good.s
clang good.s f.o

./a.out
100, 0

opt -early-cse-memssa -S < good.ll | llc > bad.s
clang bad.s f.o

./a.out
0,0

opt -early-cse-memssa -S < good.ll > bad.ll

The key diff between good.ll and bad.ll are

< %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x
17c20
< %pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr.x

%pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr

And:

< %2 = load i32, i32* %x, align 4, !tbaa !​2
< %call = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 %2)

%call = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 0)

Looks like Early CSE wrongly

  1. commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1
  2. With 1), pi does not look like alias with x anymore, so initial value of x of 0 gets propagated to printf.

Does it have anything to do with intptr/ptrint?

// good.ll:

@.str = private unnamed_addr constant [11 x i8] c"a=%d x=%d\0A\00", align 1

define i32 @​main() local_unnamed_addr #​0 {
entry:
%x = alloca i32, align 4
%y = alloca [1 x i32], align 4
%0 = bitcast i32* %x to i8*
call void @​llvm.lifetime.start.p0i8(i64 4, i8* %0) #​4
store i32 0, i32* %x, align 4, !tbaa !​2
%1 = bitcast [1 x i32]* %y to i8*
call void @​llvm.lifetime.start.p0i8(i64 4, i8* %1) #​4
%arraydecay = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 0
%add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1
%cmp = icmp ne i32* %x, %add.ptr
%add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x
%. = select i1 %cmp, i32 100, i32 0
%pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr.x
%a.1 = select i1 %cmp, i32 100, i32 %.
store i32 15, i32* %pi.1.in, align 4, !tbaa !​2
%2 = load i32, i32* %x, align 4, !tbaa !​2
%call = call i32 (i8*, ...) @​printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 %2)
call void @​Z1fPiS(i32* nonnull %x, i32* %arraydecay)
call void @​llvm.lifetime.end.p0i8(i64 4, i8* %1) #​4
call void @​llvm.lifetime.end.p0i8(i64 4, i8* %0) #​4
ret i32 0
}

; Function Attrs: argmemonly nounwind
declare void @​llvm.lifetime.start.p0i8(i64, i8* nocapture) #​1

; Function Attrs: nounwind
declare i32 @​printf(i8* nocapture readonly, ...) local_unnamed_addr #​2

declare void @​Z1fPiS(i32*, i32*) local_unnamed_addr #​3

; Function Attrs: argmemonly nounwind
declare void @​llvm.lifetime.end.p0i8(i64, i8* nocapture) #​1

attributes #​0 = { norecurse uwtable }
attributes #​1 = { argmemonly nounwind }
attributes #​2 = { nounwind }
attributes #​3 = { nounwind }
attributes #​4 = { nounwind }

!llvm.module.flags = !{#0}
!llvm.ident = !{#1}

!​0 = !{i32 1, !"wchar_size", i32 4}
!​1 = !{!"clang version 6.0.0 (trunk 312663)"}
!​2 = !{#3, !​3, i64 0}
!​3 = !{!"int", !​4, i64 0}
!​4 = !{!"omnipotent char", !​5, i64 0}
!​5 = !{!"Simple C++ TBAA"}

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 16, 2017

I can now reproduce the problem. An IR reproducible:
...

Looks like Early CSE wrongly

  1. commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1
  2. With 1), pi does not look like alias with x anymore, so initial value of
    x of 0 gets propagated to printf.

Does it have anything to do with intptr/ptrint?

I suspect the answer is still yes because an answer of no is worse. So %add.ptr.x is defined as:

%add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1
%cmp = icmp ne i32* %x, %add.ptr
%add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x

And CSE is just applying the identity:

q := (a != b ? b : a) => q := b

and that certainly seems reasonable to me. FWIW, as currently defined, the icmp is currently well defined even for operands that point to different objects (we currently define the result in terms of the result of ptrtoint on each operand).

@sanjoy
Copy link
Contributor

sanjoy commented Sep 16, 2017

(we currently define the result in terms of the result of ptrtoint
on each operand).

That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".

@david-xl
Copy link
Contributor

I don't see the connection with ptrtoint/intptr.

Whether the conditional assignment is optimized into q := b seems irrelevant. Suppose we have a local pointer analysis that can see through this and will know pi's value is y+1 (after the first select).

I think the real problem is that the address y+1 is out of bounds of y, though there is no out of bounds access, the compiler can not assume that this address does not alias with other object, in this case 'x' (the second issue in CSE mentioned in comment #​16)

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 16, 2017

(we currently define the result in terms of the result of ptrtoint
on each operand).

That's precisely the problem though -- given what you said, the expression
is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be
safely transformed to "q := b".

Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 16, 2017

I don't see the connection with ptrtoint/intptr.

Whether the conditional assignment is optimized into q := b seems
irrelevant. Suppose we have a local pointer analysis that can see through
this and will know pi's value is y+1 (after the first select).

I think the real problem is that the address y+1 is out of bounds of y,
though there is no out of bounds access, the compiler can not assume that
this address does not alias with other object, in this case 'x' (the second
issue in CSE mentioned in comment #​16)

Yes, we could define things this way. It would make our AA much less powerful (because we'd need to assume that any pointer we could not prove in bounds could potentially alias with all other pointers). We should find a different solution.

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 16, 2017

(we currently define the result in terms of the result of ptrtoint
on each operand).

That's precisely the problem though -- given what you said, the expression
is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be
safely transformed to "q := b".

Indeed. Good point. We could say that this transformation is unsafe unless
we know that both a and b have the same underlying object.

But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also seems unfortunate.

I think, however, that I'd still prefer to restrict the inttoptr(ptrtoint(x)) folding to this.

@david-xl
Copy link
Contributor

Re #​21. Suppressing the optimization for this cases looks like paper overing the real problem.

Regarding AA, I think in practice, the common case is that off-by-one address is used as sentinel value, so we only need to teach AA to handle this explicit out of bound case instead of making AA generally less conservative. Besides, we can teach AA to look at real references. If there are, we can assume OOB does not exist -- otherwise the behavior is UB.

@david-xl
Copy link
Contributor

Re #​21. Suppressing the optimization for this cases looks like paper overing
the real problem.

Regarding AA, I think in practice, the common case is that off-by-one
address is used as sentinel value, so we only need to teach AA to handle
this explicit out of bound case instead of making AA generally less
conservative.

less conservative --> more conservative.

Besides, we can teach AA to look at real references. If there
are, we can assume OOB does not exist -- otherwise the behavior is UB.

@sanjoy
Copy link
Contributor

sanjoy commented Sep 16, 2017

(we currently define the result in terms of the result of ptrtoint
on each operand).

That's precisely the problem though -- given what you said, the expression
is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be
safely transformed to "q := b".

Indeed. Good point. We could say that this transformation is unsafe unless
we know that both a and b have the same underlying object.

But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also
seems unfortunate.

That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).

I think, however, that I'd still prefer to restrict the
inttoptr(ptrtoint(x)) folding to this.

I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more preferable" or the converse (the "however" threw me off :) ), but in any case, I think we have to restrict both. We can't assume ptrtoint(a) == ptrtoint(b) implies a == b for the reason stated above.

And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) == ptrtoint(b) implies a == b") then we have:

i64 A = ...
i64 B = ...
i64 C = A == B ? A : B;

and we replace C with B (since we're dealing with integers here so our restriction does not apply), and the whole program was actually:

// X and Y are 1 byte allocas where the address X+1 is the same as Y
i64 A = ptrtoint(Y)
i64 B = ptrtoint(X+1)
i64 C = B // after the transformation
i8* C_ptr = inttoptr(C)
*C_ptr = 1

then the inttoptr(ptrtoint(x)) == x rule gives us:

// X and Y are 1 byte allocas where the address X+1 is the same as Y
i64 A = ptrtoint(Y)
i64 B = ptrtoint(X+1)
i64 C = B // after the transformation
i8* C_ptr = inttoptr(C)
*(X+1) = 1

which is an out of bounds access.

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 16, 2017

That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).

I'm still not seeing how this causes a problem. If I have two pointers, a and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must have non-overlapping lifetimes. To have non-overlapping lifetimes, there must be some mutually-aliasing code in between.

Note that our AA is only defined for pointers that are accessed. This is exactly why we can't use AA to reason about pointer comparisons.

In fact, it may be the case that our AA queries are only well defined in general if some semantics-preserving transformation could put the two accesses next to each other. I've often thought this to be true, and perhaps this is why.

@david-xl
Copy link
Contributor

(we currently define the result in terms of the result of ptrtoint
on each operand).

That's precisely the problem though -- given what you said, the expression
is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be
safely transformed to "q := b".

Indeed. Good point. We could say that this transformation is unsafe unless
we know that both a and b have the same underlying object.

But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also
seems unfortunate.

That's pretty fundamental, unfortunately. If we want to do high level AA
inferences (like pointers derived off of two distinct malloc()'s do not
alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).

I think, however, that I'd still prefer to restrict the
inttoptr(ptrtoint(x)) folding to this.

I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more
preferable" or the converse (the "however" threw me off :) ), but in any
case, I think we have to restrict both. We can't assume ptrtoint(a) ==
ptrtoint(b) implies a == b for the reason stated above.

And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) ==
ptrtoint(b) implies a == b") then we have:

i64 A = ...
i64 B = ...
i64 C = A == B ? A : B;

and we replace C with B (since we're dealing with integers here so our
restriction does not apply), and the whole program was actually:

// X and Y are 1 byte allocas where the address X+1 is the same as Y
i64 A = ptrtoint(Y)
i64 B = ptrtoint(X+1)
i64 C = B // after the transformation
i8* C_ptr = inttoptr(C)
*C_ptr = 1

then the inttoptr(ptrtoint(x)) == x rule gives us:

// X and Y are 1 byte allocas where the address X+1 is the same as Y
i64 A = ptrtoint(Y)
i64 B = ptrtoint(X+1)
i64 C = B // after the transformation
i8* C_ptr = inttoptr(C)
*(X+1) = 1

which is an out of bounds access.

The compiler does not create out of bound access.

  1. when A != B, the original program has out of bound access
  2. and A == B, the generated code is sementially equivalent to the original program. It is like all locals can be accessed via the same base %rsp.

I think we should not suppress optimizations, but instead fix the real underlying problems.

@david-xl
Copy link
Contributor

That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b).

I'm still not seeing how this causes a problem. If I have two pointers, a
and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must
have non-overlapping lifetimes. To have non-overlapping lifetimes, there
must be some mutually-aliasing code in between.

agreed.

Note that our AA is only defined for pointers that are accessed. This is
exactly why we can't use AA to reason about pointer comparisons.

In fact, it may be the case that our AA queries are only well defined in
general if some semantics-preserving transformation could put the two
accesses next to each other. I've often thought this to be true, and perhaps
this is why.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 16, 2017

Okay, so, i admit to feeling confused why we've defined inttoptr and ptrtoint this way (IE to quantumly entangle other pointers, etc).

I feel (probably wrongly), like the pointer aliasing definition should be something like "inttoptr only aliases the memory that is at the address of value passed to it".

IE inttoptr(ptrtoint(x)) is a no-op from aliasing perspective. Whatever i can prove about the aliasing or non-alias of those objects is up to me.

To prove anything about the aliasing result, i would need to take the pointer x, try to perform math on it, and see what the result is.

Note also the result gcc 7 gives for the original example:
╰─ gcc-7 -O3 foo.c b.c
╰─ ./a.out
a=0 x=0

bam.

This has been like this as far back as i was lazy enough to try ;)

ICC 2018 gives a correct result of a=100 x=0
TL;DR Given both gcc and clang fail at this, and have for years, i'm not sure anyone expects a correct result here :)

@regehr
Copy link
Contributor

regehr commented Sep 16, 2017

It's probably not a good idea but I wanted to float the possibility of having the compiler look for cases where a one-past-the-end pointer might cause trouble and either add a bit of padding or else lay out the stack slots in a different order.

@sanjoy
Copy link
Contributor

sanjoy commented Oct 5, 2017

It probably happens in practice (why
wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim
we cannot fold the comparisons to false.

Going back a step: why do we care about folding "p+n==q" to false? Does this
pattern come up often enough that constant folding it is necessary?

I don't think we do. I think the question is: if I have pointers p and q,
and p is based on a and q is based on b, and I have p == q, can I fold that
to false? If I need to prove first that p != a + n and q != b + m, then it
becomes harder.

(I feel like I may be rehashing things that have already been settled, please let me know if that's the case.)

That's what I was asking though -- are there cases where folding "p == q" to "false" (assuming we can prove based_on(p) != based_on(q)) is important? Why can't we say that since we can't do provenance based comparisons at runtime, pointer comparisons are equivalent to comparing their bitwise representations at runtime? This prevents propagateEquality on pointer types though, but it looks like we're fine with that?

@RalfJung
Copy link
Contributor

By defining the comparison of p+n==q to return a non-deterministic value, it
allows the compiler to fold the comparison to false, and the hardware to return
true or false.

Going back a step: why do we care about folding "p+n==q" to false? Does this
pattern come up often enough that constant folding it is necessary?

I tried to catch LLVM in the act of making pointer comparison non-deterministic, but actually it seems to try hard to not do so. In particular, while it will fold "p+n == q" to false if that's the only use of p and qo, it no longer does so if p or q are escaped: https://godbolt.org/g/RD91og.

So, is this "p+n == q --> false" actually handled similar to foldAllocaCmp? (I've always been wondering why that one should only handle alloca, anyway.)

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 15, 2017

It probably happens in practice (why
wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim
we cannot fold the comparisons to false.

Going back a step: why do we care about folding "p+n==q" to false? Does this
pattern come up often enough that constant folding it is necessary?

I don't think we do. I think the question is: if I have pointers p and q,
and p is based on a and q is based on b, and I have p == q, can I fold that
to false? If I need to prove first that p != a + n and q != b + m, then it
becomes harder.

(I feel like I may be rehashing things that have already been settled,
please let me know if that's the case.)

That's what I was asking though -- are there cases where folding "p == q" to
"false" (assuming we can prove based_on(p) != based_on(q)) is important?

Are you assuming you can never determine q is a constant?
IE these are always variable pointers?

Or are you also considering the case where q is constant?

Because the obvious answer when q is constant nullptr is "null pointer checks".

Past that, this is generally how value profiling would work to remove aliasing, etc, so we should be careful there is some way to make that work (IE comparison of p or q with constant pointers or ranges, usually)

Past all that, when p and q is "guaranteed" to be non-constant, yeah, don't care in practice.

@nunoplopes
Copy link
Member Author

For reference, these slides give an illustrated explanation on this bug: http://llvm.org/devmtg/2018-04/slides/Lopes-Sbirlea-Pointers,%20Alias%20and%20ModRef%20Analyses.pdf

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 6, 2020

Any small example test case about how the magic pointer can break things?

There is now a bunch of examples in bug 44313 and bug 44374. Mostly they provide full simple examples for aspects already touched in this bug in some way or another. But the example with restrict adds a surprising twist to the whole mess.

I tried to catch LLVM in the act of making pointer comparison
non-deterministic, but actually it seems to try hard to not do so.

It did in the past but that was fixed in 2014 -- see bug 13507 and bug 21327.

as I talked with Reid and Richard, it seems that the devirtualization has
the same problem right now:

https://godbolt.org/g/LS9jc4

It seems there was an effort[1][2] to fix the problem as it applies to devirtualization independently from the general case, and the particular testcase was fixed between clang 6 and 7. I've just filed bug 44472 with new testcases.

[1] https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo
[2] https://www.youtube.com/watch?v=Dt4UehzzcsE

@aqjune
Copy link
Contributor

aqjune commented Jan 7, 2020

Any small example test case about how the magic pointer can break things?

There is now a bunch of examples in bug 44313 and bug 44374. Mostly they
provide full simple examples for aspects already touched in this bug in some
way or another. But the example with restrict adds a surprising twist to
the whole mess.

There was also another bug that involves restrict and int-ptr casting that were reported by us: #39193
Conclusion was that we need an intrinsic something like @​llvm.ptr.also_derived_from(base, a, b, c, ...) that can be used for safely replacing a pointer based on pointer equality.
#39193 #c8
That was on my to-do list, but I was busy with all other things including development of alive2 with Nuno (https://github.com/aliveToolkit/alive2 ), so its priority was low.
Actually, what we've recently found with alive2 was that it is safe to replace a pointer with another in certain cases, e.g.

define i8* @​f(i8* nocapture %p, i8* %q) {
if (%p == %q) ret %p // this can be optimized to ret %q
ret %q
}

(alive2 is still in progress; it does not support inttoptr and noalias yet, so wouldn't check your bug and this inttoptr bug, but I'll be happy if this tool can help determine when it's safe to replace pointer in the future).

I tried to catch LLVM in the act of making pointer comparison
non-deterministic, but actually it seems to try hard to not do so.

It did in the past but that was fixed in 2014 -- see bug 13507 and bug 21327.

This is very interesting. So LLVM (at least in 2014) regards such behavior as buggy.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 10, 2020

There are many thoughtful and thought-provoking comments here, thanks for this! But one comment has far-reaching consequences and especially resonated with my older thoughts:

To put it another way, if I have a comparison that gives one answer to some
observers and other answers to other observers, there may be nothing that
prevents that logical contradiction from infecting the remainder of the
execution.

This is a quite powerful principle.

  1. An illustration for its application to comparisons of the form &x == &y + 1 in gcc should be here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c42. If clang is ever modified to make such comparisons unstable again I guess it would be applicable to clang too. (But it's straightforward to demonstrate the same with bug 44342 for now if anyone wants.)

  2. Is it possible to have unfrozen undef at all? See bug 44512.

@Meinersbur
Copy link
Member

I came here from https://blog.regehr.org/archives/1737 and can maybe add another perspective.

It seems that the bug is the result of mixing two concepts of a pointer. First, as a symbolic reference to another value, second as the bit-pattern of a memory address. The former could also be represented as a tuple (baseptr,offset), the baseptr being the pointer's provenance. Pointers with different provenance cannot alias.

The problem being that there exist identical address bit-patterns that represent different logical pointers, namely the address of the pointer one past the baseptr's memory allocation can be another allocation.[*]

In the model of pointer as a logical reference, ptr2int is an implementation-defined non-injective (non-surjective) functions. That is, p!=q does not imply ptr2int(p)!=ptr2int(q) because of the validity of one-past-the-end pointers. Therefore, int2ptr is ambiguous: we cannot infer an unique (baseptr,offset) from the integer value. The conversion int2ptr(ptr2int(p)) therefore is 'lossy'.

AliasAnalysis is linked to pointer comparison in that if we can deduce p+n < q || q+n < p to be unconditionally false, then they do not not alias.

I hope this formalizes a bit what is also called 'magic', 'opaque' and 'non-deterministic' in this thread. Probably it's not just me who got confused.

Clearly, we want the transformation (a != b ? b : a) == b (comment #​17) to eventually take place on the address bit-pattern. I don't see why the final program needs to run slower on pointers than on integers. Note that, on the IR level, this mixes-up provenance regardless whether a or be are the result of int2ptr (comment #​43), so a solution that only handles int2ptr will be insufficient, at least for front-end languages that don't make pointer comparison with different provenance undefined.

I can think of two solutions:

  1. We introduce two phases into the pass pipeline. In the first phase, pointers are logical references with provenance. The identity of pointers must be kept distinct unless it can be shown that they have the same provenance, especially within GVN. Operations such as q - p are undefined (poison?) if q and p have different provenance.

In the second phase, pointers are assumed to be address bit-patterns, i.e. ptr2int is assumed on any pointers. This is a kind of lowering. AliasAnalysis may not assume provenance anymore.

  1. We track provenance explicitly (comment #​54), e.g. using metadata inserted by e.g. the front-end:

    %x = alloca !provenance !​0
    %add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1 !provenance !​1
    %cmp = icmp ne i32* %x, %add.ptr
    %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x

Since provenance is language concept, it makes sense to not just assume the same thing in the IR, especially since there seem to be differences in C++ and C (comment #​77). If my interpretation is correct, the icmp above is required to be an integer comparison in C11, but can be assumed to be false in C++. In any case, the provenance of %add.ptr.x as to be cleared ("assumed unknown") since it merges pointers of different provenance to a single pointer. Alternatively, provenance could be a set.
Stripping all the provenance matadata would be equivalent to entering phase two of the previous solution.
Unfortunately, this solution increases the overhead into the direction of ScopedAA.

Note that there is the special case of std::less: To make std::set/std::map work, it must be defined on pointers of different provenance. https://en.cppreference.com/w/cpp/utility/functional/less says:

If the function call operator of the specialization std::less calls a built-in operator comparing pointers, it yields a strict total order even if the built-in operator< does not.

In libc++, I don't see any special mechanism to make on all pointers (https://github.com/llvm/llvm-project/blob/master/libcxx/include/__functional_base#L54). This case might be interesting:

https://godbolt.org/z/mU5nZp

 int x[1], y[1];
 std::set<int*> MySet;
 MySet.insert(&x[1]);
 MySet.insert(&y[0]);
 printf("count = %d\n", (int)MySet.size());

[*] We ignore that within a segmented memory model, or virtual page tables, multiple virtual memory addresses can be mapped to the same physical memory.

@Meinersbur
Copy link
Member

Note that to just allocate another byte past the last element of arrays (comment #​30) for stack variables and globals (and malloc implementations, in particular BumpPtrAllocator) would probably be easiest: It makes the int2ptr function injective. Similar the reason why the size of an empty struct is not zero and malloc(0) returns an unique allocation.

@regehr
Copy link
Contributor

regehr commented Jun 12, 2020

Michael, my guess is that adding a byte (which of course becomes more than a byte when there are alignment requirements) is a showstopper for memory-constrained embedded use cases. Also note that (I think!) this would have to be done even for stack variables and globals that are not arrays.

@Meinersbur
Copy link
Member

Just mentioning it would be easiest and there is precedence. For stack variables, I initially expected it to only necessary whenever there is pointer arithmetic involving non-constants which is only common for e.g. buffers. However, you're right it might be necessary for any captured variable; especially in C where there a pointer of a scalar variable is an array of size one and pass-by-reference is done by passing the pointer.

There is an aspect of AliasAnalysis that I did not consider: it might only care about dereferenceable pointers. Since the one-past-the-last pointer is not dereferenceable, int2ptr is injective over the set of derefenceable pointers.
However, in the original test case, AliasAnalysis is queried about &x and one-past &y. &x is dereferenceable, but one-past &y is not which BasicAA does not consider.

Not sure whether LLVM's AliasAnalysis interface is only defined for valid pointer representations (ie including one-past-the-last) of just dereferenceable pointers. The latter would be hard for analysis since one had to proof whether a pointer is dereferenceable before an AA query.

@RalfJung
Copy link
Contributor

Note that to just allocate another byte past the last element of arrays (comment #​30) for stack variables and globals (and malloc implementations, in particular BumpPtrAllocator) would probably be easiest: It makes the int2ptr function injective. Similar the reason why the size of an empty struct is not zero and malloc(0) returns an unique allocation.

getelementptr without inbounds is allowed to create pointers that are more than "one past the end", and it is likewise allowed to cast those to integers and back. So, in LLVM, I don't think there is any way to make int2ptr injective.

Also see https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf for a detailed discussion of LLVM pointer semantics.

@Meinersbur
Copy link
Member

The LLVM reference (except getelementptr inbounds) seem to miss concepts of valid pointer representation, pointer provenance or partial ordering of pointers. The ability of BasicAA to deduce pointers as non-aliasing based on provenance seem to be inferred from C/C++ semantics.

From the semantics of icmp:

If the operands are pointer typed, the pointer values are compared as if they were integers.

Sanjoy Das seem to have worked on non-integral pointer types which cannot be created "from thin air" by disallowing integer conversion instructions.
http://llvm.org/docs/LangRef.html#nointptrtype

@efriedma-quic
Copy link
Collaborator

Not sure whether LLVM's AliasAnalysis interface is only defined for valid
pointer representations (ie including one-past-the-last) of just
dereferenceable pointers. The latter would be hard for analysis since one
had to proof whether a pointer is dereferenceable before an AA query.

The alias() API is defined for any pair of pointers which are both defined at some position in the program. The question is essentially, given two pointer values, for any location in the program where both pointer values are defined, is it possible that a store to one pointer would conflict with a load or store to the other pointer? The possible answers are essentially yes, no, or don't know.

If it in't legal to dereference a pointer, it trivially doesn't alias anything: there are no memory accesses through the pointer, so there isn't any possible conflict. Similarly, if a pointer points to a constant, it doesn't alias anything: there are no stores through it.

The LLVM reference (except getelementptr inbounds) seem to miss concepts of
valid pointer representation, pointer provenance or partial ordering of
pointers. The ability of BasicAA to deduce pointers as non-aliasing based on
provenance seem to be inferred from C/C++ semantics.

icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef. If the text of LangRef describing icmp isn't sufficiently clear, suggestions welcome. If some transform isn't consistent with this, please file a bug.

For provenance, the rules we have are described at http://llvm.org/docs/LangRef.html#pointeraliasing . The problem described here is that the rules don't actually work.

See also http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf .

@RalfJung
Copy link
Contributor

icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef.

Doesn't LLVM also assume that a fresh pointer returned by malloc does not equal any older pointer, and optimize icmp accordingly? They could physically be equal if addresses are reused (and that easily happens e.g. with jemalloc). C has a clause that pointers become undef when you deallocate the block they point to, but I have not seen such a clause in the LVLM semantics (but I might have just missed it).

@efriedma-quic
Copy link
Collaborator

icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef.

Doesn't LLVM also assume that a fresh pointer returned by malloc does not
equal any older pointer, and optimize icmp accordingly?

There's a class of optimizations where LLVM implicitly replaces malloc/new with an alloca, and we can optimize comparisons against that. In cases where that optimization isn't legal, LLVM doesn't make any assumptions, as far as I know.

@RalfJung
Copy link
Contributor

If some transform isn't consistent with this, please file a bug.

llvm/llvm-bugzilla-archive#46380

@uecker
Copy link
Mannequin

uecker mannequin commented Apr 3, 2021

For those who have not seen this. WG14 now has a draft TS which addresses this topic:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf

@RalfJung
Copy link
Contributor

mentioned in issue #34577

@aqjune
Copy link
Contributor

aqjune commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#37469

@aqjune
Copy link
Contributor

aqjune commented Nov 27, 2021

mentioned in issue #39193

@RalfJung
Copy link
Contributor

mentioned in issue llvm/llvm-bugzilla-archive#46380

@dwblaikie
Copy link
Collaborator

mentioned in issue llvm/llvm-bugzilla-archive#52570

@aqjune
Copy link
Contributor

aqjune commented Apr 20, 2022

FWIW: opt now has a --disable-i2p-p2i-opt flag that disables the roundtrip cast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests