Example of an end-to-end miscompilation by clang of the following code involving ptrtoint: $ cat c.c #include <stdio.h> void f(int*, int*); int main() { int a=0, y[1], x = 0; uintptr_t pi = (uintptr_t) &x; uintptr_t yi = (uintptr_t) (y+1); uintptr_t n = pi != yi; if (n) { a = 100; pi = yi; } if (n) { a = 100; pi = (uintptr_t) y; } *(int *)pi = 15; printf("a=%d x=%d\n", a, x); f(&x,y); return 0; } $ cat b.c void f(int*x, int*y) {} $ clang -O2 c.c b.c -o foo $ ./foo a=0 x=0 This result is wrong. The two possible outcomes are: a=0 x=15, and a=100 x=0. The bug is in Instcombine that treats inttoptr(ptrtoint(x)) == x, which is incorrect. This transformation can only be done if x is dereferenceable for the accesses through inttoptr. Compare the following: clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa clang -O0 -S -emit-llvm -o - c.c | opt -S -sroa -instcombine Integer compares are replaces with pointer compares (wrong) and load/stores are changed from inttoptr to pointers directly (also wrong). Test case by Gil Hur.
Personally I get "a=100 x=0" for this test case. Does the soundness issue go away if we stop optimizing 'n' to 1? That seems like a much better fix, IMO. Integer/pointer casts are far more important to optimize through than
Reid, try this change: int a = 0, y[1], x = 0; //int a=0, x = 0, y[1];
(In reply to John Regehr from comment #2) > Reid, try this change: > > int a = 0, y[1], x = 0; > //int a=0, x = 0, y[1]; Doesn't seem to repro here as well (x86-64_linux) [same happens with John's suggested change]. [davide@cupiditate bin]$ ./clang c.c b.c -O1 -o foo && ./foo a=100 x=0 [davide@cupiditate bin]$ ./clang c.c b.c -O2 -o foo && ./foo a=100 x=0 [davide@cupiditate bin]$ ./clang c.c b.c -O3 -o foo && ./foo a=100 x=0
Also [davide@cupiditate bin]$ ./clang c.c -O0 -S -emit-llvm -o - | ./opt -S -sroa > pre.ll [davide@cupiditate bin]$ ./clang c.c -O0 -S -emit-llvm -o - | ./opt -S -sroa -instcombine > post.ll [davide@cupiditate bin]$ diff -u pre.ll post.ll [davide@cupiditate bin]$ Presumably this might have been fixed/hidden? Which revision are you at? I'm at clang version 6.0.0 (trunk 312918) (llvm/trunk 312925) Side note, I had to add <stdint.h> to the code to get it compile.
Weird, I see "a=0 x=0" at -O2 using both 4.0 and 5.0 on OS X and Ubuntu 16.04 on x86-64.
(In reply to John Regehr from comment #5) > Weird, I see "a=0 x=0" at -O2 using both 4.0 and 5.0 on OS X and Ubuntu > 16.04 on x86-64. Can you verify this still happen on ToT ? In the meanwhile, I'm going to try with 5.0 on my machine (see if that applies, and bisect in case).
I could not reproduce the problem either (same IR)
Johns-MacBook-Pro-2:code regehr$ clang -v clang version 6.0.0 (trunk 312932) Target: x86_64-apple-darwin16.7.0 Thread model: posix InstalledDir: /Users/regehr/llvm-install/bin Johns-MacBook-Pro-2:code regehr$ clang -O2 alias4.c alias4-b.c Johns-MacBook-Pro-2:code regehr$ ./a.out a=0 x=0 Johns-MacBook-Pro-2:code regehr$ cat alias4.c #include <stdint.h> #include <stdio.h> void f(int *, int *); int main() { int a = 0, y[1], x = 0; //int a=0, x = 0, y[1]; uintptr_t pi = (uintptr_t)&x; uintptr_t yi = (uintptr_t)(y + 1); uintptr_t n = pi != yi; if (n) { a = 100; pi = yi; } if (n) { a = 100; pi = (uintptr_t)y; } *(int *)pi = 15; printf("a=%d x=%d\n", a, x); f(&x, y); return 0; } Johns-MacBook-Pro-2:code regehr$ cat alias4-b.c void f(int *x, int *y) {}
(In reply to Reid Kleckner from comment #1) > Does the soundness issue go away if we stop optimizing 'n' to 1? I don't think we do that? At least, I can't reproduce that with trunk clang... and it's pretty clearly not a legal transform. See https://reviews.llvm.org/rL249490 .
Ok, so I guess some of you cannot reproduce this because your clang has a default option to enable some sanitization. Try with clang -cc1 (adjust paths as needed): $ ./bin/clang -cc1 -triple x86_64-pc-linux-gnu -emit-obj -O2 -o c.o c.c -internal-externc-isystem /usr/include/ -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-isystem /home/nuno/llvm-build/lib/clang/6.0.0/include $ ./bin/clang -o foo b.c c.o $ ./foo
I was able to repro with John's reordering of the variables, but that doesn't seem that important. Why can't we assume inttoptr(ptrtoint(x)) is a no-op? What's wrong with that in principle? "a=0 x=0" seems like a real bug, but I'm not convinced this is the cause.
Can someone attach a ll file and the opt commandline to reproduce the problem consistently?
The problem really stems the following rule from http://llvm.org/docs/LangRef.html#pointer-aliasing-rules: - A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value. I think Nuno wrote up a good description somewhere, but essentially the problem is that we end up erasing dependencies, so alias analysis returns the wrong result.
(In reply to Eli Friedman from comment #13) > The problem really stems the following rule from > http://llvm.org/docs/LangRef.html#pointer-aliasing-rules: > > - A pointer value formed by an inttoptr is based on all pointer values that > contribute (directly or indirectly) to the computation of the pointer’s > value. > > I think Nuno wrote up a good description somewhere, but essentially the > problem is that we end up erasing dependencies, so alias analysis returns > the wrong result. Is this the problem: inttoptr(ptrtoint(x)) + int_value where the int_value is actually a pointer but x is not. If the llvm folds the first, this becomes a getelementptr with x as the base, then the result value won't aliased with what int_value points to anymore?
(In reply to Eli Friedman from comment #13) > The problem really stems the following rule from > http://llvm.org/docs/LangRef.html#pointer-aliasing-rules: > > - A pointer value formed by an inttoptr is based on all pointer values that > contribute (directly or indirectly) to the computation of the pointer’s > value. > > I think Nuno wrote up a good description somewhere, but essentially the > problem is that we end up erasing dependencies, so alias analysis returns > the wrong result. I think that this is essentially correct. The problem here is that inttoptr(ptrtoint(x)) only appears be based on x. However, in this case, the inttoptr is also control dependent on a comparison between some other pointer values. The resulting pointer, therefore, is based on all of these pointers. AA won't see this, just based on inttoptr(ptrtoint(x)) == x, because it will only see x.
I can now reproduce the problem. An IR reproducible: llc good.ll > good.s clang good.s f.o ./a.out 100, 0 opt -early-cse-memssa -S < good.ll | llc > bad.s clang bad.s f.o ./a.out 0,0 opt -early-cse-memssa -S < good.ll > bad.ll The key diff between good.ll and bad.ll are < %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x 17c20 < %pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr.x --- > %pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr And: < %2 = load i32, i32* %x, align 4, !tbaa !2 < %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 %2) --- > %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 0) Looks like Early CSE wrongly 1) commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1 2) With 1), pi does not look like alias with x anymore, so initial value of x of 0 gets propagated to printf. Does it have anything to do with intptr/ptrint? // good.ll: @.str = private unnamed_addr constant [11 x i8] c"a=%d x=%d\0A\00", align 1 define i32 @main() local_unnamed_addr #0 { entry: %x = alloca i32, align 4 %y = alloca [1 x i32], align 4 %0 = bitcast i32* %x to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #4 store i32 0, i32* %x, align 4, !tbaa !2 %1 = bitcast [1 x i32]* %y to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) #4 %arraydecay = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 0 %add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1 %cmp = icmp ne i32* %x, %add.ptr %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x %. = select i1 %cmp, i32 100, i32 0 %pi.1.in = select i1 %cmp, i32* %arraydecay, i32* %add.ptr.x %a.1 = select i1 %cmp, i32 100, i32 %. store i32 15, i32* %pi.1.in, align 4, !tbaa !2 %2 = load i32, i32* %x, align 4, !tbaa !2 %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %a.1, i32 %2) call void @_Z1fPiS_(i32* nonnull %x, i32* %arraydecay) call void @llvm.lifetime.end.p0i8(i64 4, i8* %1) #4 call void @llvm.lifetime.end.p0i8(i64 4, i8* %0) #4 ret i32 0 } ; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1 ; Function Attrs: nounwind declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #2 declare void @_Z1fPiS_(i32*, i32*) local_unnamed_addr #3 ; Function Attrs: argmemonly nounwind declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1 attributes #0 = { norecurse uwtable } attributes #1 = { argmemonly nounwind } attributes #2 = { nounwind } attributes #3 = { nounwind } attributes #4 = { nounwind } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{!"clang version 6.0.0 (trunk 312663)"} !2 = !{!3, !3, i64 0} !3 = !{!"int", !4, i64 0} !4 = !{!"omnipotent char", !5, i64 0} !5 = !{!"Simple C++ TBAA"}
(In reply to Xinliang David Li from comment #16) > I can now reproduce the problem. An IR reproducible: ... > > > Looks like Early CSE wrongly > 1) commonizes the address of x and y+1 and makes %add.ptr.x to be y + 1 > 2) With 1), pi does not look like alias with x anymore, so initial value of > x of 0 gets propagated to printf. > > > Does it have anything to do with intptr/ptrint? > I suspect the answer is still yes because an answer of no is worse. So %add.ptr.x is defined as: %add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1 %cmp = icmp ne i32* %x, %add.ptr %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x And CSE is just applying the identity: q := (a != b ? b : a) => q := b and that certainly seems reasonable to me. FWIW, as currently defined, the icmp is currently well defined even for operands that point to different objects (we currently define the result in terms of the result of ptrtoint on each operand).
(In reply to Hal Finkel from comment #17) > (we currently define the result in terms of the result of ptrtoint > on each operand). That's precisely the problem though -- given what you said, the expression is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be safely transformed to "q := b".
I don't see the connection with ptrtoint/intptr. Whether the conditional assignment is optimized into q := b seems irrelevant. Suppose we have a local pointer analysis that can see through this and will know pi's value is y+1 (after the first select). I think the real problem is that the address y+1 is out of bounds of y, though there is no out of bounds access, the compiler can not assume that this address does not alias with other object, in this case 'x' (the second issue in CSE mentioned in comment #16)
(In reply to Sanjoy Das from comment #18) > (In reply to Hal Finkel from comment #17) > > (we currently define the result in terms of the result of ptrtoint > > on each operand). > > That's precisely the problem though -- given what you said, the expression > is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be > safely transformed to "q := b". Indeed. Good point. We could say that this transformation is unsafe unless we know that both a and b have the same underlying object.
(In reply to David (Xinliang) Li from comment #19) > I don't see the connection with ptrtoint/intptr. > > Whether the conditional assignment is optimized into q := b seems > irrelevant. Suppose we have a local pointer analysis that can see through > this and will know pi's value is y+1 (after the first select). > > I think the real problem is that the address y+1 is out of bounds of y, > though there is no out of bounds access, the compiler can not assume that > this address does not alias with other object, in this case 'x' (the second > issue in CSE mentioned in comment #16) Yes, we could define things this way. It would make our AA much less powerful (because we'd need to assume that any pointer we could not prove in bounds could potentially alias with all other pointers). We should find a different solution.
(In reply to Hal Finkel from comment #20) > (In reply to Sanjoy Das from comment #18) > > (In reply to Hal Finkel from comment #17) > > > (we currently define the result in terms of the result of ptrtoint > > > on each operand). > > > > That's precisely the problem though -- given what you said, the expression > > is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be > > safely transformed to "q := b". > > Indeed. Good point. We could say that this transformation is unsafe unless > we know that both a and b have the same underlying object. But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also seems unfortunate. I think, however, that I'd still prefer to restrict the inttoptr(ptrtoint(x)) folding to this.
Re #21. Suppressing the optimization for this cases looks like paper overing the real problem. Regarding AA, I think in practice, the common case is that off-by-one address is used as sentinel value, so we only need to teach AA to handle this explicit out of bound case instead of making AA generally less conservative. Besides, we can teach AA to look at real references. If there are, we can assume OOB does not exist -- otherwise the behavior is UB.
(In reply to David (Xinliang) Li from comment #23) > Re #21. Suppressing the optimization for this cases looks like paper overing > the real problem. > > Regarding AA, I think in practice, the common case is that off-by-one > address is used as sentinel value, so we only need to teach AA to handle > this explicit out of bound case instead of making AA generally less > conservative. less conservative --> more conservative. > Besides, we can teach AA to look at real references. If there > are, we can assume OOB does not exist -- otherwise the behavior is UB.
(In reply to Hal Finkel from comment #22) > (In reply to Hal Finkel from comment #20) > > (In reply to Sanjoy Das from comment #18) > > > (In reply to Hal Finkel from comment #17) > > > > (we currently define the result in terms of the result of ptrtoint > > > > on each operand). > > > > > > That's precisely the problem though -- given what you said, the expression > > > is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be > > > safely transformed to "q := b". > > > > Indeed. Good point. We could say that this transformation is unsafe unless > > we know that both a and b have the same underlying object. > > But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also > seems unfortunate. That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b). > I think, however, that I'd still prefer to restrict the > inttoptr(ptrtoint(x)) folding to this. I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more preferable" or the converse (the "however" threw me off :) ), but in any case, I think we have to restrict both. We can't assume ptrtoint(a) == ptrtoint(b) implies a == b for the reason stated above. And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) == ptrtoint(b) implies a == b") then we have: i64 A = ... i64 B = ... i64 C = A == B ? A : B; and we replace C with B (since we're dealing with integers here so our restriction does not apply), and the whole program was actually: // X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8* C_ptr = inttoptr(C) *C_ptr = 1 then the inttoptr(ptrtoint(x)) == x rule gives us: // X and Y are 1 byte allocas where the address X+1 is the same as Y i64 A = ptrtoint(Y) i64 B = ptrtoint(X+1) i64 C = B // after the transformation i8* C_ptr = inttoptr(C) *(X+1) = 1 which is an out of bounds access.
> That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b). I'm still not seeing how this causes a problem. If I have two pointers, a and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must have non-overlapping lifetimes. To have non-overlapping lifetimes, there must be some mutually-aliasing code in between. Note that our AA is only defined for pointers that are accessed. This is exactly why we can't use AA to reason about pointer comparisons. In fact, it may be the case that our AA queries are only well defined in general if some semantics-preserving transformation could put the two accesses next to each other. I've often thought this to be true, and perhaps this is why.
(In reply to Sanjoy Das from comment #25) > (In reply to Hal Finkel from comment #22) > > (In reply to Hal Finkel from comment #20) > > > (In reply to Sanjoy Das from comment #18) > > > > (In reply to Hal Finkel from comment #17) > > > > > (we currently define the result in terms of the result of ptrtoint > > > > > on each operand). > > > > > > > > That's precisely the problem though -- given what you said, the expression > > > > is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be > > > > safely transformed to "q := b". > > > > > > Indeed. Good point. We could say that this transformation is unsafe unless > > > we know that both a and b have the same underlying object. > > > > But if ptrtoint(a) == ptrtoint(b) does not imply that a == b, that also > > seems unfortunate. > > That's pretty fundamental, unfortunately. If we want to do high level AA > inferences (like pointers derived off of two distinct malloc()'s do not > alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b). > > > I think, however, that I'd still prefer to restrict the > > inttoptr(ptrtoint(x)) folding to this. > > I'm not sure if you meant "restricting inttoptr(ptrtoint(x))==x is more > preferable" or the converse (the "however" threw me off :) ), but in any > case, I think we have to restrict both. We can't assume ptrtoint(a) == > ptrtoint(b) implies a == b for the reason stated above. > > And if we allow inttoptr(ptrtoint(x)) == x (but not "ptrtoint(a) == > ptrtoint(b) implies a == b") then we have: > > i64 A = ... > i64 B = ... > i64 C = A == B ? A : B; > > and we replace C with B (since we're dealing with integers here so our > restriction does not apply), and the whole program was actually: > > // X and Y are 1 byte allocas where the address X+1 is the same as Y > i64 A = ptrtoint(Y) > i64 B = ptrtoint(X+1) > i64 C = B // after the transformation > i8* C_ptr = inttoptr(C) > *C_ptr = 1 > > then the inttoptr(ptrtoint(x)) == x rule gives us: > > // X and Y are 1 byte allocas where the address X+1 is the same as Y > i64 A = ptrtoint(Y) > i64 B = ptrtoint(X+1) > i64 C = B // after the transformation > i8* C_ptr = inttoptr(C) > *(X+1) = 1 > > which is an out of bounds access. The compiler does not create out of bound access. 1) when A != B, the original program has out of bound access 2) and A == B, the generated code is sementially equivalent to the original program. It is like all locals can be accessed via the same base %rsp. I think we should not suppress optimizations, but instead fix the real underlying problems.
(In reply to Hal Finkel from comment #26) > > That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b). > > I'm still not seeing how this causes a problem. If I have two pointers, a > and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must > have non-overlapping lifetimes. To have non-overlapping lifetimes, there > must be some mutually-aliasing code in between. agreed. > > Note that our AA is only defined for pointers that are accessed. This is > exactly why we can't use AA to reason about pointer comparisons. > > In fact, it may be the case that our AA queries are only well defined in > general if some semantics-preserving transformation could put the two > accesses next to each other. I've often thought this to be true, and perhaps > this is why.
Okay, so, i admit to feeling confused why we've defined inttoptr and ptrtoint this way (IE to quantumly entangle other pointers, etc). I feel (probably wrongly), like the pointer aliasing definition should be something like "inttoptr only aliases the memory that is at the address of value passed to it". IE inttoptr(ptrtoint(x)) is a no-op from aliasing perspective. Whatever i can prove about the aliasing or non-alias of those objects is up to me. To prove anything about the aliasing result, i would need to take the pointer x, try to perform math on it, and see what the result is. Note also the result gcc 7 gives for the original example: ╰─ gcc-7 -O3 foo.c b.c ╰─ ./a.out a=0 x=0 bam. This has been like this as far back as i was lazy enough to try ;) ICC 2018 gives a correct result of a=100 x=0 TL;DR Given both gcc and clang fail at this, and have for years, i'm not sure anyone expects a correct result here :)
It's probably not a good idea but I wanted to float the possibility of having the compiler look for cases where a one-past-the-end pointer might cause trouble and either add a bit of padding or else lay out the stack slots in a different order.
Regarding "anyone expects a correct result here" is it a reasonable reading of the "Add a new paragraph" part of DR 260 that the compiler is sort of allowed to do the wrong thing? http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
(In reply to John Regehr from comment #31) > Regarding "anyone expects a correct result here" is it a reasonable reading > of the "Add a new paragraph" part of DR 260 that the compiler is sort of > allowed to do the wrong thing? > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm So, DR260 came about because of similar cases in gcc land. You can find cases that don't directly implicate it but are similar things. There is a whole slew of bugs (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54945 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502 and others that i forget). In fact, i believe gcc has determined that they couldn't even propagate conditional equivalences for pointers *or* integers, if they wanted this to work in all cases. See, e.g.,https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65752 the later comments. As richard points out: "You can expose the same issue by piecewise decomposing the pointer to chars, and having them equivalency propagated in a bogus way, then reconstruct the pointer from the chars. So it's not enough to disable pointer and uintptr_t propagations either." You can also split the equivalency propagation across as many functions as you want. So i believe i come to the same conclusion Richard does: If you want this stuff to work in all cases, we'd probably have to give up on conditional equivalences. Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something. Yay for C/C++
I think Hal nails the problem: both data-flow and control-flow may contribute to the value of the value of ptr2int. This is because integers can be freely swapped (e.g. by GVN) and so data-flow dependencies can be lost. Therefore a pointer produced by int2ptr may alias with other objects rather than only the one found through data-flow dependencies. That's why int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in some cases. BTW, regarding gcc, the only bug we were able to find was in AA with != comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177
> Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something. > > Yay for C/C++ I believe that we could define the current state of things to be correct by saying: Any programs with side effects dependent on the absolute or relative addresses of objects is ill defined. However, while "most" code is probably fine with that, I suspect that you can't write an OS kernel, a sanitier runtime, or the like, under those restrictions. Even if you restrict the UB to objects on the stack, which I suppose you could do, it's still not clear to me how practical this will be. I don't want to just ignore the problem, but either we define self-consistent semantics with which we're happy that makes this ill defined, or we fix it (either by default or, as suggested, under some pedantic flag).
(In reply to Nuno Lopes from comment #33) > I think Hal nails the problem: both data-flow and control-flow may > contribute to the value of the value of ptr2int. > This is because integers can be freely swapped (e.g. by GVN) and so > data-flow dependencies can be lost. Sure. > > Therefore a pointer produced by int2ptr may alias with other objects rather > than only the one found through data-flow dependencies. That's why > int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in > some cases. > > BTW, regarding gcc, the only bug we were able to find was in AA with != > comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177 As richard says in that bug (and i pointed out elsewhere), it will completely screw up pretty much any aliasing induced by conditional equivalences (which are the other bugs i cited).
(In reply to Hal Finkel from comment #34) > > Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something. > > > > Yay for C/C++ > > I believe that we could define the current state of things to be correct by > saying: Any programs with side effects dependent on the absolute or relative > addresses of objects is ill defined. Sure > > However, while "most" code is probably fine with that, I suspect that you > can't write an OS kernel, a sanitier runtime, or the like, under those > restrictions. Regardless of what we do, i would like to challenge this, because it implies the current state is very broken. IE i think we could define the current state to be fine (maybe not the way you have) and be okay, from what i've seen. Apple ships a clang built kernel. Google has built and tested (heavily) production linux kernels with clang. Same with android and chromeos kernels (IE a very different situation). Obviously, the current sanitizer runtimes are also built with clang. So far, not a single bug or crash has been tracked down or caused by this or related issues. I feel like this is empirical proof that you *can* write an OS kernel, sanitizer runtime, or the like, that works okay. Maybe it'll stop working in the future. But it has survived at least a year of llvm/clang development so far :) GCC has not stopped propagated any conditional equivalences (IE of the form in 17, 18, etc), and is the default compiler for the kernel/sanitier runtime for everyone else. It has been propagating them like this for at least 10 years. It has had bugs, including this one (and similar related ones) for 10+ years. That to me suggests the this world is not that bad. > Even if you restrict the UB to objects on the stack, which I > suppose you could do, it's still not clear to me how practical this will be. > > I don't want to just ignore the problem, but either we define > self-consistent semantics with which we're happy that makes this ill > defined, or we fix it (either by default or, as suggested, under some > pedantic flag). Though to be clear, i'm fine with whatever.
(In reply to Daniel Berlin from comment #36) > (In reply to Hal Finkel from comment #34) > > > Given all this, i'll be honest, my temptation is to ignore this problem, or: give a very well defined way that people can have this work that we can mark in the frontend and emit, and give up optimizing those or something. > > > > > > Yay for C/C++ > > > > I believe that we could define the current state of things to be correct by > > saying: Any programs with side effects dependent on the absolute or relative > > addresses of objects is ill defined. > > Sure > > > > > However, while "most" code is probably fine with that, I suspect that you > > can't write an OS kernel, a sanitier runtime, or the like, under those > > restrictions. > > Regardless of what we do, i would like to challenge this, because it implies > the current state is very broken. IE i think we could define the current > state to be fine (maybe not the way you have) and be okay, from what i've > seen. I agree. I meant to imply that we probably can't use something as general as I had suggested. > > Apple ships a clang built kernel. > Google has built and tested (heavily) production linux kernels with clang. > Same with android and chromeos kernels (IE a very different situation). > Obviously, the current sanitizer runtimes are also built with clang. > > So far, not a single bug or crash has been tracked down or caused by this or > related issues. > > I feel like this is empirical proof that you *can* write an OS kernel, > sanitizer runtime, or the like, that works okay. > > Maybe it'll stop working in the future. But it has survived at least a year > of llvm/clang development so far :) > > GCC has not stopped propagated any conditional equivalences (IE of the form > in 17, 18, etc), and is the default compiler for the kernel/sanitier runtime > for everyone else. > It has been propagating them like this for at least 10 years. It has had > bugs, including this one (and similar related ones) for 10+ years. > > That to me suggests the this world is not that bad. To be clear, I didn't mean to imply that our current implementation, as far as this goes, is all that bad in practice. I mean that, if you define the UB broadly as I suggested above, you'd likely end up with problems with that kind of code. It does leave open, however, how we'd actually define things to make this okay. > > > Even if you restrict the UB to objects on the stack, which I > > suppose you could do, it's still not clear to me how practical this will be. > > > > I don't want to just ignore the problem, but either we define > > self-consistent semantics with which we're happy that makes this ill > > defined, or we fix it (either by default or, as suggested, under some > > pedantic flag). > > Though to be clear, i'm fine with whatever.
Great, sounds like we are on the same page. Unfortunately for LLVM, GCC has it easier in a lot of ways here. It is mainly based on a constraint-based points-to analysis, which means you can add any constraints you want (we can do the equivalent in our points-to, but we don't rely on a real points-to at all). So for example, for this bug, they can just add constraints based on conditionals, or whatever else they want. This is true even though they are really *uses* and not defs. LLVM's BasicAA only looks at the def chain when determining what has happened to a thing. This is not enough if you are allowed to use conditionals to induce aliasing relationships. Fixing the multi-file case is pretty much impossible without giving up, you'd have to assume that all escaped pointers may be equal after a function call, etc.
(In reply to Nuno Lopes from comment #33) > I think Hal nails the problem: both data-flow and control-flow may > contribute to the value of the value of ptr2int. > This is because integers can be freely swapped (e.g. by GVN) and so > data-flow dependencies can be lost. > > Therefore a pointer produced by int2ptr may alias with other objects rather > than only the one found through data-flow dependencies. That's why > int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in > some cases. > > BTW, regarding gcc, the only bug we were able to find was in AA with != > comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177 I wanted to point out this issue has nothing to do with inttoptr. I can reproduce the problem with a program that does not use any of intptr/ptrint: Fundamentally, what happens is that with address comparisons optimization pass(es) convert the program which is semantically equivalent but changes the underlying object of what a type points to. This greatly confuses AA (which assumes out of bound pointer does not point to neighboring stack objects or unknown pointer can not point to non-escaped locals etc) and leads to latter optimization pass to do bad things. Capturing the alias relationship with predicate/assertion propagation is the way to solve it as pointed out by Danny, but there are just too many situations to handle to make it worthwhile. We can however special case the one-pass-the-end situation which is not uncommon IMO. // program: #include <stdio.h> void f(int*, int*); int main() { int a=0, x = 0, y[1]; // int a=0, y[1], x =0 ; int *pi = &x; int *yi = &y[1]; bool n = pi != yi; if (n) { a = 100; pi = yi; } if (n) { a = 100; pi = &y[0]; } *pi = 15; printf("a=%d x=%d\n", a, x); f(&x,y); return 0; }
(In reply to Hal Finkel from comment #26) > > That's pretty fundamental, unfortunately. If we want to do high level AA inferences (like pointers derived off of two distinct malloc()'s do not alias) then we have be okay with ptrtoint(a) == ptrtoint(b) != (a == b). > > I'm still not seeing how this causes a problem. If I have two pointers, a > and b, and they don't alias, but ptrtoint(a) == ptrtoint(b), then they must > have non-overlapping lifetimes. To have non-overlapping lifetimes, there > must be some mutually-aliasing code in between. I was thinking of this situation: A = malloc(); free(A); B = malloc(); *B = 0 C = select (ptrtoint(A) == ptrtoint(B)), B, A ;; S0 r = *C free(B); If I apply the select rule you mention to S0 then we get: A = malloc(); free(A); B = malloc(); *B = 0 C = A r = *C free(B); Now since A and B are distinct mallocs, BasicAA (https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BasicAliasAnalysis.cpp#L1579) will tell us that the store to B and load from C (=A) can be reordered. But in the original program, if ptrtoint(A) == ptrtoint(B) then r should have seen the store "*B = 0", so the reordering is not legal. One way out of this is to make our memory model "flat" -- i.e. model the heap as a large array and make it well defined to offset from one allocation to another. But that may regress alias analysis (I'm not sure by how much, though). > Note that our AA is only defined for pointers that are accessed. This is > exactly why we can't use AA to reason about pointer comparisons. > > In fact, it may be the case that our AA queries are only well defined in > general if some semantics-preserving transformation could put the two > accesses next to each other. I've often thought this to be true, and perhaps > this is why.
(In reply to David (Xinliang) Li from comment #27) > The compiler does not create out of bound access. > 1) when A != B, the original program has out of bound access Agreed -- if A != B then any transform is valid since the source has UB. > 2) and A == B, the generated code is sementially equivalent to the original > program. It is like all locals can be accessed via the same base %rsp. I don't think this is true, if I understand you correctly. For instance, if you could access one alloca by offsetting from another (I presume that's what you meant by "all locals can be accessed via the same base %rsp"), mem2reg would not be legal. E.g. mem2reg would not be able to transform define i32 @f() { %a = alloca i32 %b = alloca i32 store i32 0, i32* %a call void @unknown(i32* %b) %v = load i32, i32* %a ret i32 %v } to define i32 @f() { %b = alloca i32 call void @unknown(i32* %b) ret i32 0 } (which it does today) since @unknown //may have// written something other than 0 to %a by offsetting from %b. > I think we should not suppress optimizations, but instead fix the real > underlying problems.
(In reply to Sanjoy Das from comment #41) > (In reply to David (Xinliang) Li from comment #27) > > The compiler does not create out of bound access. > > 1) when A != B, the original program has out of bound access > > Agreed -- if A != B then any transform is valid since the source has UB. > > > 2) and A == B, the generated code is sementially equivalent to the original > > program. It is like all locals can be accessed via the same base %rsp. > > I don't think this is true, if I understand you correctly. For instance, if > you could access one alloca by offsetting from another (I presume that's > what you meant by "all locals can be accessed via the same base %rsp"), > mem2reg would not be legal. E.g. mem2reg would not be able to transform > > define i32 @f() { > %a = alloca i32 > %b = alloca i32 > store i32 0, i32* %a > call void @unknown(i32* %b) > %v = load i32, i32* %a > ret i32 %v > } > > to > > define i32 @f() { > %b = alloca i32 > call void @unknown(i32* %b) > ret i32 0 > } > > (which it does today) since @unknown //may have// written something other > than 0 to %a by offsetting from %b. > > > I think we should not suppress optimizations, but instead fix the real > > underlying problems. The underlying object being accessed is not changed regardless of what base pointer is used -- whether AA can handle it currently is a different story. For instance, suppose we want to access locality of local objects, we will need to introduce an optimization pass that lays out stack object in an optimal way. After this is done, all the local objects will now have a fixed offset from the same base. AA needs to be taught to handle such scenarios (e.g. annotate the accesses with unique local ids).
(In reply to David (Xinliang) Li from comment #39) > (In reply to Nuno Lopes from comment #33) > > I think Hal nails the problem: both data-flow and control-flow may > > contribute to the value of the value of ptr2int. > > This is because integers can be freely swapped (e.g. by GVN) and so > > data-flow dependencies can be lost. > > > > Therefore a pointer produced by int2ptr may alias with other objects rather > > than only the one found through data-flow dependencies. That's why > > int2ptr(ptr2int(x)) cannot always be replaced with x -- it certainly can in > > some cases. > > > > BTW, regarding gcc, the only bug we were able to find was in AA with != > > comparisons only: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82177 > > I wanted to point out this issue has nothing to do with inttoptr. I can > reproduce the problem with a program that does not use any of intptr/ptrint: > > > Fundamentally, what happens is that with address comparisons optimization > pass(es) convert the program which is semantically equivalent but changes > the underlying object of what a type points to. This greatly confuses AA > (which assumes out of bound pointer does not point to neighboring stack > objects or unknown pointer can not point to non-escaped locals etc) and > leads to latter optimization pass to do bad things. > > Capturing the alias relationship with predicate/assertion propagation is the > way to solve it as pointed out by Danny, but there are just too many > situations to handle to make it worthwhile. We can however special case > the one-pass-the-end situation which is not uncommon IMO. > > > > // program: > > #include <stdio.h> > void f(int*, int*); > > int main() > { > int a=0, x = 0, y[1]; > // int a=0, y[1], x =0 ; > int *pi = &x; > int *yi = &y[1]; > bool n = pi != yi; > > if (n) { > a = 100; > pi = yi; > } > > if (n) { > a = 100; > pi = &y[0]; > } > > *pi = 15; > > printf("a=%d x=%d\n", a, x); > > f(&x,y); > > return 0; > } Yes, but this program has UB because it compares pointers into two different objects. It was exactly the ptrtoint/inttoptr that make the program (potentially) well defined. Thus, even though you can replicate the behavior without the inttoptr/ptrtoint, it is the presence of those casts that is essential to the potential behavior being a miscompile.
(In reply to David (Xinliang) Li from comment #42) > > I don't think this is true, if I understand you correctly. For instance, if > > you could access one alloca by offsetting from another (I presume that's > > what you meant by "all locals can be accessed via the same base %rsp"), > > mem2reg would not be legal. E.g. mem2reg would not be able to transform > > > > define i32 @f() { > > %a = alloca i32 > > %b = alloca i32 > > store i32 0, i32* %a > > call void @unknown(i32* %b) > > %v = load i32, i32* %a > > ret i32 %v > > } > > > > to > > > > define i32 @f() { > > %b = alloca i32 > > call void @unknown(i32* %b) > > ret i32 0 > > } > > > > (which it does today) since @unknown //may have// written something other > > than 0 to %a by offsetting from %b. > > > > > I think we should not suppress optimizations, but instead fix the real > > > underlying problems. > > The underlying object being accessed is not changed regardless of what base > pointer is used -- whether AA can handle it currently is a different story. This ^ semantic is *one* correct interpretation of what happened -- since X and Y happen to lie adjacent to one another, using X+1 is a legitimate way to access Y (so the select xform did nothing wrong). This is close to the "flat" memory model I referred to earlier: pointers are just numerical values, *how* you got hold of a pointer does not matter as long as it has the right numerical value. However, this semantic is problematic because it disallows basic optimizations we'd like to be able to do, like mem2reg, but of course, that does not make the semantic incorrect, just inconvenient. However, there are other semantics too with different trade-offs. For instance, one possible semantic is that X+1 is **not** a legitimate way to access Y, even if numerically X+1 is the same as Y; in other words pointers are more than just their numerical values. In this semantic mem2reg is legal because no one can create a pointer to an unescaped alloca "out of thin air" by offsetting from some *other* alloca they have access to. However, in this semantic, the select folding is wrong -- it miscompiled and introduced UB into a well defined program. I'm sure there are other more creative semantics with more, should I say subtle :), tradeoffs. We just need to pick one and be consistent about it. > For instance, suppose we want to access locality of local objects, we will > need to introduce an optimization pass that lays out stack object in an > optimal way. After this is done, all the local objects will now have a fixed > offset from the same base. AA needs to be taught to handle such scenarios > (e.g. annotate the accesses with unique local ids). Are you talking about a pass that replaces all alloca's with offsets into a giant "master" alloca? If so, in the flat-memory-model semantic this is a no-op, but in the nonflat-memory-model semantic this is an information losing transform. But this is correct in both the semantics.
(In reply to Sanjoy Das from comment #44) > (In reply to David (Xinliang) Li from comment #42) > > > I don't think this is true, if I understand you correctly. For instance, if > > > you could access one alloca by offsetting from another (I presume that's > > > what you meant by "all locals can be accessed via the same base %rsp"), > > > mem2reg would not be legal. E.g. mem2reg would not be able to transform > > > > > > define i32 @f() { > > > %a = alloca i32 > > > %b = alloca i32 > > > store i32 0, i32* %a > > > call void @unknown(i32* %b) > > > %v = load i32, i32* %a > > > ret i32 %v > > > } > > > > > > to > > > > > > define i32 @f() { > > > %b = alloca i32 > > > call void @unknown(i32* %b) > > > ret i32 0 > > > } > > > > > > (which it does today) since @unknown //may have// written something other > > > than 0 to %a by offsetting from %b. > > > > > > > I think we should not suppress optimizations, but instead fix the real > > > > underlying problems. > > > > The underlying object being accessed is not changed regardless of what base > > pointer is used -- whether AA can handle it currently is a different story. > > This ^ semantic is *one* correct interpretation of what happened -- since X > and Y happen to lie adjacent to one another, using X+1 is a legitimate way > to access Y (so the select xform did nothing wrong). This is close to the > "flat" memory model I referred to earlier: pointers are just numerical > values, *how* you got hold of a pointer does not matter as long as it has > the right numerical value. However, this semantic is problematic because it > disallows basic optimizations we'd like to be able to do, like mem2reg, but > of course, that does not make the semantic incorrect, just inconvenient. > > However, there are other semantics too with different trade-offs. For > instance, one possible semantic is that X+1 is **not** a legitimate way to > access Y, even if numerically X+1 is the same as Y; in other words pointers > are more than just their numerical values. In this semantic mem2reg is > legal because no one can create a pointer to an unescaped alloca "out of > thin air" by offsetting from some *other* alloca they have access to. > However, in this semantic, the select folding is wrong -- it miscompiled and > introduced UB into a well defined program. This is always the case at user level -- user can not assume layout relationship from declaration order. Compiler, on the other hand, can do it because it is also responsible for laying out the objects. The tricky situation is that if user write code with runtime checks of relative positions of objects, when the condition is true, can compiler make such transformation that one object is accessed via base pointer of another object? I think most compiler does that -- but if AA is not maintained/updated, mis-compile can happen. > > I'm sure there are other more creative semantics with more, should I say > subtle :), tradeoffs. We just need to pick one and be consistent about it. > > > For instance, suppose we want to access locality of local objects, we will > > need to introduce an optimization pass that lays out stack object in an > > optimal way. After this is done, all the local objects will now have a fixed > > offset from the same base. AA needs to be taught to handle such scenarios > > (e.g. annotate the accesses with unique local ids). > > Are you talking about a pass that replaces all alloca's with offsets into a > giant "master" alloca? Yes, not just that. It is more common to do this for global variable layouts. Global variables can be laid out for better cache utilization; it also help reduce the number of GOT entries and GOT accesses if global accesses are commonned to the same base. > If so, in the flat-memory-model semantic this is a > no-op, but in the nonflat-memory-model semantic this is an information > losing transform. But this is correct in both the semantics. Whether the information is lost or not depends on the implementation -- it is possible to have the AA info not lost
Of the three ways we've discussed fixing this, my preference is for making CSE less powerful. This device has been used in many LLVM soundness counterexamples. It was an existential problem that we were never able to solve when designing de-virtualization during Piotr's (prazek@) internship a few years ago. If LLVM has these kinds of "based-on" aliasing rules, then we can't use the fact that "p == q" to replace q with p or p with q. That equality has to do with numeric pointer values, which we should try to keep separate from our aliasing model. I suspect that most of the value of CSE/equality propagation is from discovering that a variable is constant in some region of code (i.e. propagating null inside a null check), which probably doesn't suffer from these kinds of bugs where we replace one pointer with another valid object pointer without considering aliasing.
I would be pretty strongly against making CSE less powerful :) Besides it being very difficult (there a bunch of things that go looking for these sorts of equalities besides earlycse, gvn, and newgvn). You'd also have to give up a lot, because it's not limited to pointer equalities, it's all non-floating point equalities. I suspect you'd be giving up a tremendous amount of performance to make a very small amount of very uncommon and complex code work ;) (and that, as mentioned, doesn't work in other compilers either). I know from numbers that the equality propagation that instsimplify, instcombine, cse, and gvn provide is at least 10% of total perf on apps i looked at. I'd rather us define semantics and a well lit path here for people to accomplish what they need, and declare victory.
(In reply to Sanjoy Das from comment #18) > (In reply to Hal Finkel from comment #17) > > (we currently define the result in terms of the result of ptrtoint > > on each operand). > > That's precisely the problem though -- given what you said, the expression > is actually "q := (ptrtoint(a) != ptrtoint(b) ? b : a)", which cannot be > safely transformed to "q := b". Actually, I wonder if that's only one way to define this. The LangRef says that the comparisons are done as integers, but that does not necessarily mean that we need to define the action of the comparison directly in terms of ptrtoint. Maybe we'd like to define cmp in another way. There might be advantages to that - it would be closer, at least, to C/C++ comparisons, and we always can generate explicit ptrtoint instructions if we need those semantics.
as I talked with Reid and Richard, it seems that the devirtualization has the same problem right now: https://godbolt.org/g/LS9jc4 #include <memory> struct A { A(); virtual void foo(); }; struct B : A{ B(); virtual void foo(); }; void weird() { A *a = new A; a->foo(); A * b = new(a) B; if (a == b) b->foo(); } Because gvn will replace b with a, it will cause the wron call to be called. My question is: how important is replecement of one variable to another (non constant) based on comparision? Right now GVN just picks the variable having longer lifetime. If the solution to this is to ignore the problem, then there should not be objections to turn on devirtualization by default (as it probably have the same odds of miscompiling)
(In reply to Piotr Padlewski from comment #49) > as I talked with Reid and Richard, it seems that the devirtualization has > the same problem right now: > > https://godbolt.org/g/LS9jc4 > > #include <memory> > struct A { > A(); > virtual void foo(); > }; > > struct B : A{ > B(); > virtual void foo(); > }; > > void weird() { > A *a = new A; > a->foo(); > A * b = new(a) B; > if (a == b) > b->foo(); > } > > Because gvn will replace b with a, it will cause the wron call to be called. > > My question is: how important is replecement of one variable to another (non > constant) based on comparision? Right now GVN just picks the variable having > longer lifetime. > > If the solution to this is to ignore the problem, then there should not be > objections to turn on devirtualization by default (as it probably have the > same odds of miscompiling) Still looks like a AA problem. a->vptr field should be clobbered by the placement new call, so it should be reloaded before the second foo call.
(In reply to Piotr Padlewski from comment #49) > as I talked with Reid and Richard, it seems that the devirtualization has > the same problem right now: > > https://godbolt.org/g/LS9jc4 > > #include <memory> > struct A { > A(); > virtual void foo(); > }; > > struct B : A{ > B(); > virtual void foo(); > }; > > void weird() { > A *a = new A; > a->foo(); > A * b = new(a) B; > if (a == b) > b->foo(); > } > > Because gvn will replace b with a, it will cause the wron call to be called. > This is not the same bug at all. This is about where the lifetime of memory ends, and how that is expressed in the IR. The others are about the semantics of ptr2int and int2ptr. Here, if your IR lets GVN do this, your IR is buggy. It should either be reloading the vptr before the b->foo() call, or it should be expressing that "A" is dead (lifetime.start/end are optional, but if you used them, they would be *required* here for correctness. So the semantics would have to be modified, or new intrinsics developed). > My question is: how important is replecement of one variable to another (non > constant) based on comparision? Right now GVN just picks the variable having > longer lifetime. It's not about importance. You can't fix this by telling GVN to stop replacing perfectly legal equivalences. You have to make IR where it will conclude they aren't equivalent at this point in the first place. I think this needs a tremendous amount of emphasis. As our optimizations get more powerful, this will simply show up in more and more ways. Whatever solution chosen has to express, in IR, such that it is obvious what is legal and what is not. GVN, NewGVN, etc, depend on all sorts of underlying infrastructure that may also enable it to discover the same thing. It already does. simplifyinstruction, for example, will use computeknownbits and proving things about icmps (IE trying to create an icmp and see if it simplifies) of your variables in ways that will do the *same thing as gvn* is doing here. Trying to neuter optimizations at point-of-elimination is, IMHO, a recipe for disaster. You have to make *all things* able to conclude it's not equivalent in the first place. The only way you can do that is by expressing it in the IR.
Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y does not imply %x and %y are equivalent; there's also an invisible base pointer attached to any value of pointer type. This is reflected in a lot of places. noalias only makes sense with "magic" pointers (since otherwise, it's impossible to prove any relationship between the value and the marking). inbounds only makes sense with "magic" pointers (otherwise, you basically can't use it to prove anything). And alias analysis heavily depends on the assumption that it can ignore offsets in GEPs. The testcase in comment 49 is related, in that the "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: LangRef says the returned value has different aliasing semantics from the argument, even though it has the same value. This isn't to say that the rules for how exactly base pointers work are particularly clear, or even self-consistent, but whatever the exact rules are, they clearly aren't consistent with GVN generating equivalences for pointers based solely on the result of an icmp.
(In reply to Eli Friedman from comment #52) > Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y > does not imply %x and %y are equivalent; It implies they are value equivalent. It does not imply they are aliasing equivalent. > there's also an invisible base > pointer attached to any value of pointer type. This is reflected in a lot > of places. And is pretty easy to generate testcases that break each of these in interesting and different ways. I believe Zhendong has proven this at this point. > noalias only makes sense with "magic" pointers (since otherwise, > it's impossible to prove any relationship between the value and the > marking). You could just make it explicit, actually. Instead of an invisible/implicit thing, you can pass along a token. pointers are only equivalent if they have the same name and token. > inbounds only makes sense with "magic" pointers (otherwise, you > basically can't use it to prove anything). And alias analysis heavily > depends on the assumption that it can ignore offsets in GEPs. > > The testcase in comment 49 is related, in that the > "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: > LangRef says the returned value has different aliasing semantics from the > argument, even though it has the same value. > > This isn't to say that the rules for how exactly base pointers work are > particularly clear, or even self-consistent, but whatever the exact rules > are, they clearly aren't consistent with GVN generating equivalences for > pointers based solely on the result of an icmp. As mentioned, it's not just GVN. It's just easy to generate a testcase GVN screws up. We *really* need to start getting out of "magic" mode. You aren't going to be able to hack away all this stuff by, for example, disabling pointer equivalences in GVN. Because in turn, simplifyinstruction does this on icmp's of pointers " // Simplify comparisons of related pointers using a powerful, recursive // GEP-walk when we have target data available." while certainly, the calls in there try to be conservative to handle these cases, the point is more: building on top of a house of cards and hoping it doesn't do a certain thing is a bad play. At some point, you need to start building something that isn't a house of cards. The best way i know how to do that is to make the implicit, explicit.
"pointers are only equivalent if they have the same name and token. " same value and token, obviously
(and of course, literally nothing you do with pointers in piotr's testcase will fix anything related to this particular bug, since it's about generating equivalences from what look like perfectly normal integers)
(In reply to Eli Friedman from comment #52) > Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y > does not imply %x and %y are equivalent; there's also an invisible base > pointer attached to any value of pointer type. This is reflected in a lot > of places. noalias only makes sense with "magic" pointers (since otherwise, > it's impossible to prove any relationship between the value and the > marking). inbounds only makes sense with "magic" pointers (otherwise, you > basically can't use it to prove anything). And alias analysis heavily > depends on the assumption that it can ignore offsets in GEPs. > > The testcase in comment 49 is related, in that the > "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: > LangRef says the returned value has different aliasing semantics from the > argument, even though it has the same value. > > This isn't to say that the rules for how exactly base pointers work are > particularly clear, or even self-consistent, but whatever the exact rules > are, they clearly aren't consistent with GVN generating equivalences for > pointers based solely on the result of an icmp. Note that the CSEed load is tagged with the same invariant.group as the available load expression -- even though there is a dominating invariant.barrier. Removing that !invariant.group annotation in the IR, GVN produces correct result.
(In reply to Xinliang David Li from comment #56) > (In reply to Eli Friedman from comment #52) > > Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y > > does not imply %x and %y are equivalent; there's also an invisible base > > pointer attached to any value of pointer type. This is reflected in a lot > > of places. noalias only makes sense with "magic" pointers (since otherwise, > > it's impossible to prove any relationship between the value and the > > marking). inbounds only makes sense with "magic" pointers (otherwise, you > > basically can't use it to prove anything). And alias analysis heavily > > depends on the assumption that it can ignore offsets in GEPs. > > > > The testcase in comment 49 is related, in that the > > "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: > > LangRef says the returned value has different aliasing semantics from the > > argument, even though it has the same value. > > > > This isn't to say that the rules for how exactly base pointers work are > > particularly clear, or even self-consistent, but whatever the exact rules > > are, they clearly aren't consistent with GVN generating equivalences for > > pointers based solely on the result of an icmp. > > Note that the CSEed load is tagged with the same invariant.group as the > available load expression -- even though there is a dominating > invariant.barrier. > > > Removing that !invariant.group annotation in the IR, GVN produces correct > result. It kinda doesn't matter. Eli is right in the sense that there is implicit info in pointers right now that can generate wrong things in the face of equivalences. Even if you fix the clear bug you point out, you could still generate a testcase where gvn does the wrong thing :) Same with CVP, anything using LVI, etc. It's just "harder" to get them to optimize it. You'll note that GCC is able to propagate equivalences and get roughly all of these right. This is in part because MemorySSA in GCC being there all the time makes it easy to express the aliasing constraints in a way that nothing breaks them. But this is just convenience, we could achieve the same goal without memoryssa all the time if we wanted to, by having pointers carry along their magic instead of it being implicit. Then only a thing that understands the magic can propagate the tokens. I suspect such a change would be larger than making memoryssa exist everywhere though. (this eliminates the class of issues with using pointer values, but leaves the class of issues with pointer-in-integer form values). The larger problem with the original testcase is, even if you went as far as to introduce a pointercompare instruction, and use it on uintptr_t and real pointer types, and never get equivalences from pointercompare's, it wouldn't help unless you told people "this will only work with visible uintptr_t's, otherwise, you are boned".
(In reply to Daniel Berlin from comment #57) > (In reply to Xinliang David Li from comment #56) > > (In reply to Eli Friedman from comment #52) > > > Currently, in LLVM, pointers are clearly "magic", as in, icmp eq i8* %x, %y > > > does not imply %x and %y are equivalent; there's also an invisible base > > > pointer attached to any value of pointer type. This is reflected in a lot > > > of places. noalias only makes sense with "magic" pointers (since otherwise, > > > it's impossible to prove any relationship between the value and the > > > marking). inbounds only makes sense with "magic" pointers (otherwise, you > > > basically can't use it to prove anything). And alias analysis heavily > > > depends on the assumption that it can ignore offsets in GEPs. > > > > > > The testcase in comment 49 is related, in that the > > > "llvm.invariant.group.barrier" intrinsic depends on "magic" pointers: > > > LangRef says the returned value has different aliasing semantics from the > > > argument, even though it has the same value. > > > > > > This isn't to say that the rules for how exactly base pointers work are > > > particularly clear, or even self-consistent, but whatever the exact rules > > > are, they clearly aren't consistent with GVN generating equivalences for > > > pointers based solely on the result of an icmp. > > > > Note that the CSEed load is tagged with the same invariant.group as the > > available load expression -- even though there is a dominating > > invariant.barrier. > > > > > > Removing that !invariant.group annotation in the IR, GVN produces correct > > result. > > It kinda doesn't matter. > Eli is right in the sense that there is implicit info in pointers right now > that can generate wrong things in the face of equivalences. Even if you fix > the clear bug you point out, you could still generate a testcase where gvn > does the wrong thing :) > Any small example test case about how the magic pointer can break things? Or is it similar to the one in comment #39? Put it this way. In this case, if the 'invariant' annotation is removed, the optimizer will behave correctly -- this means that either IR is wrong, or the way how invariant groups are handled is wrong. For reference, here is the complete IR test case (build it with -gvn -S) struct.A = type { i32 (...)** } %struct.B = type { %struct.A } @_ZTV1A = available_externally unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* bitcast (i8** @_ZTI1A to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A3fooEv to i8*)] }, align 8 @_ZTV1B = available_externally unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* bitcast (i8** @_ZTI1B to i8*), i8* bitcast (void (%struct.B*)* @_ZN1B3fooEv to i8*)] }, align 8 @_ZTI1A = external constant i8* @_ZTI1B = external constant i8* ;@_ZTV1A = external unnamed_addr constant { [3 x i8*] }, align 8 ;@_ZTV1B = external unnamed_addr constant { [3 x i8*] }, align 8 define void @_Z5weirdv() local_unnamed_addr #0 { %1 = tail call i8* @_Znwm(i64 8) #5 %2 = bitcast i8* %1 to %struct.A* tail call void @_ZN1AC1Ev(%struct.A* nonnull %2) #3 %3 = bitcast i8* %1 to i8*** %4 = load i8**, i8*** %3, align 8, !tbaa !5, !invariant.group !8 %5 = icmp eq i8** %4, getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1A, i64 0, inrange i32 0, i64 2) tail call void @llvm.assume(i1 %5) %6 = bitcast i8** %4 to void (%struct.A*)** %7 = load void (%struct.A*)*, void (%struct.A*)** %6, align 8, !invariant.load !8 tail call void %7(%struct.A* nonnull %2) #3 %8 = tail call i8* @llvm.invariant.group.barrier(i8* nonnull %1) %9 = bitcast i8* %8 to %struct.B* tail call void @_ZN1BC1Ev(%struct.B* %9) #3 %10 = bitcast i8* %8 to i8*** %11 = load i8**, i8*** %10, align 8, !tbaa !5, !invariant.group !8 %12 = icmp eq i8** %11, getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1B, i64 0, inrange i32 0, i64 2) tail call void @llvm.assume(i1 %12) %13 = icmp eq i8* %1, %8 br i1 %13, label %14, label %19 ; <label>:14: ; preds = %0 %15 = bitcast i8* %8 to %struct.A* %16 = bitcast i8* %8 to void (%struct.A*)*** %17 = load void (%struct.A*)**, void (%struct.A*)*** %16, align 8, !tbaa !5, !invariant.group !8 %18 = load void (%struct.A*)*, void (%struct.A*)** %17, align 8, !invariant.load !8 tail call void %18(%struct.A* %15) #3 br label %19 ; <label>:19: ; preds = %14, %0 ret void } ; Function Attrs: nobuiltin declare noalias nonnull i8* @_Znwm(i64) local_unnamed_addr #1 declare void @_ZN1AC1Ev(%struct.A*) unnamed_addr #2 ; Function Attrs: nounwind declare void @llvm.assume(i1) #3 ; Function Attrs: argmemonly nounwind readonly declare i8* @llvm.invariant.group.barrier(i8*) #4 declare void @_ZN1BC1Ev(%struct.B*) unnamed_addr #2 declare void @_ZN1A3fooEv(%struct.A*) unnamed_addr #2 declare void @_ZN1B3fooEv(%struct.B*) unnamed_addr #2 attributes #0 = { nounwind uwtable} attributes #1 = { nobuiltin } attributes #2 = { nounwind } attributes #3 = { nounwind } attributes #4 = { argmemonly nounwind readonly } attributes #5 = { builtin nounwind } !llvm.module.flags = !{!0, !1, !3} !llvm.ident = !{!4} !0 = !{i32 1, !"StrictVTablePointers", i32 1} !1 = !{i32 3, !"StrictVTablePointersRequirement", !2} !2 = !{!"StrictVTablePointers", i32 1} !3 = !{i32 1, !"wchar_size", i32 4} !4 = !{!"clang version 6.0.0 (trunk 312928)"} !5 = !{!6, !6, i64 0} !6 = !{!"vtable pointer", !7, i64 0} !7 = !{!"Simple C++ TBAA"} !8 = !{} > Same with CVP, anything using LVI, etc. It's just "harder" to get them to > optimize it. > > You'll note that GCC is able to propagate equivalences and get roughly all > of these right. This is in part because MemorySSA in GCC being there all > the time makes it easy to express the aliasing constraints in a way that > nothing breaks them. But this is just convenience, we could achieve the > same goal without memoryssa all the time if we wanted to, by having pointers > carry along their magic instead of it being implicit. Then only a thing > that understands the magic can propagate the tokens. > > I suspect such a change would be larger than making memoryssa exist > everywhere though. > > (this eliminates the class of issues with using pointer values, but leaves > the class of issues with pointer-in-integer form values). > > The larger problem with the original testcase is, even if you went as far as > to introduce a pointercompare instruction, and use it on uintptr_t and real > pointer types, and never get equivalences from pointercompare's, it wouldn't > help unless you told people "this will only work with visible uintptr_t's, > otherwise, you are boned".
A bunch of things worth mentioning: As Piotr says, you can get GVN to choose the other pointer just by switching the preference it has: diff --git a/lib/Transforms/Scalar/GVN.cpp b/lib/Transforms/Scalar/GVN.cpp index 593aad74bd1..8bd0ba2756a 100644 --- a/lib/Transforms/Scalar/GVN.cpp +++ b/lib/Transforms/Scalar/GVN.cpp @@ -1703,7 +1703,7 @@ bool GVN::propagateEquality(Value *LHS, Value *RHS, const BasicBlockEdge &Root, // Move the 'oldest' value to the right-hand side, using the value number // as a proxy for age. uint32_t RVN = VN.lookupOrAdd(RHS); - if (LVN < RVN) { + if (LVN >= RVN) { std::swap(LHS, RHS); LVN = RVN; } As mentioned, piotr's example happens to involve a barrier bug. Certainly you can see that you can generate a near infinite set of examples where it will propagate past implicit lifetimes that affect aliasing. (i believe on another bug, sanjoy has demonstrated large numbers of issues using free) It's also worth mentioning: The invariant barrier intrinsics are only defined to affect the invariant group info, and stop a pointer from carrying it. You should be able to drop all invariant group info and barriers and get the same issue. Here, that issue it is directly expressed in the IR, as if you remove the invariant barrier and propagate its argument, it already reuses the same pointer past the end of its lifetime, and nothing stops it.
(In reply to Daniel Berlin from comment #59) > A bunch of things worth mentioning: > > As Piotr says, you can get GVN to choose the other pointer just by switching > the preference it has: > diff --git a/lib/Transforms/Scalar/GVN.cpp b/lib/Transforms/Scalar/GVN.cpp > index 593aad74bd1..8bd0ba2756a 100644 > --- a/lib/Transforms/Scalar/GVN.cpp > +++ b/lib/Transforms/Scalar/GVN.cpp > @@ -1703,7 +1703,7 @@ bool GVN::propagateEquality(Value *LHS, Value *RHS, > const BasicBlockEdge &Root, > // Move the 'oldest' value to the right-hand side, using the value > number > // as a proxy for age. > uint32_t RVN = VN.lookupOrAdd(RHS); > - if (LVN < RVN) { > + if (LVN >= RVN) { > std::swap(LHS, RHS); > LVN = RVN; > } > > > As mentioned, piotr's example happens to involve a barrier bug. > Certainly you can see that you can generate a near infinite set of examples > where it will propagate past implicit lifetimes that affect aliasing. > (i believe on another bug, sanjoy has demonstrated large numbers of issues > using free) You mean the example in comment #40? A = malloc(); free(A); B = malloc(); *B = 0 C = select (ptrtoint(A) == ptrtoint(B)), B, A ;; S0 r = *C free(B); That program is ill formed -- There is potential use after free error (for referencing A at S0 and beyond. Making compiler to well-behave in this case is possible -- e.g, by make AA more flow sensitive (to honor the implicit barrier of 'free'), but why would we do that for ill-formed program (even though there is a runtime path that is not ill-formed)? > > It's also worth mentioning: > The invariant barrier intrinsics are only defined to affect the invariant > group info, and stop a pointer from carrying it. > You should be able to drop all invariant group info and barriers and get the > same issue. > > Here, that issue it is directly expressed in the IR, as if you remove the > invariant barrier and propagate its argument, it already reuses the same > pointer past the end of its lifetime, and nothing stops it.
The issue with invariant.group is that by replacing one pointer with another we skip the invariant.group and reference the dead pointer. By fix I meant to not perform replacement based on the comparision if the operands are not constant. dropping invariant.groups would be also valid after such optimization, but the problem here is that you can't really do that across different calls if (a == b) call(b); // this might have invariant.group loads inside. This is not Alias Analysis issue, aliasing does not do anything here - we do not replace pointer with another based on aliasing info.
(In reply to Piotr Padlewski from comment #61) > The issue with invariant.group is that by replacing one pointer with another > we skip the invariant.group and reference the dead pointer. By fix I meant > to not perform replacement based on the comparision if the operands are not > constant. This will not actually fix anything, particularly when you mix it with inttoptr, because that constant may in fact be a constant that is really a pointer :) (if you don't believe it's possible to make that bug still happen, really, just look at the original testcase and be createive). This is pretty well established by GCC bugs (and our bugs) at this point. The only thing you could do to avoid it in all cases would be to drop any equivalences gleaned from comparisons in all passes and simplifications (GCC came to the same conclusion). IMHo, this is a super-bad idea, as they are responsible for a lot of performance. As mentioned previously, if we view the "replace pointer with pointer" and "replace thing that was/becomes a pointer through inttoptr" as two different issues - the first is pretty easy to solve by either 1. passing along the implicit info so we accurately reflect the aliasing/lifetime of the pointed-to object. or 2. require memoryssa everywhere and only allow pointer variables to be propagated when memory state they depend on is the same or 3. introduce a pointercmp instruction. I'd prefer all of these over trying to special case and keep up to date every place in the compiler that gleans info on icmps such that they all special case pointer icmps. But i'm also not going to fix any of these "bugs", so i'll defer to whoever tries to fix it, as long as they do it in a way that doesn't cause massive performance regression relative to GCC (which, again, happily propagates equivalences). The latter intotptr problem (and the original testcase on this bug) is impossible to solve "in full generality" except by dropping literally all non-floating equivalences gleaned from comparisons. I can split up any pointer into as many pieces, equivalence them in ways we break right now, and then reassemble the pointer. Given how uncommon it is, i'm still of the belief that a well lit path for people to follow who want to do it. At least in the initial performance testing i did, dropping equivalences except for those involving constants was a significant loss. So i'm pretty sure "meet halfway" wouldn't work well here. (But i welcome other testing).
(In reply to Piotr Padlewski from comment #61) > The issue with invariant.group is that by replacing one pointer with another > we skip the invariant.group and reference the dead pointer. The problem is not that the compiler replaces the reference to the dead pointer. The real problem is that the original annotation based on the replaced pointer is no longer valid. In this case, it looks like the invariant property gets extended across the barrier because the invariant group annotation and a new base pointer. > By fix I meant > to not perform replacement based on the comparision if the operands are not > constant. Blocking compiler from doing optimization based on value equivalence is not acceptable, as pointed out by Danny. > > dropping invariant.groups would be also valid after such optimization, but > the problem here is that you can't really do that across different calls > > > if (a == b) > call(b); // this might have invariant.group loads inside. This does matter. The original loads in 'call' are through the function parameter which is a different IR entity. Do you see how it can gets miscompile without dropping the annotation? > > This is not Alias Analysis issue, aliasing does not do anything here - we do > not replace pointer with another based on aliasing info. It is directly related to AA. Without the wrong invariance, AA can tell that the call to B's ctor will alias with the load.
(In reply to Xinliang David Li from comment #63) > (In reply to Piotr Padlewski from comment #61) > > The issue with invariant.group is that by replacing one pointer with another > > we skip the invariant.group and reference the dead pointer. > > The problem is not that the compiler replaces the reference to the dead > pointer. The real problem is that the original annotation based on the > replaced pointer is no longer valid. In this case, it looks like the > invariant property gets extended across the barrier because the invariant > group annotation and a new base pointer. > > > > > By fix I meant > > to not perform replacement based on the comparision if the operands are not > > constant. > > Blocking compiler from doing optimization based on value equivalence is not > acceptable, as pointed out by Danny. > > > > > > dropping invariant.groups would be also valid after such optimization, but > > the problem here is that you can't really do that across different calls > > > > > > if (a == b) > > call(b); // this might have invariant.group loads inside. > > This does matter. Meant to say 'does not matter' > The original loads in 'call' are through the function > parameter which is a different IR entity. Do you see how it can gets > miscompile without dropping the annotation? > > > > > > This is not Alias Analysis issue, aliasing does not do anything here - we do > > not replace pointer with another based on aliasing info. > > > It is directly related to AA. Without the wrong invariance, AA can tell that > the call to B's ctor will alias with the load.
Okay, so i kind of need to try to push this forward a bit and see if there is any consensus. One of the things on my plate is to match the equivalence handling of gvn in newgvn, in a bunch of cases. A bunch depend on equality/inequality of variables. If we are going to rip out all the equivalence handling in the compiler to make these cases work (last i looked, it was in lvi, simplifyinstruction, gvn, and newgvn, but i didn't look harder), i don't want to spend time to do that work :) (Otherwise, i want to know where we want to go, in particular: 1. Do nothing at all. Make sad face. 2. Do nothing with equivalences, work on memory model 3. Rip out equivalences from comparisons between variables and ban them 4. Rip out all equivalence handling. Note that #3 working in all cases depends on the compiler never discovering the integer value of a pointer and propagating it. It also relies on never detecting equivalences due to, for example, equal singleton ranges. IE If you derive an equivalence from the following that the variables are equal, things break just the same: if (pi > 48 && pi < 50) if (yi > 48 && yi < 50) (pi == yi inside here) same with the anti-range version: if (pi < 48 || pi > 50) if (yi < 48 || yi > 50) (pi != yi inside here) I would not put money on us being able to prevent that from happening.
We've been working on a memory model for LLVM and GCC, and while it's not fully finished, we already have a reasonable degree of confidence in it. The changes that are required to LLVM are quite simple: - Disable blind folding of int2ptr(ptr2int(x)) -> x - Disable some type punning through load/store (implicit ptr2int casts, essentially). We don't have a great solution for this yet; we just know that what LLVM is doing right now isn't correct. (I'll skip this issue for now) In this model, you can keep propagating integer equalities as before. But you need to be careful when propagating pointer equalities, because just because "p == q", it doesn't mean that you can replace p with q or vice-versa. p,q have equal value if "p == q", but other attributes (like provenance, liveness) are not necessarily equal. A safe version for GVN (that we are benchmarking ATM) is to do the following: if (p == q) { use(p) use(q) } => v = ptr2int(p) w = ptr2int(q) if (v == w) { p' = int2ptr(v) use(p') use(p') } The casts can be elided if the elected pointer is provably dereferenceable for all memory accesses in "use(p)" and "use(q)". There are several reasons for this constraint: 1) either p,q might have been freed already and thus memory access through them is UB; 2) if p = a + n, then "p == q" yields a non-deterministic value (per the C++ semantics: http://eel.is/c++draft/expr.eq#2.1), so we cannot replace q with "a+n" because the non-deterministic value may evaluate to true. Another option for GVN is to introduce a new "union p, q" instruction and replace use(p) with use(union p, q). This new instruction would do the meet operation between p, q (think of the pointer lattice with provenances). The solution above with int2ptr/ptr2int round-trip essentially returns Top from this lattice (i.e., the returned pointer can alias with all the pointers that have been ptr2int'ed so far that appear in the control/data-flow reaching p,q). So this union instruction yields a more precise result, but I dunno how important that is in practice (under study). In summary, if we have: p = (int*)malloc(8) v = ptr2int(p) + 4; q = int2ptr(v) *q = 3; we can optimize into: p[1] = 3 This is sound because we can see through the casts that q must point to the object starting at p at that object is still alive. If we don't know anything about p, then the optimization can't be done. (GCC is more conservative than LLVM right now with respect ptr2int, but it also needs similar tweaks to become correct)
(In reply to Nuno Lopes from comment #66) > We've been working on a memory model for LLVM and GCC, and while it's not > fully finished, we already have a reasonable degree of confidence in it. > The changes that are required to LLVM are quite simple: > - Disable blind folding of int2ptr(ptr2int(x)) -> x > - Disable some type punning through load/store (implicit ptr2int casts, > essentially). We don't have a great solution for this yet; we just know that > what LLVM is doing right now isn't correct. (I'll skip this issue for now) > > In this model, you can keep propagating integer equalities as before. But > you need to be careful when propagating pointer equalities, because just > because "p == q", it doesn't mean that you can replace p with q or > vice-versa. p,q have equal value if "p == q", but other attributes (like > provenance, liveness) are not necessarily equal. > A safe version for GVN (that we are benchmarking ATM) is to do the following: > if (p == q) { > use(p) > use(q) > } > => > v = ptr2int(p) > w = ptr2int(q) > if (v == w) { > p' = int2ptr(v) > use(p') > use(p') > } > > The casts can be elided if the elected pointer is provably dereferenceable > for all memory accesses in "use(p)" and "use(q)". > There are several reasons for this constraint: 1) either p,q might have been > freed already and thus memory access through them is UB; 2) if p = a + n, > then "p == q" yields a non-deterministic value (per the C++ semantics: > http://eel.is/c++draft/expr.eq#2.1), so we cannot replace q with "a+n" > because the non-deterministic value may evaluate to true. if p points to one pass the last byte of one object and q points to another object, the comparison is UB. When it is UB, why can't compiler do anything it pleases? > > Another option for GVN is to introduce a new "union p, q" instruction and > replace use(p) with use(union p, q). This new instruction would do the meet > operation between p, q (think of the pointer lattice with provenances). The > solution above with int2ptr/ptr2int round-trip essentially returns Top from > this lattice (i.e., the returned pointer can alias with all the pointers > that have been ptr2int'ed so far that appear in the control/data-flow > reaching p,q). > So this union instruction yields a more precise result, but I dunno how > important that is in practice (under study). I think you hit the essence of the issue -- it is not GVN, but lack of IR support to update static properties/attributes. Your first suggestion to use int2ptr/ptr2int round trip is like 'giving up' those properties completely. The 'union' instruction idea is more precise but too heavy weight. There might be other ways to merge value properties .. > > In summary, if we have: > p = (int*)malloc(8) > v = ptr2int(p) + 4; > q = int2ptr(v) > *q = 3; > > we can optimize into: > p[1] = 3 > > This is sound because we can see through the casts that q must point to the > object starting at p at that object is still alive. If we don't know > anything about p, then the optimization can't be done. > > (GCC is more conservative than LLVM right now with respect ptr2int, but it > also needs similar tweaks to become correct)
(In reply to David (Xinliang) Li from comment #67) > > The casts can be elided if the elected pointer is provably dereferenceable > > for all memory accesses in "use(p)" and "use(q)". > > There are several reasons for this constraint: 1) either p,q might have been > > freed already and thus memory access through them is UB; 2) if p = a + n, > > then "p == q" yields a non-deterministic value (per the C++ semantics: > > http://eel.is/c++draft/expr.eq#2.1), so we cannot replace q with "a+n" > > because the non-deterministic value may evaluate to true. > > if p points to one pass the last byte of one object and q points to another > object, the comparison is UB. When it is UB, why can't compiler do anything > it pleases? I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I may have misunderstood. In any case, we could probably define it that way, but then icmp instructions would have UB in some cases and we won't be able to freely move them (trivially hoist out of loops etc.). Same situation if we have it return poison or something like poison.
(In reply to Sanjoy Das from comment #68) > I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I > may > have misunderstood. > > In any case, we could probably define it that way, but then icmp instructions > would have UB in some cases and we won't be able to freely move them > (trivially > hoist out of loops etc.). Same situation if we have it return poison or > something like poison. I agree, I don't think we can make comparisons of pointers to different objects UB, the standard says that the result is "unspecified". Consider that std::map<T*> needs pointers to separate objects to be comparable. It uses std::less, but in practice that lowers to LLVM pointer comparisons.
(In reply to Reid Kleckner from comment #69) > (In reply to Sanjoy Das from comment #68) > > I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I > > may > > have misunderstood. > > > > In any case, we could probably define it that way, but then icmp instructions > > would have UB in some cases and we won't be able to freely move them > > (trivially > > hoist out of loops etc.). Same situation if we have it return poison or > > something like poison. > > I agree, I don't think we can make comparisons of pointers to different > objects UB, the standard says that the result is "unspecified". Consider > that std::map<T*> needs pointers to separate objects to be comparable. It > uses std::less, but in practice that lowers to LLVM pointer comparisons. It does not say comparison of pointers to different objects is unspecified, but comparison of pointer to one complete object with another pointer representing address *one past the last* element of a different complete object unspecified.
(In reply to Nuno Lopes from comment #66) > We've been working on a memory model for LLVM and GCC, and while it's not > fully finished, we already have a reasonable degree of confidence in it. > The changes that are required to LLVM are quite simple: > - Disable blind folding of int2ptr(ptr2int(x)) -> x > - Disable some type punning through load/store (implicit ptr2int casts, > essentially). We don't have a great solution for this yet; we just know that > what LLVM is doing right now isn't correct. (I'll skip this issue for now) > > In this model, you can keep propagating integer equalities as before. But > you need to be careful when propagating pointer equalities, because just > because "p == q", it doesn't mean that you can replace p with q or > vice-versa. p,q have equal value if "p == q", but other attributes (like > provenance, liveness) are not necessarily equal. > A safe version for GVN (that we are benchmarking ATM) is to do the following: > if (p == q) { > use(p) > use(q) > } > => > v = ptr2int(p) > w = ptr2int(q) > if (v == w) { > p' = int2ptr(v) > use(p') > use(p') > } As a technical matter, I'm strongly opposed to any transformations introducing new ptr2int/int2ptr instructions. Unless, that is, we teach AA to deal with them and also deal with the associated integer arithmetic as pointer arithmetic. If we want to go down this route nonetheless, we could use some new instruction that returned some kind of opaque value. The key here is to prevent the introduction of arithmetic on these values and allow us to teach AA about them. This, in some sense, may be equivalent to the union-instruction idea you mentioned, but using an implicit, instead of explicit, representation.
So, in this model, does this literally mean if the equivalence is not between two pointer type arguments, i can use it? (IE if i ignore equivalences where Op1->getType()->isPointerTy() && Op2->getType()->isPointerTy(), is that sufficient, or am i misunderstanding?)
(In reply to Sanjoy Das from comment #68) > (In reply to David (Xinliang) Li from comment #67) > > > The casts can be elided if the elected pointer is provably dereferenceable > > > for all memory accesses in "use(p)" and "use(q)". > > > There are several reasons for this constraint: 1) either p,q might have been > > > freed already and thus memory access through them is UB; 2) if p = a + n, > > > then "p == q" yields a non-deterministic value (per the C++ semantics: > > > http://eel.is/c++draft/expr.eq#2.1), so we cannot replace q with "a+n" > > > because the non-deterministic value may evaluate to true. > > > > if p points to one pass the last byte of one object and q points to another > > object, the comparison is UB. When it is UB, why can't compiler do anything > > it pleases? > > I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I > may > have misunderstood. It says that. So we can do what we wish. It would be reasonable, I think, to always fold the comparisons to false. > > In any case, we could probably define it that way, but then icmp instructions > would have UB in some cases and we won't be able to freely move them > (trivially > hoist out of loops etc.). Same situation if we have it return poison or > something like poison. This is not a binary thing. It is UB in the standard, but I suspect that we can define the behavior in this case in a somewhat more benign way.
(In reply to Daniel Berlin from comment #72) > So, in this model, does this literally mean if the equivalence is not > between two pointer type arguments, i can use it? > > (IE if i ignore equivalences where Op1->getType()->isPointerTy() && > Op2->getType()->isPointerTy(), is that sufficient, or am i misunderstanding?) Yes, that's correct. (well, not sure about the && part, feels like it should be ||, but probably you cannot propagate equivalences between values of different types?) That was one of the design goals: don't make the common case (propagate/fold integers) more complicated. (In reply to Hal Finkel from comment #73) > > > if p points to one pass the last byte of one object and q points to another > > > object, the comparison is UB. When it is UB, why can't compiler do anything > > > it pleases? > > > > I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I > > may > > have misunderstood. > > It says that. So we can do what we wish. It would be reasonable, I think, to > always fold the comparisons to false. We can *try* to fold it to false whenever we spot this case, but we cannot guarantee that it will always return false. Otherwise we would need to instrument the code to detect this case and return false (since otherwise the comparison in hardware will return true because we'll do a simple integer comparison). By defining the comparison of p+n==q to return a non-deterministic value, it allows the compiler to fold the comparison to false, and the hardware to return true or false. (In reply to Hal Finkel from comment #71) > As a technical matter, I'm strongly opposed to any transformations > introducing new ptr2int/int2ptr instructions. Unless, that is, we teach AA > to deal with them and also deal with the associated integer arithmetic as > pointer arithmetic. I agree with that: we should avoid introducing ptr2int, since it's a very strong construct. We are working on a patch that includes improvements to AA to support ptr2int. We'll share the code & benchmarks results in a few weeks hopefully. > If we want to go down this route nonetheless, we could use some new > instruction that returned some kind of opaque value. The key here is to > prevent the introduction of arithmetic on these values and allow us to teach > AA about them. Funny, we have been thinking about opaque values as well :) We don't have a patch ready for this yet nor a formal model of it yet. We'll share more details about it later.
(In reply to Nuno Lopes from comment #74) > (In reply to Daniel Berlin from comment #72) > > So, in this model, does this literally mean if the equivalence is not > > between two pointer type arguments, i can use it? > > > > (IE if i ignore equivalences where Op1->getType()->isPointerTy() && > > Op2->getType()->isPointerTy(), is that sufficient, or am i misunderstanding?) > > Yes, that's correct. (well, not sure about the && part, feels like it should > be ||, but probably you cannot propagate equivalences between values of > different types?) > That was one of the design goals: don't make the common case (propagate/fold > integers) more complicated. > Thanks, that's very helpful to know where to go. As for && vs ||, icmp requires types of operands both be the same anyway. So it would actually suffice to just check one :)
The comparision between pointers from different array is Unspecified Behavior, not UB. I spoked about it with Richard a few days ago and he said that they changed wording in a new standard, but the intention was always to have it as Unspecified Behavior. It also just makes sense, compiler should not be legal to insert unreachable when it finds out that pointers from different arrays are compared.
(In reply to Nuno Lopes from comment #74) > (In reply to Daniel Berlin from comment #72) > > So, in this model, does this literally mean if the equivalence is not > > between two pointer type arguments, i can use it? > > > > (IE if i ignore equivalences where Op1->getType()->isPointerTy() && > > Op2->getType()->isPointerTy(), is that sufficient, or am i misunderstanding?) > > Yes, that's correct. (well, not sure about the && part, feels like it should > be ||, but probably you cannot propagate equivalences between values of > different types?) > That was one of the design goals: don't make the common case (propagate/fold > integers) more complicated. > > > (In reply to Hal Finkel from comment #73) > > > > if p points to one pass the last byte of one object and q points to another > > > > object, the comparison is UB. When it is UB, why can't compiler do anything > > > > it pleases? > > > > > > I thought http://eel.is/c++draft/expr.eq#2.1 said it was unspecified, but I > > > may > > > have misunderstood. > > > > It says that. So we can do what we wish. It would be reasonable, I think, to > > always fold the comparisons to false. > > We can *try* to fold it to false whenever we spot this case, but we cannot > guarantee that it will always return false. That's what I meant. > Otherwise we would need to > instrument the code to detect this case and return false (since otherwise > the comparison in hardware will return true because we'll do a simple > integer comparison). > By defining the comparison of p+n==q to return a non-deterministic value, it > allows the compiler to fold the comparison to false, and the hardware to > return true or false. Exactly. This is what I'd prefer. This may be a nice thing we can't have, however. My concern is that there's some reason we can't do that and still end up with a model that makes sense. To put it another way, if I have a comparison that gives one answer to some observers and other answers to other observers, there may be nothing that prevents that logical contradiction from infecting the remainder of the execution. Moreover, there may be no reasonably way to program the remainder of the program such that actual UB is avoided in the presence of arbitrary logical contractions. I'll also note that there is some related history here. In lib/Analysis/InstructionSimplify.cpp we have: // A significant optimization not implemented here is assuming that alloca // addresses are not equal to incoming argument values. They don't *alias*, // as we say, but that doesn't mean they aren't equal, so we take a // conservative approach. // // This is inspired in part by C++11 5.10p1: // "Two pointers of the same type compare equal if and only if they are both // null, both point to the same function, or both represent the same // address." // // This is pretty permissive. // // It's also partly due to C11 6.5.9p6: // "Two pointers compare equal if and only if both are null pointers, both are // pointers to the same object (including a pointer to an object and a // subobject at its beginning) or function, both are pointers to one past the // last element of the same array object, or one is a pointer to one past the // end of one array object and the other is a pointer to the start of a // different array object that happens to immediately follow the first array // object in the address space.) // // C11's version is more restrictive, however there's no reason why an argument // couldn't be a one-past-the-end value for a stack object in the caller and be // equal to the beginning of a stack object in the callee. // // If the C and C++ standards are ever made sufficiently restrictive in this // area, it may be possible to update LLVM's semantics accordingly and reinstate // this optimization. static Constant * computePointerICmp(const DataLayout &DL, const TargetLibraryInfo *TLI, ... // Various optimizations for (in)equality comparisons. if (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) { // Different non-empty allocations that exist at the same time have // different addresses (if the program can tell). Global variables always // exist, so they always exist during the lifetime of each other and all // allocas. Two different allocas usually have different addresses... // // However, if there's an @llvm.stackrestore dynamically in between two // allocas, they may have the same address. It's tempting to reduce the // scope of the problem by only looking at *static* allocas here. That would // cover the majority of allocas while significantly reducing the likelihood // of having an @llvm.stackrestore pop up in the middle. However, it's not // actually impossible for an @llvm.stackrestore to pop up in the middle of // an entry block. Also, if we have a block that's not attached to a // function, we can't tell if it's "static" under the current definition. // Theoretically, this problem could be fixed by creating a new kind of // instruction kind specifically for static allocas. Such a new instruction // could be required to be at the top of the entry block, thus preventing it // from being subject to a @llvm.stackrestore. Instcombine could even // convert regular allocas into these special allocas. It'd be nifty. // However, until then, this problem remains open. // // So, we'll assume that two non-empty allocas have different addresses // for now. // // With all that, if the offsets are within the bounds of their allocations // (and not one-past-the-end! so we can't use inbounds!), and their // allocations aren't the same, the pointers are not equal. // // Note that it's not necessary to check for LHS being a global variable // address, due to canonicalization and constant folding. ... > > > (In reply to Hal Finkel from comment #71) > > As a technical matter, I'm strongly opposed to any transformations > > introducing new ptr2int/int2ptr instructions. Unless, that is, we teach AA > > to deal with them and also deal with the associated integer arithmetic as > > pointer arithmetic. > > I agree with that: we should avoid introducing ptr2int, since it's a very > strong construct. We are working on a patch that includes improvements to AA > to support ptr2int. We'll share the code & benchmarks results in a few weeks > hopefully. Different topic, but my immediate question is going to be: can we canonicalize away from the ptr2int/int2ptr instead of teaching AA about them? > > > > If we want to go down this route nonetheless, we could use some new > > instruction that returned some kind of opaque value. The key here is to > > prevent the introduction of arithmetic on these values and allow us to teach > > AA about them. > > Funny, we have been thinking about opaque values as well :) > We don't have a patch ready for this yet nor a formal model of it yet. We'll > share more details about it later.
(In reply to Piotr Padlewski from comment #76) > The comparision between pointers from different array is Unspecified > Behavior, not UB. I spoked about it with Richard a few days ago and he said > that they changed wording in a new standard, but the intention was always to > have it as Unspecified Behavior. It also just makes sense, compiler should > not be legal to insert unreachable when it finds out that pointers from > different arrays are compared. You're right. I am not sure that unspecified behavior is strong enough to let us fold these comparisons away.
(In reply to Hal Finkel from comment #78) > (In reply to Piotr Padlewski from comment #76) > > The comparision between pointers from different array is Unspecified > > Behavior, not UB. I spoked about it with Richard a few days ago and he said > > that they changed wording in a new standard, but the intention was always to > > have it as Unspecified Behavior. It also just makes sense, compiler should > > not be legal to insert unreachable when it finds out that pointers from > > different arrays are compared. > > You're right. I am not sure that unspecified behavior is strong enough to > let us fold these comparisons away. It does not. By this standard says "language does not give you guarantee that things will be in right order in memory so that the comparision will always produce the same value". It could not be Implementation Defined, because most of the impmentation does not give this guarantee either.
> > Otherwise we would need to > > instrument the code to detect this case and return false (since otherwise > > the comparison in hardware will return true because we'll do a simple > > integer comparison). > > By defining the comparison of p+n==q to return a non-deterministic value, it > > allows the compiler to fold the comparison to false, and the hardware to > > return true or false. > > Exactly. This is what I'd prefer. This may be a nice thing we can't have, > however. My concern is that there's some reason we can't do that and still > end up with a model that makes sense. To put it another way, if I have a > comparison that gives one answer to some observers and other answers to > other observers, there may be nothing that prevents that logical > contradiction from infecting the remainder of the execution. Moreover, there > may be no reasonably way to program the remainder of the program such that > actual UB is avoided in the presence of arbitrary logical contractions. (...) This is pretty cool info; thanks for the pointers. So far I was focusing only on C++ standard, hoping that the C standard would be equal, but apparently not. I agree with your points regarding not being possible to fold comparisons to false if you read the C++ standard to the letter. It's funny that both GCC and LLVM ignore the standard, but MSVC and ICC do not: https://godbolt.org/g/Xbybrz IMHO, there should be a difference between comparing pointers and pointers casted to integer. And there is in fact. If we want a memory model based on objects and so on, it makes sense to yield false if we compare 2 different objects. > > (In reply to Hal Finkel from comment #71) > > > As a technical matter, I'm strongly opposed to any transformations > > > introducing new ptr2int/int2ptr instructions. Unless, that is, we teach AA > > > to deal with them and also deal with the associated integer arithmetic as > > > pointer arithmetic. > > > > I agree with that: we should avoid introducing ptr2int, since it's a very > > strong construct. We are working on a patch that includes improvements to AA > > to support ptr2int. We'll share the code & benchmarks results in a few weeks > > hopefully. > > Different topic, but my immediate question is going to be: can we > canonicalize away from the ptr2int/int2ptr instead of teaching AA about them? I only have 2 alternatives at the moment: ptr2int or introduce this new union instruction. We will benchmark both and then report back. There's no way of escaping some sort of int2ptr instruction, because it exists to widen the provenance of pointers: normal pointers can only point to one object, while pointers returned by int2ptr may alias 1, 2, or more. Whatever way we "fix" this, we will always want/need to teach AA about this multiple provenance of some pointers, but that still don't alias with everybody.
(In reply to Hal Finkel from comment #78) > (In reply to Piotr Padlewski from comment #76) > > The comparision between pointers from different array is Unspecified > > Behavior, not UB. I spoked about it with Richard a few days ago and he said > > that they changed wording in a new standard, but the intention was always to > > have it as Unspecified Behavior. It also just makes sense, compiler should > > not be legal to insert unreachable when it finds out that pointers from > > different arrays are compared. > > You're right. I am not sure that unspecified behavior is strong enough to > let us fold these comparisons away. Reading the C++17 standard again, I think it is: http://eel.is/c++draft/defns.unspecified unspecified behavior: behavior, for a well-formed program construct and correct data, that depends on the implementation. (my translation: you can return any value; but you cannot trigger UB otherwise it wouldn't be well-formed) http://eel.is/c++draft/expr.eq#2.1 (p == q+n) is unspecified Since it's up to the implementation, I can define the semantics to be "return true/false arbitrarily". We cannot return undef, since it can take different values on each use, but we can return a non-deterministic value (i.e., freeze(poison) in our model). The caveat is that theoretically we need to be careful when duplicating icmps. If the C++ program has a single comparison and then we duplicate it in the IR for some reason, then both of the comparisons would have to return the same value. So if we fold one of the icmps, we would need to guarantee that we will fold all the duplicates. It probably happens in practice (why wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim we cannot fold the comparisons to false.
(In reply to Nuno Lopes from comment #81) > (In reply to Hal Finkel from comment #78) > > (In reply to Piotr Padlewski from comment #76) > > > The comparision between pointers from different array is Unspecified > > > Behavior, not UB. I spoked about it with Richard a few days ago and he said > > > that they changed wording in a new standard, but the intention was always to > > > have it as Unspecified Behavior. It also just makes sense, compiler should > > > not be legal to insert unreachable when it finds out that pointers from > > > different arrays are compared. > > > > You're right. I am not sure that unspecified behavior is strong enough to > > let us fold these comparisons away. > > Reading the C++17 standard again, I think it is: > http://eel.is/c++draft/defns.unspecified > unspecified behavior: behavior, for a well-formed program construct and > correct data, that depends on the implementation. > > (my translation: you can return any value; but you cannot trigger UB > otherwise it wouldn't be well-formed) > > > http://eel.is/c++draft/expr.eq#2.1 > (p == q+n) is unspecified > > Since it's up to the implementation, I can define the semantics to be > "return true/false arbitrarily". We cannot return undef, since it can take > different values on each use, but we can return a non-deterministic value > (i.e., freeze(poison) in our model). > The caveat is that theoretically we need to be careful when duplicating > icmps. If the C++ program has a single comparison and then we duplicate it > in the IR for some reason, then both of the comparisons would have to return > the same value. So if we fold one of the icmps, we would need to guarantee > that we will fold all the duplicates. It probably happens in practice (why > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > we cannot fold the comparisons to false. Yes sorry, you are right, I was thinking of deleting the compare instead of folding. If the optimizer figures out the value of address, then I don't see why it would not be able to fold the compare to constant.
(In reply to Nuno Lopes from comment #81) > (In reply to Hal Finkel from comment #78) > > (In reply to Piotr Padlewski from comment #76) > > > The comparision between pointers from different array is Unspecified > > > Behavior, not UB. I spoked about it with Richard a few days ago and he said > > > that they changed wording in a new standard, but the intention was always to > > > have it as Unspecified Behavior. It also just makes sense, compiler should > > > not be legal to insert unreachable when it finds out that pointers from > > > different arrays are compared. > > > > You're right. I am not sure that unspecified behavior is strong enough to > > let us fold these comparisons away. > > Reading the C++17 standard again, I think it is: > http://eel.is/c++draft/defns.unspecified > unspecified behavior: behavior, for a well-formed program construct and > correct data, that depends on the implementation. > > (my translation: you can return any value; but you cannot trigger UB > otherwise it wouldn't be well-formed) > > > http://eel.is/c++draft/expr.eq#2.1 > (p == q+n) is unspecified > > Since it's up to the implementation, I can define the semantics to be > "return true/false arbitrarily". We cannot return undef, since it can take > different values on each use, but we can return a non-deterministic value > (i.e., freeze(poison) in our model). > The caveat is that theoretically we need to be careful when duplicating > icmps. If the C++ program has a single comparison and then we duplicate it > in the IR for some reason, then both of the comparisons would have to return > the same value. So if we fold one of the icmps, we would need to guarantee > that we will fold all the duplicates. This is exactly the thing that I don't know how to guarantee, at least not without making them strictly no-duplicate (which would be far worse than not folding in this case). I could have the icmp in a function that is called in a loop. We could then unroll the loop and only inline some of the call sites. Maybe for the inlined call sites we fold the comparison, and for the others we don't. There seem to be lots of inlining/unrolling situations in which this could happen. Maybe we should do this anyway, but I think some thought will be required to define what this means semantically. > It probably happens in practice (why > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > we cannot fold the comparisons to false.
(In reply to Hal Finkel from comment #83) > > It probably happens in practice (why > > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > > we cannot fold the comparisons to false. Going back a step: why do we care about folding "p+n==q" to false? Does this pattern come up often enough that constant folding it is necessary?
(In reply to Sanjoy Das from comment #84) > (In reply to Hal Finkel from comment #83) > > > It probably happens in practice (why > > > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > > > we cannot fold the comparisons to false. > > Going back a step: why do we care about folding "p+n==q" to false? Does this > pattern come up often enough that constant folding it is necessary? I don't think we do. I think the question is: if I have pointers p and q, and p is based on a and q is based on b, and I have p == q, can I fold that to false? If I need to prove first that p != a + n and q != b + m, then it becomes harder.
(In reply to Hal Finkel from comment #85) > (In reply to Sanjoy Das from comment #84) > > (In reply to Hal Finkel from comment #83) > > > > It probably happens in practice (why > > > > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > > > > we cannot fold the comparisons to false. > > > > Going back a step: why do we care about folding "p+n==q" to false? Does this > > pattern come up often enough that constant folding it is necessary? > > I don't think we do. I think the question is: if I have pointers p and q, > and p is based on a and q is based on b, and I have p == q, can I fold that > to false? If I need to prove first that p != a + n and q != b + m, then it > becomes harder. (I feel like I may be rehashing things that have already been settled, please let me know if that's the case.) That's what I was asking though -- are there cases where folding "p == q" to "false" (assuming we can prove based_on(p) != based_on(q)) is important? Why can't we say that since we can't do provenance based comparisons at runtime, pointer comparisons are equivalent to comparing their bitwise representations at runtime? This prevents propagateEquality on pointer types though, but it looks like we're fine with that?
> By defining the comparison of p+n==q to return a non-deterministic value, it > allows the compiler to fold the comparison to false, and the hardware to return > true or false. > Going back a step: why do we care about folding "p+n==q" to false? Does this > pattern come up often enough that constant folding it is necessary? I tried to catch LLVM in the act of making pointer comparison non-deterministic, but actually it seems to try hard to not do so. In particular, while it will fold "p+n == q" to false if that's the only use of p and qo, it no longer does so if p or q are escaped: <https://godbolt.org/g/RD91og>. So, is this "p+n == q --> false" actually handled similar to foldAllocaCmp? (I've always been wondering why that one should only handle alloca, anyway.)
(In reply to Sanjoy Das from comment #86) > (In reply to Hal Finkel from comment #85) > > (In reply to Sanjoy Das from comment #84) > > > (In reply to Hal Finkel from comment #83) > > > > > It probably happens in practice (why > > > > > wouldn't it?). I agree it's a potentially risky thing, but I wouldn't claim > > > > > we cannot fold the comparisons to false. > > > > > > Going back a step: why do we care about folding "p+n==q" to false? Does this > > > pattern come up often enough that constant folding it is necessary? > > > > I don't think we do. I think the question is: if I have pointers p and q, > > and p is based on a and q is based on b, and I have p == q, can I fold that > > to false? If I need to prove first that p != a + n and q != b + m, then it > > becomes harder. > > (I feel like I may be rehashing things that have already been settled, > please let me know if that's the case.) > > That's what I was asking though -- are there cases where folding "p == q" to > "false" (assuming we can prove based_on(p) != based_on(q)) is important? Are you assuming you can never determine q is a constant? IE these are always variable pointers? Or are you also considering the case where q is constant? Because the obvious answer when q is constant nullptr is "null pointer checks". Past that, this is generally how value profiling would work to remove aliasing, etc, so we should be careful there is some way to make that work (IE comparison of p or q with constant pointers or ranges, usually) Past all that, when p and q is "guaranteed" to be non-constant, yeah, don't care in practice.
For reference, these slides give an illustrated explanation on this bug: http://llvm.org/devmtg/2018-04/slides/Lopes-Sbirlea-Pointers,%20Alias%20and%20ModRef%20Analyses.pdf
(In reply to Xinliang David Li from comment #58) > Any small example test case about how the magic pointer can break things? There is now a bunch of examples in bug 44313 and bug 44374. Mostly they provide full simple examples for aspects already touched in this bug in some way or another. But the example with `restrict` adds a surprising twist to the whole mess. (In reply to Ralf Jung from comment #87) > I tried to catch LLVM in the act of making pointer comparison > non-deterministic, but actually it seems to try hard to not do so. It did in the past but that was fixed in 2014 -- see bug 13507 and bug 21327. (In reply to Piotr Padlewski from comment #49) > as I talked with Reid and Richard, it seems that the devirtualization has > the same problem right now: > > https://godbolt.org/g/LS9jc4 It seems there was an effort[1][2] to fix the problem as it applies to devirtualization independently from the general case, and the particular testcase was fixed between clang 6 and 7. I've just filed bug 44472 with new testcases. [1] https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo [2] https://www.youtube.com/watch?v=Dt4UehzzcsE
(In reply to Alexander Cherepanov from comment #90) > (In reply to Xinliang David Li from comment #58) > > Any small example test case about how the magic pointer can break things? > > There is now a bunch of examples in bug 44313 and bug 44374. Mostly they > provide full simple examples for aspects already touched in this bug in some > way or another. But the example with `restrict` adds a surprising twist to > the whole mess. > There was also another bug that involves restrict and int-ptr casting that were reported by us: https://bugs.llvm.org/show_bug.cgi?id=39846 Conclusion was that we need an intrinsic something like @llvm.ptr.also_derived_from(base, a, b, c, ...) that can be used for safely replacing a pointer based on pointer equality. https://bugs.llvm.org/show_bug.cgi?id=39846#c8 That was on my to-do list, but I was busy with all other things including development of alive2 with Nuno (https://github.com/aliveToolkit/alive2 ), so its priority was low. Actually, what we've recently found with alive2 was that it is safe to replace a pointer with another in certain cases, e.g. define i8* @f(i8* nocapture %p, i8* %q) { if (%p == %q) ret %p // this can be optimized to ret %q ret %q } (alive2 is still in progress; it does not support inttoptr and noalias yet, so wouldn't check your bug and this inttoptr bug, but I'll be happy if this tool can help determine when it's safe to replace pointer in the future). > (In reply to Ralf Jung from comment #87) > > I tried to catch LLVM in the act of making pointer comparison > > non-deterministic, but actually it seems to try hard to not do so. > > It did in the past but that was fixed in 2014 -- see bug 13507 and bug 21327. > This is very interesting. So LLVM (at least in 2014) regards such behavior as buggy.
There are many thoughtful and thought-provoking comments here, thanks for this! But one comment has far-reaching consequences and especially resonated with my older thoughts: (In reply to Hal Finkel from comment #77) > To put it another way, if I have a comparison that gives one answer to some > observers and other answers to other observers, there may be nothing that > prevents that logical contradiction from infecting the remainder of the > execution. This is a quite powerful principle. 1. An illustration for its application to comparisons of the form `&x == &y + 1` in gcc should be here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502#c42. If clang is ever modified to make such comparisons unstable again I guess it would be applicable to clang too. (But it's straightforward to demonstrate the same with bug 44342 for now if anyone wants.) 2. Is it possible to have unfrozen `undef` at all? See bug 44512.
I came here from https://blog.regehr.org/archives/1737 and can maybe add another perspective. It seems that the bug is the result of mixing two concepts of a pointer. First, as a symbolic reference to another value, second as the bit-pattern of a memory address. The former could also be represented as a tuple (baseptr,offset), the baseptr being the pointer's provenance. Pointers with different provenance cannot alias. The problem being that there exist identical address bit-patterns that represent different logical pointers, namely the address of the pointer one past the baseptr's memory allocation can be another allocation.[*] In the model of pointer as a logical reference, ptr2int is an implementation-defined non-injective (non-surjective) functions. That is, p!=q does not imply ptr2int(p)!=ptr2int(q) because of the validity of one-past-the-end pointers. Therefore, int2ptr is ambiguous: we cannot infer an unique (baseptr,offset) from the integer value. The conversion int2ptr(ptr2int(p)) therefore is 'lossy'. AliasAnalysis is linked to pointer comparison in that if we can deduce `p+n < q || q+n < p` to be unconditionally false, then they do not not alias. I hope this formalizes a bit what is also called 'magic', 'opaque' and 'non-deterministic' in this thread. Probably it's not just me who got confused. Clearly, we want the transformation `(a != b ? b : a) == b` (comment #17) to eventually take place on the address bit-pattern. I don't see why the final program needs to run slower on pointers than on integers. Note that, on the IR level, this mixes-up provenance regardless whether a or be are the result of int2ptr (comment #43), so a solution that only handles int2ptr will be insufficient, at least for front-end languages that don't make pointer comparison with different provenance undefined. I can think of two solutions: 1. We introduce two phases into the pass pipeline. In the first phase, pointers are logical references with provenance. The identity of pointers must be kept distinct unless it can be shown that they have the same provenance, especially within GVN. Operations such as `q - p` are undefined (poison?) if q and p have different provenance. In the second phase, pointers are assumed to be address bit-patterns, i.e. ptr2int is assumed on any pointers. This is a kind of lowering. AliasAnalysis may not assume provenance anymore. 2. We track provenance explicitly (comment #54), e.g. using metadata inserted by e.g. the front-end: %x = alloca !provenance !0 %add.ptr = getelementptr inbounds [1 x i32], [1 x i32]* %y, i64 0, i64 1 !provenance !1 %cmp = icmp ne i32* %x, %add.ptr %add.ptr.x = select i1 %cmp, i32* %add.ptr, i32* %x Since provenance is language concept, it makes sense to not just assume the same thing in the IR, especially since there seem to be differences in C++ and C (comment #77). If my interpretation is correct, the icmp above is required to be an integer comparison in C11, but can be assumed to be false in C++. In any case, the provenance of %add.ptr.x as to be cleared ("assumed unknown") since it merges pointers of different provenance to a single pointer. Alternatively, provenance could be a set. Stripping all the provenance matadata would be equivalent to entering phase two of the previous solution. Unfortunately, this solution increases the overhead into the direction of ScopedAA. Note that there is the special case of std::less: To make std::set/std::map work, it must be defined on pointers of different provenance. https://en.cppreference.com/w/cpp/utility/functional/less says: > If the function call operator of the specialization std::less<void> calls a built-in operator comparing pointers, it yields a strict total order even if the built-in operator< does not. In libc++, I don't see any special mechanism to make on all pointers (https://github.com/llvm/llvm-project/blob/master/libcxx/include/__functional_base#L54). This case might be interesting: https://godbolt.org/z/mU5nZp int x[1], y[1]; std::set<int*> MySet; MySet.insert(&x[1]); MySet.insert(&y[0]); printf("count = %d\n", (int)MySet.size()); [*] We ignore that within a segmented memory model, or virtual page tables, multiple virtual memory addresses can be mapped to the same physical memory.
Note that to just allocate another byte past the last element of arrays (comment #30) for stack variables and globals (and malloc implementations, in particular BumpPtrAllocator) would probably be easiest: It makes the int2ptr function injective. Similar the reason why the size of an empty struct is not zero and malloc(0) returns an unique allocation.
Michael, my guess is that adding a byte (which of course becomes more than a byte when there are alignment requirements) is a showstopper for memory-constrained embedded use cases. Also note that (I think!) this would have to be done even for stack variables and globals that are not arrays.
Just mentioning it would be easiest and there is precedence. For stack variables, I initially expected it to only necessary whenever there is pointer arithmetic involving non-constants which is only common for e.g. buffers. However, you're right it might be necessary for any captured variable; especially in C where there a pointer of a scalar variable is an array of size one and pass-by-reference is done by passing the pointer. There is an aspect of AliasAnalysis that I did not consider: it might only care about dereferenceable pointers. Since the one-past-the-last pointer is not dereferenceable, int2ptr is injective over the set of derefenceable pointers. However, in the original test case, AliasAnalysis is queried about &x and one-past &y. &x is dereferenceable, but one-past &y is not which BasicAA does not consider. Not sure whether LLVM's AliasAnalysis interface is only defined for valid pointer representations (ie including one-past-the-last) of just dereferenceable pointers. The latter would be hard for analysis since one had to proof whether a pointer is dereferenceable before an AA query.
> Note that to just allocate another byte past the last element of arrays (comment #30) for stack variables and globals (and malloc implementations, in particular BumpPtrAllocator) would probably be easiest: It makes the int2ptr function injective. Similar the reason why the size of an empty struct is not zero and malloc(0) returns an unique allocation. `getelementptr` without `inbounds` is allowed to create pointers that are more than "one past the end", and it is likewise allowed to cast those to integers and back. So, in LLVM, I don't think there is any way to make int2ptr injective. Also see https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf for a detailed discussion of LLVM pointer semantics.
The LLVM reference (except getelementptr inbounds) seem to miss concepts of valid pointer representation, pointer provenance or partial ordering of pointers. The ability of BasicAA to deduce pointers as non-aliasing based on provenance seem to be inferred from C/C++ semantics. From the semantics of icmp: > If the operands are pointer typed, the pointer values are compared as if they were integers. Sanjoy Das seem to have worked on non-integral pointer types which cannot be created "from thin air" by disallowing integer conversion instructions. http://llvm.org/docs/LangRef.html#nointptrtype
(In reply to Michael Kruse from comment #96) > Not sure whether LLVM's AliasAnalysis interface is only defined for valid > pointer representations (ie including one-past-the-last) of just > dereferenceable pointers. The latter would be hard for analysis since one > had to proof whether a pointer is dereferenceable before an AA query. The alias() API is defined for any pair of pointers which are both defined at some position in the program. The question is essentially, given two pointer values, for any location in the program where both pointer values are defined, is it possible that a store to one pointer would conflict with a load or store to the other pointer? The possible answers are essentially yes, no, or don't know. If it in't legal to dereference a pointer, it trivially doesn't alias anything: there are no memory accesses through the pointer, so there isn't any possible conflict. Similarly, if a pointer points to a constant, it doesn't alias anything: there are no stores through it. (In reply to Michael Kruse from comment #98) > The LLVM reference (except getelementptr inbounds) seem to miss concepts of > valid pointer representation, pointer provenance or partial ordering of > pointers. The ability of BasicAA to deduce pointers as non-aliasing based on > provenance seem to be inferred from C/C++ semantics. icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef. If the text of LangRef describing icmp isn't sufficiently clear, suggestions welcome. If some transform isn't consistent with this, please file a bug. For provenance, the rules we have are described at http://llvm.org/docs/LangRef.html#pointeraliasing . The problem described here is that the rules don't actually work. See also http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2311.pdf .
> icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef. Doesn't LLVM also assume that a fresh pointer returned by malloc does not equal any older pointer, and optimize icmp accordingly? They could physically be equal if addresses are reused (and that easily happens e.g. with jemalloc). C has a clause that pointers become undef when you deallocate the block they point to, but I have not seen such a clause in the LVLM semantics (but I might have just missed it).
(In reply to Ralf Jung from comment #100) > > icmp is defined to just take the raw pointer bits as an integer. This is described in LangRef. > > Doesn't LLVM also assume that a fresh pointer returned by malloc does not > equal any older pointer, and optimize icmp accordingly? There's a class of optimizations where LLVM implicitly replaces malloc/new with an alloca, and we can optimize comparisons against that. In cases where that optimization isn't legal, LLVM doesn't make any assumptions, as far as I know.
> If some transform isn't consistent with this, please file a bug. https://bugs.llvm.org/show_bug.cgi?id=46380
For those who have not seen this. WG14 now has a draft TS which addresses this topic: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf