New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450
Comments
FWIW, for your first example, GCC generates the same exact code as LLVM. For the second, GCC generates this: foo(int*): |
I think the idea in foldAllocaCmp is not to fold only one cmp, but to make sure we fold all compares. Checking that there is only one icmp on an alloca is just one way of making sure we fold all compares in the same way (and e.g. we don't fold both Here is a comment from the source code: // The |
GCC does this optimization more aggressively, but I think InstCombiner::foldAllocaCmp does this This is valid if the comparison |
In this concrete example, GCC can rely on C's rule that casting an integer to a pointer is UB if that pointer is out-of-bounds. (I assume C++ has a similar clause.) However, AFAIK, inttoptr in LLVM does not have a similar rule, so there should be no UB in the following LLVM snippet (taken from the IR of unreachable.cpp): %5 = load i64, i64* %2, align 8, !dbg !29 |
I see. In this model, do you have a problem with programs like this too (I've discussed this with Nuno in person): // This program as written is not practical uintptr_t i = only_32_bit_integer_not_in_set(addrs_seen); |
Right, it has to be true. Defining correctness of optimizations when the memory is allowed to be exhaustively used is really hard, though. |
But then we can't do the optimization at all today right, even if there is just one instance of the compare outside a loop?
IMO we should avoid removing optimizations until we have a proper roadmap of what the final semantics will look like. Otherwise we risk leaving the source in an inconsistent state that will confuse readers, though you can argue that it already is inconsistent and confusing. :) |
I'd agree that it's worth thinking more before changing the compiler to behave differently for test cases like this one, unless we can show that the optimization in question really doesn't help for any program we care about. |
Rich Felker was pointing out on twitter that you can remove a potential UB (and thus a potential justification for the optimization) by doing the comparison in the integer domain instead of the pointer domain: static void compare_to_all2(uintptr_t i) { At this point this optimization looks very wrong. The compiler is effectively arguing that there's a value of i that is different from all possible values of uintrptr_t. |
Oops I had a useless cast. static void compare_to_all2(uintptr_t i) { |
I tried a few variations of the integer version of this back when I constructed this example, but could not get LLVM to optimize it the same way as it does for the pointer version. Indeed your code does not trigger the problematic optimization: https://godbolt.org/g/5kQX2g However, the implementation does not actually seem to check for pointer types, so maybe that is just a pass ordering artifact? |
Can we interpret the above ^ as saying "ptrtoint(X) == ptrtoint(Y)" is not the same as "X == Y" and that we can allow the foldAllocaCmp optimization only on pointer types? |
I believe this is reasonable but would like to hear from Nuno. Sorry to have implied that the optimization fires in the integer domain! As Ralf says LLVM does not seem to do this. |
It is already not the same for other reasons, right? The first compares integers, which is always deterministic, while the second compares pointers, which can be non-deterministic. |
Extended Description
Compiling this code with -O3 generates following assembly code:
-- unreachable.cpp --
--
-- unreachable.s --
--
InstCombiner::foldAllocaCmp folds
(int*)i == (int*)cur
into false after checking that there exists only one icmp instruction which uses alloca i.If the purpose of this optimization was to fold single pointer equality comparison into false, syntactically checking the number of icmp isn't enough. It should also check whether the icmp is not in a loop. In following code (test2.c), this optimization folds the pointer comparison in foo(), but not in gee().
-- test2.c --
void foo(int* p) {
int i;
for (int j = 0; j < 2; j++) {
if (&i == p+j) {
printf("ok\n");
}
}
}
void gee(int* p) {
int i;
if (&i == p+0) {
printf("ok\n");
}
if (&i == p+1) {
printf("ok\n");
}
}
unreachable.cpp written by Ralf Jung, test2.c written by Gil Hur
The text was updated successfully, but these errors were encountered: