Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

Open
aqjune opened this issue Oct 27, 2017 · 14 comments
Open

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

aqjune opened this issue Oct 27, 2017 · 14 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@aqjune
Copy link
Contributor

aqjune commented Oct 27, 2017

Bugzilla Link 35102
Version trunk
OS Linux
Attachments cpp code that removes for loop away
CC @majnemer,@vns-mn,@aqjune,@nunoplopes,@RalfJung,@regehr,@sanjoy,@rotateright

Extended Description

Compiling this code with -O3 generates following assembly code:

-- unreachable.cpp --

#include <stdint.h>
void unreachable();
static void compare_to_all(uintptr_t i) {
    uintptr_t cur = 0;
    while(true) {
        if ((int*)i == (int*)cur) return;
        if (cur == UINTPTR_MAX) break;
        ++cur;
    };
    unreachable();
}
void test() {
    int i = 1;
    compare_to_all((uintptr_t)&i);
}

--

-- unreachable.s --

test(): # @test()
  jmp unreachable() # TAILCALL

--

InstCombiner::foldAllocaCmp folds (int*)i == (int*)cur into false after checking that there exists only one icmp instruction which uses alloca i.
If the purpose of this optimization was to fold single pointer equality comparison into false, syntactically checking the number of icmp isn't enough. It should also check whether the icmp is not in a loop. In following code (test2.c), this optimization folds the pointer comparison in foo(), but not in gee().

-- test2.c --
void foo(int* p) {
int i;

for (int j = 0; j < 2; j++) {
if (&i == p+j) {
printf("ok\n");
}
}
}

void gee(int* p) {
int i;

if (&i == p+0) {
printf("ok\n");
}

if (&i == p+1) {
printf("ok\n");
}
}

unreachable.cpp written by Ralf Jung, test2.c written by Gil Hur

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 27, 2017

FWIW, for your first example, GCC generates the same exact code as LLVM.

For the second, GCC generates this:

foo(int*):
rep ret
gee(int*):
rep ret

@sanjoy
Copy link
Contributor

sanjoy commented Oct 27, 2017

I think the idea in foldAllocaCmp is not to fold only one cmp, but to make sure we fold all compares. Checking that there is only one icmp on an alloca is just one way of making sure we fold all compares in the same way (and e.g. we don't fold both alloca == val and alloca != val to true, for instance).

Here is a comment from the source code:

// The
// single-comparison-use condition ensures that we're trivially folding all
// comparisons against the alloca consistently

@aqjune
Copy link
Contributor Author

aqjune commented Oct 29, 2017

GCC does this optimization more aggressively, but
ICC 17 is more conservative:
https://godbolt.org/g/JBpouG

I think InstCombiner::foldAllocaCmp does this
optimization because it assumes that alloca block
can be placed appropriately so the comparison can
be folded into false.
Here is relevant comment from foldAllocaCmp.
// But LLVM doesn't specify where allocas get their memory, so if the alloca
// doesn't escape we can argue that it's impossible to guess its value, and we
// can therefore act as if any such guesses are wrong.

This is valid if the comparison
is done only once; but if every possible integer
address is investigated (like compare_to_all in the first example),
this is not true anymore because such alloca does not exist.

@RalfJung
Copy link
Contributor

FWIW, for your first example, GCC generates the same exact code as LLVM.

In this concrete example, GCC can rely on C's rule that casting an integer to a pointer is UB if that pointer is out-of-bounds. (I assume C++ has a similar clause.) However, AFAIK, inttoptr in LLVM does not have a similar rule, so there should be no UB in the following LLVM snippet (taken from the IR of unreachable.cpp):

%5 = load i64, i64* %2, align 8, !dbg !​29
%6 = inttoptr i64 %5 to i32*, !dbg !​32
%7 = load i64, i64* %3, align 8, !dbg !​33
%8 = inttoptr i64 %7 to i32*, !dbg !​34
%9 = icmp eq i32* %6, %8, !dbg !​35
br i1 %9, label %10, label %11, !dbg !​36

@sanjoy
Copy link
Contributor

sanjoy commented Oct 30, 2017

I think InstCombiner::foldAllocaCmp does this
optimization because it assumes that alloca block
can be placed appropriately so the comparison can
be folded into false.
Here is relevant comment from foldAllocaCmp.
// But LLVM doesn't specify where allocas get their memory, so if the
alloca
// doesn't escape we can argue that it's impossible to guess its value,
and we
// can therefore act as if any such guesses are wrong.

This is valid if the comparison
is done only once; but if every possible integer
address is investigated (like compare_to_all in the first example),
this is not true anymore because such alloca does not exist.

I see. In this model, do you have a problem with programs like this too (I've discussed this with Nuno in person):

// This program as written is not practical
std::set<uintptr_t> addrs_seen;
// Assume 32 bit pointers
while (addrs_seen.size() != ((1 << 32) - 1)) {
x = alloca i8
addrs_seen.insert((intptr_t)x);
}

uintptr_t i = only_32_bit_integer_not_in_set(addrs_seen);
x = alloca i8
assert(i == (uintptr_t)x); // has to be true

@aqjune
Copy link
Contributor Author

aqjune commented Oct 30, 2017

Right, it has to be true.

Defining correctness of optimizations when the memory is allowed to be exhaustively used is really hard, though.

@sanjoy
Copy link
Contributor

sanjoy commented Oct 30, 2017

Right, it has to be true.

But then we can't do the optimization at all today right, even if there is just one instance of the compare outside a loop?

Defining correctness of optimizations when the memory is allowed to be
exhaustively used is really hard, though.

IMO we should avoid removing optimizations until we have a proper roadmap of what the final semantics will look like. Otherwise we risk leaving the source in an inconsistent state that will confuse readers, though you can argue that it already is inconsistent and confusing. :)

@regehr
Copy link
Contributor

regehr commented Oct 30, 2017

I'd agree that it's worth thinking more before changing the compiler to behave differently for test cases like this one, unless we can show that the optimization in question really doesn't help for any program we care about.

@regehr
Copy link
Contributor

regehr commented Nov 1, 2017

Rich Felker was pointing out on twitter that you can remove a potential UB (and thus a potential justification for the optimization) by doing the comparison in the integer domain instead of the pointer domain:

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == (uintptr_t)cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

At this point this optimization looks very wrong. The compiler is effectively arguing that there's a value of i that is different from all possible values of uintrptr_t.

@regehr
Copy link
Contributor

regehr commented Nov 1, 2017

Oops I had a useless cast.

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

@RalfJung
Copy link
Contributor

RalfJung commented Nov 2, 2017

I tried a few variations of the integer version of this back when I constructed this example, but could not get LLVM to optimize it the same way as it does for the pointer version. Indeed your code does not trigger the problematic optimization: https://godbolt.org/g/5kQX2g

However, the implementation does not actually seem to check for pointer types, so maybe that is just a pass ordering artifact?

@sanjoy
Copy link
Contributor

sanjoy commented Nov 2, 2017

Rich Felker was pointing out on twitter that you can remove a potential UB
(and thus a potential justification for the optimization) by doing the
comparison in the integer domain instead of the pointer domain:

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == (uintptr_t)cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

At this point this optimization looks very wrong. The compiler is
effectively arguing that there's a value of i that is different from all
possible values of uintrptr_t.

Can we interpret the above ^ as saying "ptrtoint(X) == ptrtoint(Y)" is not the same as "X == Y" and that we can allow the foldAllocaCmp optimization only on pointer types?

@regehr
Copy link
Contributor

regehr commented Nov 2, 2017

Can we interpret the above ^ as saying "ptrtoint(X) == ptrtoint(Y)" is not
the same as "X == Y" and that we can allow the foldAllocaCmp optimization
only on pointer types?

I believe this is reasonable but would like to hear from Nuno.

Sorry to have implied that the optimization fires in the integer domain! As Ralf says LLVM does not seem to do this.

@RalfJung
Copy link
Contributor

RalfJung commented Nov 2, 2017

"ptrtoint(X) == ptrtoint(Y)" is not the same as "X == Y"

It is already not the same for other reasons, right? The first compares integers, which is always deterministic, while the second compares pointers, which can be non-deterministic.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

5 participants