InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

aqjune · 2017-10-27T00:45:14Z


Bugzilla Link	35102
Version	trunk
OS	Linux
Attachments	cpp code that removes for loop away
CC	@majnemer,@vns-mn,@aqjune,@nunoplopes,@RalfJung,@regehr,@sanjoy,@rotateright

Extended Description

Compiling this code with -O3 generates following assembly code:

-- unreachable.cpp --

#include <stdint.h>
void unreachable();
static void compare_to_all(uintptr_t i) {
    uintptr_t cur = 0;
    while(true) {
        if ((int*)i == (int*)cur) return;
        if (cur == UINTPTR_MAX) break;
        ++cur;
    };
    unreachable();
}
void test() {
    int i = 1;
    compare_to_all((uintptr_t)&i);
}

--

-- unreachable.s --

test(): # @test()
  jmp unreachable() # TAILCALL

--

InstCombiner::foldAllocaCmp folds (int*)i == (int*)cur into false after checking that there exists only one icmp instruction which uses alloca i.
If the purpose of this optimization was to fold single pointer equality comparison into false, syntactically checking the number of icmp isn't enough. It should also check whether the icmp is not in a loop. In following code (test2.c), this optimization folds the pointer comparison in foo(), but not in gee().

-- test2.c --
void foo(int* p) {
int i;

for (int j = 0; j < 2; j++) {
if (&i == p+j) {
printf("ok\n");
}
}
}

void gee(int* p) {
int i;

if (&i == p+0) {
printf("ok\n");
}

if (&i == p+1) {
printf("ok\n");
}
}

unreachable.cpp written by Ralf Jung, test2.c written by Gil Hur

The text was updated successfully, but these errors were encountered:

llvmbot · 2017-10-27T01:20:20Z

FWIW, for your first example, GCC generates the same exact code as LLVM.

For the second, GCC generates this:

foo(int*):
rep ret
gee(int*):
rep ret

sanjoy · 2017-10-27T16:15:20Z

I think the idea in foldAllocaCmp is not to fold only one cmp, but to make sure we fold all compares. Checking that there is only one icmp on an alloca is just one way of making sure we fold all compares in the same way (and e.g. we don't fold both alloca == val and alloca != val to true, for instance).

Here is a comment from the source code:

// The
// single-comparison-use condition ensures that we're trivially folding all
// comparisons against the alloca consistently

aqjune · 2017-10-29T04:23:40Z

GCC does this optimization more aggressively, but
ICC 17 is more conservative:
https://godbolt.org/g/JBpouG

I think InstCombiner::foldAllocaCmp does this
optimization because it assumes that alloca block
can be placed appropriately so the comparison can
be folded into false.
Here is relevant comment from foldAllocaCmp.
// But LLVM doesn't specify where allocas get their memory, so if the alloca
// doesn't escape we can argue that it's impossible to guess its value, and we
// can therefore act as if any such guesses are wrong.

This is valid if the comparison
is done only once; but if every possible integer
address is investigated (like compare_to_all in the first example),
this is not true anymore because such alloca does not exist.

RalfJung · 2017-10-29T10:43:36Z

FWIW, for your first example, GCC generates the same exact code as LLVM.

In this concrete example, GCC can rely on C's rule that casting an integer to a pointer is UB if that pointer is out-of-bounds. (I assume C++ has a similar clause.) However, AFAIK, inttoptr in LLVM does not have a similar rule, so there should be no UB in the following LLVM snippet (taken from the IR of unreachable.cpp):

%5 = load i64, i64* %2, align 8, !dbg !29
%6 = inttoptr i64 %5 to i32*, !dbg !32
%7 = load i64, i64* %3, align 8, !dbg !33
%8 = inttoptr i64 %7 to i32*, !dbg !34
%9 = icmp eq i32* %6, %8, !dbg !35
br i1 %9, label %10, label %11, !dbg !36

sanjoy · 2017-10-30T06:50:22Z

I think InstCombiner::foldAllocaCmp does this
optimization because it assumes that alloca block
can be placed appropriately so the comparison can
be folded into false.
Here is relevant comment from foldAllocaCmp.
// But LLVM doesn't specify where allocas get their memory, so if the
alloca
// doesn't escape we can argue that it's impossible to guess its value,
and we
// can therefore act as if any such guesses are wrong.

This is valid if the comparison
is done only once; but if every possible integer
address is investigated (like compare_to_all in the first example),
this is not true anymore because such alloca does not exist.

I see. In this model, do you have a problem with programs like this too (I've discussed this with Nuno in person):

// This program as written is not practical
std::set<uintptr_t> addrs_seen;
// Assume 32 bit pointers
while (addrs_seen.size() != ((1 << 32) - 1)) {
x = alloca i8
addrs_seen.insert((intptr_t)x);
}

uintptr_t i = only_32_bit_integer_not_in_set(addrs_seen);
x = alloca i8
assert(i == (uintptr_t)x); // has to be true

aqjune · 2017-10-30T20:38:48Z

Right, it has to be true.

Defining correctness of optimizations when the memory is allowed to be exhaustively used is really hard, though.

sanjoy · 2017-10-30T21:02:28Z

Right, it has to be true.

But then we can't do the optimization at all today right, even if there is just one instance of the compare outside a loop?

Defining correctness of optimizations when the memory is allowed to be
exhaustively used is really hard, though.

IMO we should avoid removing optimizations until we have a proper roadmap of what the final semantics will look like. Otherwise we risk leaving the source in an inconsistent state that will confuse readers, though you can argue that it already is inconsistent and confusing. :)

regehr · 2017-10-30T22:23:35Z

I'd agree that it's worth thinking more before changing the compiler to behave differently for test cases like this one, unless we can show that the optimization in question really doesn't help for any program we care about.

regehr · 2017-11-01T19:25:04Z

Rich Felker was pointing out on twitter that you can remove a potential UB (and thus a potential justification for the optimization) by doing the comparison in the integer domain instead of the pointer domain:

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == (uintptr_t)cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

At this point this optimization looks very wrong. The compiler is effectively arguing that there's a value of i that is different from all possible values of uintrptr_t.

regehr · 2017-11-01T19:29:29Z

Oops I had a useless cast.

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

RalfJung · 2017-11-02T08:16:58Z

I tried a few variations of the integer version of this back when I constructed this example, but could not get LLVM to optimize it the same way as it does for the pointer version. Indeed your code does not trigger the problematic optimization: https://godbolt.org/g/5kQX2g

However, the implementation does not actually seem to check for pointer types, so maybe that is just a pass ordering artifact?

sanjoy · 2017-11-02T08:33:19Z

Rich Felker was pointing out on twitter that you can remove a potential UB
(and thus a potential justification for the optimization) by doing the
comparison in the integer domain instead of the pointer domain:

static void compare_to_all2(uintptr_t i) {
uintptr_t cur = 0;
while(1) {
if (i == (uintptr_t)cur) return;
if (cur == UINTPTR_MAX) break;
++cur;
};
unreachable();
}
void test2(void) {
int i = 1;
compare_to_all((uintptr_t)&i);
}

At this point this optimization looks very wrong. The compiler is
effectively arguing that there's a value of i that is different from all
possible values of uintrptr_t.

Can we interpret the above ^ as saying "ptrtoint(X) == ptrtoint(Y)" is not the same as "X == Y" and that we can allow the foldAllocaCmp optimization only on pointer types?

regehr · 2017-11-02T15:12:47Z

Can we interpret the above ^ as saying "ptrtoint(X) == ptrtoint(Y)" is not
the same as "X == Y" and that we can allow the foldAllocaCmp optimization
only on pointer types?

I believe this is reasonable but would like to hear from Nuno.

Sorry to have implied that the optimization fires in the integer domain! As Ralf says LLVM does not seem to do this.

RalfJung · 2017-11-02T15:18:13Z

"ptrtoint(X) == ptrtoint(Y)" is not the same as "X == Y"

It is already not the same for other reasons, right? The first compares integers, which is always deterministic, while the second compares pointers, which can be non-deterministic.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

aqjune commented Oct 27, 2017 •

edited by nikic

llvmbot commented Oct 27, 2017

sanjoy commented Oct 27, 2017

aqjune commented Oct 29, 2017

RalfJung commented Oct 29, 2017

sanjoy commented Oct 30, 2017

aqjune commented Oct 30, 2017

sanjoy commented Oct 30, 2017

regehr commented Oct 30, 2017

regehr commented Nov 1, 2017

regehr commented Nov 1, 2017

RalfJung commented Nov 2, 2017

sanjoy commented Nov 2, 2017

regehr commented Nov 2, 2017

RalfJung commented Nov 2, 2017

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

InstCombiner::foldAllocaCmp does not check whether icmp is not in a loop #34450

Comments

aqjune commented Oct 27, 2017 • edited by nikic

Extended Description

if (&i == p+1) { printf("ok\n"); } }

llvmbot commented Oct 27, 2017

sanjoy commented Oct 27, 2017

aqjune commented Oct 29, 2017

RalfJung commented Oct 29, 2017

sanjoy commented Oct 30, 2017

aqjune commented Oct 30, 2017

sanjoy commented Oct 30, 2017

regehr commented Oct 30, 2017

regehr commented Nov 1, 2017

regehr commented Nov 1, 2017

RalfJung commented Nov 2, 2017

sanjoy commented Nov 2, 2017

regehr commented Nov 2, 2017

RalfJung commented Nov 2, 2017

aqjune commented Oct 27, 2017 •

edited by nikic

if (&i == p+1) {
printf("ok\n");
}
}