Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong code bug: load removed by GVN #33692

Closed
mikaelholmen opened this issue Aug 28, 2017 · 12 comments
Closed

Wrong code bug: load removed by GVN #33692

mikaelholmen opened this issue Aug 28, 2017 · 12 comments
Labels
bugzilla Issues migrated from bugzilla llvm:optimizations

Comments

@mikaelholmen
Copy link
Collaborator

Bugzilla Link 34344
Version unspecified
OS Linux
Attachments Reproducer
CC @bjope,@topperc,@efriedma-quic,@hfinkel,@nunoplopes,@sanjoy

Extended Description

Reproduce with:
opt -S -gvn -o - gvn_input.ll

And you get:

target datalayout = "p:8:8"

define void @​foo() {
entry:
%array = alloca [130 x i8], align 1
br label %for.body

for.body: ; preds = %for.body, %entry
%i = phi i8 [ 0, %entry ], [ %inc, %for.body ]
%idx = zext i8 %i to i16
%addr = getelementptr inbounds [130 x i8], [130 x i8]* %array, i16 0, i16 %idx
store i8 %i, i8* %addr, align 1
%inc = add nuw i8 %i, 1
%exitcond = icmp ne i8 %inc, -126
br i1 %exitcond, label %for.body, label %for.cond.cleanup

for.cond.cleanup: ; preds = %for.body
%addr2 = getelementptr inbounds [130 x i8], [130 x i8]* %array, i16 0, i16 129
tail call void @​bar(i8 undef)
ret void
}

declare void @​bar(i8)

Note that now we pass undef to bar instead of the value loaded from index 129
in the local array. The array is initialized in the loop, from index 0 to (but
not including) index 130.

For some reason GVN fails to realize that the loop indeed writes at index 129,
the only dependency it sees when examining the load, is the alloca, and thus
it thinks we read from uninitialized memory, and the load is removed and replaced
with undef.

As far as I can tell, this bug is not new. It was at least present in November 2016.

(Originally found on my out-of-tree target where we have 16b pointers. Loading from
index 32768 then gives the same problems as index 129 in the example above.)

@efriedma-quic
Copy link
Collaborator

From LangRef:

"If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end."

Note the "signed" part; this might lead to unexpected results if you're allocating objects larger than half the address-space (since "i8 129" is actually "i8 -127").

In theory, it should be fine without the inbounds keyword, but that isn't really well-tested; there are probably bugs lurking.

@mikaelholmen
Copy link
Collaborator Author

From LangRef:

"If the inbounds keyword is present, the result value of the getelementptr
is a poison value if the base pointer is not an in bounds address of an
allocated object, or if any of the addresses that would be formed by
successive addition of the offsets implied by the indices to the base
address with infinitely precise signed arithmetic are not an in bounds
address of that allocated object. The in bounds addresses for an allocated
object are all the addresses that point into the object, plus the address
one byte past the end."

Note the "signed" part; this might lead to unexpected results if you're
allocating objects larger than half the address-space (since "i8 129" is
actually "i8 -127").

But I don't use "i8 129" anywhere in the input program.

The store address is calculated with

%idx = zext %ty %i to i16
%addr = getelementptr inbounds %arrty, %arrty* %array, i16 0, i16 %idx

and the load address

%addr2 = getelementptr inbounds %arrty, %arrty* %array, i16 0, i16 129

So both indices use i16, just to avoid any sign problems.

This is not enough to make it work?

In theory, it should be fine without the inbounds keyword, but that isn't
really well-tested; there are probably bugs lurking.

@mikaelholmen
Copy link
Collaborator Author

From LangRef:

"If the inbounds keyword is present, the result value of the getelementptr
is a poison value if the base pointer is not an in bounds address of an
allocated object, or if any of the addresses that would be formed by
successive addition of the offsets implied by the indices to the base
address with infinitely precise signed arithmetic are not an in bounds
address of that allocated object. The in bounds addresses for an allocated
object are all the addresses that point into the object, plus the address
one byte past the end."

Note the "signed" part; this might lead to unexpected results if you're
allocating objects larger than half the address-space (since "i8 129" is
actually "i8 -127").

But I don't use "i8 129" anywhere in the input program.

The store address is calculated with

%idx = zext %ty %i to i16
%addr = getelementptr inbounds %arrty, %arrty* %array, i16 0, i16 %idx

and the load address

%addr2 = getelementptr inbounds %arrty, %arrty* %array, i16 0, i16 129

So both indices use i16, just to avoid any sign problems.

This is not enough to make it work?

I also saw this in langref now:

"If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer."

So using larger types on the offsets doesn't matter?

I also noticed that instcombine seems to replace large index types with types
with the same size as the pointer:

// Index type should have the same width as IntPtr
Type *IndexTy = (*I)->getType();
Type *NewIndexType = IndexTy->isVectorTy() ?
  VectorType::get(IntPtrTy, IndexTy->getVectorNumElements()) : IntPtrTy;

So it is really pointless to use larger index types?

In theory, it should be fine without the inbounds keyword, but that isn't
really well-tested; there are probably bugs lurking.

The load is removed by GVN even if I remove the inbounds.

So this means that we can't address the upper half of the memory space or what am I missing here? This seems like a major deficiency?

@efriedma-quic
Copy link
Collaborator

Oh, right, your testcase doesn't use i8. In that case, I guess it's supposed to work...?

That said, this sort of thing gets very little testing. You need a single allocation (alloca/malloc/etc.) larger than half the address-space to run into this problem, and that's basically impossible on a normal desktop or server.

@mikaelholmen
Copy link
Collaborator Author

Oh, right, your testcase doesn't use i8. In that case, I guess it's
supposed to work...?

You tell me :)

So you would indeed consider this a bug in GVN then?

I don't know, but as I wrote, instcombine changes the index type to the width of the pointer, so if it's supposed to work with i16 129 then I suppose instcombine has a bug too then? Should I write a PR on that too maybe...

And langref says

"If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer."

but it does so in a paragraph talking about non-inbounds geps, so I'm not sure if it's also relevant when we use inbounds?

"If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer."

But here it really sounds like it doesn't matter if we use larger index types, they will be truncated (just like instcombine does) anyway? So then this is no bug?

That said, this sort of thing gets very little testing. You need a single
allocation (alloca/malloc/etc.) larger than half the address-space to run
into this problem, and that's basically impossible on a normal desktop or
server.

Yes, I understand it's not very common. It is however possible in my world :/

@efriedma-quic
Copy link
Collaborator

Yes, GVN bug (or more likely, a bug in the alias analysis used by GVN).

So using larger types on the offsets doesn't matter?

The LLVM rules for GEPs without inbounds are basically this: a GEP is lowered to arithmetic in the size of the pointer. The indexes are forced to the correct side, and overflow isn't relevant. The only restriction is that you can only dereference points based on the correct object (http://llvm.org/docs/LangRef.html#pointeraliasing).

inbounds introduces extra restriction: essentially, overflow in the GEP math is undefined behavior. What exactly "overflow" means is kind of complicated if you're dealing with objects larger than half the address-space; I'm not sure the way LangRef describes it really matches reality.

@mikaelholmen
Copy link
Collaborator Author

Yes, GVN bug (or more likely, a bug in the alias analysis used by GVN).

I haven't totally gone to the bottom with this and understood everything but at least I've found who is saying that the load and the store don't alias for the non-inbounds and the inbounds cases.

For the non-inbounds case we come to BasicAAResult::aliasGEP and there is code that does a lot of fiddling with "Modulo" and then we get to:

// If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
// If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
// don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
if (AllPositive && GEP1BaseOffset > 0 && V2Size <= (uint64_t)GEP1BaseOffset)
  return NoAlias;

and this check determines that there is no aliasing and we're done.

For inbounds we end up in BasicAAResult::DecomposeGEPExpression which does

// Take care of wrap-arounds
if (GepHasConstantOffset) {
  Decomposed.StructOffset =
      adjustToPointerSize(Decomposed.StructOffset, PointerSize);
  Decomposed.OtherOffset =
      adjustToPointerSize(Decomposed.OtherOffset, PointerSize);
}

which effectively wraps the i16 129 offset to i8 -127. Then just after that
in BasicAAResult::aliasGEP we do:

// If the GEP's offset relative to its base is such that the base would
// fall below the start of the object underlying V2, then the GEP and V2
// cannot alias.
if (!GEP1MaxLookupReached && !GEP2MaxLookupReached &&
isGEPBaseAtNegativeOffset(GEP1, DecompGEP1, DecompGEP2, V2Size))
return NoAlias;

And then of course returns NoAlias.

@efriedma-quic
Copy link
Collaborator

For the non-inbounds case we come to BasicAAResult::aliasGEP and there is
code that does a lot of fiddling with "Modulo" and then we get to:

// If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
// If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
// don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
if (AllPositive && GEP1BaseOffset > 0 && V2Size <=

(uint64_t)GEP1BaseOffset)
return NoAlias;

and this check determines that there is no aliasing and we're done.

It looks like this code isn't taking wrapping into account correctly.

For inbounds we end up in BasicAAResult::DecomposeGEPExpression which does

// Take care of wrap-arounds
if (GepHasConstantOffset) {
  Decomposed.StructOffset =
      adjustToPointerSize(Decomposed.StructOffset, PointerSize);
  Decomposed.OtherOffset =
      adjustToPointerSize(Decomposed.OtherOffset, PointerSize);
}

which effectively wraps the i16 129 offset to i8 -127. Then just after that
in BasicAAResult::aliasGEP we do:

// If the GEP's offset relative to its base is such that the base would
// fall below the start of the object underlying V2, then the GEP and V2
// cannot alias.
if (!GEP1MaxLookupReached && !GEP2MaxLookupReached &&
isGEPBaseAtNegativeOffset(GEP1, DecompGEP1, DecompGEP2, V2Size))
return NoAlias;

And then of course returns NoAlias.

I'm not sure how we should be handling this, just glancing quickly at the code. Maybe BasicAAResult::isGEPBaseAtNegativeOffset should be checking the size of the object in question (and just return false if it's too large)?

@hfinkel
Copy link
Collaborator

hfinkel commented Sep 2, 2017

For the non-inbounds case we come to BasicAAResult::aliasGEP and there is
code that does a lot of fiddling with "Modulo" and then we get to:

// If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
// If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
// don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
if (AllPositive && GEP1BaseOffset > 0 && V2Size <=

(uint64_t)GEP1BaseOffset)
return NoAlias;

and this check determines that there is no aliasing and we're done.

It looks like this code isn't taking wrapping into account correctly.

What are the values of these various variables when this returns NoAlias?

For inbounds we end up in BasicAAResult::DecomposeGEPExpression which does

// Take care of wrap-arounds
if (GepHasConstantOffset) {
  Decomposed.StructOffset =
      adjustToPointerSize(Decomposed.StructOffset, PointerSize);
  Decomposed.OtherOffset =
      adjustToPointerSize(Decomposed.OtherOffset, PointerSize);
}

which effectively wraps the i16 129 offset to i8 -127. Then just after that
in BasicAAResult::aliasGEP we do:

// If the GEP's offset relative to its base is such that the base would
// fall below the start of the object underlying V2, then the GEP and V2
// cannot alias.
if (!GEP1MaxLookupReached && !GEP2MaxLookupReached &&
isGEPBaseAtNegativeOffset(GEP1, DecompGEP1, DecompGEP2, V2Size))
return NoAlias;

And then of course returns NoAlias.

I'm not sure how we should be handling this, just glancing quickly at the
code. Maybe BasicAAResult::isGEPBaseAtNegativeOffset should be checking the
size of the object in question (and just return false if it's too large)?

Or it just needs to be rewritten to use an unsigned comparison. Right now it does:

return (GEPBaseOffset >= ObjectBaseOffset + (int64_t)ObjectAccessSize);

would it work correctly if we wrote instead?

return (GEPBaseOffset - ObjectBaseOffset) >= ObjectAccessSize;

@mikaelholmen
Copy link
Collaborator Author

Hi Eli and Hal,

Thank you for your replies!

For the non-inbounds case we come to BasicAAResult::aliasGEP and there is
code that does a lot of fiddling with "Modulo" and then we get to:

// If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
// If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
// don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
if (AllPositive && GEP1BaseOffset > 0 && V2Size <=

(uint64_t)GEP1BaseOffset)
return NoAlias;

and this check determines that there is no aliasing and we're done.

It looks like this code isn't taking wrapping into account correctly.

What are the values of these various variables when this returns NoAlias?

1361 if (AllPositive && GEP1BaseOffset > 0 && V2Size <= (uint64_t)GEP1BaseOffset)
(gdb) p AllPositive
$1 = true
(gdb) p GEP1BaseOffset
$2 = 127
(gdb) p V2Size
$3 = 1
(gdb) p GEP1BaseOffset
$4 = 127
(gdb)

GEP1BaseOffset is 127 since it has earlier been calculated as

GEP1BaseOffset -= GEP2BaseOffset;

where GEP1BaseOffset was 0 and GEP2BaseOffset was -127 (since DecomposeGEPExpression wrapped 129 into -127).

For inbounds we end up in BasicAAResult::DecomposeGEPExpression which does

// Take care of wrap-arounds
if (GepHasConstantOffset) {
  Decomposed.StructOffset =
      adjustToPointerSize(Decomposed.StructOffset, PointerSize);
  Decomposed.OtherOffset =
      adjustToPointerSize(Decomposed.OtherOffset, PointerSize);
}

which effectively wraps the i16 129 offset to i8 -127. Then just after that
in BasicAAResult::aliasGEP we do:

// If the GEP's offset relative to its base is such that the base would
// fall below the start of the object underlying V2, then the GEP and V2
// cannot alias.
if (!GEP1MaxLookupReached && !GEP2MaxLookupReached &&
isGEPBaseAtNegativeOffset(GEP1, DecompGEP1, DecompGEP2, V2Size))
return NoAlias;

And then of course returns NoAlias.

I'm not sure how we should be handling this, just glancing quickly at the
code. Maybe BasicAAResult::isGEPBaseAtNegativeOffset should be checking the
size of the object in question (and just return false if it's too large)?

Or it just needs to be rewritten to use an unsigned comparison. Right now it
does:

return (GEPBaseOffset >= ObjectBaseOffset + (int64_t)ObjectAccessSize);

would it work correctly if we wrote instead?

return (GEPBaseOffset - ObjectBaseOffset) >= ObjectAccessSize;

The compiler complains on this:

../lib/Analysis/BasicAliasAnalysis.cpp:1147:45: error: comparison of integers of different signs: 'long' and 'uint64_t' (aka 'unsigned long') [-Werror,-Wsign-compare]

But even if I add explicit casts to make the comparison in uin64_t it returns
true for me.

The involved variables are:
(gdb) p GEPBaseOffset
$11 = 0
(gdb) p ObjectBaseOffset
$12 = -127
(gdb) p ObjectAccessSize
$13 = 1

@bjope
Copy link
Collaborator

bjope commented Apr 11, 2019

https://reviews.llvm.org/D59648 claims to fix this. I haven't verified it myself.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@bjope
Copy link
Collaborator

bjope commented Aug 14, 2022

As pre previous comment this was solved by https://reviews.llvm.org/D59648 (the original reproducer from https://bugs.llvm.org/show_bug.cgi?id=34344 was added as a lit test). So I'll just close this as it seem to have been working for a couple of years now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:optimizations
Projects
None yet
Development

No branches or pull requests

5 participants