Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

Closed
nunoplopes opened this issue Jul 21, 2019 · 4 comments
Closed

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

nunoplopes opened this issue Jul 21, 2019 · 4 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@nunoplopes
Copy link
Member

Bugzilla Link 42699
Resolution FIXED
Resolved on Oct 17, 2019 02:12
Version trunk
OS All
CC @aqjune,@preames,@RKSimon,@miyuki,@regehr,@sanjoy

Extended Description

The following transformation in test/Transforms/InstCombine/sub.ll exposes a bug in EmitGEPOffset() that dates back to 2011.
It assumes that 'gep inbounds' guarantees that %i * 4 doesn't overflow unsigned, which isn't true since %i could be negative.
Changing nuw to nsw makes it correct.

Alive's report:

define i64 @​test30(* %foo, i64 %i, i64 %j) {
%bit = bitcast * %foo to *
%gep1 = gep inbounds * %bit, 4 x i64 %i
%gep2 = gep inbounds * %foo, 1 x i64 %j
%cast1 = ptrtoint * %gep1 to i64
%cast2 = ptrtoint * %gep2 to i64
%sub = sub i64 %cast1, %cast2
ret i64 %sub
}
=>
define i64 @​test30(* %foo, i64 %i, i64 %j) {
%gep1.idx = shl nuw i64 %i, 2
%1 = sub i64 %gep1.idx, %j
ret i64 %1
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:

  • %foo = pointer(non-local, block_id=0, offset=8)
    i64 %i = -1
    i64 %j = 0

Source:

  • %bit = pointer(non-local, block_id=0, offset=8)
  • %gep1 = pointer(non-local, block_id=0, offset=4)
  • %gep2 = pointer(non-local, block_id=0, offset=8)
    i64 %sub = -4

Target:
i64 %gep1.idx = poison
i64 %1 = poison

Source value: -4
Target value: poison

@llvmbot
Copy link
Collaborator

llvmbot commented Aug 22, 2019

I hit this problem using an out of tree backend.
A single libcxx test related to std::vector was failing.

@preames
Copy link
Collaborator

preames commented Aug 23, 2019

The wording from the LangRef is bit unclear about this case.

"If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise."

"If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer. The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage. See the Pointer Aliasing Rules section for more information."

What does "infinitely precise signed arithmetic" mean? Does it mean interpretation as abstract integers? Or something more restrictive?

Secondly, we never actually describe how to compute an inbounds GEP. :) I'm pretty sure the second paragraph should apply to all geps, not just inbound ones. It feels very strange to have any difference in address computation between the inbounds and not-inbounds variants.

Looking at the implementation doesn't add much clarity. SelectionDAG interprets inbounds as meaning that only the additions (not the multiplications) implied by the addressing are NUW if offsets can be proven non-negative. The code in Analysis/Local.h EmitGEPOffset inteprets the NUW as applying to the muls and nt the adds. So, no consistency there...

Anyone want to make a guess as to what the LangRef should say?

@nunoplopes
Copy link
Member Author

Given the lowering done and that indexes and considered signed integers, my take on the semantics for inbounds is:

  • idx * sizeof(*ptr) cannot overflow signed (NSW)
  • ptr + (idx * sizeof(*ptr)) cannot overflow signed (NSW)

LLVM assumes that objsize it at most half of the address space (in several places, including here). So none of the above constraints limits expressiblility of valid programs.
The infinite precision argument is not needed because each of the add/mul operations can be done with the same bitwidth as the pointer without loss of precision. E.g. for 64 bits we have -2^63 <= idx * sizeof(*ptr) <= 2^63 (well, technically we cannot have 1 past the object if we allow obj size to go up to 2^63-1; it needs to be 2^63-2)

@nunoplopes
Copy link
Member Author

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants