EmitGEPOffset() incorrectly adds NUW to multiplications #42044

nunoplopes · 2019-07-21T10:27:39Z


Bugzilla Link	42699
Resolution	FIXED
Resolved on	Oct 17, 2019 02:12
Version	trunk
OS	All
CC	@aqjune,@preames,@RKSimon,@miyuki,@regehr,@sanjoy

Extended Description

The following transformation in test/Transforms/InstCombine/sub.ll exposes a bug in EmitGEPOffset() that dates back to 2011.
It assumes that 'gep inbounds' guarantees that %i * 4 doesn't overflow unsigned, which isn't true since %i could be negative.
Changing nuw to nsw makes it correct.

Alive's report:

define i64 @test30(* %foo, i64 %i, i64 %j) {
%bit = bitcast * %foo to *
%gep1 = gep inbounds * %bit, 4 x i64 %i
%gep2 = gep inbounds * %foo, 1 x i64 %j
%cast1 = ptrtoint * %gep1 to i64
%cast2 = ptrtoint * %gep2 to i64
%sub = sub i64 %cast1, %cast2
ret i64 %sub
}
=>
define i64 @test30(* %foo, i64 %i, i64 %j) {
%gep1.idx = shl nuw i64 %i, 2
%1 = sub i64 %gep1.idx, %j
ret i64 %1
}
Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:

%foo = pointer(non-local, block_id=0, offset=8)
i64 %i = -1
i64 %j = 0

Source:

%bit = pointer(non-local, block_id=0, offset=8)
%gep1 = pointer(non-local, block_id=0, offset=4)
%gep2 = pointer(non-local, block_id=0, offset=8)
i64 %sub = -4

Target:
i64 %gep1.idx = poison
i64 %1 = poison

Source value: -4
Target value: poison

llvmbot · 2019-08-22T20:51:07Z

I hit this problem using an out of tree backend.
A single libcxx test related to std::vector was failing.

preames · 2019-08-23T19:02:01Z

The wording from the LangRef is bit unclear about this case.

"If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. The only in bounds address for a null pointer in the default address-space is the null pointer itself. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise."

"If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two’s complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer. The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage. See the Pointer Aliasing Rules section for more information."

What does "infinitely precise signed arithmetic" mean? Does it mean interpretation as abstract integers? Or something more restrictive?

Secondly, we never actually describe how to compute an inbounds GEP. :) I'm pretty sure the second paragraph should apply to all geps, not just inbound ones. It feels very strange to have any difference in address computation between the inbounds and not-inbounds variants.

Looking at the implementation doesn't add much clarity. SelectionDAG interprets inbounds as meaning that only the additions (not the multiplications) implied by the addressing are NUW if offsets can be proven non-negative. The code in Analysis/Local.h EmitGEPOffset inteprets the NUW as applying to the muls and nt the adds. So, no consistency there...

Anyone want to make a guess as to what the LangRef should say?

nunoplopes · 2019-08-27T13:20:06Z

Given the lowering done and that indexes and considered signed integers, my take on the semantics for inbounds is:

idx * sizeof(*ptr) cannot overflow signed (NSW)
ptr + (idx * sizeof(*ptr)) cannot overflow signed (NSW)

LLVM assumes that objsize it at most half of the address space (in several places, including here). So none of the above constraints limits expressiblility of valid programs.
The infinite precision argument is not needed because each of the add/mul operations can be done with the same bitwidth as the pointer without loss of precision. E.g. for 64 bits we have -2^63 <= idx * sizeof(*ptr) <= 2^63 (well, technically we cannot have 1 past the object if we allow obj size to go up to 2^63-1; it needs to be 2^63-2)

nunoplopes · 2019-10-17T09:12:44Z

Fixed in https://reviews.llvm.org/rGb6534b2a26fa94e4d09d271faf538b1e4b19ab5d
Thanks!

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

nunoplopes commented Jul 21, 2019

llvmbot commented Aug 22, 2019

preames commented Aug 23, 2019

nunoplopes commented Aug 27, 2019

nunoplopes commented Oct 17, 2019

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

EmitGEPOffset() incorrectly adds NUW to multiplications #42044

Comments

nunoplopes commented Jul 21, 2019

Extended Description

llvmbot commented Aug 22, 2019

preames commented Aug 23, 2019

nunoplopes commented Aug 27, 2019

nunoplopes commented Oct 17, 2019