45152 – Instcombine incorrectly transforms store i64 -> store double

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 45152 - Instcombine incorrectly transforms store i64 -> store double

Summary: Instcombine incorrectly transforms store i64 -> store double

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:	miscompilation

Depends on:
Blocks:

Reported:	2020-03-09 08:33 PDT by Nuno Lopes
Modified:	2021-03-20 10:14 PDT (History)
CC List:	14 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nuno Lopes 2020-03-09 08:33:20 PDT

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:
@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %__constexpr_0 = bitcast * @Q to *
  %load = load i64, * %__constexpr_0, align 8
  br label %end

%end:
  %phi = phi i64 [ 0, %entry ], [ %load, %if ]
  store i64 %phi, * %p, align 8
  %cast = bitcast i64 %phi to double
  ret double %cast
}
=>
@Q = global 8 bytes, align 8

define double @test(i1 %c, * %p) {
%entry:
  br i1 %c, label %if, label %end

%if:
  %load1 = load double, * @Q, align 8
  br label %end

%end:
  %0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
  %1 = bitcast * %p to *
  store double %0, * %1, align 8
  ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)
* %p = pointer(non-local, block_id=2, offset=64)

Source:
* %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
i64 %load = #x7ff0000001000000 (9218868437244182528)
i64 %phi = #x7ff0000001000000 (9218868437244182528)
double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN
* %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

Comment 1 Roman Lebedev 2020-03-09 08:42:35 PDT

(In reply to Nuno Lopes from comment #0)
> NaN values can be canonicalized by
> the CPU so the store double can write a different bit-pattern than store i64.

Citation needed?

Comment 2 Eli Friedman 2020-03-09 09:34:22 PDT

I think the only target LLVM supports that has loads that canonicalize floats is 32-bit x86 (using non-SSE operations).

How we want to deal with that case is sort of complicated.  On the one hand, we don't want to forbid hoisting loads with float type.  On the other hand, it's impossible for a function with a double return type to return an SNaN in the standard x86 calling convention, so we'd need some special rule for that.

I'd prefer to say that we have to preserve SNaN patterns across non-arithmetic operations, and make the x86 backend deal with whatever complexity results from that.

Comment 3 Nuno Lopes 2020-03-09 09:45:27 PDT

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.
So if we decide that a specific bit pattern must be preserved across NaN we need to work out the details. We would also need to specify what's the bit pattern produced by each operations that returns NaN. E.g. what's the value of (bitcast-to-i32 (fdiv 1.0 0.0))? It's target specific the very least, and in Alive2 is a non-deterministic value that covers all possible bit-patterns for NaN.

Comment 4 JF Bastien 2020-03-09 12:34:06 PDT

This seems like a bad bug, and I've seen something like it trigger before in GCC (funnily enough, GCC miscompiled clang):
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

It seems like this transform of i64 -> double shouldn't occur if the target canonicalizes NaNs. I don't think we need a particularly complex solution for this.

Comment 5 Eli Friedman 2020-03-09 13:45:48 PDT

I guess there are two models you could reason about. The model I was thinking about works something like the following:

1. Non-arithmetic operations (load/store/select/phi/call without fast-math flags) preserve the exact bit pattern of a value, even if it's a NaN.
2. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.


I guess there's an alternative model, like you're proposing:

1. NaN values have an unspecified bit pattern.
2. Storing/bitcasting a NaN produces a NaN bit pattern, but the chosen NaN pattern is unspecified, and may be different for each operation.


Really, the semantic results of either rule is pretty similar.  It's just a question of *when* the NaN pattern is fixed: when an instruction produces the value, or when a store/bitcast forces it to be fixed.

It's probably worth noting that any rule that means a "load" is allowed to raise an FP exception is going to make strict fp support a lot more complicated.

Comment 6 Nuno Lopes 2020-03-10 03:01:33 PDT

I agree with your summary, Eli. I would just add that in semantics 1), if an operation produces NaN when the operands are not, it produces an unspecified NaN bit-pattern.

The main reason I prefer to consider NaN to have an unspecified bit-pattern is because it makes it easier to support chips where load of an integer is not the same as load of a float. If this is not an issue, then we should document it and change Alive2 to match the semantics. And then remove any optimization that doesn't respect those semantics.

Bottom line: is the optimization present in this bug report important or can it be removed?

Comment 7 Eli Friedman 2020-03-11 12:51:21 PDT

This particular optimization probably isn't that valuable on its own; I mean, we want to avoid bitcasts where we can, but there are other ways of addressing this particular situation on most targets.

I have three concerns with making NaN values indeterminate in registers:

1. We have a bunch of optimizations over memory operations that are essentially type-agnostic: they don't care what type is loaded/stored.  We'd have to add a bunch of new checks.
2. Destroying NaN-patterns would be very unfriendly to SIMD intrinsics; for example, if we decided that you can't use _mm_shuffle_ps on integer vectors.
3. Allowing LLVM IR loads to raise FP exceptions probably isn't compatible with strict fp; we would need strictfp load intrinsics, and I don't think anyone wants to deal with that. If we do in fact forbid loads from raising an FP exception, every target needs to support some way to lower FP loads in a way that doesn't raise an exception.  And if we have that support anyway, we can just use it unconditionally.  (This should be possible on any target, at some cost to performance.)

Comment 8 Nuno Lopes 2020-03-12 05:47:53 PDT

Ok, give me some time and I'll compile a list of optimizations that are broken if we assume that specific NaN patterns get propagated.

Comment 9 Nuno Lopes 2020-03-30 09:04:28 PDT

Ok, I've implemented the semantics we discussed here in Alive2 for testing. This keeps all values in bit-vectors all the time, except when there's a float operation. Operands are converted from a bit-vector into float type, the operation is performed, and the result converted back to a bit-vector pattern.
The conversion from float->bit-vector produces all bit patterns for NaN.

The result is that there is only 1 regression in the LLVM test suite. On the other hand, only 2 tests get fixed.

The new test failure is this:
define i64 @All11(i64 %in) {
; CHECK-NEXT:    ret i64 0
  %out = and i64 %in, xor (i64 bitcast (<2 x float> bitcast (i64 -1 to <2 x float>) to i64), i64 -1)
  ret i64 %out
}

The problem for this test is that "bitcast (i64 -1 to <2 x float>)" shows up as llvm::ConstantFP, and thus Alive2's interpretation of the IR above is:
define i64 @All11(i64 %in) {
  %__constexpr_1 = bitcast <2 x float> { -nan, -nan } to i64
  %__constexpr_0 = xor i64 %__constexpr_1, -1
  %out = and i64 %in, %__constexpr_0
  ret i64 %out
}

So the -1 bit pattern disappears. So not sure there's a way to distinguish a bit-pattern that happens to represent NaN from a float NaN (which we interpret as any bit-pattern that represents NaN).

Anyway, assuming we reach a conclusion on what to do with this test, the question is then about the backends. Are all backends ok for LLVM to assume that read/write of data into float registers doesn't change the bits?
(this is not true in one of our chips, but I think we can live with this)

Comment 10 Eli Friedman 2020-03-30 10:17:14 PDT

Probably under any model where floating-point "operations" are special, bitcast shouldn't count as a floating-point operation; otherwise, we lose the equivalence between bitcast and store+load.

Again, I think the only in-tree target with non-bit-preserving load/store operations is 32-bit x86 (using x87 operations to load float/double values; oddly, long double load/store are bit-preserving).

Comment 11 Nuno Lopes 2020-04-09 02:36:08 PDT

There's a related test failure in Transforms/InstCombine/minmax-fold.ll:
define <4 x i32> @bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %t3 = bitcast <2 x i64> %a to <4 x i32>
  %t4 = bitcast <2 x i64> %b to <4 x i32>
  %t5 = select <4 x i1> %t2, <4 x i32> %t3, <4 x i32> %t4
  ret <4 x i32> %t5
}
=>
define <4 x i32> @bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x float>
  %t1 = bitcast <2 x i64> %b to <4 x float>
  %t2 = fcmp olt <4 x i1> %t1, %t0
  %1 = select <4 x i1> %t2, <4 x float> %t0, <4 x float> %t1
  %t5 = bitcast <4 x float> %1 to <4 x i32>
  ret <4 x i32> %t5
}


In source, there's only a bitcast between integers, while in target there's bitcast of int->float->int.  If the bit representation in NaN, then this roundtrip may change the bits (e.g. canonicalize the NaN).

Comment 12 Ralf Jung 2020-06-15 23:55:01 PDT

In Rust, we are also struggling with what exactly LLVM's NaN semantics are: https://github.com/rust-lang/rust/issues/73328. Would be good to get a more precise documentation of those semantics -- though it seems that's still work-in-progress if I understand the discussion here correctly?

> I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.

What was the semantics you implemented first, i.e., how is it different from the adjusted one you described later?  Is it what Eli calls the "alternative model"?  So, float/double LLVM variables actually have a different range of possible values than a memory location, and conversion happens on load/store/bitcast?

> 2. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

Note that this means that these operations are non-deterministic! So, duplicating them would be an illegal transformation, similar to "freeze".  Is that something LLVM is treating properly?

(In the "alternative model", likewise stores/bitcasts would be non-deterministic and must not be duplicated.)

Comment 13 Nuno Lopes 2020-06-16 03:52:50 PDT

I see 2 models:
 1) when you move a value in/out of a float register, the CPU canonicalizes the NaN value, so the original bit pattern may not be preserved. This is the semantics implemented ATM in Alive. When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

 2) CPUs are not allowed to change the NaN bits, hence they preserve the bit-pattern of the input. The non-determinism is moved to the float operations: if they produce NaN, they make all their NaN bits non-deterministic.


Each has pros and cons, as usual. The first one is required if people care about such processors. The second one allows some optimizations like the ones shown in this bug report.

Comment 14 Ralf Jung 2020-06-16 10:13:43 PDT

> When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

(The store would be the problem with duplication here, not the load.)
Oh I see, so basically there is a fixed global parameter of the semantics which determines the NaN bit pattern? Or is it allowed to depend on some other factors?

The second one seems to disallow e.g. turning "float f = x+x; if ((int)f == (int)f)" into "if ((int)(x+x) == (int)(x+x))" as that would duplicate the non-determinism. This particular transformation likely makes little sense, but there might be other conditions under which recomputing a result could be beneficial.