APFloat::fusedMultiplyAdd is broken, leading to incorrect constant folding for libc fmal calls. #21102

lhames · 2014-08-21T20:42:57Z


Bugzilla Link	20728
Resolution	FIXED
Resolved on	Nov 21, 2014 00:25
Version	trunk
OS	All
CC	@adibiagio,@dexonsmith,@gregbedwell,@hfinkel,@gottesmm,@rotateright

Extended Description

APFloat::multiplySignificand generates incorrect results when called with Addend != 0 (i.e. from APFloat::fusedMultiplyAdd). This results in incorrect results from libc fmal calls with constant arguments. I have tested this X86, but I believe it will hit any architecture that uses APFloat to handle long double constants.

The output is only incorrect for some constants.
E.g. fmal(3.0L, 4.0L, 5.0L) == 1.0 instead of 17.0L.
However fmal(-3.0L, 4.0L, 5.0L) == -7.0L as expected.

Compiling the following program with 'clang -O3 -o foo foo.c' on X86 demonstrates the bug:

#include <math.h>
#include <stdio.h>

int main(void) {
printf("%Lf\n", fmal(3.0L, 4.0L, 5.0L));
return 0;
}

The misbehavior for the specific constants above was introduced by r181715. The commit message for that revision was: "Fix a bug that APFloat::fusedMultiplyAdd() mistakenly evaluate "14.5f * -14.5f + 225.0f" to 225.0f.". I have confirmed that rolling back to r181714 causes "fmal(3.0L, 4.0L, 5.0L)" to generate correct results, but breaks "14.5f * -14.5f + 225.0f".

lhames · 2014-08-21T20:42:57Z

assigned to @lhames

lhames · 2014-09-05T22:18:12Z

Reproduced locally. I'll investigate shortly.

lhames · 2014-09-05T22:18:40Z

Totally the wrong bug. Disregard my last comment. :)

lhames · 2014-11-15T01:44:21Z

APFloat::fusedMultiplyAdd failure case
Trivial test case derived from:

fmal(1.0L, 1.0L, 3.0L)

To see the failure run:

opt -constprop -S -o - testcase.ll

Note The constant to be returned should be 4.0L (0xK40018000000000000000), but instead it's 0.0L (0xK00000000000000000000).

lhames · 2014-11-17T05:24:26Z

I've tracked this down to APFloat::multiplySignificand. It's due to the way APFloat computes the number of "parts" (uint64 buckets) required to hold the significand. Most methods seem to reference APFloat::partCount() to determine part count:

unsigned int
APFloat::partCount() const
{
return partCountForBits(semantics->precision + 1);
}

static inline unsigned int
partCountForBits(unsigned int bits)
{
return ((bits) + integerPartWidth - 1) / integerPartWidth;
}

However, in multiplySignificand we have:

newPartsCount = partCountForBits(precision * 2);

For an fp80, which has 64 bits of significand, the partCount will return 3 (i.e. 3 x uint64s required for the significand), whereas multiplySignificand will compute newPartsCount = 2.

Both answers are small enough that for the duration of multiplySignificand, the significand is held in a 4 x uint64 scratch space, which is more than enough to never overflow.

In the block that adds the addend to the intermediate result (look for "if (addend && addend->isNonZero())" ...) we call

lost_fraction = addOrSubtractSignificand(extendedAddend, false);

Then addOrSubtractSignificand calls:

return APInt::tcAdd(parts, rhs.significandParts(), 0, partCount());

But now we're telling APInt to do an addition with a partCount of 3, rather than two. For the above constants, fmal(1.0L, 1.0L, 3.0L), The result of the immediate multiplication is 0x0.8p1, and we have 3.0L = 0x1.8p1, so the resulting significands are (both 128 bits wide)

010...0

110...0

This addition overflows into bit 128, but APInt::tcAdd doesn't treat this as an overflow, because we told it that these ints were 3 x 64 bits wide. On return to multiplySignificand lost_fraction is therefore 0, but all subsequent calculations in multiplySignificand work the the internally calculated numPartsCount, so the overflowed bits are silently dropped.

So - that's what's going wrong. The next question is how it should be fixed. It seems very strange to me that so many internal calculations in APFloat are using an extra part. I tried dropping the " + ' in partCount(), i.e:

unsigned int
APFloat::partCount() const
{
return partCountForBits(semantics->precision);
}

This caused a huge number of regression failures that seem to be predominantly caused by use of default initialised (empty) APFloats(), which evidently still want at least one "part" allocated.

Switching to:

unsigned int
APFloat::partCount() const
{
return partCountForBits(std::max(semantics->precision, 1));
}

fixed most of these bugs, but there are still a few failures left.

If this is the right way forward (i.e. if APFloat should only be allocating enough bits to hold the significand and no more) then I just need to track down the remaining bugs. If, on the other hand, APFloat is supposed to allocate an extra part (for overflow?), then we need to teach multiplySignificand to be aware of that when doing the addition.

Any APFloat experts out there want to weigh in on which approach is the right one?

lhames · 2014-11-19T02:47:42Z

Fix for APFloat::multiplySignificand
Attached is a patch that fixes the testcase I posted earlier. It passes all regression tests and the nightly test suite on x86.

The approach I took was to reserve an extra bit for the significand in multiplySignificand into which the add operation can overflow. (i.e. the intermediate significand is held in 2 * precision + 1 bits, where before it had only been 2 * precision bits).

This patch will be submitted to llvm-commits. Commentary should be made there, at least for now, in order to keep the discussion open to people who aren't specifically following this bug.

lhames · 2014-11-19T21:31:31Z

Fixed in r222374.

llvmbot · 2014-11-21T08:25:13Z

*** Bug llvm/llvm-bugzilla-archive#21623 has been marked as a duplicate of this bug. ***

llvmbot · 2021-11-26T19:50:09Z

mentioned in issue llvm/llvm-bugzilla-archive#21623

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APFloat::fusedMultiplyAdd is broken, leading to incorrect constant folding for libc fmal calls. #21102

APFloat::fusedMultiplyAdd is broken, leading to incorrect constant folding for libc fmal calls. #21102

lhames commented Aug 21, 2014

lhames commented Aug 21, 2014

lhames commented Sep 5, 2014

lhames commented Sep 5, 2014

lhames commented Nov 15, 2014

lhames commented Nov 17, 2014

lhames commented Nov 19, 2014

lhames commented Nov 19, 2014

llvmbot commented Nov 21, 2014

llvmbot commented Nov 26, 2021

APFloat::fusedMultiplyAdd is broken, leading to incorrect constant folding for libc fmal calls. #21102

APFloat::fusedMultiplyAdd is broken, leading to incorrect constant folding for libc fmal calls. #21102

Comments

lhames commented Aug 21, 2014

Extended Description

lhames commented Aug 21, 2014

lhames commented Sep 5, 2014

lhames commented Sep 5, 2014

lhames commented Nov 15, 2014

lhames commented Nov 17, 2014

lhames commented Nov 19, 2014

lhames commented Nov 19, 2014

llvmbot commented Nov 21, 2014

llvmbot commented Nov 26, 2021