LLVM API Documentation
00001 //== llvm/Support/APFloat.h - Arbitrary Precision Floating Point -*- C++ -*-==// 00002 // 00003 // The LLVM Compiler Infrastructure 00004 // 00005 // This file is distributed under the University of Illinois Open Source 00006 // License. See LICENSE.TXT for details. 00007 // 00008 //===----------------------------------------------------------------------===// 00009 // 00010 // This file declares a class to represent arbitrary precision floating 00011 // point values and provide a variety of arithmetic operations on them. 00012 // 00013 //===----------------------------------------------------------------------===// 00014 00015 /* A self-contained host- and target-independent arbitrary-precision 00016 floating-point software implementation. It uses bignum integer 00017 arithmetic as provided by static functions in the APInt class. 00018 The library will work with bignum integers whose parts are any 00019 unsigned type at least 16 bits wide, but 64 bits is recommended. 00020 00021 Written for clarity rather than speed, in particular with a view 00022 to use in the front-end of a cross compiler so that target 00023 arithmetic can be correctly performed on the host. Performance 00024 should nonetheless be reasonable, particularly for its intended 00025 use. It may be useful as a base implementation for a run-time 00026 library during development of a faster target-specific one. 00027 00028 All 5 rounding modes in the IEEE-754R draft are handled correctly 00029 for all implemented operations. Currently implemented operations 00030 are add, subtract, multiply, divide, fused-multiply-add, 00031 conversion-to-float, conversion-to-integer and 00032 conversion-from-integer. New rounding modes (e.g. away from zero) 00033 can be added with three or four lines of code. 00034 00035 Four formats are built-in: IEEE single precision, double 00036 precision, quadruple precision, and x87 80-bit extended double 00037 (when operating with full extended precision). Adding a new 00038 format that obeys IEEE semantics only requires adding two lines of 00039 code: a declaration and definition of the format. 00040 00041 All operations return the status of that operation as an exception 00042 bit-mask, so multiple operations can be done consecutively with 00043 their results or-ed together. The returned status can be useful 00044 for compiler diagnostics; e.g., inexact, underflow and overflow 00045 can be easily diagnosed on constant folding, and compiler 00046 optimizers can determine what exceptions would be raised by 00047 folding operations and optimize, or perhaps not optimize, 00048 accordingly. 00049 00050 At present, underflow tininess is detected after rounding; it 00051 should be straight forward to add support for the before-rounding 00052 case too. 00053 00054 The library reads hexadecimal floating point numbers as per C99, 00055 and correctly rounds if necessary according to the specified 00056 rounding mode. Syntax is required to have been validated by the 00057 caller. It also converts floating point numbers to hexadecimal 00058 text as per the C99 %a and %A conversions. The output precision 00059 (or alternatively the natural minimal precision) can be specified; 00060 if the requested precision is less than the natural precision the 00061 output is correctly rounded for the specified rounding mode. 00062 00063 It also reads decimal floating point numbers and correctly rounds 00064 according to the specified rounding mode. 00065 00066 Conversion to decimal text is not currently implemented. 00067 00068 Non-zero finite numbers are represented internally as a sign bit, 00069 a 16-bit signed exponent, and the significand as an array of 00070 integer parts. After normalization of a number of precision P the 00071 exponent is within the range of the format, and if the number is 00072 not denormal the P-th bit of the significand is set as an explicit 00073 integer bit. For denormals the most significant bit is shifted 00074 right so that the exponent is maintained at the format's minimum, 00075 so that the smallest denormal has just the least significant bit 00076 of the significand set. The sign of zeroes and infinities is 00077 significant; the exponent and significand of such numbers is not 00078 stored, but has a known implicit (deterministic) value: 0 for the 00079 significands, 0 for zero exponent, all 1 bits for infinity 00080 exponent. For NaNs the sign and significand are deterministic, 00081 although not really meaningful, and preserved in non-conversion 00082 operations. The exponent is implicitly all 1 bits. 00083 00084 TODO 00085 ==== 00086 00087 Some features that may or may not be worth adding: 00088 00089 Binary to decimal conversion (hard). 00090 00091 Optional ability to detect underflow tininess before rounding. 00092 00093 New formats: x87 in single and double precision mode (IEEE apart 00094 from extended exponent range) (hard). 00095 00096 New operations: sqrt, IEEE remainder, C90 fmod, nextafter, 00097 nexttoward. 00098 */ 00099 00100 #ifndef LLVM_ADT_APFLOAT_H 00101 #define LLVM_ADT_APFLOAT_H 00102 00103 // APInt contains static functions implementing bignum arithmetic. 00104 #include "llvm/ADT/APInt.h" 00105 00106 namespace llvm { 00107 00108 /* Exponents are stored as signed numbers. */ 00109 typedef signed short exponent_t; 00110 00111 struct fltSemantics; 00112 class APSInt; 00113 class StringRef; 00114 00115 /* When bits of a floating point number are truncated, this enum is 00116 used to indicate what fraction of the LSB those bits represented. 00117 It essentially combines the roles of guard and sticky bits. */ 00118 enum lostFraction { // Example of truncated bits: 00119 lfExactlyZero, // 000000 00120 lfLessThanHalf, // 0xxxxx x's not all zero 00121 lfExactlyHalf, // 100000 00122 lfMoreThanHalf // 1xxxxx x's not all zero 00123 }; 00124 00125 class APFloat { 00126 public: 00127 00128 /* We support the following floating point semantics. */ 00129 static const fltSemantics IEEEhalf; 00130 static const fltSemantics IEEEsingle; 00131 static const fltSemantics IEEEdouble; 00132 static const fltSemantics IEEEquad; 00133 static const fltSemantics PPCDoubleDouble; 00134 static const fltSemantics x87DoubleExtended; 00135 /* And this pseudo, used to construct APFloats that cannot 00136 conflict with anything real. */ 00137 static const fltSemantics Bogus; 00138 00139 static unsigned int semanticsPrecision(const fltSemantics &); 00140 00141 /* Floating point numbers have a four-state comparison relation. */ 00142 enum cmpResult { 00143 cmpLessThan, 00144 cmpEqual, 00145 cmpGreaterThan, 00146 cmpUnordered 00147 }; 00148 00149 /* IEEE-754R gives five rounding modes. */ 00150 enum roundingMode { 00151 rmNearestTiesToEven, 00152 rmTowardPositive, 00153 rmTowardNegative, 00154 rmTowardZero, 00155 rmNearestTiesToAway 00156 }; 00157 00158 // Operation status. opUnderflow or opOverflow are always returned 00159 // or-ed with opInexact. 00160 enum opStatus { 00161 opOK = 0x00, 00162 opInvalidOp = 0x01, 00163 opDivByZero = 0x02, 00164 opOverflow = 0x04, 00165 opUnderflow = 0x08, 00166 opInexact = 0x10 00167 }; 00168 00169 // Category of internally-represented number. 00170 enum fltCategory { 00171 fcInfinity, 00172 fcNaN, 00173 fcNormal, 00174 fcZero 00175 }; 00176 00177 enum uninitializedTag { 00178 uninitialized 00179 }; 00180 00181 // Constructors. 00182 APFloat(const fltSemantics &); // Default construct to 0.0 00183 APFloat(const fltSemantics &, StringRef); 00184 APFloat(const fltSemantics &, integerPart); 00185 APFloat(const fltSemantics &, fltCategory, bool negative); 00186 APFloat(const fltSemantics &, uninitializedTag); 00187 APFloat(const fltSemantics &, const APInt &); 00188 explicit APFloat(double d); 00189 explicit APFloat(float f); 00190 APFloat(const APFloat &); 00191 ~APFloat(); 00192 00193 // Convenience "constructors" 00194 static APFloat getZero(const fltSemantics &Sem, bool Negative = false) { 00195 return APFloat(Sem, fcZero, Negative); 00196 } 00197 static APFloat getInf(const fltSemantics &Sem, bool Negative = false) { 00198 return APFloat(Sem, fcInfinity, Negative); 00199 } 00200 00201 /// getNaN - Factory for QNaN values. 00202 /// 00203 /// \param Negative - True iff the NaN generated should be negative. 00204 /// \param type - The unspecified fill bits for creating the NaN, 0 by 00205 /// default. The value is truncated as necessary. 00206 static APFloat getNaN(const fltSemantics &Sem, bool Negative = false, 00207 unsigned type = 0) { 00208 if (type) { 00209 APInt fill(64, type); 00210 return getQNaN(Sem, Negative, &fill); 00211 } else { 00212 return getQNaN(Sem, Negative, 0); 00213 } 00214 } 00215 00216 /// getQNan - Factory for QNaN values. 00217 static APFloat getQNaN(const fltSemantics &Sem, bool Negative = false, 00218 const APInt *payload = 0) { 00219 return makeNaN(Sem, false, Negative, payload); 00220 } 00221 00222 /// getSNan - Factory for SNaN values. 00223 static APFloat getSNaN(const fltSemantics &Sem, bool Negative = false, 00224 const APInt *payload = 0) { 00225 return makeNaN(Sem, true, Negative, payload); 00226 } 00227 00228 /// getLargest - Returns the largest finite number in the given 00229 /// semantics. 00230 /// 00231 /// \param Negative - True iff the number should be negative 00232 static APFloat getLargest(const fltSemantics &Sem, bool Negative = false); 00233 00234 /// getSmallest - Returns the smallest (by magnitude) finite number 00235 /// in the given semantics. Might be denormalized, which implies a 00236 /// relative loss of precision. 00237 /// 00238 /// \param Negative - True iff the number should be negative 00239 static APFloat getSmallest(const fltSemantics &Sem, bool Negative = false); 00240 00241 /// getSmallestNormalized - Returns the smallest (by magnitude) 00242 /// normalized finite number in the given semantics. 00243 /// 00244 /// \param Negative - True iff the number should be negative 00245 static APFloat getSmallestNormalized(const fltSemantics &Sem, 00246 bool Negative = false); 00247 00248 /// getAllOnesValue - Returns a float which is bitcasted from 00249 /// an all one value int. 00250 /// 00251 /// \param BitWidth - Select float type 00252 /// \param isIEEE - If 128 bit number, select between PPC and IEEE 00253 static APFloat getAllOnesValue(unsigned BitWidth, bool isIEEE = false); 00254 00255 /// Profile - Used to insert APFloat objects, or objects that contain 00256 /// APFloat objects, into FoldingSets. 00257 void Profile(FoldingSetNodeID &NID) const; 00258 00259 /// @brief Used by the Bitcode serializer to emit APInts to Bitcode. 00260 void Emit(Serializer &S) const; 00261 00262 /// @brief Used by the Bitcode deserializer to deserialize APInts. 00263 static APFloat ReadVal(Deserializer &D); 00264 00265 /* Arithmetic. */ 00266 opStatus add(const APFloat &, roundingMode); 00267 opStatus subtract(const APFloat &, roundingMode); 00268 opStatus multiply(const APFloat &, roundingMode); 00269 opStatus divide(const APFloat &, roundingMode); 00270 /* IEEE remainder. */ 00271 opStatus remainder(const APFloat &); 00272 /* C fmod, or llvm frem. */ 00273 opStatus mod(const APFloat &, roundingMode); 00274 opStatus fusedMultiplyAdd(const APFloat &, const APFloat &, roundingMode); 00275 opStatus roundToIntegral(roundingMode); 00276 00277 /* Sign operations. */ 00278 void changeSign(); 00279 void clearSign(); 00280 void copySign(const APFloat &); 00281 00282 /* Conversions. */ 00283 opStatus convert(const fltSemantics &, roundingMode, bool *); 00284 opStatus convertToInteger(integerPart *, unsigned int, bool, roundingMode, 00285 bool *) const; 00286 opStatus convertToInteger(APSInt &, roundingMode, bool *) const; 00287 opStatus convertFromAPInt(const APInt &, bool, roundingMode); 00288 opStatus convertFromSignExtendedInteger(const integerPart *, unsigned int, 00289 bool, roundingMode); 00290 opStatus convertFromZeroExtendedInteger(const integerPart *, unsigned int, 00291 bool, roundingMode); 00292 opStatus convertFromString(StringRef, roundingMode); 00293 APInt bitcastToAPInt() const; 00294 double convertToDouble() const; 00295 float convertToFloat() const; 00296 00297 /* The definition of equality is not straightforward for floating point, 00298 so we won't use operator==. Use one of the following, or write 00299 whatever it is you really mean. */ 00300 bool operator==(const APFloat &) const LLVM_DELETED_FUNCTION; 00301 00302 /* IEEE comparison with another floating point number (NaNs 00303 compare unordered, 0==-0). */ 00304 cmpResult compare(const APFloat &) const; 00305 00306 /* Bitwise comparison for equality (QNaNs compare equal, 0!=-0). */ 00307 bool bitwiseIsEqual(const APFloat &) const; 00308 00309 /* Write out a hexadecimal representation of the floating point 00310 value to DST, which must be of sufficient size, in the C99 form 00311 [-]0xh.hhhhp[+-]d. Return the number of characters written, 00312 excluding the terminating NUL. */ 00313 unsigned int convertToHexString(char *dst, unsigned int hexDigits, 00314 bool upperCase, roundingMode) const; 00315 00316 /* Simple queries. */ 00317 fltCategory getCategory() const { return category; } 00318 const fltSemantics &getSemantics() const { return *semantics; } 00319 bool isZero() const { return category == fcZero; } 00320 bool isNonZero() const { return category != fcZero; } 00321 bool isNormal() const { return category == fcNormal; } 00322 bool isNaN() const { return category == fcNaN; } 00323 bool isInfinity() const { return category == fcInfinity; } 00324 bool isNegative() const { return sign; } 00325 bool isPosZero() const { return isZero() && !isNegative(); } 00326 bool isNegZero() const { return isZero() && isNegative(); } 00327 bool isDenormal() const; 00328 00329 APFloat &operator=(const APFloat &); 00330 00331 /// \brief Overload to compute a hash code for an APFloat value. 00332 /// 00333 /// Note that the use of hash codes for floating point values is in general 00334 /// frought with peril. Equality is hard to define for these values. For 00335 /// example, should negative and positive zero hash to different codes? Are 00336 /// they equal or not? This hash value implementation specifically 00337 /// emphasizes producing different codes for different inputs in order to 00338 /// be used in canonicalization and memoization. As such, equality is 00339 /// bitwiseIsEqual, and 0 != -0. 00340 friend hash_code hash_value(const APFloat &Arg); 00341 00342 /// Converts this value into a decimal string. 00343 /// 00344 /// \param FormatPrecision The maximum number of digits of 00345 /// precision to output. If there are fewer digits available, 00346 /// zero padding will not be used unless the value is 00347 /// integral and small enough to be expressed in 00348 /// FormatPrecision digits. 0 means to use the natural 00349 /// precision of the number. 00350 /// \param FormatMaxPadding The maximum number of zeros to 00351 /// consider inserting before falling back to scientific 00352 /// notation. 0 means to always use scientific notation. 00353 /// 00354 /// Number Precision MaxPadding Result 00355 /// ------ --------- ---------- ------ 00356 /// 1.01E+4 5 2 10100 00357 /// 1.01E+4 4 2 1.01E+4 00358 /// 1.01E+4 5 1 1.01E+4 00359 /// 1.01E-2 5 2 0.0101 00360 /// 1.01E-2 4 2 0.0101 00361 /// 1.01E-2 4 1 1.01E-2 00362 void toString(SmallVectorImpl<char> &Str, unsigned FormatPrecision = 0, 00363 unsigned FormatMaxPadding = 3) const; 00364 00365 /// getExactInverse - If this value has an exact multiplicative inverse, 00366 /// store it in inv and return true. 00367 bool getExactInverse(APFloat *inv) const; 00368 00369 private: 00370 00371 /* Trivial queries. */ 00372 integerPart *significandParts(); 00373 const integerPart *significandParts() const; 00374 unsigned int partCount() const; 00375 00376 /* Significand operations. */ 00377 integerPart addSignificand(const APFloat &); 00378 integerPart subtractSignificand(const APFloat &, integerPart); 00379 lostFraction addOrSubtractSignificand(const APFloat &, bool subtract); 00380 lostFraction multiplySignificand(const APFloat &, const APFloat *); 00381 lostFraction divideSignificand(const APFloat &); 00382 void incrementSignificand(); 00383 void initialize(const fltSemantics *); 00384 void shiftSignificandLeft(unsigned int); 00385 lostFraction shiftSignificandRight(unsigned int); 00386 unsigned int significandLSB() const; 00387 unsigned int significandMSB() const; 00388 void zeroSignificand(); 00389 00390 /* Arithmetic on special values. */ 00391 opStatus addOrSubtractSpecials(const APFloat &, bool subtract); 00392 opStatus divideSpecials(const APFloat &); 00393 opStatus multiplySpecials(const APFloat &); 00394 opStatus modSpecials(const APFloat &); 00395 00396 /* Miscellany. */ 00397 static APFloat makeNaN(const fltSemantics &Sem, bool SNaN, bool Negative, 00398 const APInt *fill); 00399 void makeNaN(bool SNaN = false, bool Neg = false, const APInt *fill = 0); 00400 opStatus normalize(roundingMode, lostFraction); 00401 opStatus addOrSubtract(const APFloat &, roundingMode, bool subtract); 00402 cmpResult compareAbsoluteValue(const APFloat &) const; 00403 opStatus handleOverflow(roundingMode); 00404 bool roundAwayFromZero(roundingMode, lostFraction, unsigned int) const; 00405 opStatus convertToSignExtendedInteger(integerPart *, unsigned int, bool, 00406 roundingMode, bool *) const; 00407 opStatus convertFromUnsignedParts(const integerPart *, unsigned int, 00408 roundingMode); 00409 opStatus convertFromHexadecimalString(StringRef, roundingMode); 00410 opStatus convertFromDecimalString(StringRef, roundingMode); 00411 char *convertNormalToHexString(char *, unsigned int, bool, 00412 roundingMode) const; 00413 opStatus roundSignificandWithExponent(const integerPart *, unsigned int, int, 00414 roundingMode); 00415 00416 APInt convertHalfAPFloatToAPInt() const; 00417 APInt convertFloatAPFloatToAPInt() const; 00418 APInt convertDoubleAPFloatToAPInt() const; 00419 APInt convertQuadrupleAPFloatToAPInt() const; 00420 APInt convertF80LongDoubleAPFloatToAPInt() const; 00421 APInt convertPPCDoubleDoubleAPFloatToAPInt() const; 00422 void initFromAPInt(const fltSemantics *Sem, const APInt &api); 00423 void initFromHalfAPInt(const APInt &api); 00424 void initFromFloatAPInt(const APInt &api); 00425 void initFromDoubleAPInt(const APInt &api); 00426 void initFromQuadrupleAPInt(const APInt &api); 00427 void initFromF80LongDoubleAPInt(const APInt &api); 00428 void initFromPPCDoubleDoubleAPInt(const APInt &api); 00429 00430 void assign(const APFloat &); 00431 void copySignificand(const APFloat &); 00432 void freeSignificand(); 00433 00434 /* What kind of semantics does this value obey? */ 00435 const fltSemantics *semantics; 00436 00437 /* Significand - the fraction with an explicit integer bit. Must be 00438 at least one bit wider than the target precision. */ 00439 union Significand { 00440 integerPart part; 00441 integerPart *parts; 00442 } significand; 00443 00444 /* The exponent - a signed number. */ 00445 exponent_t exponent; 00446 00447 /* What kind of floating point number this is. */ 00448 /* Only 2 bits are required, but VisualStudio incorrectly sign extends 00449 it. Using the extra bit keeps it from failing under VisualStudio */ 00450 fltCategory category : 3; 00451 00452 /* The sign bit of this number. */ 00453 unsigned int sign : 1; 00454 }; 00455 00456 // See friend declaration above. This additional declaration is required in 00457 // order to compile LLVM with IBM xlC compiler. 00458 hash_code hash_value(const APFloat &Arg); 00459 } /* namespace llvm */ 00460 00461 #endif /* LLVM_ADT_APFLOAT_H */