LLVM API Documentation

APFloat.h
Go to the documentation of this file.
00001 //== llvm/Support/APFloat.h - Arbitrary Precision Floating Point -*- C++ -*-==//
00002 //
00003 //                     The LLVM Compiler Infrastructure
00004 //
00005 // This file is distributed under the University of Illinois Open Source
00006 // License. See LICENSE.TXT for details.
00007 //
00008 //===----------------------------------------------------------------------===//
00009 //
00010 // This file declares a class to represent arbitrary precision floating
00011 // point values and provide a variety of arithmetic operations on them.
00012 //
00013 //===----------------------------------------------------------------------===//
00014 
00015 /*  A self-contained host- and target-independent arbitrary-precision
00016     floating-point software implementation.  It uses bignum integer
00017     arithmetic as provided by static functions in the APInt class.
00018     The library will work with bignum integers whose parts are any
00019     unsigned type at least 16 bits wide, but 64 bits is recommended.
00020 
00021     Written for clarity rather than speed, in particular with a view
00022     to use in the front-end of a cross compiler so that target
00023     arithmetic can be correctly performed on the host.  Performance
00024     should nonetheless be reasonable, particularly for its intended
00025     use.  It may be useful as a base implementation for a run-time
00026     library during development of a faster target-specific one.
00027 
00028     All 5 rounding modes in the IEEE-754R draft are handled correctly
00029     for all implemented operations.  Currently implemented operations
00030     are add, subtract, multiply, divide, fused-multiply-add,
00031     conversion-to-float, conversion-to-integer and
00032     conversion-from-integer.  New rounding modes (e.g. away from zero)
00033     can be added with three or four lines of code.
00034 
00035     Four formats are built-in: IEEE single precision, double
00036     precision, quadruple precision, and x87 80-bit extended double
00037     (when operating with full extended precision).  Adding a new
00038     format that obeys IEEE semantics only requires adding two lines of
00039     code: a declaration and definition of the format.
00040 
00041     All operations return the status of that operation as an exception
00042     bit-mask, so multiple operations can be done consecutively with
00043     their results or-ed together.  The returned status can be useful
00044     for compiler diagnostics; e.g., inexact, underflow and overflow
00045     can be easily diagnosed on constant folding, and compiler
00046     optimizers can determine what exceptions would be raised by
00047     folding operations and optimize, or perhaps not optimize,
00048     accordingly.
00049 
00050     At present, underflow tininess is detected after rounding; it
00051     should be straight forward to add support for the before-rounding
00052     case too.
00053 
00054     The library reads hexadecimal floating point numbers as per C99,
00055     and correctly rounds if necessary according to the specified
00056     rounding mode.  Syntax is required to have been validated by the
00057     caller.  It also converts floating point numbers to hexadecimal
00058     text as per the C99 %a and %A conversions.  The output precision
00059     (or alternatively the natural minimal precision) can be specified;
00060     if the requested precision is less than the natural precision the
00061     output is correctly rounded for the specified rounding mode.
00062 
00063     It also reads decimal floating point numbers and correctly rounds
00064     according to the specified rounding mode.
00065 
00066     Conversion to decimal text is not currently implemented.
00067 
00068     Non-zero finite numbers are represented internally as a sign bit,
00069     a 16-bit signed exponent, and the significand as an array of
00070     integer parts.  After normalization of a number of precision P the
00071     exponent is within the range of the format, and if the number is
00072     not denormal the P-th bit of the significand is set as an explicit
00073     integer bit.  For denormals the most significant bit is shifted
00074     right so that the exponent is maintained at the format's minimum,
00075     so that the smallest denormal has just the least significant bit
00076     of the significand set.  The sign of zeroes and infinities is
00077     significant; the exponent and significand of such numbers is not
00078     stored, but has a known implicit (deterministic) value: 0 for the
00079     significands, 0 for zero exponent, all 1 bits for infinity
00080     exponent.  For NaNs the sign and significand are deterministic,
00081     although not really meaningful, and preserved in non-conversion
00082     operations.  The exponent is implicitly all 1 bits.
00083 
00084     TODO
00085     ====
00086 
00087     Some features that may or may not be worth adding:
00088 
00089     Binary to decimal conversion (hard).
00090 
00091     Optional ability to detect underflow tininess before rounding.
00092 
00093     New formats: x87 in single and double precision mode (IEEE apart
00094     from extended exponent range) (hard).
00095 
00096     New operations: sqrt, IEEE remainder, C90 fmod, nextafter,
00097     nexttoward.
00098 */
00099 
00100 #ifndef LLVM_ADT_APFLOAT_H
00101 #define LLVM_ADT_APFLOAT_H
00102 
00103 // APInt contains static functions implementing bignum arithmetic.
00104 #include "llvm/ADT/APInt.h"
00105 
00106 namespace llvm {
00107 
00108 /* Exponents are stored as signed numbers.  */
00109 typedef signed short exponent_t;
00110 
00111 struct fltSemantics;
00112 class APSInt;
00113 class StringRef;
00114 
00115 /* When bits of a floating point number are truncated, this enum is
00116    used to indicate what fraction of the LSB those bits represented.
00117    It essentially combines the roles of guard and sticky bits.  */
00118 enum lostFraction { // Example of truncated bits:
00119   lfExactlyZero,    // 000000
00120   lfLessThanHalf,   // 0xxxxx  x's not all zero
00121   lfExactlyHalf,    // 100000
00122   lfMoreThanHalf    // 1xxxxx  x's not all zero
00123 };
00124 
00125 class APFloat {
00126 public:
00127 
00128   /* We support the following floating point semantics.  */
00129   static const fltSemantics IEEEhalf;
00130   static const fltSemantics IEEEsingle;
00131   static const fltSemantics IEEEdouble;
00132   static const fltSemantics IEEEquad;
00133   static const fltSemantics PPCDoubleDouble;
00134   static const fltSemantics x87DoubleExtended;
00135   /* And this pseudo, used to construct APFloats that cannot
00136      conflict with anything real. */
00137   static const fltSemantics Bogus;
00138 
00139   static unsigned int semanticsPrecision(const fltSemantics &);
00140 
00141   /* Floating point numbers have a four-state comparison relation.  */
00142   enum cmpResult {
00143     cmpLessThan,
00144     cmpEqual,
00145     cmpGreaterThan,
00146     cmpUnordered
00147   };
00148 
00149   /* IEEE-754R gives five rounding modes.  */
00150   enum roundingMode {
00151     rmNearestTiesToEven,
00152     rmTowardPositive,
00153     rmTowardNegative,
00154     rmTowardZero,
00155     rmNearestTiesToAway
00156   };
00157 
00158   // Operation status.  opUnderflow or opOverflow are always returned
00159   // or-ed with opInexact.
00160   enum opStatus {
00161     opOK = 0x00,
00162     opInvalidOp = 0x01,
00163     opDivByZero = 0x02,
00164     opOverflow = 0x04,
00165     opUnderflow = 0x08,
00166     opInexact = 0x10
00167   };
00168 
00169   // Category of internally-represented number.
00170   enum fltCategory {
00171     fcInfinity,
00172     fcNaN,
00173     fcNormal,
00174     fcZero
00175   };
00176 
00177   enum uninitializedTag {
00178     uninitialized
00179   };
00180 
00181   // Constructors.
00182   APFloat(const fltSemantics &); // Default construct to 0.0
00183   APFloat(const fltSemantics &, StringRef);
00184   APFloat(const fltSemantics &, integerPart);
00185   APFloat(const fltSemantics &, fltCategory, bool negative);
00186   APFloat(const fltSemantics &, uninitializedTag);
00187   APFloat(const fltSemantics &, const APInt &);
00188   explicit APFloat(double d);
00189   explicit APFloat(float f);
00190   APFloat(const APFloat &);
00191   ~APFloat();
00192 
00193   // Convenience "constructors"
00194   static APFloat getZero(const fltSemantics &Sem, bool Negative = false) {
00195     return APFloat(Sem, fcZero, Negative);
00196   }
00197   static APFloat getInf(const fltSemantics &Sem, bool Negative = false) {
00198     return APFloat(Sem, fcInfinity, Negative);
00199   }
00200 
00201   /// getNaN - Factory for QNaN values.
00202   ///
00203   /// \param Negative - True iff the NaN generated should be negative.
00204   /// \param type - The unspecified fill bits for creating the NaN, 0 by
00205   /// default.  The value is truncated as necessary.
00206   static APFloat getNaN(const fltSemantics &Sem, bool Negative = false,
00207                         unsigned type = 0) {
00208     if (type) {
00209       APInt fill(64, type);
00210       return getQNaN(Sem, Negative, &fill);
00211     } else {
00212       return getQNaN(Sem, Negative, 0);
00213     }
00214   }
00215 
00216   /// getQNan - Factory for QNaN values.
00217   static APFloat getQNaN(const fltSemantics &Sem, bool Negative = false,
00218                          const APInt *payload = 0) {
00219     return makeNaN(Sem, false, Negative, payload);
00220   }
00221 
00222   /// getSNan - Factory for SNaN values.
00223   static APFloat getSNaN(const fltSemantics &Sem, bool Negative = false,
00224                          const APInt *payload = 0) {
00225     return makeNaN(Sem, true, Negative, payload);
00226   }
00227 
00228   /// getLargest - Returns the largest finite number in the given
00229   /// semantics.
00230   ///
00231   /// \param Negative - True iff the number should be negative
00232   static APFloat getLargest(const fltSemantics &Sem, bool Negative = false);
00233 
00234   /// getSmallest - Returns the smallest (by magnitude) finite number
00235   /// in the given semantics.  Might be denormalized, which implies a
00236   /// relative loss of precision.
00237   ///
00238   /// \param Negative - True iff the number should be negative
00239   static APFloat getSmallest(const fltSemantics &Sem, bool Negative = false);
00240 
00241   /// getSmallestNormalized - Returns the smallest (by magnitude)
00242   /// normalized finite number in the given semantics.
00243   ///
00244   /// \param Negative - True iff the number should be negative
00245   static APFloat getSmallestNormalized(const fltSemantics &Sem,
00246                                        bool Negative = false);
00247 
00248   /// getAllOnesValue - Returns a float which is bitcasted from
00249   /// an all one value int.
00250   ///
00251   /// \param BitWidth - Select float type
00252   /// \param isIEEE   - If 128 bit number, select between PPC and IEEE
00253   static APFloat getAllOnesValue(unsigned BitWidth, bool isIEEE = false);
00254 
00255   /// Profile - Used to insert APFloat objects, or objects that contain
00256   ///  APFloat objects, into FoldingSets.
00257   void Profile(FoldingSetNodeID &NID) const;
00258 
00259   /// @brief Used by the Bitcode serializer to emit APInts to Bitcode.
00260   void Emit(Serializer &S) const;
00261 
00262   /// @brief Used by the Bitcode deserializer to deserialize APInts.
00263   static APFloat ReadVal(Deserializer &D);
00264 
00265   /* Arithmetic.  */
00266   opStatus add(const APFloat &, roundingMode);
00267   opStatus subtract(const APFloat &, roundingMode);
00268   opStatus multiply(const APFloat &, roundingMode);
00269   opStatus divide(const APFloat &, roundingMode);
00270   /* IEEE remainder. */
00271   opStatus remainder(const APFloat &);
00272   /* C fmod, or llvm frem. */
00273   opStatus mod(const APFloat &, roundingMode);
00274   opStatus fusedMultiplyAdd(const APFloat &, const APFloat &, roundingMode);
00275   opStatus roundToIntegral(roundingMode);
00276 
00277   /* Sign operations.  */
00278   void changeSign();
00279   void clearSign();
00280   void copySign(const APFloat &);
00281 
00282   /* Conversions.  */
00283   opStatus convert(const fltSemantics &, roundingMode, bool *);
00284   opStatus convertToInteger(integerPart *, unsigned int, bool, roundingMode,
00285                             bool *) const;
00286   opStatus convertToInteger(APSInt &, roundingMode, bool *) const;
00287   opStatus convertFromAPInt(const APInt &, bool, roundingMode);
00288   opStatus convertFromSignExtendedInteger(const integerPart *, unsigned int,
00289                                           bool, roundingMode);
00290   opStatus convertFromZeroExtendedInteger(const integerPart *, unsigned int,
00291                                           bool, roundingMode);
00292   opStatus convertFromString(StringRef, roundingMode);
00293   APInt bitcastToAPInt() const;
00294   double convertToDouble() const;
00295   float convertToFloat() const;
00296 
00297   /* The definition of equality is not straightforward for floating point,
00298      so we won't use operator==.  Use one of the following, or write
00299      whatever it is you really mean. */
00300   bool operator==(const APFloat &) const LLVM_DELETED_FUNCTION;
00301 
00302   /* IEEE comparison with another floating point number (NaNs
00303      compare unordered, 0==-0). */
00304   cmpResult compare(const APFloat &) const;
00305 
00306   /* Bitwise comparison for equality (QNaNs compare equal, 0!=-0). */
00307   bool bitwiseIsEqual(const APFloat &) const;
00308 
00309   /* Write out a hexadecimal representation of the floating point
00310      value to DST, which must be of sufficient size, in the C99 form
00311      [-]0xh.hhhhp[+-]d.  Return the number of characters written,
00312      excluding the terminating NUL.  */
00313   unsigned int convertToHexString(char *dst, unsigned int hexDigits,
00314                                   bool upperCase, roundingMode) const;
00315 
00316   /* Simple queries.  */
00317   fltCategory getCategory() const { return category; }
00318   const fltSemantics &getSemantics() const { return *semantics; }
00319   bool isZero() const { return category == fcZero; }
00320   bool isNonZero() const { return category != fcZero; }
00321   bool isNormal() const { return category == fcNormal; }
00322   bool isNaN() const { return category == fcNaN; }
00323   bool isInfinity() const { return category == fcInfinity; }
00324   bool isNegative() const { return sign; }
00325   bool isPosZero() const { return isZero() && !isNegative(); }
00326   bool isNegZero() const { return isZero() && isNegative(); }
00327   bool isDenormal() const;
00328 
00329   APFloat &operator=(const APFloat &);
00330 
00331   /// \brief Overload to compute a hash code for an APFloat value.
00332   ///
00333   /// Note that the use of hash codes for floating point values is in general
00334   /// frought with peril. Equality is hard to define for these values. For
00335   /// example, should negative and positive zero hash to different codes? Are
00336   /// they equal or not? This hash value implementation specifically
00337   /// emphasizes producing different codes for different inputs in order to
00338   /// be used in canonicalization and memoization. As such, equality is
00339   /// bitwiseIsEqual, and 0 != -0.
00340   friend hash_code hash_value(const APFloat &Arg);
00341 
00342   /// Converts this value into a decimal string.
00343   ///
00344   /// \param FormatPrecision The maximum number of digits of
00345   ///   precision to output.  If there are fewer digits available,
00346   ///   zero padding will not be used unless the value is
00347   ///   integral and small enough to be expressed in
00348   ///   FormatPrecision digits.  0 means to use the natural
00349   ///   precision of the number.
00350   /// \param FormatMaxPadding The maximum number of zeros to
00351   ///   consider inserting before falling back to scientific
00352   ///   notation.  0 means to always use scientific notation.
00353   ///
00354   /// Number       Precision    MaxPadding      Result
00355   /// ------       ---------    ----------      ------
00356   /// 1.01E+4              5             2       10100
00357   /// 1.01E+4              4             2       1.01E+4
00358   /// 1.01E+4              5             1       1.01E+4
00359   /// 1.01E-2              5             2       0.0101
00360   /// 1.01E-2              4             2       0.0101
00361   /// 1.01E-2              4             1       1.01E-2
00362   void toString(SmallVectorImpl<char> &Str, unsigned FormatPrecision = 0,
00363                 unsigned FormatMaxPadding = 3) const;
00364 
00365   /// getExactInverse - If this value has an exact multiplicative inverse,
00366   /// store it in inv and return true.
00367   bool getExactInverse(APFloat *inv) const;
00368 
00369 private:
00370 
00371   /* Trivial queries.  */
00372   integerPart *significandParts();
00373   const integerPart *significandParts() const;
00374   unsigned int partCount() const;
00375 
00376   /* Significand operations.  */
00377   integerPart addSignificand(const APFloat &);
00378   integerPart subtractSignificand(const APFloat &, integerPart);
00379   lostFraction addOrSubtractSignificand(const APFloat &, bool subtract);
00380   lostFraction multiplySignificand(const APFloat &, const APFloat *);
00381   lostFraction divideSignificand(const APFloat &);
00382   void incrementSignificand();
00383   void initialize(const fltSemantics *);
00384   void shiftSignificandLeft(unsigned int);
00385   lostFraction shiftSignificandRight(unsigned int);
00386   unsigned int significandLSB() const;
00387   unsigned int significandMSB() const;
00388   void zeroSignificand();
00389 
00390   /* Arithmetic on special values.  */
00391   opStatus addOrSubtractSpecials(const APFloat &, bool subtract);
00392   opStatus divideSpecials(const APFloat &);
00393   opStatus multiplySpecials(const APFloat &);
00394   opStatus modSpecials(const APFloat &);
00395 
00396   /* Miscellany.  */
00397   static APFloat makeNaN(const fltSemantics &Sem, bool SNaN, bool Negative,
00398                          const APInt *fill);
00399   void makeNaN(bool SNaN = false, bool Neg = false, const APInt *fill = 0);
00400   opStatus normalize(roundingMode, lostFraction);
00401   opStatus addOrSubtract(const APFloat &, roundingMode, bool subtract);
00402   cmpResult compareAbsoluteValue(const APFloat &) const;
00403   opStatus handleOverflow(roundingMode);
00404   bool roundAwayFromZero(roundingMode, lostFraction, unsigned int) const;
00405   opStatus convertToSignExtendedInteger(integerPart *, unsigned int, bool,
00406                                         roundingMode, bool *) const;
00407   opStatus convertFromUnsignedParts(const integerPart *, unsigned int,
00408                                     roundingMode);
00409   opStatus convertFromHexadecimalString(StringRef, roundingMode);
00410   opStatus convertFromDecimalString(StringRef, roundingMode);
00411   char *convertNormalToHexString(char *, unsigned int, bool,
00412                                  roundingMode) const;
00413   opStatus roundSignificandWithExponent(const integerPart *, unsigned int, int,
00414                                         roundingMode);
00415 
00416   APInt convertHalfAPFloatToAPInt() const;
00417   APInt convertFloatAPFloatToAPInt() const;
00418   APInt convertDoubleAPFloatToAPInt() const;
00419   APInt convertQuadrupleAPFloatToAPInt() const;
00420   APInt convertF80LongDoubleAPFloatToAPInt() const;
00421   APInt convertPPCDoubleDoubleAPFloatToAPInt() const;
00422   void initFromAPInt(const fltSemantics *Sem, const APInt &api);
00423   void initFromHalfAPInt(const APInt &api);
00424   void initFromFloatAPInt(const APInt &api);
00425   void initFromDoubleAPInt(const APInt &api);
00426   void initFromQuadrupleAPInt(const APInt &api);
00427   void initFromF80LongDoubleAPInt(const APInt &api);
00428   void initFromPPCDoubleDoubleAPInt(const APInt &api);
00429 
00430   void assign(const APFloat &);
00431   void copySignificand(const APFloat &);
00432   void freeSignificand();
00433 
00434   /* What kind of semantics does this value obey?  */
00435   const fltSemantics *semantics;
00436 
00437   /* Significand - the fraction with an explicit integer bit.  Must be
00438      at least one bit wider than the target precision.  */
00439   union Significand {
00440     integerPart part;
00441     integerPart *parts;
00442   } significand;
00443 
00444   /* The exponent - a signed number.  */
00445   exponent_t exponent;
00446 
00447   /* What kind of floating point number this is.  */
00448   /* Only 2 bits are required, but VisualStudio incorrectly sign extends
00449      it.  Using the extra bit keeps it from failing under VisualStudio */
00450   fltCategory category : 3;
00451 
00452   /* The sign bit of this number.  */
00453   unsigned int sign : 1;
00454 };
00455 
00456 // See friend declaration above. This additional declaration is required in
00457 // order to compile LLVM with IBM xlC compiler.
00458 hash_code hash_value(const APFloat &Arg);
00459 } /* namespace llvm */
00460 
00461 #endif /* LLVM_ADT_APFLOAT_H */