1043 – Arbitrary bitwidth integers

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1043 - Arbitrary bitwidth integers

Summary: Arbitrary bitwidth integers

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Core LLVM classes (show other bugs)
Version:	trunk
Hardware:	All All

Importance:	P enhancement
Assignee:	Sheng Zhou

URL:
Keywords:	new-feature

Depends on:	1120
Blocks:
	Show dependency tree

Reported:	2006-12-11 02:43 PST by Sheng Zhou
Modified:	2010-02-22 12:46 PST (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sheng Zhou 2006-12-11 02:43:06 PST

This PR is a follow-up of pr950. It extends the LLVM core libraries
to accept Bit Accurate Types (BATs) into the IR. The BAT is a extension to
integral types. It supports not only traditional integral types like
BoolTy, ByteTy, IntTy, and ShortTy, LongTy (these common cases will be 
renamed into Int1Ty, Int8Ty like), but also non-byte-width BATs
like Int13. The LLVM core libraries will include BATs without
modifying the existing types except rename. To do this, a subclass of 
Type, saying IntType will be added into LLVM to represent BATs. Bytecode 
Reader and LLVM Assembly parser will also be updated to handle BATs. 

Since this is a fairly extensive change to LLVM, this bug will be used
to trackthe progress of the work as it proceeds.

Comment 1 Sheng Zhou 2006-12-11 03:22:11 PST

Basic Work Plan:

Each item below will be considered an iteration. An iteration consists of making
a small(ish) change across llvm and llvm-gcc4 and testing it with llvm-test and
deja-gnu. Development of new test cases will also occur as the instructions are
modified.

The iterations planned are the following (in roughly this order):

1. Write an IntType class that is a subclass of Type and add it to
include/llvm/Type.h. 
      This class will represent the BATs and must have at least the following
methods:
      IntType* get(uint16_t numbits); // Returns an IntType for "numbits" bits
      unsigned getNumBits() const;    // Returns the number of bits in the type
2. Convert the static type instances in the Type class (BoolTy, ByteTy, IntTy,
ShortTy, 
      LongTy) into corresponding instances of IntType, like:
      static Type *BoolTy;   ->   static IntType *Int1Ty;
3. Revise the Bytecode reader and writer to handle integer types of any bit
width. The 
      common cases (8,16,32,and 64 bits) will retain their (low-order) values in
the type 
      index. Non-byte-width BATs will be assigned unique type identifiers based
on their 
      bit width. This retains the compaction of the common case values.
4. Update the bytecode documentation to reflect the new bytecode layout.
5. Revise the LLVM Assembly parser to handle integer types of any bit width. 
      The assembler will support a new notation for specifying integer types
using a regex 
      pattern of i[09]+ so that i4, i7, i32, i64 will all be recognized as
integer type 
      names.
6. Revise the LLVM Assembly writer to use only the new notation for integers it
prints. 
      That is, an input using the short type will generate an output of i16.
Similarly for 
      the other keywords.
7. Update the LLVM Language Reference document to describe the new syntax.
8. Adjust lib/VMCore/ConstantFolding.{h,cpp} to fold integers of any bit width and 
      perform correct integer arithmetic computations in constant expressions
regardless 
      of integer bit width.
9. Update all the LLVM Feature and Regression tests, as needed, to fix broken tests 
      (e.g. those that look for keywords like short or int in the output) and
ensure that 
      there are no regressions.
10. Write new LLVM Feature tests for non-standard bit widths to ensure that
assembly, 
        bytecode and constant folding can work with these types. Use these tests
to verify 
        correctness of the BAT implementation.

Comment 2 Chris Lattner 2006-12-12 12:18:21 PST

Related to #8, you'll have to change ConstantInt to be able to hold arbitrary bitwidth constants.  Before 
you do that, you probably want to merge ConstantBool and ConstantIntegral into ConstantInt.

To specifically implement #8 and ConstantInt, you probably want to make a very simple library for 
representing and manipulating arbitrary bit-width values.  This will be important when changing things 
that do arithmetic on constants (like instcombine).  Currently we usually do stuff like:

int64_t X = V->getZExtValue();
X = (X << 1) & 47;
 Y = ConstantInt::get(SomeType, X);

The random arithmetic will have to be done with the library instead of with inline C types and math.

-Chris

Comment 3 Sheng Zhou 2006-12-12 20:48:09 PST

For this PR, i'd like to just handle bit width between 1 and 64bits. This means
arbitrary bit-width will be like i13, i47 which <= 64 bits.
For > 64 bits case, we can open a new PR for it. In that PR, we can add a new
Constant subclass, say LargeConstantInt as Reid suggested. i think we can still
keep ConstantBool, and let LargeConstantInt handle all the manipulation stuff of
the arbitrary bit-width values.

Comment 4 Chris Lattner 2006-12-12 21:51:20 PST

I think that handling <= 64 bits before arbitrary width integers makes sense.

You'll still need a basic constant manipulation library to handle constant folding of arbitrary width 
integers, even if the widths are limited to 64-bits.  Also, please consider merging ConstantBool and 
ConstantIntegral into ConstantInt first, as a separate step.  It will make things simpler in the long-run.


-Chris

Comment 5 Sheng Zhou 2006-12-21 21:15:22 PST

As Chris suggested, ConstantIntegral and ConstantBool will be merged into
ConstantInt after #2 has completed.

Comment 6 Sheng Zhou 2006-12-21 21:18:13 PST

The inital design of BitSet is as following:

class BitSet {

private:

  typedef uint64_t _WordTy;

  std::vector<_WordTy> WordVec;

  inline size_t BITSET_WORDS(uint16_t __n) {
    return __n < 1 ? 0 : (__n + BITSET_BITS_PER_WORD - 1) / 
                         BITSET_BITS_PER_WORD;
  }

  static size_t whichWord(size_t pos)
  { return pos / BITSET_BITS_PER_WORD; }

  static size_t whichByte(size_t pos)
  { return (pos % BITSET_BITS_PER_WORD) / 8; }

  static size_t whichBit(size_t pos)
  { return pos % BITSET_BITS_PER_WORD; }

  static _WordTy maskBit(size_t pos)
  { return (static_cast<_WordTy>(1)) << whichBit(pos); }

  _WordTy& getWord(size_t pos)
  { return WordVec[whichWord(pos)]; }

  _WordTy  getWord(size_t pos) const
  { return WordVec[whichWord(pos)]; }

public:

  enum {
    BITSET_BITS_PER_WORD = sizeof(uint64_t) * 8
  };

  /// Create a new BitSet of numbits bit-width, and initalized as all zeros.
  ///
  BitSet(unsigned short numbits = 1) : WordVec(BITSET_WORDS(numbits), 0) {}

  /// Create a new BitSet with Integer Value translated from a string.
  /// Argument Radix helps the interpretation of the string. Radix can only be
  /// 2, 8, 10, 16.
  ///
  BitSet(std::string& Val, uint8_t radix = 10);

  BitSet(const BitSet& BS) : WordVec(BS.WordVec) {}

  /// Initial bits bitwise-copied from a single word.
  BitSet(const _WordTy val) : WordVec(1, val) {}

  BitSet& operator=(const BitSet& RHS)
  { WordVec.assign(RHS.WordVec.begin(), RHS.WordVec.end()); }

  const BitSet operator++(int);
  BitSet& operator++();

  const BitSet operator--(int);
  BitSet& operator--();

  BitSet& operator&=(const BitSet& RHS);

  BitSet& operator|=(const BitSet& RHS);

  BitSet& operator^=(const BitSet& RHS);

  BitSet& operator<<=(size_t shiftAmt);

  BitSet& operator>>=(size_t shiftAmt);

  BitSet& operator~() const;

  BitSet& operator*=(const BitSet& RHS);
  BitSet& operator/=(const BitSet& RHS);
  BitSet& operator+=(const BitSet& RHS);
  BitSet& operator-=(const BitSet& RHS);
  BitSet& operator%=(const BitSet& RHS);

  /// getMaxValue - This is only for bitwidth <= 64 and
  /// returns a uint64_t.
  /// @brief Gets max value of the BitSet with bitwidth <= 64.
  uint64_t getMaxValue();

  /// getMinValue - This is only for bitwidth <= 64 and
  /// returns a uint64_t.
  /// @brief Gets min value of the BitSet with bitwidth <= 64.
  uint64_t getMinValue();

  /// @brief Set every bit to 1.
  ///
  BitSet& setAll();

  /// Set the given bit to 1 whose poition is given as "pos"
  /// @brief Set a given bit to 1.
  ///
  BitSet& set(size_t pos);

  /// @brief Set every bit to 0.
  ///
  BitSet& clearAll();

  /// Set the given bit to 0 whose position is given as "pos"
  /// @brief Set a given bit to 0.
  ///
  BitSet& clear(size_t pos);

  /// @brief Toggle every bit to its opposite value.
  ///
  BitSet& flip();

  /// Toggle a given bit to its opposite value whose position is given as "pos"
  /// @brief Toggles a given bit to its opposite value.
  BitSet& flip(size_t pos);

  /// @brief  Array-indexing support.
  ///
  bool operator[](size_t pos);

  /// Retuns a numerical interpretation of the BitSet if its bit-width <= 64
  ///
  operator uint64_t() const;

  /// Retuns a character interpretation of the BitSet.
  ///
  std::string to_string(uint8_t radix = 10) const;

  /// Returns the number of bits which are set.
  size_t count() const;

  /// Returns the total number of bits;
  size_t getBitsNum() const;

  /// Comparison of two BitSets.
  bool operator==(const BitSet& RHS) const;
  bool operator!=(const BitSet& RHS) const;

  /// Return the index of the first "on" bit or BitsNum if all zero.
  ///
  size_t findFirst() const;

  BitSet operator<<(size_t shiftAmt) const;
  BitSet operator>>(size_t shiftAmt) const;

};

  BitSet operator&(const BitSet& BS1, const BitSet& BS2);
  BitSet operator|(const BitSet& BS1, const BitSet& BS2);
  BitSet operator^(const BitSet& BS1, const BitSet& BS2);

  BitSet operator*(const BitSet& BS1, const BitSet& BS2);
  BitSet operator/(const BitSet& BS1, const BitSet& BS2);
  BitSet operator+(const BitSet& BS1, const BitSet& BS2);
  BitSet operator-(const BitSet& BS1, const BitSet& BS2);
  BitSet operator%(const BitSet& BS1, const BitSet& BS2);

Comment 7 Chris Lattner 2006-12-22 16:10:56 PST

This looks right-minded, but I have concerns about efficiency.  Instead, I'd like to see an approach that 
looks like this:

1. ConstantInt should be modified to use the "variable sized array" trick from C to hold the integer data 
immediately after the ConstantInt instance.  This allows each ConstantInt to stay a single allocation, 
and also allows them to be fully general (holding arbitrary width integers).

2. A new value class (AInt for arbitrary width integer?) like BitSet should be introduced to represent 
arbitrary width values when they need to be manipulated.  This is the thing that should have the 
overloaded operators etc as your BitSet does.  To implement this, I'd suggest having it contain three 
fields: a bit width, a pointer to uint64_t (the array of bits) and a single uint64_t value (the later two can 
be in a union if desired).  ConstantInt::getValue() should be changed to return an instance of this class.

The basic design of #2 permits AInt (or whatever) to be constructed on the stack with no additional 
memory allocations as longs as the type is 64-bits or less.  We can then slowly migrate all the 
transformations over to use ConstantInt::getValue() instead of getZExtValue/getSExtValue.

-Chris

Comment 8 Chris Lattner 2006-12-22 16:13:23 PST

As a meta issue, please feel free to split this up into sub-bugs for each separate issue.  For example, 
you could file one for:

1. Merging ConstantInt ConstantIntegral ConstantBool.
2. Introduce "AInt" (or whatever it should be called), and adding getValue to constantint
3. Make ConstantInt be able to hold large values.
4. Migrate code to use AInt instead of uint64_t for various bit manipulations.
5. Update the .ll/.bc formats for variable width stuff,
etc

These can all be marked as blocking this bug.

Comment 9 Reid Spencer 2007-01-11 15:53:53 PST

Current committed progress:

ConstantIntegral, ConstantBool and ConstantInt have been merged to a single
ConstantInt class. 

BoolTy has been renamed Int1Ty

Comment 10 Reid Spencer 2007-01-12 01:38:09 PST

The arbitrary bitwidth integers have been implemented.

This is what's left for this enhancement

1. ConstantFolding (Sheng)
2. Full test suite (Leo)
3. Documentation updates (Reid)

Comment 11 Chris Lattner 2007-01-12 01:45:13 PST

4. Verifying that all the optimizations work with BATs, even if to just avoid breaking when confronted with 
one.

-Chris

Comment 12 Reid Spencer 2007-01-12 10:26:36 PST

That (#4) was intended to be convered under #2. The test suite will consist of
both assembly and C/C++ programs.

Comment 13 Reid Spencer 2007-01-16 21:46:17 PST

There are now two remaining items, both being worked on by Sheng:

1. bug 1120 (which this one now depends on). This bug makes some cosmetic
   adjustments to use IntegerType as opposed to Type. 
2. Double checking that constant folding works with various bit widths by
   reviewing Leo's test case for coverage.

Once these two items are done, Sheng, this bug can be resolved.

Comment 14 Reid Spencer 2007-01-19 15:16:34 PST

All tasks for this PR have been completed.

Comment 15 Chris Lattner 2007-01-19 20:00:44 PST

Should this really be closed even when simple things like constant folding are broken?

Comment 16 Chris Lattner 2007-01-19 20:01:19 PST

Actually, I retract that, constant folding may not be broken.  Have you verified that it works though?

Comment 17 Reid Spencer 2007-01-20 01:01:55 PST

Yes, there's a test suite for it. It passes. Is there something you know that I
don't ?

Comment 18 Chris Lattner 2007-01-20 14:24:37 PST

Nope, I just wanted to make sure you tested it.  You tried stuff like     'ashr i7 -1, 3' -> -1  etc?

Comment 19 Reid Spencer 2007-01-20 14:33:27 PST

Not that particular test case, but many similar ones. I just added that one to
test/Integer/a7.ll and it passed. If you're curious about the tests we've 
developed, please see test/Integer .. its reasonably comprehensive. Leo developed
this test suite and Sheng reviewed it for completeness and accuracy. He found a 
couple instructions that weren't tested well and updated the tests. He also found
a bug in the handling of i1 -1 and fixed it. At this point we don't know of any 
constant folding bugs in the arbitrary precision integer arithmetic.

Comment 20 Chris Lattner 2007-01-20 14:40:30 PST

ok, sounds great!