Arbitrary bitwidth integers #1415

llvmbot · 2006-12-11T10:43:06Z


Bugzilla Link	1043
Resolution	FIXED
Resolved on	Feb 22, 2010 12:46
Version	trunk
OS	All
Depends On	llvm/llvm-bugzilla-archive#1120
Reporter	LLVM Bugzilla Contributor
CC	@lattner

Extended Description

This PR is a follow-up of pr950. It extends the LLVM core libraries
to accept Bit Accurate Types (BATs) into the IR. The BAT is a extension to
integral types. It supports not only traditional integral types like
BoolTy, ByteTy, IntTy, and ShortTy, LongTy (these common cases will be
renamed into Int1Ty, Int8Ty like), but also non-byte-width BATs
like Int13. The LLVM core libraries will include BATs without
modifying the existing types except rename. To do this, a subclass of
Type, saying IntType will be added into LLVM to represent BATs. Bytecode
Reader and LLVM Assembly parser will also be updated to handle BATs.

Since this is a fairly extensive change to LLVM, this bug will be used
to trackthe progress of the work as it proceeds.

llvmbot · 2006-12-11T11:22:11Z

Basic Work Plan:

Each item below will be considered an iteration. An iteration consists of making
a small(ish) change across llvm and llvm-gcc4 and testing it with llvm-test and
deja-gnu. Development of new test cases will also occur as the instructions are
modified.

The iterations planned are the following (in roughly this order):

Write an IntType class that is a subclass of Type and add it to
include/llvm/Type.h.
This class will represent the BATs and must have at least the following
methods:
IntType* get(uint16_t numbits); // Returns an IntType for "numbits" bits
unsigned getNumBits() const; // Returns the number of bits in the type
Convert the static type instances in the Type class (BoolTy, ByteTy, IntTy,
ShortTy,
LongTy) into corresponding instances of IntType, like:
static Type *BoolTy; -> static IntType *Int1Ty;
Revise the Bytecode reader and writer to handle integer types of any bit
width. The
common cases (8,16,32,and 64 bits) will retain their (low-order) values in
the type
index. Non-byte-width BATs will be assigned unique type identifiers based
on their
bit width. This retains the compaction of the common case values.
Update the bytecode documentation to reflect the new bytecode layout.
Revise the LLVM Assembly parser to handle integer types of any bit width.
The assembler will support a new notation for specifying integer types
using a regex
pattern of i[09]+ so that i4, i7, i32, i64 will all be recognized as
integer type
names.
Revise the LLVM Assembly writer to use only the new notation for integers it
prints.
That is, an input using the short type will generate an output of i16.
Similarly for
the other keywords.
Update the LLVM Language Reference document to describe the new syntax.
Adjust lib/VMCore/ConstantFolding.{h,cpp} to fold integers of any bit width and
perform correct integer arithmetic computations in constant expressions
regardless
of integer bit width.
Update all the LLVM Feature and Regression tests, as needed, to fix broken tests
(e.g. those that look for keywords like short or int in the output) and
ensure that
there are no regressions.
Write new LLVM Feature tests for non-standard bit widths to ensure that
assembly,
bytecode and constant folding can work with these types. Use these tests
to verify
correctness of the BAT implementation.

lattner · 2006-12-12T20:18:21Z

Related to #8, you'll have to change ConstantInt to be able to hold arbitrary bitwidth constants. Before
you do that, you probably want to merge ConstantBool and ConstantIntegral into ConstantInt.

To specifically implement #8 and ConstantInt, you probably want to make a very simple library for
representing and manipulating arbitrary bit-width values. This will be important when changing things
that do arithmetic on constants (like instcombine). Currently we usually do stuff like:

int64_t X = V->getZExtValue();
X = (X << 1) & 47;
Y = ConstantInt::get(SomeType, X);

The random arithmetic will have to be done with the library instead of with inline C types and math.

-Chris

llvmbot · 2006-12-13T04:48:09Z

For this PR, i'd like to just handle bit width between 1 and 64bits. This means
arbitrary bit-width will be like i13, i47 which <= 64 bits.
For > 64 bits case, we can open a new PR for it. In that PR, we can add a new
Constant subclass, say LargeConstantInt as Reid suggested. i think we can still
keep ConstantBool, and let LargeConstantInt handle all the manipulation stuff of
the arbitrary bit-width values.

lattner · 2006-12-13T05:51:20Z

I think that handling <= 64 bits before arbitrary width integers makes sense.

You'll still need a basic constant manipulation library to handle constant folding of arbitrary width
integers, even if the widths are limited to 64-bits. Also, please consider merging ConstantBool and
ConstantIntegral into ConstantInt first, as a separate step. It will make things simpler in the long-run.

-Chris

llvmbot · 2006-12-22T05:15:22Z

As Chris suggested, ConstantIntegral and ConstantBool will be merged into
ConstantInt after #2 has completed.

llvmbot · 2006-12-22T05:18:13Z

The inital design of BitSet is as following:

class BitSet {

private:

typedef uint64_t _WordTy;

std::vector<_WordTy> WordVec;

inline size_t BITSET_WORDS(uint16_t __n) {
return __n < 1 ? 0 : (__n + BITSET_BITS_PER_WORD - 1) /
BITSET_BITS_PER_WORD;
}

static size_t whichWord(size_t pos)
{ return pos / BITSET_BITS_PER_WORD; }

static size_t whichByte(size_t pos)
{ return (pos % BITSET_BITS_PER_WORD) / 8; }

static size_t whichBit(size_t pos)
{ return pos % BITSET_BITS_PER_WORD; }

static _WordTy maskBit(size_t pos)
{ return (static_cast<_WordTy>(1)) << whichBit(pos); }

_WordTy& getWord(size_t pos)
{ return WordVec[whichWord(pos)]; }

_WordTy getWord(size_t pos) const
{ return WordVec[whichWord(pos)]; }

public:

enum {
BITSET_BITS_PER_WORD = sizeof(uint64_t) * 8
};

/// Create a new BitSet of numbits bit-width, and initalized as all zeros.
///
BitSet(unsigned short numbits = 1) : WordVec(BITSET_WORDS(numbits), 0) {}

/// Create a new BitSet with Integer Value translated from a string.
/// Argument Radix helps the interpretation of the string. Radix can only be
/// 2, 8, 10, 16.
///
BitSet(std::string& Val, uint8_t radix = 10);

BitSet(const BitSet& BS) : WordVec(BS.WordVec) {}

/// Initial bits bitwise-copied from a single word.
BitSet(const _WordTy val) : WordVec(1, val) {}

BitSet& operator=(const BitSet& RHS)
{ WordVec.assign(RHS.WordVec.begin(), RHS.WordVec.end()); }

const BitSet operator++(int);
BitSet& operator++();

const BitSet operator--(int);
BitSet& operator--();

BitSet& operator&=(const BitSet& RHS);

BitSet& operator|=(const BitSet& RHS);

BitSet& operator^=(const BitSet& RHS);

BitSet& operator<<=(size_t shiftAmt);

BitSet& operator>>=(size_t shiftAmt);

BitSet& operator~() const;

BitSet& operator*=(const BitSet& RHS);
BitSet& operator/=(const BitSet& RHS);
BitSet& operator+=(const BitSet& RHS);
BitSet& operator-=(const BitSet& RHS);
BitSet& operator%=(const BitSet& RHS);

/// getMaxValue - This is only for bitwidth <= 64 and
/// returns a uint64_t.
/// @brief Gets max value of the BitSet with bitwidth <= 64.
uint64_t getMaxValue();

/// getMinValue - This is only for bitwidth <= 64 and
/// returns a uint64_t.
/// @brief Gets min value of the BitSet with bitwidth <= 64.
uint64_t getMinValue();

/// @brief Set every bit to 1.
///
BitSet& setAll();

/// Set the given bit to 1 whose poition is given as "pos"
/// @brief Set a given bit to 1.
///
BitSet& set(size_t pos);

/// @brief Set every bit to 0.
///
BitSet& clearAll();

/// Set the given bit to 0 whose position is given as "pos"
/// @brief Set a given bit to 0.
///
BitSet& clear(size_t pos);

/// @brief Toggle every bit to its opposite value.
///
BitSet& flip();

/// Toggle a given bit to its opposite value whose position is given as "pos"
/// @brief Toggles a given bit to its opposite value.
BitSet& flip(size_t pos);

/// @brief Array-indexing support.
///
bool operator[](size_t pos);

/// Retuns a numerical interpretation of the BitSet if its bit-width <= 64
///
operator uint64_t() const;

/// Retuns a character interpretation of the BitSet.
///
std::string to_string(uint8_t radix = 10) const;

/// Returns the number of bits which are set.
size_t count() const;

/// Returns the total number of bits;
size_t getBitsNum() const;

/// Comparison of two BitSets.
bool operator==(const BitSet& RHS) const;
bool operator!=(const BitSet& RHS) const;

/// Return the index of the first "on" bit or BitsNum if all zero.
///
size_t findFirst() const;

BitSet operator<<(size_t shiftAmt) const;
BitSet operator>>(size_t shiftAmt) const;

};

BitSet operator&(const BitSet& BS1, const BitSet& BS2);
BitSet operator|(const BitSet& BS1, const BitSet& BS2);
BitSet operator^(const BitSet& BS1, const BitSet& BS2);

BitSet operator*(const BitSet& BS1, const BitSet& BS2);
BitSet operator/(const BitSet& BS1, const BitSet& BS2);
BitSet operator+(const BitSet& BS1, const BitSet& BS2);
BitSet operator-(const BitSet& BS1, const BitSet& BS2);
BitSet operator%(const BitSet& BS1, const BitSet& BS2);

lattner · 2006-12-23T00:10:56Z

This looks right-minded, but I have concerns about efficiency. Instead, I'd like to see an approach that
looks like this:

ConstantInt should be modified to use the "variable sized array" trick from C to hold the integer data
immediately after the ConstantInt instance. This allows each ConstantInt to stay a single allocation,
and also allows them to be fully general (holding arbitrary width integers).
A new value class (AInt for arbitrary width integer?) like BitSet should be introduced to represent
arbitrary width values when they need to be manipulated. This is the thing that should have the
overloaded operators etc as your BitSet does. To implement this, I'd suggest having it contain three
fields: a bit width, a pointer to uint64_t (the array of bits) and a single uint64_t value (the later two can
be in a union if desired). ConstantInt::getValue() should be changed to return an instance of this class.

The basic design of #2 permits AInt (or whatever) to be constructed on the stack with no additional
memory allocations as longs as the type is 64-bits or less. We can then slowly migrate all the
transformations over to use ConstantInt::getValue() instead of getZExtValue/getSExtValue.

-Chris

lattner · 2006-12-23T00:13:23Z

As a meta issue, please feel free to split this up into sub-bugs for each separate issue. For example,
you could file one for:

Merging ConstantInt ConstantIntegral ConstantBool.
Introduce "AInt" (or whatever it should be called), and adding getValue to constantint
Make ConstantInt be able to hold large values.
Migrate code to use AInt instead of uint64_t for various bit manipulations.
Update the .ll/.bc formats for variable width stuff,
etc

These can all be marked as blocking this bug.

llvmbot · 2007-01-11T23:53:53Z

Current committed progress:

ConstantIntegral, ConstantBool and ConstantInt have been merged to a single
ConstantInt class.

BoolTy has been renamed Int1Ty

llvmbot · 2007-01-12T09:38:09Z

The arbitrary bitwidth integers have been implemented.

This is what's left for this enhancement

ConstantFolding (Sheng)
Full test suite (Leo)
Documentation updates (Reid)

lattner · 2007-01-12T09:45:13Z

Verifying that all the optimizations work with BATs, even if to just avoid breaking when confronted with
one.

-Chris

llvmbot · 2007-01-12T18:26:36Z

That (#4) was intended to be convered under #2. The test suite will consist of
both assembly and C/C++ programs.

llvmbot · 2007-01-17T05:46:17Z

There are now two remaining items, both being worked on by Sheng:

bug 1120 (which this one now depends on). This bug makes some cosmetic
adjustments to use IntegerType as opposed to Type.
Double checking that constant folding works with various bit widths by
reviewing Leo's test case for coverage.

Once these two items are done, Sheng, this bug can be resolved.

llvmbot · 2007-01-19T23:16:34Z

All tasks for this PR have been completed.

lattner · 2007-01-20T04:00:44Z

Should this really be closed even when simple things like constant folding are broken?

lattner · 2007-01-20T04:01:19Z

Actually, I retract that, constant folding may not be broken. Have you verified that it works though?

llvmbot · 2007-01-20T09:01:55Z

Yes, there's a test suite for it. It passes. Is there something you know that I
don't ?

lattner · 2007-01-20T22:24:37Z

Nope, I just wanted to make sure you tested it. You tried stuff like 'ashr i7 -1, 3' -> -1 etc?

llvmbot · 2007-01-20T22:33:27Z

Not that particular test case, but many similar ones. I just added that one to
test/Integer/a7.ll and it passed. If you're curious about the tests we've
developed, please see test/Integer .. its reasonably comprehensive. Leo developed
this test suite and Sheng reviewed it for completeness and accuracy. He found a
couple instructions that weren't tested well and updated the tests. He also found
a bug in the handling of i1 -1 and fixed it. At this point we don't know of any
constant folding bugs in the arbitrary precision integer arithmetic.

lattner · 2007-01-20T22:40:30Z

ok, sounds great!

nlewycky · 2021-11-26T16:04:03Z

mentioned in issue llvm/llvm-bugzilla-archive#1120

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

Endilll added enhancement and removed new-feature labels Aug 15, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arbitrary bitwidth integers #1415

Arbitrary bitwidth integers #1415

llvmbot commented Dec 11, 2006

llvmbot commented Dec 11, 2006

lattner commented Dec 12, 2006

llvmbot commented Dec 13, 2006

lattner commented Dec 13, 2006

llvmbot commented Dec 22, 2006

llvmbot commented Dec 22, 2006

lattner commented Dec 23, 2006

lattner commented Dec 23, 2006

llvmbot commented Jan 11, 2007

llvmbot commented Jan 12, 2007

lattner commented Jan 12, 2007

llvmbot commented Jan 12, 2007

llvmbot commented Jan 17, 2007

llvmbot commented Jan 19, 2007

lattner commented Jan 20, 2007

lattner commented Jan 20, 2007

llvmbot commented Jan 20, 2007

lattner commented Jan 20, 2007

llvmbot commented Jan 20, 2007

lattner commented Jan 20, 2007

nlewycky commented Nov 26, 2021

Arbitrary bitwidth integers #1415

Arbitrary bitwidth integers #1415

Comments

llvmbot commented Dec 11, 2006

Extended Description

llvmbot commented Dec 11, 2006

lattner commented Dec 12, 2006

llvmbot commented Dec 13, 2006

lattner commented Dec 13, 2006

llvmbot commented Dec 22, 2006

llvmbot commented Dec 22, 2006

lattner commented Dec 23, 2006

lattner commented Dec 23, 2006

llvmbot commented Jan 11, 2007

llvmbot commented Jan 12, 2007

lattner commented Jan 12, 2007

llvmbot commented Jan 12, 2007

llvmbot commented Jan 17, 2007

llvmbot commented Jan 19, 2007

lattner commented Jan 20, 2007

lattner commented Jan 20, 2007

llvmbot commented Jan 20, 2007

lattner commented Jan 20, 2007

llvmbot commented Jan 20, 2007

lattner commented Jan 20, 2007

nlewycky commented Nov 26, 2021