Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary bitwidth integers #1415

Closed
llvmbot opened this issue Dec 11, 2006 · 21 comments
Closed

Arbitrary bitwidth integers #1415

llvmbot opened this issue Dec 11, 2006 · 21 comments
Labels
bugzilla Issues migrated from bugzilla enhancement Improving things as opposed to bug fixing, e.g. new or missing feature llvm:core

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 11, 2006

Bugzilla Link 1043
Resolution FIXED
Resolved on Feb 22, 2010 12:46
Version trunk
OS All
Depends On llvm/llvm-bugzilla-archive#1120
Reporter LLVM Bugzilla Contributor
CC @lattner

Extended Description

This PR is a follow-up of pr950. It extends the LLVM core libraries
to accept Bit Accurate Types (BATs) into the IR. The BAT is a extension to
integral types. It supports not only traditional integral types like
BoolTy, ByteTy, IntTy, and ShortTy, LongTy (these common cases will be
renamed into Int1Ty, Int8Ty like), but also non-byte-width BATs
like Int13. The LLVM core libraries will include BATs without
modifying the existing types except rename. To do this, a subclass of
Type, saying IntType will be added into LLVM to represent BATs. Bytecode
Reader and LLVM Assembly parser will also be updated to handle BATs.

Since this is a fairly extensive change to LLVM, this bug will be used
to trackthe progress of the work as it proceeds.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 11, 2006

Basic Work Plan:

Each item below will be considered an iteration. An iteration consists of making
a small(ish) change across llvm and llvm-gcc4 and testing it with llvm-test and
deja-gnu. Development of new test cases will also occur as the instructions are
modified.

The iterations planned are the following (in roughly this order):

  1. Write an IntType class that is a subclass of Type and add it to
    include/llvm/Type.h.
    This class will represent the BATs and must have at least the following
    methods:
    IntType* get(uint16_t numbits); // Returns an IntType for "numbits" bits
    unsigned getNumBits() const; // Returns the number of bits in the type
  2. Convert the static type instances in the Type class (BoolTy, ByteTy, IntTy,
    ShortTy,
    LongTy) into corresponding instances of IntType, like:
    static Type *BoolTy; -> static IntType *Int1Ty;
  3. Revise the Bytecode reader and writer to handle integer types of any bit
    width. The
    common cases (8,16,32,and 64 bits) will retain their (low-order) values in
    the type
    index. Non-byte-width BATs will be assigned unique type identifiers based
    on their
    bit width. This retains the compaction of the common case values.
  4. Update the bytecode documentation to reflect the new bytecode layout.
  5. Revise the LLVM Assembly parser to handle integer types of any bit width.
    The assembler will support a new notation for specifying integer types
    using a regex
    pattern of i[0­9]+ so that i4, i7, i32, i64 will all be recognized as
    integer type
    names.
  6. Revise the LLVM Assembly writer to use only the new notation for integers it
    prints.
    That is, an input using the short type will generate an output of i16.
    Similarly for
    the other keywords.
  7. Update the LLVM Language Reference document to describe the new syntax.
  8. Adjust lib/VMCore/ConstantFolding.{h,cpp} to fold integers of any bit width and
    perform correct integer arithmetic computations in constant expressions
    regardless
    of integer bit width.
  9. Update all the LLVM Feature and Regression tests, as needed, to fix broken tests
    (e.g. those that look for keywords like short or int in the output) and
    ensure that
    there are no regressions.
  10. Write new LLVM Feature tests for non-standard bit widths to ensure that
    assembly,
    bytecode and constant folding can work with these types. Use these tests
    to verify
    correctness of the BAT implementation.

@lattner
Copy link
Collaborator

lattner commented Dec 12, 2006

Related to #​8, you'll have to change ConstantInt to be able to hold arbitrary bitwidth constants. Before
you do that, you probably want to merge ConstantBool and ConstantIntegral into ConstantInt.

To specifically implement #​8 and ConstantInt, you probably want to make a very simple library for
representing and manipulating arbitrary bit-width values. This will be important when changing things
that do arithmetic on constants (like instcombine). Currently we usually do stuff like:

int64_t X = V->getZExtValue();
X = (X << 1) & 47;
Y = ConstantInt::get(SomeType, X);

The random arithmetic will have to be done with the library instead of with inline C types and math.

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 13, 2006

For this PR, i'd like to just handle bit width between 1 and 64bits. This means
arbitrary bit-width will be like i13, i47 which <= 64 bits.
For > 64 bits case, we can open a new PR for it. In that PR, we can add a new
Constant subclass, say LargeConstantInt as Reid suggested. i think we can still
keep ConstantBool, and let LargeConstantInt handle all the manipulation stuff of
the arbitrary bit-width values.

@lattner
Copy link
Collaborator

lattner commented Dec 13, 2006

I think that handling <= 64 bits before arbitrary width integers makes sense.

You'll still need a basic constant manipulation library to handle constant folding of arbitrary width
integers, even if the widths are limited to 64-bits. Also, please consider merging ConstantBool and
ConstantIntegral into ConstantInt first, as a separate step. It will make things simpler in the long-run.

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 22, 2006

As Chris suggested, ConstantIntegral and ConstantBool will be merged into
ConstantInt after #​2 has completed.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Dec 22, 2006

The inital design of BitSet is as following:

class BitSet {

private:

typedef uint64_t _WordTy;

std::vector<_WordTy> WordVec;

inline size_t BITSET_WORDS(uint16_t __n) {
return __n < 1 ? 0 : (__n + BITSET_BITS_PER_WORD - 1) /
BITSET_BITS_PER_WORD;
}

static size_t whichWord(size_t pos)
{ return pos / BITSET_BITS_PER_WORD; }

static size_t whichByte(size_t pos)
{ return (pos % BITSET_BITS_PER_WORD) / 8; }

static size_t whichBit(size_t pos)
{ return pos % BITSET_BITS_PER_WORD; }

static _WordTy maskBit(size_t pos)
{ return (static_cast<_WordTy>(1)) << whichBit(pos); }

_WordTy& getWord(size_t pos)
{ return WordVec[whichWord(pos)]; }

_WordTy getWord(size_t pos) const
{ return WordVec[whichWord(pos)]; }

public:

enum {
BITSET_BITS_PER_WORD = sizeof(uint64_t) * 8
};

/// Create a new BitSet of numbits bit-width, and initalized as all zeros.
///
BitSet(unsigned short numbits = 1) : WordVec(BITSET_WORDS(numbits), 0) {}

/// Create a new BitSet with Integer Value translated from a string.
/// Argument Radix helps the interpretation of the string. Radix can only be
/// 2, 8, 10, 16.
///
BitSet(std::string& Val, uint8_t radix = 10);

BitSet(const BitSet& BS) : WordVec(BS.WordVec) {}

/// Initial bits bitwise-copied from a single word.
BitSet(const _WordTy val) : WordVec(1, val) {}

BitSet& operator=(const BitSet& RHS)
{ WordVec.assign(RHS.WordVec.begin(), RHS.WordVec.end()); }

const BitSet operator++(int);
BitSet& operator++();

const BitSet operator--(int);
BitSet& operator--();

BitSet& operator&=(const BitSet& RHS);

BitSet& operator|=(const BitSet& RHS);

BitSet& operator^=(const BitSet& RHS);

BitSet& operator<<=(size_t shiftAmt);

BitSet& operator>>=(size_t shiftAmt);

BitSet& operator~() const;

BitSet& operator*=(const BitSet& RHS);
BitSet& operator/=(const BitSet& RHS);
BitSet& operator+=(const BitSet& RHS);
BitSet& operator-=(const BitSet& RHS);
BitSet& operator%=(const BitSet& RHS);

/// getMaxValue - This is only for bitwidth <= 64 and
/// returns a uint64_t.
/// @​brief Gets max value of the BitSet with bitwidth <= 64.
uint64_t getMaxValue();

/// getMinValue - This is only for bitwidth <= 64 and
/// returns a uint64_t.
/// @​brief Gets min value of the BitSet with bitwidth <= 64.
uint64_t getMinValue();

/// @​brief Set every bit to 1.
///
BitSet& setAll();

/// Set the given bit to 1 whose poition is given as "pos"
/// @​brief Set a given bit to 1.
///
BitSet& set(size_t pos);

/// @​brief Set every bit to 0.
///
BitSet& clearAll();

/// Set the given bit to 0 whose position is given as "pos"
/// @​brief Set a given bit to 0.
///
BitSet& clear(size_t pos);

/// @​brief Toggle every bit to its opposite value.
///
BitSet& flip();

/// Toggle a given bit to its opposite value whose position is given as "pos"
/// @​brief Toggles a given bit to its opposite value.
BitSet& flip(size_t pos);

/// @​brief Array-indexing support.
///
bool operator[](size_t pos);

/// Retuns a numerical interpretation of the BitSet if its bit-width <= 64
///
operator uint64_t() const;

/// Retuns a character interpretation of the BitSet.
///
std::string to_string(uint8_t radix = 10) const;

/// Returns the number of bits which are set.
size_t count() const;

/// Returns the total number of bits;
size_t getBitsNum() const;

/// Comparison of two BitSets.
bool operator==(const BitSet& RHS) const;
bool operator!=(const BitSet& RHS) const;

/// Return the index of the first "on" bit or BitsNum if all zero.
///
size_t findFirst() const;

BitSet operator<<(size_t shiftAmt) const;
BitSet operator>>(size_t shiftAmt) const;

};

BitSet operator&(const BitSet& BS1, const BitSet& BS2);
BitSet operator|(const BitSet& BS1, const BitSet& BS2);
BitSet operator^(const BitSet& BS1, const BitSet& BS2);

BitSet operator*(const BitSet& BS1, const BitSet& BS2);
BitSet operator/(const BitSet& BS1, const BitSet& BS2);
BitSet operator+(const BitSet& BS1, const BitSet& BS2);
BitSet operator-(const BitSet& BS1, const BitSet& BS2);
BitSet operator%(const BitSet& BS1, const BitSet& BS2);

@lattner
Copy link
Collaborator

lattner commented Dec 23, 2006

This looks right-minded, but I have concerns about efficiency. Instead, I'd like to see an approach that
looks like this:

  1. ConstantInt should be modified to use the "variable sized array" trick from C to hold the integer data
    immediately after the ConstantInt instance. This allows each ConstantInt to stay a single allocation,
    and also allows them to be fully general (holding arbitrary width integers).

  2. A new value class (AInt for arbitrary width integer?) like BitSet should be introduced to represent
    arbitrary width values when they need to be manipulated. This is the thing that should have the
    overloaded operators etc as your BitSet does. To implement this, I'd suggest having it contain three
    fields: a bit width, a pointer to uint64_t (the array of bits) and a single uint64_t value (the later two can
    be in a union if desired). ConstantInt::getValue() should be changed to return an instance of this class.

The basic design of #​2 permits AInt (or whatever) to be constructed on the stack with no additional
memory allocations as longs as the type is 64-bits or less. We can then slowly migrate all the
transformations over to use ConstantInt::getValue() instead of getZExtValue/getSExtValue.

-Chris

@lattner
Copy link
Collaborator

lattner commented Dec 23, 2006

As a meta issue, please feel free to split this up into sub-bugs for each separate issue. For example,
you could file one for:

  1. Merging ConstantInt ConstantIntegral ConstantBool.
  2. Introduce "AInt" (or whatever it should be called), and adding getValue to constantint
  3. Make ConstantInt be able to hold large values.
  4. Migrate code to use AInt instead of uint64_t for various bit manipulations.
  5. Update the .ll/.bc formats for variable width stuff,
    etc

These can all be marked as blocking this bug.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 11, 2007

Current committed progress:

ConstantIntegral, ConstantBool and ConstantInt have been merged to a single
ConstantInt class.

BoolTy has been renamed Int1Ty

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 12, 2007

The arbitrary bitwidth integers have been implemented.

This is what's left for this enhancement

  1. ConstantFolding (Sheng)
  2. Full test suite (Leo)
  3. Documentation updates (Reid)

@lattner
Copy link
Collaborator

lattner commented Jan 12, 2007

  1. Verifying that all the optimizations work with BATs, even if to just avoid breaking when confronted with
    one.

-Chris

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 12, 2007

That (#4) was intended to be convered under #​2. The test suite will consist of
both assembly and C/C++ programs.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 17, 2007

There are now two remaining items, both being worked on by Sheng:

  1. bug 1120 (which this one now depends on). This bug makes some cosmetic
    adjustments to use IntegerType as opposed to Type.
  2. Double checking that constant folding works with various bit widths by
    reviewing Leo's test case for coverage.

Once these two items are done, Sheng, this bug can be resolved.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2007

All tasks for this PR have been completed.

@lattner
Copy link
Collaborator

lattner commented Jan 20, 2007

Should this really be closed even when simple things like constant folding are broken?

@lattner
Copy link
Collaborator

lattner commented Jan 20, 2007

Actually, I retract that, constant folding may not be broken. Have you verified that it works though?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 20, 2007

Yes, there's a test suite for it. It passes. Is there something you know that I
don't ?

@lattner
Copy link
Collaborator

lattner commented Jan 20, 2007

Nope, I just wanted to make sure you tested it. You tried stuff like 'ashr i7 -1, 3' -> -1 etc?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 20, 2007

Not that particular test case, but many similar ones. I just added that one to
test/Integer/a7.ll and it passed. If you're curious about the tests we've
developed, please see test/Integer .. its reasonably comprehensive. Leo developed
this test suite and Sheng reviewed it for completeness and accuracy. He found a
couple instructions that weren't tested well and updated the tests. He also found
a bug in the handling of i1 -1 and fixed it. At this point we don't know of any
constant folding bugs in the arbitrary precision integer arithmetic.

@lattner
Copy link
Collaborator

lattner commented Jan 20, 2007

ok, sounds great!

@nlewycky
Copy link
Contributor

mentioned in issue llvm/llvm-bugzilla-archive#1120

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
@Endilll Endilll added enhancement Improving things as opposed to bug fixing, e.g. new or missing feature and removed new-feature labels Aug 15, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla enhancement Improving things as opposed to bug fixing, e.g. new or missing feature llvm:core
Projects
None yet
Development

No branches or pull requests

4 participants