First Last Prev Next    No search results available
Details
: LLVM cannot handle structures with more than 256 elements
Bug#: 82
: libraries
: Core LLVM classes
Status: RESOLVED
Resolution: FIXED
: All
: All
: 1.0
: P2
: major
: 1.3

:
: compile-fail
:
:
  Show dependency tree - Show dependency graph
People
Reporter: Chris Lattner <clattner@apple.com>
Assigned To: Chris Lattner <clattner@apple.com>
:

Attachments
Partial Patch - Just Gets Started On This Bug (17.96 KB, patch)
2003-11-18 21:45, Reid Spencer
Details


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2003-11-02 17:33
Currently, the getelementptr instruction uses constant 'ubyte' indexes to index
into structure fields.  This means that we can't address more than 2^8 fields.

The fix for this is to change the ubyte index to be a ulong.  Doing this however
will require auditing all of LLVM, looking for passes walking getelementptrs. 
Also, we'll have to bump the bytecode version to have the readers autotranslate
the indexes.

This is certainly not an impossible thing to fix: in fact it's been on my TODO
list for a long time now - it's just that it will take time.  The urgency of
fixing this is now greater than before, however, because it turns out that X
uses large structures, making compilation of X programs unlikely to work out if
they use the "wrong" structures.  This should be fixed for the 1.1 release.

-Chris
------- Comment #1 From Reid Spencer 2003-11-17 23:11:51 -------
Chris,

Do you mind if I take a crack at this one? Since it requires an audit of LLVM,
its another one that's good for me to get an understanding of the code (well, at
least I'll learn about how structures are handled).
------- Comment #2 From Chris Lattner 2003-11-17 23:19:03 -------
I was going to attack this tommorow, but if you want to look into it, feel free.

The major problem is that currently, we have the following constraints on
getelementptr instructions:

1. "ubyte" operands are necessarily constants, and are necessarily structure
indexes.
2. "long" operands are necessarily the initial pointer index, or subsequent
array indexes.

I want to completely abandon this notion, allowing the array to be an arbitrary
integer type (this is important for 32 bit targets, because right now, every
array index must cast the index to "long" for the getelementptr), and the
structure index to be an arbitrary unsigned integer type.

This requires an audit and update of all code that loops through the
getelementptr indices.  If you'd like to start fixing them to DTRT, please, be
my guest!!  This can be done _before_ the actual change is made.

-Chris
------- Comment #3 From Reid Spencer 2003-11-18 00:30:03 -------
DTRT? <acronym not comprehended here>

I'd be happy to do this. I've already been looking at it for a few hours. I
think I see where it needs to go. Fortunately, most of the use of the index
operands of the GEP instruction are hidden inside its own methods. I've looked
at a few analyses and they don't seem to fetch Operand(1) .. but I have many
more to look at.

I was going to suggest we go from ubyte to ushort rather than ubyte to ulong.
But, if I understand you correctly, you want to allow both array and structure
indices to accept any integer expression.  I would have thought it should be any
unsigned expression, but I can see your reasoning.  Opening up the indices to
negative values may mean some additional checking needs to be put in. Could you
confirm you meant "any integer" rather than "any unsigned"?

In any event, I'll start on this tonight.
------- Comment #4 From Chris Lattner 2003-11-18 00:35:30 -------
Sorry, DTRT == Do The Right Thing.  I probably just made that up.  Sorry...

> Fortunately, most of the use of the index
> operands of the GEP instruction are hidden inside its own methods. I've looked
> at a few analyses and they don't seem to fetch Operand(1) .. but I have many
> more to look at.

The hard part will be updating code like this (cut from BasicAliasAnalysis.cpp):

  for (unsigned i = 1; i != FirstConstantOper; ++i)
    if (GEP1->getOperand(i)->getType() == Type::UByteTy)
      Indices1.push_back(GEP1->getOperand(i));
    else
      Indices1.push_back(Constant::getNullValue(Type::LongTy));

... Which assumes that UByteTy means struct index.

> I was going to suggest we go from ubyte to ushort rather than ubyte to ulong.

There is no reason to go to a shorter format (doesn't even save bytecode space).
 Might as well be fully general.

> But, if I understand you correctly, you want to allow both array and
> structure indices to accept any integer expression.  I would have thought
> it should be any unsigned expression, but I can see your reasoning. 

Nope, your thoughts are right.  I think structure indexes should be limited to
be any unsigned type.  Array indexes can be any integer type though.

-Chris
------- Comment #5 From Chris Lattner 2003-11-18 18:16:00 -------
Reid,

Have you started working on this?  I think that perhaps a reasonable way to do
this is to write a new "gep_type_iterator" class, which will walk through the
indexed types, making the clients easier to convert.  What do you think?  If you
haven't started converting over users of gep types, I can do this.

-Chris
------- Comment #6 From Reid Spencer 2003-11-18 18:21:55 -------
I have nearly completed this. However, my patch will be a partial patch that
you'll need to review carefully. There are several places where I was unsure of
exactly the correct replacement to use.  So, the patch will be a "starting
place" for you.

An iterator is a good idea. For ease of convenience in replacement, I created
two new functions on GetElementPtrInst:

indexIsForArray(unsigned);
indexIsForStructure(unsigned);

This is used to replace things like:

GEP->getOperand(i)->getType() == ULongTy

The new methods simply encode the above, but could be changed to a more
meaningful expression later on. My original plan was to augment Value with some
notion of whether the value was used as a structure or array or pointer index.
But, that turned out to be problematic than it was worth.

I'll have the patch done later tonight (I'm multi-tasking).

Reid.
------- Comment #7 From Chris Lattner 2003-11-18 18:23:31 -------
Ok, sounds great.  No hurry, there are plenty of other bugs that need fixing. 
:)

-Chris
------- Comment #8 From Reid Spencer 2003-11-18 21:45:47 -------
Created an attachment (id=28) [details]
Partial Patch - Just Gets Started On This Bug

Rather than try to figure out some of the more complex things, I've opted to
just post this patch in the hope that it is useful. This task would require me
to learn far more than the short time to Release 1.1 would allow.  So, the
patch may or may not be useful. It is the result of looking for the following
patterns in all the source files:

GetElementPtr, 
GEP
UByteTy
ULongTy
getOperand.*getType

Hopefully the patch at least locates some of the places that need to be fixed.
------- Comment #9 From Reid Spencer 2003-11-18 22:07:23 -------
P.S. The patch was derived from the head that was updated just before the patch
was created.

A subsequent compile made it through most of LLVM but failed here:

gmake[3]: Entering directory `/proj/work/llvmobj/runtime/GCCLibraries/crtend'
Compiling listend.ll to bytecode
Linking crtend bytecode library
While deleting: uint%
Use still stuck around after Def is destroyed:  add uint <badref>, 36          
; <uint>:<badref> [#uses=1]
 
gccld: /proj/work/llvm/lib/VMCore/Value.cpp:52: virtual llvm::Value::~Value():
Assertion `Uses.begin() == Uses.end() &&"Uses remain when a value is
destroyed!"' failed.
gcc: Internal error: Aborted (program gccld)
Please submit a full bug report.
See <URL:http://llvm.cs.uiuc.edu> for instructions.


I didn't see an easy way to fix this and it didn't seem related to the changes
I'd made, but it could be.
------- Comment #10 From Chris Lattner 2003-11-19 16:29:02 -------
Ok, the problem with your patch is here:

+ bool GetElementPtrInst::indexIsForArray( unsigned i ) const {
+   assert( i > 0 && "Operand(0) is not an index.");
+   return this->getOperand(i)->getType()->isInteger();
+ }
+ 
+ bool GetElementPtrInst::indexIsForStructure( unsigned i ) const {
+   assert( i > 0 && "Operand(0) is not an index.");
+   return this->getOperand(i)->getType()->isUnsigned();
+ }

The problem is that indexIsForArray will currently return true for structure
indexes.  The problem with figuring out what "type" of index we are dealing with
is that it depends on all of the GEP indexes before it.  I'll probably cobble
together the gep_type_iterator class which will allow us to do this efficiently.

-Chris
------- Comment #11 From Reid Spencer 2003-11-19 16:36:31 -------
Yes, the iterator sounds more useful that what I've come up with.

Like I said, I had my doubts about the utility of the patch but at least it will
provide you with a hint of the places in the source you need to use the iterator.

Sorry I couldn't be more helpful on this.
------- Comment #12 From Chris Lattner 2003-11-19 16:39:02 -------
Yes, it does look like you found a majority of the places that need to be fixed.
 Thanks a lot!  This will definitely help.

And don't worry about not getting it exactly right.  It takes time to get
familiar with a big system and this is a _nontrivial_ change...

-Chris
------- Comment #13 From Chris Lattner 2003-11-20 13:04:41 -------
Here is another testcase for this bug:
struct pine {
    unsigned read_predicted:1;
    char cur_folder[4000];
    int dlevel;
};

extern struct pine *ps_global;
void dump_some_debugging(char *message) {
   ps_global->dlevel = 1;
}
------- Comment #14 From Reid Spencer 2003-11-22 15:32:34 -------
After writing Stacker, I would like to add my voice to getting this bug fixed.
It has tremendously limiting consequences that can even affect performance. For
example, because array indices in LLVM are required to be Type::LongTy, any
global index to an array must also be LongTy, or casted to one. The use of
LongTy can mean (on some platforms) that we'll have threading issues down the
road because you can't gaurantee atomic assignment as with a smaller sized
integer. Furthermore, loading and manipulating 64-bit values on many platforms
implies calls to library functions to handle the 64-bit entities. This can lead
to a significant degradtion in performance if its done a lot (as is the case
with Stacker).  Finally, not being able to use Byte, Short, or Int directly but
having to cast to Long increases the compiler writers task.  These aspects are
in *addition* to the limited structure sizes that LLVM now supports. 

If there's any way I can help again to get this task done, please let me know.
------- Comment #15 From Chris Lattner 2003-11-22 19:59:31 -------
I totally understand, and agree.  I will try to get this fixed over the next
week.  LLVM development is probably going to slow down over the next couple of
weeks (thanksgiving and a conference), but I will try to get this one done.

Note that this really shouldn't cause any performance problems at all.  The X86
backend, for example, already folds the 'cast int -> long, getelementptr'
sequences into a direct use of the source integer (and the sparc backend is
natively 64-bit).

Despite the lack of performance implications, it SHOULD be fixed, and will be.  :)

-Chris
------- Comment #16 From Chris Lattner 2003-11-25 15:23:36 -------
Progress made:
I checked in the gep_type_iterator class that I mentioned before, and converted
the places Reid found to use it instead of looking for Long/UByte indexes
directly.  Next up is to add support to the bytecode reader/writer & asmparser
support for this.  Finally, I will convert LLVM code that creates structure
indexes to use UInts instead of ubytes.

-Chris
------- Comment #17 From Chris Lattner 2003-12-06 19:55:45 -------
This change is going to be potentially destabilizing, so it makes sense to do it
immediately _after_ a release, not right before it.  Unfortunately this means
this won't get addressed until 1.2 though.  :(

-Chris
------- Comment #18 From Chris Lattner 2004-01-08 17:09:37 -------
Just a note: I'm currently bogged down working on the final submission of a
paper, but I hope to get to this bug soon after it's off my plate.  I hope this
to be in about 1-2 weeks.

-Chris
------- Comment #19 From Chris Lattner 2004-01-08 23:44:09 -------
*** Note that when this is fixed all places that contain the string "PR82" in
comments should be updated.
------- Comment #20 From Reid Spencer 2004-01-09 00:18:47 -------
Chris: I have some "compile time" waits that I could use to help on this. It
seems like you're marking things PR86. If you can quickly describe what you're
looking for, I could make a pass through all the code in my "spare" time.  Or,
do something else that furthers this along since you >should< be writing a paper :)

Reid.
------- Comment #21 From Chris Lattner 2004-01-09 00:21:20 -------
Yeah yeah yeah, I'm procrastinating right now.  When my brain is mush, I do
mechanical things. :)

If you want to go through and find things that are constructing getelementptr
indices with explicit Long and UByte values, they should be marked.

If you see something that is traversing a GEP index list, using the types of the
indices to determine whether it's indexing into a structure or an array, those
should be changed to use the geptypeiterator thing, but I think I've found them all.

Thanks Reid!

-Chris
------- Comment #22 From Chris Lattner 2004-03-08 18:29:55 -------
Retargetting to 1.3
------- Comment #23 From Chris Lattner 2004-04-01 11:19:26 -------
Just a note that this is related to Bug 309.

-Chris
------- Comment #24 From Chris Lattner 2004-04-04 20:38:18 -------
This bug is now fixed.  The 'getelementptr' instruction now requires uint
constants to specify structure fields, not ubyte's.

-Chris

First Last Prev Next    No search results available