1350 – Vreg subregs support

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1350 - Vreg subregs support

Summary: Vreg subregs support

Status:	RESOLVED FIXED

Alias:	None

Product:	new-bugs
Classification:	Unclassified
Component:	new bugs (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Duplicates (1):	769 (view as bug list)
Depends on:
Blocks:	642
	Show dependency tree

Reported:	2007-04-24 00:59 PDT by Christopher Lamb
Modified:	2018-11-07 00:22 PST (History)
CC List:	4 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christopher Lamb 2007-04-24 00:59:11 PDT

From the dev mailing list:

The issue I'm having is that there is no extract/insert
instruction in the ISA, it's simply based on using subregister
operands in subsequent/preliminary instructions. At the pointer of
custom lowering register allocation has not yet been done, so I
don't have a way to communicate the dependency.

Ok.

If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
DAG that looks like

load v4si <- extract_element 2 <- add -> load i32

I'd like to be able to generate

load v4r0
load r10
add r11, r10, r2 <== subregister 2 of v4r0

Nice ISA.  That is entirely too logical. :)

We have a similar problem on X86.  In particular, an integer truncate or 
an extend (e.g. i16 -> i8) wants to make use of subregisters.  Consider 
code like this:

   t1 = load i16
   t2 = truncate i16 t1 to i8
   t3 = add i8 t2, 42

What we would really want to generate is something like this at the 
machine instr level:

   r1024 = X86_LOADi16 ...     ;; r1024 is i16
   r1026 = ADDi8 r1024[subreg #0], 42

More specifically, we want to be able to define, for each register class, 
a set of subregister classes.  In the X86 world, the 64-bit register 
classes could have subregclass0 = i8 parts, subregclass1 = i16 parts, 
subregclass2 = i32 parts.  Each <physreg, subreg#> pair should map to 
another physreg (e.g. <RAX,1> -> AX).

The idea of this is that the register allocator allocates registers like 
normal, but when it does the rewriting pass, when it replaces vregs with 
pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0] 
with CL instead of CX.  This would give us this code:

   CX = X86_LOADi16 ...
   DL = ADDi8 CL, 42

In your case, you'd define your vector register class with 4 subregs, one 
for each piece.


Unfortunately, none of this exists yet :(.  To handle truncates and 
extends on X86, we currently emulate this by generating machineinstrs 
like:

   r1024 = X86_LOADi16 ...
   r1025 = TRUNCATE_i16_to_i8 r1024
   r1026 = ADDi8 r1025, 42

In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if 
the register allocator happens to allocate 1024 and 1025 to the same 
register.  If not, it uses an asmprinter hack to print this as a copy 
instruction.  This is horrible, and doesn't produce good code.  OTOH, 
before Evan improved this, we always copied into AX and out of AL for each 
i16->i8 truncate, which was much worse :)

I see that Evan has added getSubRegisters()/getSuperRegisters() to
MRegisterInfo. This is what's needed in order to implement the
register allocation constraint, but there's no way yet to pass the
constraint through the operands from the DAG. There would need to be
some way to specify that the SDOperand is referencing a subvalue of
the produced value (perhaps a subclass of SDOperand?). This would
allow the register allocator to try to use the sub/super register
sets to perform the instert/extract.

Right.  Evan is currently focusing on getting the late stages of the code 
generator (e.g. livevars) to be able to understand arbitrary machine 
instrs in the face of physreg subregs.  This lays the groundwork for 
handling vreg subregs, but won't solve it directly.

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it over 
the next year or so, but we don't have any concrete plans in the immediate 
future.  If you are interested, this should be a pretty reasonable project 
that will give you a chance to become more familiar with various pieces of 
the early code generator. :)

Comment 1 Christopher Lamb 2007-04-24 23:56:43 PDT

Chris Lattner sez:

I think these are the major pieces needed.  These are 
all relatively small and independent pieces, so we can tackle these one at 
a time.

1. As you say, we need MRegisterInfo::getSubRegisterForIndex that, given a
    preg/subreg# pair, returns a preg.
2. We need tblgen syntax/registerinfoemitter support to generate tables
    for #1.
3. Register MachineOperands need a subregister number.  We should probably
    use 0 to denote "no subreg".
4. The DAG scheduler pass (which creates machine instrs from dag nodes)
    currently thinks of register operands as simple unsigned's for vreg
    #'s.  This needs to be extended to be vreg+subreg pairs (see
    'CreateVirtualRegisters').
5. We need to decide how to represent subregs in the DAG.  Your
    SDSubOperand idea is fine, but I don't think it needs to be an actual
    new subclass of SDOperand.  Instead, it could just be a binary SDNode,
    where the LHS is the register input and the RHS is a TargetConstant
    specifying the subreg#.
6. [optional] We would like syntax to create these things for writting
    patterns in the .td file instead of requiring custom matching code. 
7. The register allocator needs to rewrite subreg references using
    #1.  This should be very simple.

Comment 2 Nate Begeman 2007-05-01 00:58:41 PDT

1, 2, and 3 are done.  4-7 left to go.

Comment 3 Chris Lattner 2007-05-01 01:01:04 PDT

Note: when this is completed, we should remove Target/Sparc/FPMover.cpp and implement it with 
subregs.

Comment 4 Christopher Lamb 2007-06-13 12:03:36 PDT

Chris's proposal for 4,5,7 will work for multiple uses but only single defs of the super register. For 
instance take a register class like those on x86 for 8-bit and 16-bit values. The current proposal could 
support a partial def of a 16-bit vreg by defining one 8-bit subreg, but it doesn't address it being 
defined by both the lower and upper 8-bit vregs.

In this way subregs are very much like vectors, we need a way to represent the subreg equivalent of 
insert, extract, scalar_to_vector (subreg_to_reg), and build_vector (build_from_subregs) in order to 
cover all the use/def cases. The tricky part is that LiveIntervals needs to be taught about subregs too, 
as it has to be able to discern partial defs and partial uses of a register.

Comments?

Comment 5 Chris Lattner 2007-06-13 12:41:15 PDT

Yep, that sounds right.  Excellent point.

Comment 6 Christopher Lamb 2007-06-13 17:25:15 PDT

First part here:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050456.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050457.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050458.html

Comment 7 Christopher Lamb 2007-06-14 14:24:33 PDT

After diving into LiveIntervals and ScheduleDAG for a few days, I've decided that trying to teach them
about subregs by tracking subreg indices wherever vregs are tracked is probably a bad idea. The data
structures and code make really fundamental assumptions that vregs are unsigned numbers. Making
them a pair breaks all sorts of things, including assumptions about SSA of vregs, which would no
longer hold.

So ima gonna try a different approach. Physical registers have information on super registers, sub
registers (by index), and aliased registers. These relationships allow LiveIntervals to update information
for all other values related to a specific physical register. It seems that to support vreg subregs these
relationships need to exist between super, sub (indexed), and aliasing virtual registers as well, so that
LI can update information on related values just like it can for physical registers. The difference with
vregs is that the relationship between them is determined at compile time, rather than by the layout of
the target's register file. So the aliasing information needs to be generated at compile time from subreg
nodes in the DAG.

The plan would be thus:
ScheduleDAG - Create new virtual registers for subreg values. Generate subreg/superreg maps from
subreg nodes in the DAG that denote the relationship between virtual registers.
LiveIntervals - Maintain subreg/superreg relationships as joining is done.
RegAlloc - Use subreg/superreg mappings during allocation to choose correct registers.

I have 2 questions:
1. Where is the best place to maintain these maps? SSARegMap? All of the other maps I've seen a local
to a specific part of CodeGen and aren't shared across the three stages above.
2. Is there a locus of code that is responsible for keeping vregs up to date as intervals are joined, etc?
My understanding of LI is fresh and any pointers would be appreciated.

Comment 8 Evan Cheng 2007-06-17 00:46:01 PDT

I am confused. What problem are you trying to solve by your proposal? It seems
to be creating a whole lot of complexity when the passes are already overly
complicated. It's making me uncomfortable.

Isn't ScheduleDAG already creating new virtual registers?

The problematic example you sent me via email:

        %reg1024 = SLLI_2 %V2R0, 0
        %reg1025 = SLLI_2 %V2R1, 0
        %reg1026 = ADD_1 %reg1025:2, %reg1024:1
        %R0 = SLLI_1 %reg1026, 0
        RETL %R63<imp-use>

I am surely over-simplifying things. But it seems to me, if we had already
rewritten it to eliminate the SLLI_2 before live variables as this (it's always
possible to eliminate subreg extraction instruction and propagate the
extraction, no?):

        %reg1026 = ADD_1 %r3, %r0
        %R0 = SLLI_1 %reg1026, 0
        RETL %R63<imp-use>

It would work just right.

So my whole point is, eliminate these instructions early rather than relying on
copy propagation to do it because teaching that old dog need tricks isn't worth
the effort.

Am I totally missing the point? :-)

Comment 9 Christopher Lamb 2007-06-17 20:41:43 PDT

I've come to the same conclusion about the complexity of live variables and live intervals. There be
dragons... And I agree, that subreg extraction can be propagated quite easily. The issue I have had is
handling subreg inserts and builds, which inherently violate SSA form (multiple defs of a reg).

In my proposal I was assuming that this was a a hard requirement to preserve SSA form for RegAlloc,
and so I ended up trying to add new vregs for each subvalue with appropriate implicit defs and uses
and essentially cross-wiring the value and subvalue vregs. The problem with this is that it is really
complex, and requires teaching all of the complex code in LV, LI, and RegAlloc about aliasing vregs.
Yuck!

Eventually I realized that that the implicit use/def combo needed for insert and build on subregs is very
much like the register constraint for TwoAddress instructions, and went to see how that was
implemented and discovered that because of TwoAddress the register allocator _could_ handle non-SSA
form. So my current approach ditches all of the special data structures and modifications to LV,LI, and
RegAlloc I posted previously.

My current tactic is to try to always allocate vregs as superregs if a subreg is involved, leading to stuff
like this for inserts:
%reg1024 = SLLI_4 %V4R0, 0
%reg1024.4 = MVUI 0
%V4R0 = SLLI_4 %reg1024, 0
RETL %R63<imp-use>

Live variables doesn't seem to like the multiple defs of a register, so I have a MF pass that adds that
info to LV (much like TwoAddress) after LV has run and before LI.

Then I have another MF pass that needs to be registered preEmit that assumes all vregs have been
allocated and rewrites the MachineOperands with subregs as the appropriate physical subregister.

So far this has worked out quite well with only tiny tinkering to the LV,LI, RegAlloc trifecta. I've yet to
run this strategy through all its paces, and I'm still concerned about the spiller rewriting stuff correctly,
but I'm essentially banking on the same infrastructure that TwoAddress depends on, so I'm hopeful.

Comment 10 Evan Cheng 2007-06-19 13:48:03 PDT

How would build_from_subregs be used? Please give me an example.

I think you are over-thinking this. The scheme you described is unnecessarily
complex. One thing to keep in mind is you can have a def to sub-reg followed by
a use of its super-reg (and vice versa). That's perfectly legal, lv knows how to
deal with that.

extract_subreg and insert_subreg whose source operands are physical registers
should not exist in machineinstruction passes. ScheduleDAG should be responsible
for propagating the physical registers (similar to how we deal with CopyToReg
and CopyFromReg). So...

%reg1024 = op
insert_subreg %V2R0, %reg1024, 0
%reg1026 = opv %V2R0, %reg1025
copytoreg %V2R1, %reg1026
%reg1027 = extract_subreg %V2R1, 1
%reg1029 = op %reg1027, %reg1028

=>

%r0 = op
%v2R1 = opv %v2R0, %reg1025
%reg1029 = op %r3, %reg1028

lv can handle this just fine.


insert_subreg and extract_subreg that operate on virtual registers can exist in
machineinstr passes. For now, we should not worry about performance. So the
source and destination registers will have different (and potentially
overlapping) live intervals. e.g.

%reg1025 = subreg_extract %reg1024, 0

Suppose %reg1024 is allocated %v2r0 and %reg1025 %r3. Then regallocator rewrites
this into a move instruction 

%r3 = mov %r0

We can worry about coalescing as the next step.

>My current tactic is to try to always allocate vregs as superregs if a subreg
>is involved, leading to stuff 
>like this for inserts:
>        %reg1024 = SLLI_4 %V4R0, 0
>        %reg1024.4 = MVUI 0
>        %V4R0 = SLLI_4 %reg1024, 0
>        RETL %R63<imp-use>

Is MVUI subreg_insert? Is it inserting a 0 into the 4th part of %V4R0? If so,
then after scheduling this should just be:

%r3 = mov 0

Comment 11 Christopher Lamb 2007-07-26 03:26:58 PDT

The first stage of the MachineInstr based approach discussed at the meeting at Apple is complete. There are now DAG nodes and target independent MachineInstr's to represent subreg insert and extract. The register .td file and the DAG scheduler have the needed hooks to allocate correct vregs for the subreg instructions and there is a pass to lower any un-coalesced subreg instructions to register copies.

The current framework still requires custom isel code to use subregs (no tablegen support yet) and there are currently no tests, as no public targets currently use subregs.

See (in chronological order):
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052244.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052246.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052247.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052248.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052249.html


Work remaining:
Teach the coalescing how to remove subreg instructions.
Add tablegen syntax support for selecting to subreg nodes in .td files.

Comment 12 Chris Lattner 2007-09-26 00:51:31 PDT

This bug (at least the subject line) is already implemented.  Can you please close this and file new bugs for the specific work remaining?  Thanks,

-Chris

Comment 13 Chris Lattner 2007-09-26 01:14:02 PDT

*** Bug 769 has been marked as a duplicate of this bug. ***