Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vreg subregs support #1722

Closed
llvmbot opened this issue Apr 24, 2007 · 15 comments
Closed

Vreg subregs support #1722

llvmbot opened this issue Apr 24, 2007 · 15 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 24, 2007

Bugzilla Link 1350
Resolution FIXED
Resolved on Nov 07, 2018 00:22
Version unspecified
OS All
Blocks #1014
Reporter LLVM Bugzilla Contributor
CC @lattner

Extended Description

From the dev mailing list:

The issue I'm having is that there is no extract/insert
instruction in the ISA, it's simply based on using subregister
operands in subsequent/preliminary instructions. At the pointer of
custom lowering register allocation has not yet been done, so I
don't have a way to communicate the dependency.

Ok.

If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
DAG that looks like

load v4si <- extract_element 2 <- add -> load i32

I'd like to be able to generate

load v4r0
load r10
add r11, r10, r2 <== subregister 2 of v4r0

Nice ISA. That is entirely too logical. :)

We have a similar problem on X86. In particular, an integer truncate or
an extend (e.g. i16 -> i8) wants to make use of subregisters. Consider
code like this:

t1 = load i16
t2 = truncate i16 t1 to i8
t3 = add i8 t2, 42

What we would really want to generate is something like this at the
machine instr level:

r1024 = X86_LOADi16 ... ;; r1024 is i16
r1026 = ADDi8 r1024[subreg #​0], 42

More specifically, we want to be able to define, for each register class,
a set of subregister classes. In the X86 world, the 64-bit register
classes could have subregclass0 = i8 parts, subregclass1 = i16 parts,
subregclass2 = i32 parts. Each <physreg, subreg#> pair should map to
another physreg (e.g. <RAX,1> -> AX).

The idea of this is that the register allocator allocates registers like
normal, but when it does the rewriting pass, when it replaces vregs with
pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0]
with CL instead of CX. This would give us this code:

CX = X86_LOADi16 ...
DL = ADDi8 CL, 42

In your case, you'd define your vector register class with 4 subregs, one
for each piece.

Unfortunately, none of this exists yet :(. To handle truncates and
extends on X86, we currently emulate this by generating machineinstrs
like:

r1024 = X86_LOADi16 ...
r1025 = TRUNCATE_i16_to_i8 r1024
r1026 = ADDi8 r1025, 42

In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if
the register allocator happens to allocate 1024 and 1025 to the same
register. If not, it uses an asmprinter hack to print this as a copy
instruction. This is horrible, and doesn't produce good code. OTOH,
before Evan improved this, we always copied into AX and out of AL for each
i16->i8 truncate, which was much worse :)

I see that Evan has added getSubRegisters()/getSuperRegisters() to
MRegisterInfo. This is what's needed in order to implement the
register allocation constraint, but there's no way yet to pass the
constraint through the operands from the DAG. There would need to be
some way to specify that the SDOperand is referencing a subvalue of
the produced value (perhaps a subclass of SDOperand?). This would
allow the register allocator to try to use the sub/super register
sets to perform the instert/extract.

Right. Evan is currently focusing on getting the late stages of the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it over
the next year or so, but we don't have any concrete plans in the immediate
future. If you are interested, this should be a pretty reasonable project
that will give you a chance to become more familiar with various pieces of
the early code generator. :)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 25, 2007

Chris Lattner sez:

I think these are the major pieces needed. These are
all relatively small and independent pieces, so we can tackle these one at
a time.

  1. As you say, we need MRegisterInfo::getSubRegisterForIndex that, given a
    preg/subreg# pair, returns a preg.
  2. We need tblgen syntax/registerinfoemitter support to generate tables
    for #​1.
  3. Register MachineOperands need a subregister number. We should probably
    use 0 to denote "no subreg".
  4. The DAG scheduler pass (which creates machine instrs from dag nodes)
    currently thinks of register operands as simple unsigned's for vreg
    #'s. This needs to be extended to be vreg+subreg pairs (see
    'CreateVirtualRegisters').
  5. We need to decide how to represent subregs in the DAG. Your
    SDSubOperand idea is fine, but I don't think it needs to be an actual
    new subclass of SDOperand. Instead, it could just be a binary SDNode,
    where the LHS is the register input and the RHS is a TargetConstant
    specifying the subreg#.
  6. [optional] We would like syntax to create these things for writting
    patterns in the .td file instead of requiring custom matching code.
  7. The register allocator needs to rewrite subreg references using
    #​1. This should be very simple.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 1, 2007

1, 2, and 3 are done. 4-7 left to go.

@lattner
Copy link
Collaborator

lattner commented May 1, 2007

Note: when this is completed, we should remove Target/Sparc/FPMover.cpp and implement it with
subregs.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 13, 2007

Chris's proposal for 4,5,7 will work for multiple uses but only single defs of the super register. For
instance take a register class like those on x86 for 8-bit and 16-bit values. The current proposal could
support a partial def of a 16-bit vreg by defining one 8-bit subreg, but it doesn't address it being
defined by both the lower and upper 8-bit vregs.

In this way subregs are very much like vectors, we need a way to represent the subreg equivalent of
insert, extract, scalar_to_vector (subreg_to_reg), and build_vector (build_from_subregs) in order to
cover all the use/def cases. The tricky part is that LiveIntervals needs to be taught about subregs too,
as it has to be able to discern partial defs and partial uses of a register.

Comments?

@lattner
Copy link
Collaborator

lattner commented Jun 13, 2007

Yep, that sounds right. Excellent point.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 14, 2007

After diving into LiveIntervals and ScheduleDAG for a few days, I've decided that trying to teach them
about subregs by tracking subreg indices wherever vregs are tracked is probably a bad idea. The data
structures and code make really fundamental assumptions that vregs are unsigned numbers. Making
them a pair breaks all sorts of things, including assumptions about SSA of vregs, which would no
longer hold.

So ima gonna try a different approach. Physical registers have information on super registers, sub
registers (by index), and aliased registers. These relationships allow LiveIntervals to update information
for all other values related to a specific physical register. It seems that to support vreg subregs these
relationships need to exist between super, sub (indexed), and aliasing virtual registers as well, so that
LI can update information on related values just like it can for physical registers. The difference with
vregs is that the relationship between them is determined at compile time, rather than by the layout of
the target's register file. So the aliasing information needs to be generated at compile time from subreg
nodes in the DAG.

The plan would be thus:
ScheduleDAG - Create new virtual registers for subreg values. Generate subreg/superreg maps from
subreg nodes in the DAG that denote the relationship between virtual registers.
LiveIntervals - Maintain subreg/superreg relationships as joining is done.
RegAlloc - Use subreg/superreg mappings during allocation to choose correct registers.

I have 2 questions:

  1. Where is the best place to maintain these maps? SSARegMap? All of the other maps I've seen a local
    to a specific part of CodeGen and aren't shared across the three stages above.
  2. Is there a locus of code that is responsible for keeping vregs up to date as intervals are joined, etc?
    My understanding of LI is fresh and any pointers would be appreciated.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 17, 2007

I am confused. What problem are you trying to solve by your proposal? It seems
to be creating a whole lot of complexity when the passes are already overly
complicated. It's making me uncomfortable.

Isn't ScheduleDAG already creating new virtual registers?

The problematic example you sent me via email:

    %reg1024 = SLLI_2 %V2R0, 0
    %reg1025 = SLLI_2 %V2R1, 0
    %reg1026 = ADD_1 %reg1025:2, %reg1024:1
    %R0 = SLLI_1 %reg1026, 0
    RETL %R63<imp-use>

I am surely over-simplifying things. But it seems to me, if we had already
rewritten it to eliminate the SLLI_2 before live variables as this (it's always
possible to eliminate subreg extraction instruction and propagate the
extraction, no?):

    %reg1026 = ADD_1 %r3, %r0
    %R0 = SLLI_1 %reg1026, 0
    RETL %R63<imp-use>

It would work just right.

So my whole point is, eliminate these instructions early rather than relying on
copy propagation to do it because teaching that old dog need tricks isn't worth
the effort.

Am I totally missing the point? :-)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 18, 2007

I've come to the same conclusion about the complexity of live variables and live intervals. There be
dragons... And I agree, that subreg extraction can be propagated quite easily. The issue I have had is
handling subreg inserts and builds, which inherently violate SSA form (multiple defs of a reg).

In my proposal I was assuming that this was a a hard requirement to preserve SSA form for RegAlloc,
and so I ended up trying to add new vregs for each subvalue with appropriate implicit defs and uses
and essentially cross-wiring the value and subvalue vregs. The problem with this is that it is really
complex, and requires teaching all of the complex code in LV, LI, and RegAlloc about aliasing vregs.
Yuck!

Eventually I realized that that the implicit use/def combo needed for insert and build on subregs is very
much like the register constraint for TwoAddress instructions, and went to see how that was
implemented and discovered that because of TwoAddress the register allocator could handle non-SSA
form. So my current approach ditches all of the special data structures and modifications to LV,LI, and
RegAlloc I posted previously.

My current tactic is to try to always allocate vregs as superregs if a subreg is involved, leading to stuff
like this for inserts:
%reg1024 = SLLI_4 %V4R0, 0
%reg1024.4 = MVUI 0
%V4R0 = SLLI_4 %reg1024, 0
RETL %R63

Live variables doesn't seem to like the multiple defs of a register, so I have a MF pass that adds that
info to LV (much like TwoAddress) after LV has run and before LI.

Then I have another MF pass that needs to be registered preEmit that assumes all vregs have been
allocated and rewrites the MachineOperands with subregs as the appropriate physical subregister.

So far this has worked out quite well with only tiny tinkering to the LV,LI, RegAlloc trifecta. I've yet to
run this strategy through all its paces, and I'm still concerned about the spiller rewriting stuff correctly,
but I'm essentially banking on the same infrastructure that TwoAddress depends on, so I'm hopeful.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jun 19, 2007

How would build_from_subregs be used? Please give me an example.

I think you are over-thinking this. The scheme you described is unnecessarily
complex. One thing to keep in mind is you can have a def to sub-reg followed by
a use of its super-reg (and vice versa). That's perfectly legal, lv knows how to
deal with that.

extract_subreg and insert_subreg whose source operands are physical registers
should not exist in machineinstruction passes. ScheduleDAG should be responsible
for propagating the physical registers (similar to how we deal with CopyToReg
and CopyFromReg). So...

%reg1024 = op
insert_subreg %V2R0, %reg1024, 0
%reg1026 = opv %V2R0, %reg1025
copytoreg %V2R1, %reg1026
%reg1027 = extract_subreg %V2R1, 1
%reg1029 = op %reg1027, %reg1028

=>

%r0 = op
%v2R1 = opv %v2R0, %reg1025
%reg1029 = op %r3, %reg1028

lv can handle this just fine.

insert_subreg and extract_subreg that operate on virtual registers can exist in
machineinstr passes. For now, we should not worry about performance. So the
source and destination registers will have different (and potentially
overlapping) live intervals. e.g.

%reg1025 = subreg_extract %reg1024, 0

Suppose %reg1024 is allocated %v2r0 and %reg1025 %r3. Then regallocator rewrites
this into a move instruction

%r3 = mov %r0

We can worry about coalescing as the next step.

My current tactic is to try to always allocate vregs as superregs if a subreg
is involved, leading to stuff
like this for inserts:
%reg1024 = SLLI_4 %V4R0, 0
%reg1024.4 = MVUI 0
%V4R0 = SLLI_4 %reg1024, 0
RETL %R63

Is MVUI subreg_insert? Is it inserting a 0 into the 4th part of %V4R0? If so,
then after scheduling this should just be:

%r3 = mov 0

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 26, 2007

The first stage of the MachineInstr based approach discussed at the meeting at Apple is complete. There are now DAG nodes and target independent MachineInstr's to represent subreg insert and extract. The register .td file and the DAG scheduler have the needed hooks to allocate correct vregs for the subreg instructions and there is a pass to lower any un-coalesced subreg instructions to register copies.

The current framework still requires custom isel code to use subregs (no tablegen support yet) and there are currently no tests, as no public targets currently use subregs.

See (in chronological order):
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052244.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052246.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052247.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052248.html
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052249.html

Work remaining:
Teach the coalescing how to remove subreg instructions.
Add tablegen syntax support for selecting to subreg nodes in .td files.

@lattner
Copy link
Collaborator

lattner commented Sep 26, 2007

This bug (at least the subject line) is already implemented. Can you please close this and file new bugs for the specific work remaining? Thanks,

-Chris

@lattner
Copy link
Collaborator

lattner commented Sep 26, 2007

*** Bug #1141 has been marked as a duplicate of this bug. ***

@lattner
Copy link
Collaborator

lattner commented Nov 27, 2021

mentioned in issue #1014

@lattner
Copy link
Collaborator

lattner commented Nov 27, 2021

mentioned in issue #1141

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants