New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vreg subregs support #1722
Comments
Chris Lattner sez: I think these are the major pieces needed. These are
|
1, 2, and 3 are done. 4-7 left to go. |
Note: when this is completed, we should remove Target/Sparc/FPMover.cpp and implement it with |
Chris's proposal for 4,5,7 will work for multiple uses but only single defs of the super register. For In this way subregs are very much like vectors, we need a way to represent the subreg equivalent of Comments? |
Yep, that sounds right. Excellent point. |
After diving into LiveIntervals and ScheduleDAG for a few days, I've decided that trying to teach them So ima gonna try a different approach. Physical registers have information on super registers, sub The plan would be thus: I have 2 questions:
|
I am confused. What problem are you trying to solve by your proposal? It seems Isn't ScheduleDAG already creating new virtual registers? The problematic example you sent me via email:
I am surely over-simplifying things. But it seems to me, if we had already
It would work just right. So my whole point is, eliminate these instructions early rather than relying on Am I totally missing the point? :-) |
I've come to the same conclusion about the complexity of live variables and live intervals. There be In my proposal I was assuming that this was a a hard requirement to preserve SSA form for RegAlloc, Eventually I realized that that the implicit use/def combo needed for insert and build on subregs is very My current tactic is to try to always allocate vregs as superregs if a subreg is involved, leading to stuff Live variables doesn't seem to like the multiple defs of a register, so I have a MF pass that adds that Then I have another MF pass that needs to be registered preEmit that assumes all vregs have been So far this has worked out quite well with only tiny tinkering to the LV,LI, RegAlloc trifecta. I've yet to |
How would build_from_subregs be used? Please give me an example. I think you are over-thinking this. The scheme you described is unnecessarily extract_subreg and insert_subreg whose source operands are physical registers %reg1024 = op => %r0 = op lv can handle this just fine. insert_subreg and extract_subreg that operate on virtual registers can exist in %reg1025 = subreg_extract %reg1024, 0 Suppose %reg1024 is allocated %v2r0 and %reg1025 %r3. Then regallocator rewrites %r3 = mov %r0 We can worry about coalescing as the next step.
Is MVUI subreg_insert? Is it inserting a 0 into the 4th part of %V4R0? If so, %r3 = mov 0 |
The first stage of the MachineInstr based approach discussed at the meeting at Apple is complete. There are now DAG nodes and target independent MachineInstr's to represent subreg insert and extract. The register .td file and the DAG scheduler have the needed hooks to allocate correct vregs for the subreg instructions and there is a pass to lower any un-coalesced subreg instructions to register copies. The current framework still requires custom isel code to use subregs (no tablegen support yet) and there are currently no tests, as no public targets currently use subregs. See (in chronological order): Work remaining: |
This bug (at least the subject line) is already implemented. Can you please close this and file new bugs for the specific work remaining? Thanks, -Chris |
*** Bug #1141 has been marked as a duplicate of this bug. *** |
mentioned in issue #1014 |
mentioned in issue #1141 |
Extended Description
From the dev mailing list:
The issue I'm having is that there is no extract/insert
instruction in the ISA, it's simply based on using subregister
operands in subsequent/preliminary instructions. At the pointer of
custom lowering register allocation has not yet been done, so I
don't have a way to communicate the dependency.
Ok.
If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
DAG that looks like
load v4si <- extract_element 2 <- add -> load i32
I'd like to be able to generate
load v4r0
load r10
add r11, r10, r2 <== subregister 2 of v4r0
Nice ISA. That is entirely too logical. :)
We have a similar problem on X86. In particular, an integer truncate or
an extend (e.g. i16 -> i8) wants to make use of subregisters. Consider
code like this:
t1 = load i16
t2 = truncate i16 t1 to i8
t3 = add i8 t2, 42
What we would really want to generate is something like this at the
machine instr level:
r1024 = X86_LOADi16 ... ;; r1024 is i16
r1026 = ADDi8 r1024[subreg #0], 42
More specifically, we want to be able to define, for each register class,
a set of subregister classes. In the X86 world, the 64-bit register
classes could have subregclass0 = i8 parts, subregclass1 = i16 parts,
subregclass2 = i32 parts. Each <physreg, subreg#> pair should map to
another physreg (e.g. <RAX,1> -> AX).
The idea of this is that the register allocator allocates registers like
normal, but when it does the rewriting pass, when it replaces vregs with
pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0]
with CL instead of CX. This would give us this code:
CX = X86_LOADi16 ...
DL = ADDi8 CL, 42
In your case, you'd define your vector register class with 4 subregs, one
for each piece.
Unfortunately, none of this exists yet :(. To handle truncates and
extends on X86, we currently emulate this by generating machineinstrs
like:
r1024 = X86_LOADi16 ...
r1025 = TRUNCATE_i16_to_i8 r1024
r1026 = ADDi8 r1025, 42
In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if
the register allocator happens to allocate 1024 and 1025 to the same
register. If not, it uses an asmprinter hack to print this as a copy
instruction. This is horrible, and doesn't produce good code. OTOH,
before Evan improved this, we always copied into AX and out of AL for each
i16->i8 truncate, which was much worse :)
I see that Evan has added getSubRegisters()/getSuperRegisters() to
MRegisterInfo. This is what's needed in order to implement the
register allocation constraint, but there's no way yet to pass the
constraint through the operands from the DAG. There would need to be
some way to specify that the SDOperand is referencing a subvalue of
the produced value (perhaps a subclass of SDOperand?). This would
allow the register allocator to try to use the sub/super register
sets to perform the instert/extract.
Right. Evan is currently focusing on getting the late stages of the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.
Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...
This is on our mid-term plan, which means we'll probably tackle it over
the next year or so, but we don't have any concrete plans in the immediate
future. If you are interested, this should be a pretty reasonable project
that will give you a chance to become more familiar with various pieces of
the early code generator. :)
The text was updated successfully, but these errors were encountered: