From the dev mailing list: The issue I'm having is that there is no extract/insert instruction in the ISA, it's simply based on using subregister operands in subsequent/preliminary instructions. At the pointer of custom lowering register allocation has not yet been done, so I don't have a way to communicate the dependency. Ok. If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a DAG that looks like load v4si <- extract_element 2 <- add -> load i32 I'd like to be able to generate load v4r0 load r10 add r11, r10, r2 <== subregister 2 of v4r0 Nice ISA. That is entirely too logical. :) We have a similar problem on X86. In particular, an integer truncate or an extend (e.g. i16 -> i8) wants to make use of subregisters. Consider code like this: t1 = load i16 t2 = truncate i16 t1 to i8 t3 = add i8 t2, 42 What we would really want to generate is something like this at the machine instr level: r1024 = X86_LOADi16 ... ;; r1024 is i16 r1026 = ADDi8 r1024[subreg #0], 42 More specifically, we want to be able to define, for each register class, a set of subregister classes. In the X86 world, the 64-bit register classes could have subregclass0 = i8 parts, subregclass1 = i16 parts, subregclass2 = i32 parts. Each <physreg, subreg#> pair should map to another physreg (e.g. <RAX,1> -> AX). The idea of this is that the register allocator allocates registers like normal, but when it does the rewriting pass, when it replaces vregs with pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0] with CL instead of CX. This would give us this code: CX = X86_LOADi16 ... DL = ADDi8 CL, 42 In your case, you'd define your vector register class with 4 subregs, one for each piece. Unfortunately, none of this exists yet :(. To handle truncates and extends on X86, we currently emulate this by generating machineinstrs like: r1024 = X86_LOADi16 ... r1025 = TRUNCATE_i16_to_i8 r1024 r1026 = ADDi8 r1025, 42 In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if the register allocator happens to allocate 1024 and 1025 to the same register. If not, it uses an asmprinter hack to print this as a copy instruction. This is horrible, and doesn't produce good code. OTOH, before Evan improved this, we always copied into AX and out of AL for each i16->i8 truncate, which was much worse :) I see that Evan has added getSubRegisters()/getSuperRegisters() to MRegisterInfo. This is what's needed in order to implement the register allocation constraint, but there's no way yet to pass the constraint through the operands from the DAG. There would need to be some way to specify that the SDOperand is referencing a subvalue of the produced value (perhaps a subclass of SDOperand?). This would allow the register allocator to try to use the sub/super register sets to perform the instert/extract. Right. Evan is currently focusing on getting the late stages of the code generator (e.g. livevars) to be able to understand arbitrary machine instrs in the face of physreg subregs. This lays the groundwork for handling vreg subregs, but won't solve it directly. Is any of this kind of work planned? The addition of those MRegisterInfo functions has me curious... This is on our mid-term plan, which means we'll probably tackle it over the next year or so, but we don't have any concrete plans in the immediate future. If you are interested, this should be a pretty reasonable project that will give you a chance to become more familiar with various pieces of the early code generator. :)
Chris Lattner sez: I think these are the major pieces needed. These are all relatively small and independent pieces, so we can tackle these one at a time. 1. As you say, we need MRegisterInfo::getSubRegisterForIndex that, given a preg/subreg# pair, returns a preg. 2. We need tblgen syntax/registerinfoemitter support to generate tables for #1. 3. Register MachineOperands need a subregister number. We should probably use 0 to denote "no subreg". 4. The DAG scheduler pass (which creates machine instrs from dag nodes) currently thinks of register operands as simple unsigned's for vreg #'s. This needs to be extended to be vreg+subreg pairs (see 'CreateVirtualRegisters'). 5. We need to decide how to represent subregs in the DAG. Your SDSubOperand idea is fine, but I don't think it needs to be an actual new subclass of SDOperand. Instead, it could just be a binary SDNode, where the LHS is the register input and the RHS is a TargetConstant specifying the subreg#. 6. [optional] We would like syntax to create these things for writting patterns in the .td file instead of requiring custom matching code. 7. The register allocator needs to rewrite subreg references using #1. This should be very simple.
1, 2, and 3 are done. 4-7 left to go.
Note: when this is completed, we should remove Target/Sparc/FPMover.cpp and implement it with subregs.
Chris's proposal for 4,5,7 will work for multiple uses but only single defs of the super register. For instance take a register class like those on x86 for 8-bit and 16-bit values. The current proposal could support a partial def of a 16-bit vreg by defining one 8-bit subreg, but it doesn't address it being defined by both the lower and upper 8-bit vregs. In this way subregs are very much like vectors, we need a way to represent the subreg equivalent of insert, extract, scalar_to_vector (subreg_to_reg), and build_vector (build_from_subregs) in order to cover all the use/def cases. The tricky part is that LiveIntervals needs to be taught about subregs too, as it has to be able to discern partial defs and partial uses of a register. Comments?
Yep, that sounds right. Excellent point.
First part here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050456.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050457.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070611/050458.html
After diving into LiveIntervals and ScheduleDAG for a few days, I've decided that trying to teach them about subregs by tracking subreg indices wherever vregs are tracked is probably a bad idea. The data structures and code make really fundamental assumptions that vregs are unsigned numbers. Making them a pair breaks all sorts of things, including assumptions about SSA of vregs, which would no longer hold. So ima gonna try a different approach. Physical registers have information on super registers, sub registers (by index), and aliased registers. These relationships allow LiveIntervals to update information for all other values related to a specific physical register. It seems that to support vreg subregs these relationships need to exist between super, sub (indexed), and aliasing virtual registers as well, so that LI can update information on related values just like it can for physical registers. The difference with vregs is that the relationship between them is determined at compile time, rather than by the layout of the target's register file. So the aliasing information needs to be generated at compile time from subreg nodes in the DAG. The plan would be thus: ScheduleDAG - Create new virtual registers for subreg values. Generate subreg/superreg maps from subreg nodes in the DAG that denote the relationship between virtual registers. LiveIntervals - Maintain subreg/superreg relationships as joining is done. RegAlloc - Use subreg/superreg mappings during allocation to choose correct registers. I have 2 questions: 1. Where is the best place to maintain these maps? SSARegMap? All of the other maps I've seen a local to a specific part of CodeGen and aren't shared across the three stages above. 2. Is there a locus of code that is responsible for keeping vregs up to date as intervals are joined, etc? My understanding of LI is fresh and any pointers would be appreciated.
I am confused. What problem are you trying to solve by your proposal? It seems to be creating a whole lot of complexity when the passes are already overly complicated. It's making me uncomfortable. Isn't ScheduleDAG already creating new virtual registers? The problematic example you sent me via email: %reg1024 = SLLI_2 %V2R0, 0 %reg1025 = SLLI_2 %V2R1, 0 %reg1026 = ADD_1 %reg1025:2, %reg1024:1 %R0 = SLLI_1 %reg1026, 0 RETL %R63<imp-use> I am surely over-simplifying things. But it seems to me, if we had already rewritten it to eliminate the SLLI_2 before live variables as this (it's always possible to eliminate subreg extraction instruction and propagate the extraction, no?): %reg1026 = ADD_1 %r3, %r0 %R0 = SLLI_1 %reg1026, 0 RETL %R63<imp-use> It would work just right. So my whole point is, eliminate these instructions early rather than relying on copy propagation to do it because teaching that old dog need tricks isn't worth the effort. Am I totally missing the point? :-)
I've come to the same conclusion about the complexity of live variables and live intervals. There be dragons... And I agree, that subreg extraction can be propagated quite easily. The issue I have had is handling subreg inserts and builds, which inherently violate SSA form (multiple defs of a reg). In my proposal I was assuming that this was a a hard requirement to preserve SSA form for RegAlloc, and so I ended up trying to add new vregs for each subvalue with appropriate implicit defs and uses and essentially cross-wiring the value and subvalue vregs. The problem with this is that it is really complex, and requires teaching all of the complex code in LV, LI, and RegAlloc about aliasing vregs. Yuck! Eventually I realized that that the implicit use/def combo needed for insert and build on subregs is very much like the register constraint for TwoAddress instructions, and went to see how that was implemented and discovered that because of TwoAddress the register allocator _could_ handle non-SSA form. So my current approach ditches all of the special data structures and modifications to LV,LI, and RegAlloc I posted previously. My current tactic is to try to always allocate vregs as superregs if a subreg is involved, leading to stuff like this for inserts: %reg1024 = SLLI_4 %V4R0, 0 %reg1024.4 = MVUI 0 %V4R0 = SLLI_4 %reg1024, 0 RETL %R63<imp-use> Live variables doesn't seem to like the multiple defs of a register, so I have a MF pass that adds that info to LV (much like TwoAddress) after LV has run and before LI. Then I have another MF pass that needs to be registered preEmit that assumes all vregs have been allocated and rewrites the MachineOperands with subregs as the appropriate physical subregister. So far this has worked out quite well with only tiny tinkering to the LV,LI, RegAlloc trifecta. I've yet to run this strategy through all its paces, and I'm still concerned about the spiller rewriting stuff correctly, but I'm essentially banking on the same infrastructure that TwoAddress depends on, so I'm hopeful.
How would build_from_subregs be used? Please give me an example. I think you are over-thinking this. The scheme you described is unnecessarily complex. One thing to keep in mind is you can have a def to sub-reg followed by a use of its super-reg (and vice versa). That's perfectly legal, lv knows how to deal with that. extract_subreg and insert_subreg whose source operands are physical registers should not exist in machineinstruction passes. ScheduleDAG should be responsible for propagating the physical registers (similar to how we deal with CopyToReg and CopyFromReg). So... %reg1024 = op insert_subreg %V2R0, %reg1024, 0 %reg1026 = opv %V2R0, %reg1025 copytoreg %V2R1, %reg1026 %reg1027 = extract_subreg %V2R1, 1 %reg1029 = op %reg1027, %reg1028 => %r0 = op %v2R1 = opv %v2R0, %reg1025 %reg1029 = op %r3, %reg1028 lv can handle this just fine. insert_subreg and extract_subreg that operate on virtual registers can exist in machineinstr passes. For now, we should not worry about performance. So the source and destination registers will have different (and potentially overlapping) live intervals. e.g. %reg1025 = subreg_extract %reg1024, 0 Suppose %reg1024 is allocated %v2r0 and %reg1025 %r3. Then regallocator rewrites this into a move instruction %r3 = mov %r0 We can worry about coalescing as the next step. >My current tactic is to try to always allocate vregs as superregs if a subreg >is involved, leading to stuff >like this for inserts: > %reg1024 = SLLI_4 %V4R0, 0 > %reg1024.4 = MVUI 0 > %V4R0 = SLLI_4 %reg1024, 0 > RETL %R63<imp-use> Is MVUI subreg_insert? Is it inserting a 0 into the 4th part of %V4R0? If so, then after scheduling this should just be: %r3 = mov 0
The first stage of the MachineInstr based approach discussed at the meeting at Apple is complete. There are now DAG nodes and target independent MachineInstr's to represent subreg insert and extract. The register .td file and the DAG scheduler have the needed hooks to allocate correct vregs for the subreg instructions and there is a pass to lower any un-coalesced subreg instructions to register copies. The current framework still requires custom isel code to use subregs (no tablegen support yet) and there are currently no tests, as no public targets currently use subregs. See (in chronological order): http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052244.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052246.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052247.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052248.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070723/052249.html Work remaining: Teach the coalescing how to remove subreg instructions. Add tablegen syntax support for selecting to subreg nodes in .td files.
This bug (at least the subject line) is already implemented. Can you please close this and file new bugs for the specific work remaining? Thanks, -Chris
*** Bug 769 has been marked as a duplicate of this bug. ***