New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early tail dup can cause problems to register allocator #10468
Comments
I tried building all of firefox with -Xclang -mllvm -Xclang -disable-early-taildup -Xclang -mllvm -Xclang -tail-dup-size=8 The results are amazing. With -O3 -g (as before) jsinterp.o builds in 0m32.132s A dromaeo run gets 1689.14runs/s (Total), which is the fastest run on this laptop so far :-) I am tempted to propose shifting a bit of responsibility from the early tail dup to the one that runs after ra. The early one would only duplicate small (2 instructions, no calls, no rets) blocks. The last one would be the one responsible for the more aggressive optimizations like duplicating blocks with indirect gotos. |
We are duplicating the indirectbr block to help the branch predictor. It may be a bigger deal on ARM cores, but Intel cores should benefit too. In this case, something goes horribly wrong with coalescing and register pressure explodes. If we can fix that, I would expect even better results with the duplicated indirectbr block. |
Not that the indirectbr was duplicated, just that it was duplicated in the second pass. |
I found some problems in the tail dup pass by stress testing it with a max tail size of 8 and checking on. I am currently working on those. |
Updated patch. |
remaining parts |
I implemented an IL level indirectbr duplicator to try to isolate the problem. The two attached IL files are very similar, test5.ll having one more instruction duplicated. The log files are the output of llc testN.ll -relocation-model=pic -disable-early-taildup -print-machineinstrs. Note how the outputs are mostly in sync until we reach the register allocator. The end result is that test4.o has a 323 byte __TEXT and test5.o has a 565 __TEXT. |
The problem is with coalescing failing to merge some of the copies created by phielim. I am debugging why. |
test4.phielim |
test5.phielim |
The problem is that in coalescing when trying to decide if we can combine the live ranges of A and B, we only keep track of copies between A and B. Because of this we conclude that vreg40 and vreg42 cannot be merged when we see:
I have a fairly ad hoc patch that checks for both registers being defined with copies from a common third one. It fixes the reduced test case, but it looks like I am not updating live ranges correctly...(fails an assert in make check) Perhaps we could solve this problem by reverting the algorithm order: Start by assuming that all candidates can be merged, and start splitting the groups until they can be split no further. This should handle cyclic cases like
and vregX and vregY can be coalesced but we fail to notice because of
|
This is a tricky problem. The live ranges must still be in SSA form after coalescing. That is why you can't just merge ranges, even when you know they have the same value. Normally, we merge the overlapping value numbers into one, but that only works when one def dominates the other. |
This is after phi elimination, so by SSA you mean that each definition must dominate all its uses, right? I think this would still be fine, since we can rewrite
as
and vreg40 dominates its new use. Since now vreg42 is defined in terms of vreg40, the existing logic works and we merge the two. |
Extended Description
I am reporting a bug to keep track of the recent discussions on llvm-commits.
With the attached bitcode file:
$ time ./bin/llc -disable-early-taildup ~/llvm/master2.bc -o master.o -tail-dup-size=8 -filetype=obj
real 0m1.667s
user 0m1.640s
sys 0m0.024s
$ ls -l master.o
-rw-rw-r--. 1 espindola espindola 126256 Jun 7 01:19 master.o
$ time ./bin/llc ~/llvm/master2.bc -o master.o -filetype=obj
real 0m24.317s
user 0m24.204s
sys 0m0.063s
$ ls -l master.o
-rw-rw-r--. 1 espindola espindola 675912 Jun 7 01:20 master.o
The text was updated successfully, but these errors were encountered: