Early tail dup can cause problems to register allocator #10468

llvmbot · 2011-06-07T07:17:36Z


Bugzilla Link	10096
Resolution	FIXED
Resolved on	Jun 30, 2011 20:12
Version	trunk
OS	Linux
Attachments	master2.bc
Reporter	LLVM Bugzilla Contributor
CC	@atrick,@stoklund

Extended Description

I am reporting a bug to keep track of the recent discussions on llvm-commits.

With the attached bitcode file:

$ time ./bin/llc -disable-early-taildup ~/llvm/master2.bc -o master.o -tail-dup-size=8 -filetype=obj

real 0m1.667s
user 0m1.640s
sys 0m0.024s

$ ls -l master.o
-rw-rw-r--. 1 espindola espindola 126256 Jun 7 01:19 master.o

$ time ./bin/llc ~/llvm/master2.bc -o master.o -filetype=obj

real 0m24.317s
user 0m24.204s
sys 0m0.063s
$ ls -l master.o
-rw-rw-r--. 1 espindola espindola 675912 Jun 7 01:20 master.o

llvmbot · 2011-06-07T16:04:14Z

I tried building all of firefox with

-Xclang -mllvm -Xclang -disable-early-taildup -Xclang -mllvm -Xclang -tail-dup-size=8

The results are amazing. With -O3 -g (as before) jsinterp.o builds in 0m32.132s
(from 63s) and is 1490840 bytes (from 1876288 or 168448 with the clang patch).

A dromaeo run gets 1689.14runs/s (Total), which is the fastest run on this laptop so far :-)

I am tempted to propose shifting a bit of responsibility from the early tail dup to the one that runs after ra. The early one would only duplicate small (2 instructions, no calls, no rets) blocks. The last one would be the one responsible for the more aggressive optimizations like duplicating blocks with indirect gotos.

stoklund · 2011-06-07T17:22:34Z

We are duplicating the indirectbr block to help the branch predictor. It may be a bigger deal on ARM cores, but Intel cores should benefit too.

In this case, something goes horribly wrong with coalescing and register pressure explodes.

If we can fix that, I would expect even better results with the duplicated indirectbr block.

llvmbot · 2011-06-07T19:23:51Z

Not that the indirectbr was duplicated, just that it was duplicated in the second pass.

llvmbot · 2011-06-08T05:01:22Z

patch I am trying

llvmbot · 2011-06-09T13:29:17Z

I found some problems in the tail dup pass by stress testing it with a max tail size of 8 and checking on. I am currently working on those.

llvmbot · 2011-06-09T17:55:46Z

Updated patch.
This one survived a bootstrap with a dup limit of 8. Will clean it up and commit the refactoring bits.

llvmbot · 2011-06-09T17:57:36Z

rebased patch

llvmbot · 2011-06-10T02:01:55Z

remaining parts
I have committed the refactoring and fixes bits. Back to benchmarking..

llvmbot · 2011-06-25T21:09:09Z

I implemented an IL level indirectbr duplicator to try to isolate the problem. The two attached IL files are very similar, test5.ll having one more instruction duplicated.

The log files are the output of llc testN.ll -relocation-model=pic -disable-early-taildup -print-machineinstrs.

Note how the outputs are mostly in sync until we reach the register allocator.

The end result is that test4.o has a 323 byte __TEXT and test5.o has a 565 __TEXT.

llvmbot · 2011-06-25T21:10:00Z

test4.ll

llvmbot · 2011-06-25T21:10:16Z

test5.ll

llvmbot · 2011-06-25T21:10:34Z

test4.log

llvmbot · 2011-06-25T21:11:20Z

test5.log

llvmbot · 2011-06-26T21:56:00Z

The problem is with coalescing failing to merge some of the copies created by phielim. I am debugging why.

llvmbot · 2011-06-26T21:56:46Z

test4.phielim
Output just after coalescing.

llvmbot · 2011-06-26T21:57:27Z

test5.phielim
Output just after coalescing.

llvmbot · 2011-06-27T06:37:41Z

The problem is that in coalescing when trying to decide if we can combine the live ranges of A and B, we only keep track of copies between A and B. Because of this we conclude that vreg40 and vreg42 cannot be merged when we see:

%vreg40<def> = COPY %vreg45
%vreg42<def> = COPY %vreg45

I have a fairly ad hoc patch that checks for both registers being defined with copies from a common third one. It fixes the reduced test case, but it looks like I am not updating live ranges correctly...(fails an assert in make check)

Perhaps we could solve this problem by reverting the algorithm order: Start by assuming that all candidates can be merged, and start splitting the groups until they can be split no further. This should handle cyclic cases like

%vreg40<def> = COPY %vregX
%vreg42<def> = COPY %vregY

and vregX and vregY can be coalesced but we fail to notice because of

%vregX<def> = COPY %vreg40
%vregY<def> = COPY %vreg42

llvmbot · 2011-06-27T06:38:22Z

test4.ll

llvmbot · 2011-06-27T06:38:48Z

test5.ll

llvmbot · 2011-06-27T06:39:06Z

test4.phielim

llvmbot · 2011-06-27T06:39:29Z

test5.phielim

stoklund · 2011-06-27T07:21:54Z

This is a tricky problem. The live ranges must still be in SSA form after coalescing. That is why you can't just merge ranges, even when you know they have the same value. Normally, we merge the overlapping value numbers into one, but that only works when one def dominates the other.

llvmbot · 2011-06-27T14:21:46Z

This is after phi elimination, so by SSA you mean that each definition must dominate all its uses, right?

I think this would still be fine, since we can rewrite

%vreg40<def> = COPY %vreg45
%vreg42<def> = COPY %vreg45

as

%vreg40<def> = COPY %vreg45
%vreg42<def> = COPY %vreg40

and vreg40 dominates its new use. Since now vreg42 is defined in terms of vreg40, the existing logic works and we merge the two.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early tail dup can cause problems to register allocator #10468

Early tail dup can cause problems to register allocator #10468

llvmbot commented Jun 7, 2011

llvmbot commented Jun 7, 2011

stoklund mannequin commented Jun 7, 2011

llvmbot commented Jun 7, 2011

llvmbot commented Jun 8, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 10, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

stoklund mannequin commented Jun 27, 2011

llvmbot commented Jun 27, 2011

Early tail dup can cause problems to register allocator #10468

Early tail dup can cause problems to register allocator #10468

Comments

llvmbot commented Jun 7, 2011

Extended Description

llvmbot commented Jun 7, 2011

stoklund mannequin commented Jun 7, 2011

llvmbot commented Jun 7, 2011

llvmbot commented Jun 8, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 9, 2011

llvmbot commented Jun 10, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 25, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 26, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

llvmbot commented Jun 27, 2011

stoklund mannequin commented Jun 27, 2011

llvmbot commented Jun 27, 2011