Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early tail dup still not as good as duplicating indirectbr in clang #10623

Open
llvmbot opened this issue Jul 2, 2011 · 6 comments
Open

Early tail dup still not as good as duplicating indirectbr in clang #10623

llvmbot opened this issue Jul 2, 2011 · 6 comments
Labels
bugzilla Issues migrated from bugzilla llvm:codegen

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 2, 2011

Bugzilla Link 10251
Version trunk
OS All
Attachments indirect.patch, jsinterp.ii.bz2
Reporter LLVM Bugzilla Contributor
CC @asl

Extended Description

Things got a lot better with the new improvements to the coalescer, but duplicating indirectbr in clang still produces better results.

In jsinterp.o, the current trunk produces:

171419 3904 0 0 175323 2acdb

And duplicating indirectbr in clang produces:

117275 3904 0 0 121179 1d95b

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 3, 2011

I reduce this a bit. It is still the case that the taildup is causing problems to the register allocator, but this time it is not something the coalescer can help with.

The indirectgoto bb has code that looks like

....
JMP64m %vreg25, 8, %vreg26, 0, %noreg

and vreg25 is defined in a loop preheader:

%vreg25 = LEA64r %RIP, 1, %noreg, ga:@_ZZN2js9InterpretEP9JSContextPNS_10StackFrameENS_10InterpModeEE15normalJumpTable, %noreg

We duplicate indirectgoto into the preheader and then duplicate the preheader itself. This turns vreg25 into a phi where all operands are identical, but this is too late to fix it.

Some possible fixes/improvements:

*) Teach MachineCode taildup the same trick the IL one knows about moving code to a common dominator instead of coping it.
*) Make early tail dupilcation a bit less aggressive so that we don't duplicate a loop preheader.
*) Move it earlier, so that other passes can clean up.

I will give the second option a try.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 3, 2011

testcase

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 4, 2011

patch I will benchmark

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 4, 2011

The previous patch helped with firefox jsinterp.o, but it hurt webkit's interpreter which probably benefits from the extra tail duplication.

Trying the "move it earlier" option.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 4, 2011

On size at least we are getting close. With indirectbr duplicated in clang __TEXT is 119536 and without it is 126656 (after 134372).

There is still some performance difference with gcc. I am running the benchmarks again, but duplicating indirectbr in clang used to produce the best results.

One difference I noticed is that trunk produces:

   leaq    (%r14,%rax,8), %r8

....
jmpq *(%r8)

duplicating indirectbr in clang produces

jmpq *(%rdi,%r9,8)

which suggests that the tail duplication should be done at the IL level or at least codegenprepare needs to be more aggressive. The code for indirectgoto looks like

%indirect.goto.dest.in = phi i8** ...
%indirect.goto.dest = load i8** %indirect.goto.dest.in, align 8
indirectbr i8* %indirect.goto.dest, [...]

of all the 230 arguments of the indirect.goto.dest.in phi, only one is not a getelementptr.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 4, 2011

master.bc

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen
Projects
None yet
Development

No branches or pull requests

1 participant