New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early tail dup still not as good as duplicating indirectbr in clang #10623
Comments
I reduce this a bit. It is still the case that the taildup is causing problems to the register allocator, but this time it is not something the coalescer can help with. The indirectgoto bb has code that looks like .... and vreg25 is defined in a loop preheader: %vreg25 = LEA64r %RIP, 1, %noreg, ga:@_ZZN2js9InterpretEP9JSContextPNS_10StackFrameENS_10InterpModeEE15normalJumpTable, %noreg We duplicate indirectgoto into the preheader and then duplicate the preheader itself. This turns vreg25 into a phi where all operands are identical, but this is too late to fix it. Some possible fixes/improvements: *) Teach MachineCode taildup the same trick the IL one knows about moving code to a common dominator instead of coping it. I will give the second option a try. |
The previous patch helped with firefox jsinterp.o, but it hurt webkit's interpreter which probably benefits from the extra tail duplication. Trying the "move it earlier" option. |
On size at least we are getting close. With indirectbr duplicated in clang __TEXT is 119536 and without it is 126656 (after 134372). There is still some performance difference with gcc. I am running the benchmarks again, but duplicating indirectbr in clang used to produce the best results. One difference I noticed is that trunk produces:
.... duplicating indirectbr in clang produces jmpq *(%rdi,%r9,8) which suggests that the tail duplication should be done at the IL level or at least codegenprepare needs to be more aggressive. The code for indirectgoto looks like %indirect.goto.dest.in = phi i8** ... of all the 230 arguments of the indirect.goto.dest.in phi, only one is not a getelementptr. |
Extended Description
Things got a lot better with the new improvements to the coalescer, but duplicating indirectbr in clang still produces better results.
In jsinterp.o, the current trunk produces:
171419 3904 0 0 175323 2acdb
And duplicating indirectbr in clang produces:
117275 3904 0 0 121179 1d95b
The text was updated successfully, but these errors were encountered: