-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334
Comments
I found a link to this PR in TRE and I found it amusing so I started looking. define void @patatino(i1 %c) { next: out: declare void @test(i32*, i32*, i32*, i32*) |
What does that codegen into? |
patatino: BB#2: # %out
.Lfunc_end0: This is just running I guess some I assume you don't have the original code that triggered this anymore :( |
I don't have the original code. Here's a trick to generate test cases: just change the code that detects this to dump out the function name then abort. With that compiler, just build the test suite. Any test case the compiler crashes on is one you want to look at :-) The generated assembly shows the issue: those are multiple dynamic allocations back to back. This is bad for a number of reasons, not least of which is that frame pointer elimination is defeated, and that every time through the loop, more stack space is allocated. Compared to 2006, LLVM now has the stack save/restore intrinsics which would help at least some of these issues. -Chris |
Extended Description
The tail recursion elimination pass can produce functions with allocas not in the entry block, if it wants
them to live across iterations of the tail-recursion-eliminated code. A simple example of the produced
code is:
void %foo(bool %c) {
br label %Next
Next:
%A = alloca int
%B = alloca int
%C = alloca int
%D = alloca int
store int 1, int* %A
store int 1, int* %B
store int 1, int* %C
store int 1, int* %D
call void %test(int* %A, int* %B, int* %C, int* %D)
br bool %c, label %Next, label %Out
Out:
ret void
}
declare void %test(int*,int*,int*,int*)
It does this on the assumption that a block of allocas will be compiled into marginally efficient code, on
the order of performance as a normal prolog. However, this isn't the case. The code above compiles
into:
LBB1_1: ;Next
mr r2, r1
addi r3, r2, -16
mr r1, r3
mr r7, r1
addi r4, r7, -16
mr r1, r4
mr r8, r1
addi r5, r8, -16
mr r1, r5
mr r9, r1
addi r6, r9, -16
mr r1, r6
li r10, 1
stw r10, -16(r2)
stw r10, -16(r7)
stw r10, -16(r8)
stw r10, -16(r9)
rlwinm r29, r30, 0, 31, 31
addi r1, r1, -64
bl L_test$stub
cmplwi cr0, r29, 0
addi r1, r1, 64
bne cr0, LBB1_1 ;Next
At the top, that is a ton of back-to-back copies into the stack pointer and out. On X86, doing this sort
of thing quickly causes spill code to be generated.
It would be far better to see these fixed size dynamic allocas and merge them together in the code
generator, doing a single stack adjustment and then treating the suballocas as offsets from the base
pointer.
-Chris
The text was updated successfully, but these errors were encountered: