Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

Open
lattner opened this issue Oct 22, 2006 · 4 comments
Labels
bugzilla Issues migrated from bugzilla code-quality llvm:codegen

Comments

@lattner
Copy link
Collaborator

lattner commented Oct 22, 2006

Bugzilla Link 962
Version 1.8
OS All

Extended Description

The tail recursion elimination pass can produce functions with allocas not in the entry block, if it wants
them to live across iterations of the tail-recursion-eliminated code. A simple example of the produced
code is:

void %foo(bool %c) {
br label %Next
Next:
%A = alloca int
%B = alloca int
%C = alloca int
%D = alloca int
store int 1, int* %A
store int 1, int* %B
store int 1, int* %C
store int 1, int* %D
call void %test(int* %A, int* %B, int* %C, int* %D)
br bool %c, label %Next, label %Out
Out:
ret void
}
declare void %test(int*,int*,int*,int*)

It does this on the assumption that a block of allocas will be compiled into marginally efficient code, on
the order of performance as a normal prolog. However, this isn't the case. The code above compiles
into:

LBB1_1: ;Next
mr r2, r1
addi r3, r2, -16
mr r1, r3
mr r7, r1
addi r4, r7, -16
mr r1, r4
mr r8, r1
addi r5, r8, -16
mr r1, r5
mr r9, r1
addi r6, r9, -16
mr r1, r6
li r10, 1
stw r10, -16(r2)
stw r10, -16(r7)
stw r10, -16(r8)
stw r10, -16(r9)
rlwinm r29, r30, 0, 31, 31
addi r1, r1, -64
bl L_test$stub
cmplwi cr0, r29, 0
addi r1, r1, 64
bne cr0, LBB1_1 ;Next

At the top, that is a ton of back-to-back copies into the stack pointer and out. On X86, doing this sort
of thing quickly causes spill code to be generated.

It would be far better to see these fixed size dynamic allocas and merge them together in the code
generator, doing a single stack adjustment and then treating the suballocas as offsets from the base
pointer.

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 18, 2017

I found a link to this PR in TRE and I found it amusing so I started looking.
This is an updated IR dump which works with 4.0 (and trunk)

define void @​patatino(i1 %c) {
entry:
br label %next

next:
%A = alloca i32
%B = alloca i32
%C = alloca i32
%D = alloca i32
store i32 1, i32* %A
store i32 1, i32* %B
store i32 1, i32* %C
store i32 1, i32* %D
call void @​test(i32* %A, i32* %B, i32* %C, i32* %D)
br i1 %c, label %next, label %out

out:
ret void
}

declare void @​test(i32*, i32*, i32*, i32*)

@lattner
Copy link
Collaborator Author

lattner commented Jul 19, 2017

What does that codegen into?

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 19, 2017

patatino:
[...]
.p2align 4, 0x90
.LBB0_1: # %next
# =>This Inner Loop Header: Depth=1
movq %rsp, %r8
leaq -16(%r8), %rdi
movq %rdi, %rsp
movq %rsp, %r9
leaq -16(%r9), %rsi
movq %rsi, %rsp
movq %rsp, %rax
leaq -16(%rax), %rdx
movq %rdx, %rsp
movq %rsp, %rbx
leaq -16(%rbx), %rcx
movq %rcx, %rsp
movl $1, -16(%r8)
movl $1, -16(%r9)
movl $1, -16(%rax)
movl $1, -16(%rbx)
callq test
testb $1, %r14b
jne .LBB0_1

BB#2: # %out

    leaq    -16(%rbp), %rsp
    popq    %rbx
    popq    %r14
    popq    %rbp
    retq

.Lfunc_end0:
.size patatino, .Lfunc_end0-patatino

This is just running llc on that IR. I note that if I run -O3 on the input IR only loop unswitch triggers, i.e. the loop body is duplicated (but the codegen is the same for each BB as expected, except it's duplicated).

I guess some mov could still be eliminated but it's not clear to me whether we should prevent TRE in case of dynamic allocas? FWIW, I see TRE kicking in not very often, and many things changed since this PR was opened, including the new SROA.

I assume you don't have the original code that triggered this anymore :(

@lattner
Copy link
Collaborator Author

lattner commented Jul 19, 2017

I don't have the original code. Here's a trick to generate test cases: just change the code that detects this to dump out the function name then abort. With that compiler, just build the test suite. Any test case the compiler crashes on is one you want to look at :-)

The generated assembly shows the issue: those are multiple dynamic allocations back to back. This is bad for a number of reasons, not least of which is that frame pointer elimination is defeated, and that every time through the loop, more stack space is allocated.

Compared to 2006, LLVM now has the stack save/restore intrinsics which would help at least some of these issues.

-Chris

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023
kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023
kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla code-quality llvm:codegen
Projects
None yet
Development

No branches or pull requests

2 participants