Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

lattner · 2006-10-22T20:20:59Z


Bugzilla Link	962
Version	1.8
OS	All

Extended Description

The tail recursion elimination pass can produce functions with allocas not in the entry block, if it wants
them to live across iterations of the tail-recursion-eliminated code. A simple example of the produced
code is:

void %foo(bool %c) {
br label %Next
Next:
%A = alloca int
%B = alloca int
%C = alloca int
%D = alloca int
store int 1, int* %A
store int 1, int* %B
store int 1, int* %C
store int 1, int* %D
call void %test(int* %A, int* %B, int* %C, int* %D)
br bool %c, label %Next, label %Out
Out:
ret void
}
declare void %test(int*,int*,int*,int*)

It does this on the assumption that a block of allocas will be compiled into marginally efficient code, on
the order of performance as a normal prolog. However, this isn't the case. The code above compiles
into:

LBB1_1: ;Next
mr r2, r1
addi r3, r2, -16
mr r1, r3
mr r7, r1
addi r4, r7, -16
mr r1, r4
mr r8, r1
addi r5, r8, -16
mr r1, r5
mr r9, r1
addi r6, r9, -16
mr r1, r6
li r10, 1
stw r10, -16(r2)
stw r10, -16(r7)
stw r10, -16(r8)
stw r10, -16(r9)
rlwinm r29, r30, 0, 31, 31
addi r1, r1, -64
bl L_test$stub
cmplwi cr0, r29, 0
addi r1, r1, 64
bne cr0, LBB1_1 ;Next

At the top, that is a ton of back-to-back copies into the stack pointer and out. On X86, doing this sort
of thing quickly causes spill code to be generated.

It would be far better to see these fixed size dynamic allocas and merge them together in the code
generator, doing a single stack adjustment and then treating the suballocas as offsets from the base
pointer.

-Chris

llvmbot · 2017-07-18T15:53:13Z

I found a link to this PR in TRE and I found it amusing so I started looking.
This is an updated IR dump which works with 4.0 (and trunk)

define void @patatino(i1 %c) {
entry:
br label %next

next:
%A = alloca i32
%B = alloca i32
%C = alloca i32
%D = alloca i32
store i32 1, i32* %A
store i32 1, i32* %B
store i32 1, i32* %C
store i32 1, i32* %D
call void @test(i32* %A, i32* %B, i32* %C, i32* %D)
br i1 %c, label %next, label %out

out:
ret void
}

declare void @test(i32*, i32*, i32*, i32*)

lattner · 2017-07-19T01:42:28Z

What does that codegen into?

llvmbot · 2017-07-19T01:59:03Z

patatino:
[...]
.p2align 4, 0x90
.LBB0_1: # %next
# =>This Inner Loop Header: Depth=1
movq %rsp, %r8
leaq -16(%r8), %rdi
movq %rdi, %rsp
movq %rsp, %r9
leaq -16(%r9), %rsi
movq %rsi, %rsp
movq %rsp, %rax
leaq -16(%rax), %rdx
movq %rdx, %rsp
movq %rsp, %rbx
leaq -16(%rbx), %rcx
movq %rcx, %rsp
movl $1, -16(%r8)
movl $1, -16(%r9)
movl $1, -16(%rax)
movl $1, -16(%rbx)
callq test
testb $1, %r14b
jne .LBB0_1

BB#2: # %out

    leaq    -16(%rbp), %rsp
    popq    %rbx
    popq    %r14
    popq    %rbp
    retq

.Lfunc_end0:
.size patatino, .Lfunc_end0-patatino

This is just running llc on that IR. I note that if I run -O3 on the input IR only loop unswitch triggers, i.e. the loop body is duplicated (but the codegen is the same for each BB as expected, except it's duplicated).

I guess some mov could still be eliminated but it's not clear to me whether we should prevent TRE in case of dynamic allocas? FWIW, I see TRE kicking in not very often, and many things changed since this PR was opened, including the new SROA.

I assume you don't have the original code that triggered this anymore :(

lattner · 2017-07-19T05:12:33Z

I don't have the original code. Here's a trick to generate test cases: just change the code that detects this to dump out the function name then abort. With that compiler, just build the test suite. Any test case the compiler crashes on is one you want to look at :-)

The generated assembly shows the issue: those are multiple dynamic allocations back to back. This is bad for a number of reasons, not least of which is that frame pointer elimination is defeated, and that every time through the loop, more stack space is allocated.

Compared to 2006, LLVM now has the stack save/restore intrinsics which would help at least some of these issues.

-Chris

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023

refs llvm#1334 SwplTransformLindaのlit対応

bf89aa7

kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023

refs llvm#1334 SwplTransformLindaのlit対応

3ee88e7

kitano-metro pushed a commit to RIKEN-RCCS/llvm-project that referenced this issue Feb 14, 2023

refs llvm#1334 SwplTransformLindaのlit対応

6a87fe3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

lattner commented Oct 22, 2006

llvmbot commented Jul 18, 2017

lattner commented Jul 19, 2017

llvmbot commented Jul 19, 2017

lattner commented Jul 19, 2017

Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

Code generator compiles back-to-back fixed-size dynamic allocas into really bad code #1334

Comments

lattner commented Oct 22, 2006

Extended Description

llvmbot commented Jul 18, 2017

lattner commented Jul 19, 2017

llvmbot commented Jul 19, 2017

BB#2: # %out

lattner commented Jul 19, 2017