1226 – scalarrepl should be able to scalarrepl aggregates with memcpy uses

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1226 - scalarrepl should be able to scalarrepl aggregates with memcpy uses

Summary: scalarrepl should be able to scalarrepl aggregates with memcpy uses

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	1.0
Hardware:	All All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:	code-quality

Depends on:
Blocks:	452
	Show dependency tree

Reported:	2007-02-25 14:35 PST by Chris Lattner
Modified:	2010-02-22 12:56 PST (History)
CC List:	1 user (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chris Lattner 2007-02-25 14:35:25 PST

Consider:

#include <tr1/functional>
#include <algorithm>
void assign( long* variable, long v) {
        std::transform( variable, variable + 1, variable,
                std::tr1::bind( std::plus< long >(), 0L, v ) );
}

This compiles to a single store on x86, but a whole ton of code on x86-64.  This is because the 
temporary structs are larger on x86-64, so EmitAggregateCopy in llvm-gcc emits them as a memcpy 
instead of scalar transfers.

The problem is that this later blocks scalarrepl from promoting the structs, causing much worse 
codegen:

__Z6assignRll:    # x86-32
        movl 8(%esp), %eax
        movl 4(%esp), %ecx
        movl %eax, (%ecx)
        ret

__Z6assignRll:   # x86-64
        subq $88, %rsp
        movb $0, 64(%rsp)
        movq $0, 72(%rsp)
        movq %rsi, 80(%rsp)
        movq %rsi, 48(%rsp)
        movq 72(%rsp), %rax
        movq %rax, 40(%rsp)
        movq 64(%rsp), %rax
        movq %rax, 32(%rsp)
        movq 40(%rsp), %rax
        movq %rax, 8(%rsp)
        movq 48(%rsp), %rax
        movq %rax, 16(%rsp)
        movq 32(%rsp), %rax
        movq %rax, (%rsp)
        movq 16(%rsp), %rax
        addq 8(%rsp), %rax
        movq %rax, (%rdi)
        addq $88, %rsp
        ret

-Chris

Comment 1 Chris Lattner 2007-02-25 14:36:28 PST

Repro with:
$ llvm-g++ t.cc -O3 -S -o - -fno-exceptions -fomit-frame-pointer -m64

Comment 2 Chris Lattner 2007-03-04 18:25:26 PST

Here's a reduced testcase:

#include <string.h>
struct foo { int A, B; };
int test(struct foo *P) {
  struct foo L;
  memcpy(&L, P, sizeof(struct foo));
  return L.A;
}

        %struct.foo = type { i32, i32 }

implementation   ; Functions:

define i32 @test(%struct.foo* %P) {
entry:
        %L = alloca %struct.foo, align 8                ; <%struct.foo*> [#uses=2]
        %L2 = bitcast %struct.foo* %L to i8*            ; <i8*> [#uses=1]
        %tmp13 = bitcast %struct.foo* %P to i8*         ; <i8*> [#uses=1]
        call void @llvm.memcpy.i32( i8* %L2, i8* %tmp13, i32 8, i32 4 )
        %tmp4 = getelementptr %struct.foo* %L, i32 0, i32 0             ; <i32*> [#uses=1]
        %tmp5 = load i32* %tmp4         ; <i32> [#uses=1]
        ret i32 %tmp5
}

Comment 3 Chris Lattner 2007-03-05 01:55:29 PST

This patch contains the (disabled) code to do the SROA.  Before this can be enabled, mem2reg needs to be 
able to promote scalars targetted by memcpy/memset etc.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045540.html

-Chris

Comment 4 Chris Lattner 2007-03-08 00:56:15 PST

Second half committed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045718.html

Third 'half' will come later.

Comment 5 Chris Lattner 2007-03-08 01:09:45 PST

For my notes:

The main part of this is implemented, but the testcase in this bug is not yet implemented.  The reason 
for this is that the lowered structure starts out with an array of bytes, so we get a memcpy ("gep x, 0, 
0") instead of memcpy(bitcast).  Two issues:

1. Make xform work with GEP.
2. Figure out why we're getting the array of bytes in this trivial case.

Another testcase:

#include <string.h>

struct foo { int A, B; };

struct bar{ struct foo x; long long y; double D; };

int test1(struct foo *P) {
  struct foo L;
  memcpy(&L, P, sizeof(struct foo));
  return L.A;
}

int test2() {
  struct foo L[4];
  memset(L, 0, sizeof(struct foo)*4);
  return L[0].A;
}

int test3() {
  struct bar B;
  memset(&B, 1, sizeof(B));

  B.x.A = 1;
  B.D = 10.0;
  return B.x.B;
}

-Chris

Comment 6 Chris Lattner 2007-03-18 19:19:56 PDT

Here is the final piece:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070312/045911.html

Testcases here:
Transforms/ScalarRepl/memset-aggregate.ll
Transforms/ScalarRepl/memset-aggregate-byte-leader.ll

-Chris