LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1226 - scalarrepl should be able to scalarrepl aggregates with memcpy uses
Summary: scalarrepl should be able to scalarrepl aggregates with memcpy uses
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Scalar Optimizations (show other bugs)
Version: 1.0
Hardware: All All
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords: code-quality
Depends on:
Blocks: 452
  Show dependency tree
 
Reported: 2007-02-25 14:35 PST by Chris Lattner
Modified: 2010-02-22 12:56 PST (History)
1 user (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Lattner 2007-02-25 14:35:25 PST
Consider:

#include <tr1/functional>
#include <algorithm>
void assign( long* variable, long v) {
        std::transform( variable, variable + 1, variable,
                std::tr1::bind( std::plus< long >(), 0L, v ) );
}

This compiles to a single store on x86, but a whole ton of code on x86-64.  This is because the 
temporary structs are larger on x86-64, so EmitAggregateCopy in llvm-gcc emits them as a memcpy 
instead of scalar transfers.

The problem is that this later blocks scalarrepl from promoting the structs, causing much worse 
codegen:

__Z6assignRll:    # x86-32
        movl 8(%esp), %eax
        movl 4(%esp), %ecx
        movl %eax, (%ecx)
        ret

__Z6assignRll:   # x86-64
        subq $88, %rsp
        movb $0, 64(%rsp)
        movq $0, 72(%rsp)
        movq %rsi, 80(%rsp)
        movq %rsi, 48(%rsp)
        movq 72(%rsp), %rax
        movq %rax, 40(%rsp)
        movq 64(%rsp), %rax
        movq %rax, 32(%rsp)
        movq 40(%rsp), %rax
        movq %rax, 8(%rsp)
        movq 48(%rsp), %rax
        movq %rax, 16(%rsp)
        movq 32(%rsp), %rax
        movq %rax, (%rsp)
        movq 16(%rsp), %rax
        addq 8(%rsp), %rax
        movq %rax, (%rdi)
        addq $88, %rsp
        ret

-Chris
Comment 1 Chris Lattner 2007-02-25 14:36:28 PST
Repro with:
$ llvm-g++ t.cc -O3 -S -o - -fno-exceptions -fomit-frame-pointer -m64
Comment 2 Chris Lattner 2007-03-04 18:25:26 PST
Here's a reduced testcase:

#include <string.h>
struct foo { int A, B; };
int test(struct foo *P) {
  struct foo L;
  memcpy(&L, P, sizeof(struct foo));
  return L.A;
}

        %struct.foo = type { i32, i32 }

implementation   ; Functions:

define i32 @test(%struct.foo* %P) {
entry:
        %L = alloca %struct.foo, align 8                ; <%struct.foo*> [#uses=2]
        %L2 = bitcast %struct.foo* %L to i8*            ; <i8*> [#uses=1]
        %tmp13 = bitcast %struct.foo* %P to i8*         ; <i8*> [#uses=1]
        call void @llvm.memcpy.i32( i8* %L2, i8* %tmp13, i32 8, i32 4 )
        %tmp4 = getelementptr %struct.foo* %L, i32 0, i32 0             ; <i32*> [#uses=1]
        %tmp5 = load i32* %tmp4         ; <i32> [#uses=1]
        ret i32 %tmp5
}
Comment 3 Chris Lattner 2007-03-05 01:55:29 PST
This patch contains the (disabled) code to do the SROA.  Before this can be enabled, mem2reg needs to be 
able to promote scalars targetted by memcpy/memset etc.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045540.html

-Chris
Comment 4 Chris Lattner 2007-03-08 00:56:15 PST
Second half committed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045718.html

Third 'half' will come later.
Comment 5 Chris Lattner 2007-03-08 01:09:45 PST
For my notes:

The main part of this is implemented, but the testcase in this bug is not yet implemented.  The reason 
for this is that the lowered structure starts out with an array of bytes, so we get a memcpy ("gep x, 0, 
0") instead of memcpy(bitcast).  Two issues:

1. Make xform work with GEP.
2. Figure out why we're getting the array of bytes in this trivial case.

Another testcase:

#include <string.h>

struct foo { int A, B; };

struct bar{ struct foo x; long long y; double D; };

int test1(struct foo *P) {
  struct foo L;
  memcpy(&L, P, sizeof(struct foo));
  return L.A;
}

int test2() {
  struct foo L[4];
  memset(L, 0, sizeof(struct foo)*4);
  return L[0].A;
}

int test3() {
  struct bar B;
  memset(&B, 1, sizeof(B));

  B.x.A = 1;
  B.D = 10.0;
  return B.x.B;
}

-Chris
Comment 6 Chris Lattner 2007-03-18 19:19:56 PDT
Here is the final piece:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070312/045911.html

Testcases here:
Transforms/ScalarRepl/memset-aggregate.ll
Transforms/ScalarRepl/memset-aggregate-byte-leader.ll

-Chris