Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalarrepl should be able to scalarrepl aggregates with memcpy uses #1598

Closed
lattner opened this issue Feb 25, 2007 · 7 comments
Closed

scalarrepl should be able to scalarrepl aggregates with memcpy uses #1598

lattner opened this issue Feb 25, 2007 · 7 comments
Labels
bugzilla Issues migrated from bugzilla code-quality

Comments

@lattner
Copy link
Collaborator

lattner commented Feb 25, 2007

Bugzilla Link 1226
Resolution FIXED
Resolved on Feb 22, 2010 12:56
Version 1.0
OS All
Blocks #824

Extended Description

Consider:

#include <tr1/functional>
#include
void assign( long* variable, long v) {
std::transform( variable, variable + 1, variable,
std::tr1::bind( std::plus< long >(), 0L, v ) );
}

This compiles to a single store on x86, but a whole ton of code on x86-64. This is because the
temporary structs are larger on x86-64, so EmitAggregateCopy in llvm-gcc emits them as a memcpy
instead of scalar transfers.

The problem is that this later blocks scalarrepl from promoting the structs, causing much worse
codegen:

__Z6assignRll: # x86-32
movl 8(%esp), %eax
movl 4(%esp), %ecx
movl %eax, (%ecx)
ret

__Z6assignRll: # x86-64
subq $88, %rsp
movb $0, 64(%rsp)
movq $0, 72(%rsp)
movq %rsi, 80(%rsp)
movq %rsi, 48(%rsp)
movq 72(%rsp), %rax
movq %rax, 40(%rsp)
movq 64(%rsp), %rax
movq %rax, 32(%rsp)
movq 40(%rsp), %rax
movq %rax, 8(%rsp)
movq 48(%rsp), %rax
movq %rax, 16(%rsp)
movq 32(%rsp), %rax
movq %rax, (%rsp)
movq 16(%rsp), %rax
addq 8(%rsp), %rax
movq %rax, (%rdi)
addq $88, %rsp
ret

-Chris

@lattner
Copy link
Collaborator Author

lattner commented Feb 25, 2007

Repro with:
$ llvm-g++ t.cc -O3 -S -o - -fno-exceptions -fomit-frame-pointer -m64

@lattner
Copy link
Collaborator Author

lattner commented Mar 5, 2007

Here's a reduced testcase:

#include <string.h>
struct foo { int A, B; };
int test(struct foo *P) {
struct foo L;
memcpy(&L, P, sizeof(struct foo));
return L.A;
}

    %struct.foo = type { i32, i32 }

implementation ; Functions:

define i32 @​test(%struct.foo* %P) {
entry:
%L = alloca %struct.foo, align 8 ; <%struct.foo*> [#uses=2]
%L2 = bitcast %struct.foo* %L to i8* ; <i8*> [#uses=1]
%tmp13 = bitcast %struct.foo* %P to i8* ; <i8*> [#uses=1]
call void @​llvm.memcpy.i32( i8* %L2, i8* %tmp13, i32 8, i32 4 )
%tmp4 = getelementptr %struct.foo* %L, i32 0, i32 0 ; <i32*> [#uses=1]
%tmp5 = load i32* %tmp4 ; [#uses=1]
ret i32 %tmp5
}

@lattner
Copy link
Collaborator Author

lattner commented Mar 5, 2007

This patch contains the (disabled) code to do the SROA. Before this can be enabled, mem2reg needs to be
able to promote scalars targetted by memcpy/memset etc.

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045540.html

-Chris

@lattner
Copy link
Collaborator Author

lattner commented Mar 8, 2007

Second half committed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070305/045718.html

Third 'half' will come later.

@lattner
Copy link
Collaborator Author

lattner commented Mar 8, 2007

For my notes:

The main part of this is implemented, but the testcase in this bug is not yet implemented. The reason
for this is that the lowered structure starts out with an array of bytes, so we get a memcpy ("gep x, 0,
0") instead of memcpy(bitcast). Two issues:

  1. Make xform work with GEP.
  2. Figure out why we're getting the array of bytes in this trivial case.

Another testcase:

#include <string.h>

struct foo { int A, B; };

struct bar{ struct foo x; long long y; double D; };

int test1(struct foo *P) {
struct foo L;
memcpy(&L, P, sizeof(struct foo));
return L.A;
}

int test2() {
struct foo L[4];
memset(L, 0, sizeof(struct foo)*4);
return L[0].A;
}

int test3() {
struct bar B;
memset(&B, 1, sizeof(B));

B.x.A = 1;
B.D = 10.0;
return B.x.B;
}

-Chris

@lattner
Copy link
Collaborator Author

lattner commented Mar 19, 2007

Here is the final piece:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070312/045911.html

Testcases here:
Transforms/ScalarRepl/memset-aggregate.ll
Transforms/ScalarRepl/memset-aggregate-byte-leader.ll

-Chris

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 27, 2021

mentioned in issue #824

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
troelsbjerre pushed a commit to troelsbjerre/llvm-project that referenced this issue Jan 10, 2024
Teach SROA to handle allocas with more than one dbg.declare.
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla code-quality
Projects
None yet
Development

No branches or pull requests

2 participants