Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate X86TargetLowering::MaxStoresPerMemcpyOptSize #27789

Open
zmodem opened this issue Apr 19, 2016 · 2 comments
Open

Evaluate X86TargetLowering::MaxStoresPerMemcpyOptSize #27789

zmodem opened this issue Apr 19, 2016 · 2 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@zmodem
Copy link
Collaborator

zmodem commented Apr 19, 2016

Bugzilla Link 27415
Version trunk
OS Linux
Blocks #26673

Extended Description

r265836, which causes more memcpys to be expanded, grew chrome_child.dll by 240 KB.

I suspect that maybe X86TargetLowering::MaxStoresPerMemcpyOptSize should be reduced, at least on 32-bit where we lower calls with smaller code than we used to.

@zmodem
Copy link
Collaborator Author

zmodem commented Apr 22, 2016

Some notes:

I tried bootstrapping Clang on 32-bit x86 with -Os and different values for MaxStoresPerMemcpyOptSize and MaxStoresPerMemmoveOptSize

Size MaxStoresPerMemcpyOptSize
67964194 1
6800925 2
68029730 3
68037922 4

(4 is the default)

So it seems setting it to 1 would be a size win.

But there might be more going on here. Consider the following example:

#include <string.h>
void f(char *dst, const char *src) {
memcpy(dst, src, 9);
}

At -O2 we will lower it with loads and stores:
00000000 :
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 8b 4c 24 08 mov 0x8(%esp),%ecx
8: 8a 51 08 mov 0x8(%ecx),%dl
b: 88 50 08 mov %dl,0x8(%eax)
e: f2 0f 10 01 movsd (%ecx),%xmm0
12: f2 0f 11 00 movsd %xmm0,(%eax)
16: c3 ret

At -Os, with the threshold set to 1, we make the call, and save one byte:

00000000 :
0: 83 ec 10 sub $0x10,%esp
3: 6a 09 push $0x9
5: ff 74 24 1c pushl 0x1c(%esp)
9: ff 74 24 1c pushl 0x1c(%esp)
d: e8 fc ff ff ff call e <f+0xe>
12: 83 c4 1c add $0x1c,%esp
15: c3 ret

However, if dst and src in the source were "int" instead of "char", we'd use rep;movs to copy two 4-byte words, and do one 1-byte move afterwards, causing much larger code:

00000000 :
0: 57 push %edi
1: 56 push %esi
2: 8b 44 24 0c mov 0xc(%esp),%eax
6: 8b 54 24 10 mov 0x10(%esp),%edx
a: b9 02 00 00 00 mov $0x2,%ecx
f: 89 c7 mov %eax,%edi
11: 89 d6 mov %edx,%esi
13: f3 a5 rep movsl %ds:(%esi),%es:(%edi)
15: 8a 4a 08 mov 0x8(%edx),%cl
18: 88 48 08 mov %cl,0x8(%eax)
1b: 5e pop %esi
1c: 5f pop %edi
1d: c3 ret

The rep;movs lowering comes from X86SelectionDAGInfo::EmitTargetCodeForMemcpy. If we were to just 1-byte rep;movs instead, the code would be:

00000000 :
0: 57 push %edi
1: 56 push %esi
2: 8b 7c 24 0c mov 0xc(%esp),%edi
6: 8b 74 24 10 mov 0x10(%esp),%esi
a: b9 09 00 00 00 mov $0x9,%ecx
f: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
11: 5e pop %esi
12: 5f pop %edi
13: c3 ret

Which is the smallest yet.

Maybe at -Os we should try to avoid ending up with remaining bytes after rep;movs and instead work with a smaller type, or just make the library call. Perhaps doing this would be better than trying to lower MaxStoresPerMemcpyOptSize.

(Mostly notes to self)
CMake invocation:
$ CC=/work/llvm/build.release/bin/clang CXX=/work/llvm/build.release/bin/clang++ CFLAGS="-m32" CXXFLAGS="-m32" cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS_RELEASE="-Os" -DCMAKE_CXX_FLAGS_RELEASE="-Os" -GNinja ..

Needed to apply this patch to avoid crashing:

--- a/lib/Target/X86/X86SelectionDAGInfo.cpp
+++ b/lib/Target/X86/X86SelectionDAGInfo.cpp
@@ -241,6 +241,8 @@ SDValue X86SelectionDAGInfo::EmitTargetCodeForMemcpy(

unsigned UBytes = AVT.getSizeInBits() / 8;
unsigned CountVal = SizeVal / UBytes;

  • if (!CountVal)
  • return SDValue();
    SDValue Count = DAG.getIntPtrConstant(CountVal, dl);
    unsigned BytesLeft = SizeVal % UBytes;

@zmodem
Copy link
Collaborator Author

zmodem commented Apr 23, 2016

Some measurements on a 32-bit Chrome build on Windows:

Threshold Size (chrome_child.dll)
4 50,785,280
2 50,761,728
1 50,719,232

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

1 participant