poor optimization in presence of bitcast'ed pointer #2991

llvmbot · 2008-07-31T18:35:17Z


Bugzilla Link	2619
Resolution	FIXED
Resolved on	Feb 03, 2009 12:40
Version	trunk
OS	All
Blocks	llvm/llvm-bugzilla-archive#3473
Attachments	failing testcase, Original input, pre-optimization
Reporter	LLVM Bugzilla Contributor
CC	@lattner

Extended Description

The attached test case (foo.ll) should optimize down to a constant return. I believe the presence of the bitcast is inhibiting further optimization though.

A relatively simple fix (I believe) would be to reduce the stores through a bitcasted pointer to separate stores into the proper type when obvious.

Alternatively, this code arises from a lowered memset, which could be changed to lower in a type specific way. The original .ll is also attached.

llvmbot · 2008-07-31T18:37:52Z

Please add the #if-ed out code in clang:test/CodeGen/2008-07-30-implicit-initialization.c back when fixed.

llvmbot · 2008-08-01T03:26:26Z

Looks like something Owen might be interested in.

llvmbot · 2008-08-06T00:05:20Z

GVN deliberately does not catch this case in order to improve compile time. We don't want to have to reason about the size of stores to make sure that the loads are completely contained within them. I don't anticipate GVN supporting this anytime in the near future.

llvmbot · 2008-08-06T04:09:32Z

Okay. Its still a very straightforward optimization to change the writes through bitcasts to type safe stores in this particular case. Are you saying its not worth it or just that GVN shouldn't be the one to do it?

llvmbot · 2008-08-06T21:44:54Z

Another test case which should reduce to a constant

llvmbot · 2008-08-06T21:46:00Z

Added another test case which should reduce to a constant. This case is coming from just a bitfield access not a memset.

llvmbot · 2008-09-13T03:09:20Z

Here is another test case which produces really horrible
code (IR) on x86-32. Not the same as the title but the
same -- or a similar -- underlying issue I think.

--
struct s { short a, b, c, d; }
struct s foo(struct s x) {
return x;
}

This time when compiled with llvm-gcc:

ddunbar@ddunbar2:ABITest$ llvm-gcc -O3 -c --emit-llvm -o - bad.c | llvm-dis
; ModuleID = ''
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
target triple = "i386-apple-darwin9"
%struct.s = type { i16, i16, i16, i16 }

define i64 @foo(%struct.s* byval align 4 %x) nounwind {
entry:
%tmp2 = getelementptr %struct.s* %x, i32 0, i32 0 ; <i16*> [#uses=1]
%tmp3 = load i16* %tmp2, align 2 ; [#uses=1]
%tmp5 = getelementptr %struct.s* %x, i32 0, i32 1 ; <i16*> [#uses=1]
%tmp6 = load i16* %tmp5, align 2 ; [#uses=1]
%tmp8 = getelementptr %struct.s* %x, i32 0, i32 2 ; <i16*> [#uses=1]
%tmp9 = load i16* %tmp8, align 2 ; [#uses=1]
%tmp11 = getelementptr %struct.s* %x, i32 0, i32 3 ; <i16*> [#uses=1]
%tmp12 = load i16* %tmp11, align 2 ; [#uses=1]
%tmp1537 = zext i16 %tmp3 to i64 ; [#uses=1]
%tmp1834 = zext i16 %tmp6 to i64 ; [#uses=1]
%tmp183435 = shl i64 %tmp1834, 16 ; [#uses=1]
%tmp2131 = zext i16 %tmp9 to i64 ; [#uses=1]
%tmp213132 = shl i64 %tmp2131, 32 ; [#uses=1]
%tmp2428 = zext i16 %tmp12 to i64 ; [#uses=1]
%tmp242829 = shl i64 %tmp2428, 48 ; [#uses=1]
%tmp183435.ins = or i64 %tmp183435, %tmp1537 ; [#uses=1]
%tmp213132.ins = or i64 %tmp183435.ins, %tmp213132 ; [#uses=1]
%tmp242829.ins = or i64 %tmp213132.ins, %tmp242829 ; [#uses=1]
ret i64 %tmp242829.ins
}

ddunbar@ddunbar2:ABITest$ llvm-gcc -fomit-frame-pointer -O3 -S -c bad.c -o -

.text
.align	4,0x90
.globl	_foo

_foo:
movzwl 10(%esp), %eax
shll $16, %eax
movzwl 8(%esp), %edx
orl %eax, %edx
movzwl 6(%esp), %eax
shll $16, %eax
movzwl 4(%esp), %ecx
orl %ecx, %eax
ret

.subsections_via_symbols

ddunbar@ddunbar2:ABITest$ gcc -fomit-frame-pointer -O3 -S -c bad.c -o -
.text
.align 4,0x90
.globl _foo
_foo:
movl 4(%esp), %eax
movl 8(%esp), %edx
ret
.subsections_via_symbols

lattner · 2009-01-09T20:26:47Z

Comment #8 is a different issue: lack of load merging in the code generator (turning two i16 loads into an i32 load when safe).

The recent SROA improvements don't cover the original testcase because it is only setting two elements of the array not the entire thing. I've been pondering some more aggressive changes that could handle it, but I don't know when/if I'll ever get to them.

lattner · 2009-02-03T20:30:33Z

With recent SROA improvements, the example in comment8 is optimized properly for X86-64, but we still get ugly code on x86-32. The issue is that llvm-gcc is passing the struct with 'byval' instead of as an i64 with x86-32, so there isn't much the optimizer can do (other than load merging in the code generator).

lattner · 2009-02-03T20:32:54Z

clang has the same problem fwiw.

The example in comment #6 is handle properly.

lattner · 2009-02-03T20:34:28Z

The original example is also optimized now. I'm going to close this bug and file a follow-on for the remaining issue.

lattner · 2009-02-03T20:40:00Z

I filed Bug 3473 with the remaining codegen issue. The SROA issue is fixed.

lattner · 2021-11-26T23:49:09Z

mentioned in issue llvm/llvm-bugzilla-archive#3473

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor optimization in presence of bitcast'ed pointer #2991

poor optimization in presence of bitcast'ed pointer #2991

llvmbot commented Jul 31, 2008

llvmbot commented Jul 31, 2008

llvmbot commented Aug 1, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Sep 13, 2008

lattner commented Jan 9, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Nov 26, 2021

poor optimization in presence of bitcast'ed pointer #2991

poor optimization in presence of bitcast'ed pointer #2991

Comments

llvmbot commented Jul 31, 2008

Extended Description

llvmbot commented Jul 31, 2008

llvmbot commented Aug 1, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Aug 6, 2008

llvmbot commented Sep 13, 2008

-- struct s { short a, b, c, d; } struct s foo(struct s x) { return x; }

This time when compiled with llvm-gcc:

ddunbar@ddunbar2:ABITest$ gcc -fomit-frame-pointer -O3 -S -c bad.c -o - .text .align 4,0x90 .globl _foo _foo: movl 4(%esp), %eax movl 8(%esp), %edx ret .subsections_via_symbols

lattner commented Jan 9, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Feb 3, 2009

lattner commented Nov 26, 2021

--
struct s { short a, b, c, d; }
struct s foo(struct s x) {
return x;
}

ddunbar@ddunbar2:ABITest$ gcc -fomit-frame-pointer -O3 -S -c bad.c -o -
.text
.align 4,0x90
.globl _foo
_foo:
movl 4(%esp), %eax
movl 8(%esp), %edx
ret
.subsections_via_symbols