2619 – poor optimization in presence of bitcast'ed pointer

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 2619 - poor optimization in presence of bitcast'ed pointer

Summary: poor optimization in presence of bitcast'ed pointer

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	Macintosh All

Importance:	P normal
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:	3473
	Show dependency tree

Reported:	2008-07-31 11:35 PDT by Daniel Dunbar
Modified:	2009-02-03 12:40 PST (History)
CC List:	5 users (show)

See Also:
Fixed By Commit(s):

Attachments
failing testcase (912 bytes, text/plain) 2008-07-31 11:35 PDT, Daniel Dunbar	Details
Original input, pre-optimization (1.19 KB, application/octet-stream) 2008-07-31 11:35 PDT, Daniel Dunbar	Details
Another test case which should reduce to a constant (119 bytes, text/plain) 2008-08-06 14:44 PDT, Daniel Dunbar	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Dunbar 2008-07-31 11:35:17 PDT

Created attachment 1874 [details]
failing testcase

The attached test case (foo.ll) should optimize down to a constant return. I believe the presence of the bitcast is inhibiting further optimization though. 

A relatively simple fix (I believe) would be to reduce the stores through a bitcasted pointer to separate stores into the proper type when obvious. 

Alternatively, this code arises from a lowered memset, which could be changed to lower in a type specific way. The original .ll is also attached.

Comment 1 Daniel Dunbar 2008-07-31 11:35:43 PDT

Created attachment 1875 [details]
Original input, pre-optimization

Comment 2 Daniel Dunbar 2008-07-31 11:37:52 PDT

Please add the #if-ed out code in clang:test/CodeGen/2008-07-30-implicit-initialization.c back when fixed.

Comment 3 Evan Cheng 2008-07-31 20:26:26 PDT

Looks like something Owen might be interested in.

Comment 4 Owen Anderson 2008-08-05 17:05:20 PDT

GVN deliberately does not catch this case in order to improve compile time.  We don't want to have to reason about the size of stores to make sure that the loads are completely contained within them.  I don't anticipate GVN supporting this anytime in the near future.

Comment 5 Daniel Dunbar 2008-08-05 21:09:32 PDT

Okay. Its still a very straightforward optimization to change the writes through bitcasts to type safe stores in this particular case. Are you saying its not worth it or just that GVN shouldn't be the one to do it?

Comment 6 Daniel Dunbar 2008-08-06 14:44:54 PDT

Created attachment 1899 [details]
Another test case which should reduce to a constant

Comment 7 Daniel Dunbar 2008-08-06 14:46:00 PDT

Added another test case which should reduce to a constant. This case is coming from just a bitfield access not a memset.

Comment 8 Daniel Dunbar 2008-09-12 20:09:20 PDT

Here is another test case which produces really horrible
code (IR) on x86-32. Not the same as the title but the
same -- or a similar -- underlying issue I think.

--
struct s { short a, b, c, d; }
struct s foo(struct s x) {
  return x;
}
--

This time when compiled with llvm-gcc:
--
ddunbar@ddunbar2:ABITest$ llvm-gcc -O3 -c --emit-llvm -o - bad.c | llvm-dis 
; ModuleID = '<stdin>'
target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
target triple = "i386-apple-darwin9"
	%struct.s = type { i16, i16, i16, i16 }

define i64 @foo(%struct.s* byval align 4 %x) nounwind {
entry:
	%tmp2 = getelementptr %struct.s* %x, i32 0, i32 0		; <i16*> [#uses=1]
	%tmp3 = load i16* %tmp2, align 2		; <i16> [#uses=1]
	%tmp5 = getelementptr %struct.s* %x, i32 0, i32 1		; <i16*> [#uses=1]
	%tmp6 = load i16* %tmp5, align 2		; <i16> [#uses=1]
	%tmp8 = getelementptr %struct.s* %x, i32 0, i32 2		; <i16*> [#uses=1]
	%tmp9 = load i16* %tmp8, align 2		; <i16> [#uses=1]
	%tmp11 = getelementptr %struct.s* %x, i32 0, i32 3		; <i16*> [#uses=1]
	%tmp12 = load i16* %tmp11, align 2		; <i16> [#uses=1]
	%tmp1537 = zext i16 %tmp3 to i64		; <i64> [#uses=1]
	%tmp1834 = zext i16 %tmp6 to i64		; <i64> [#uses=1]
	%tmp183435 = shl i64 %tmp1834, 16		; <i64> [#uses=1]
	%tmp2131 = zext i16 %tmp9 to i64		; <i64> [#uses=1]
	%tmp213132 = shl i64 %tmp2131, 32		; <i64> [#uses=1]
	%tmp2428 = zext i16 %tmp12 to i64		; <i64> [#uses=1]
	%tmp242829 = shl i64 %tmp2428, 48		; <i64> [#uses=1]
	%tmp183435.ins = or i64 %tmp183435, %tmp1537		; <i64> [#uses=1]
	%tmp213132.ins = or i64 %tmp183435.ins, %tmp213132		; <i64> [#uses=1]
	%tmp242829.ins = or i64 %tmp213132.ins, %tmp242829		; <i64> [#uses=1]
	ret i64 %tmp242829.ins
}

ddunbar@ddunbar2:ABITest$ llvm-gcc -fomit-frame-pointer -O3 -S -c bad.c -o -


	.text
	.align	4,0x90
	.globl	_foo
_foo:
	movzwl	10(%esp), %eax
	shll	$16, %eax
	movzwl	8(%esp), %edx
	orl	%eax, %edx
	movzwl	6(%esp), %eax
	shll	$16, %eax
	movzwl	4(%esp), %ecx
	orl	%ecx, %eax
	ret

	.subsections_via_symbols

ddunbar@ddunbar2:ABITest$ gcc -fomit-frame-pointer -O3 -S -c bad.c -o -
	.text
	.align 4,0x90
.globl _foo
_foo:
	movl	4(%esp), %eax
	movl	8(%esp), %edx
	ret
	.subsections_via_symbols
--

Comment 9 Chris Lattner 2009-01-09 12:26:47 PST

Comment #8 is a different issue: lack of load merging in the code generator (turning two i16 loads into an i32 load when safe).

The recent SROA improvements don't cover the original testcase because it is only setting two elements of the array not the entire thing.  I've been pondering some more aggressive changes that could handle it, but I don't know when/if I'll ever get to them.

Comment 10 Chris Lattner 2009-02-03 12:30:33 PST

With recent SROA improvements, the example in comment8 is optimized properly for X86-64, but we still get ugly code on x86-32.  The issue is that llvm-gcc is passing the struct with 'byval' instead of as an i64 with x86-32, so there isn't much the optimizer can do (other than load merging in the code generator).

Comment 11 Chris Lattner 2009-02-03 12:32:54 PST

clang has the same problem fwiw.

The example in comment #6 is handle properly.

Comment 12 Chris Lattner 2009-02-03 12:34:28 PST

The original example is also optimized now.  I'm going to close this bug and file a follow-on for the remaining issue.

Comment 13 Chris Lattner 2009-02-03 12:40:00 PST

I filed Bug 3473 with the remaining codegen issue.  The SROA issue is fixed.