Not so hot code produced for Ada array traversal: sintzero #1660

llvmbot · 2007-03-29T11:36:11Z


Bugzilla Link	1288
Resolution	FIXED
Resolved on	Feb 22, 2010 12:46
Version	1.0
OS	All
Attachments	The unoptimized code, optimized code
Reporter	LLVM Bugzilla Contributor

Extended Description

The attached .ll file contains unoptimized code produced by the Ada front-end
for the operation "loop over all elements of an array of ints, setting them to
zero", for several different array types. The arrays are all of fixed size,
and differ only in the type of the indexing variable:
@sbytezero: zero an array indexed by a signed i8 (index range -128..127)
@ubytezero: zero an array indexed by an unsigned i8 (index range 0..255)
@sintzero: zero an array indexed by a signed i32
@uintzero: zero an array indexed by an unsigned i32
@srangezero: zero an array with index range -10..10
@urangezero: zero an array with index range 10..30

Thus all of these routines could in theory be optimized to a single memset.
That doesn't happen, but let's not ask for the moon! They do get fairly
well optimized but some optimizations are missed. I think this matters
because these kind of array traversals (only more complicated) come up all
the time in Ada code. The main problems seem to come from code like this
(extracted from @sbytezero):
%tmp = load i8* %i ; [#uses=1]
%tmp1 = sext i8 %tmp to i32 ; [#uses=5]
%tmp3 = sub i32 %tmp1, -128 ; [#uses=1]
%tmp4 = getelementptr [256 x i32]* %tmp2, i32 0,
i32 %tmp3 ; <i32*> [#uses=1]
Here %i is the array index variable. It runs from -128 to 127. The sext to
i32 is generated by the Ada f-e, presumably to avoid overflow when subtracting
off the array lower bound of -128, which occurs in the next line. This gets
optimized to:
%tmp8 = add i8 %indvar, -127 ; [#uses=1]
%tmp115 = sext i8 %tmp8 to i32 ; [#uses=1]
%tmp316 = add i32 %tmp115, 128 ; [#uses=1]
%tmp417 = getelementptr [256 x i32]* %a, i32 0,
i32 %tmp316 ; <i32*> [#uses=1]
Here %indvar runs from 0 to 254 (nice optimization!), which really corresponds
to indices 1..255 in the [256 x i32] array (the first value was peeled off).
Thus %tmp8 is between -127 and 127. Thus %tmp115 is also between -127 and 127.
Thus %tmp316 is between 1 and 255. It is just %tmp8+1 zextended to i32. Thus
the adding of -127 and 128 could have be turned into a single addition of 1.
In fact it would have been better not to peel off the first loop value and
have %indvar go from 0..255, then just zextend it for use in the getelementptr.

PS: there are a bunch of pointless compares in the unoptimized code, like
icmp sle i32 -128, %tmp1 ; :1 [#uses=0]
icmp sle i32 %tmp1, 127 ; :2 [#uses=0]
This is me playing with assertion generation in llvm-convert, i.e. one day
these may become
iassert sle i32 -128, %tmp1
iassert sle i32 %tmp1, 127
if we introduce an assert instruction.

lattner · 2007-04-01T08:14:08Z

Interesting. I was just looking at some similar examples that can happen in C code. For the record, please
attach a .s file produced by LLC for your favorite target. I will see if I can improve the generated code over
the next week or so.

lattner · 2007-04-01T08:16:37Z

X86 Codegen

lattner · 2007-04-01T08:16:56Z

Old PPC Codegen

lattner · 2007-04-01T08:17:12Z

Old ARM Codegen

lattner · 2007-04-01T08:23:45Z

Perhaps the most amusing is sintzero. One X86 we get:

.LBB3_1: #cond_next
movl $0, 4(%eax,%ecx,4)
incl %ecx
cmpl $4294967295, %ecx
jne .LBB3_1 #cond_next

(yay)

on ARM we get:

.LBB3_1: @cond_next
add r2, r0, r3, lsl #2 <--- boo hiss
mov r1, #0
str r1, [r2, #+4]
add r3, r3, #1
cmn r3, #1
bne .LBB3_1 @cond_next

On PPC, we get:

BB3_1: #cond_next
add 6, 5, 4
addi 4, 4, 1
li 7, 0
slwi 6, 6, 2
stwx 7, 3, 6
xoris 6, 4, 65535
cmplwi 0, 6, 65535
bne 0, BB3_1 #cond_next

say what? :)

lattner · 2007-04-01T08:24:07Z

Okay, final request. Please attach the machine code generated by GCC with -O3 -fomit-frame-pointer.

-Chris

llvmbot · 2007-04-01T09:43:17Z

x86 codegen (i686-pc-linux-gnu)
The functions kinds__TsbytearrayBIP etc can be ignored (I stripped
them out of the original example).

llvmbot · 2007-04-01T09:46:13Z

gcc 4.1.2, i486-linux-gnu, -O3 -fomit-frame-pointer

llvmbot · 2007-04-01T09:47:57Z

gcc 3.4.6, i486-linux-gnu, -O3 -fomit-frame-pointer

llvmbot · 2007-04-01T09:49:34Z

gcc 4.3.0, i686-linux-gnu, -O3 -fomit-frame-pointer

llvmbot · 2007-04-01T12:43:36Z

If the type of the induction variable had been chosen to be i32, rather
than the original type, some of these problems would go away. As for
sintzero, you've got to admit that this
%tmp7 = add i32 %indvar, -2147483647 ; [#uses=1]
%tmp214 = xor i32 %tmp7, -2147483648 ; [#uses=1]
is a funny way to add one!

lattner · 2007-04-02T01:04:07Z

That is a funny way to add one. We can definitely instcombine this:

define i32 @test(i32 %indvar) {
%tmp7 = add i32 %indvar, -2147483647 ; [#uses=1]
%tmp214 = xor i32 %tmp7, -2147483648 ; [#uses=1]
ret i32 %tmp214
}

into add of one, but I'd like to fix whatever is creating that in the first place :)

lattner · 2007-04-02T08:38:29Z

Progress: with several patches, we now compile sintzero to:
LBB1_1: #cond_next
movl $0, (%eax,%ecx,4)
incl %ecx
cmpl $4294967295, %ecx
jne LBB1_1 #cond_next
on X86

LBB1_1: @cond_next
mov r2, #0
str r2, [r0, +r3, lsl #2]
add r3, r3, #1
cmn r3, #1
bne LBB1_1 @cond_next
on ARM (the mov to R2 is already covered by Bug 893)

and to:

LBB1_1: ;cond_next
li r4, 0
addi r2, r2, 1
stw r4, 0(r3)
addi r3, r3, 4
cmpwi cr0, r2, -1
bne cr0, LBB1_1 ;cond_next
on PPC (which needs to use postinc, but that's another issue).

These are all good. The only issue now is that this code isn't correct.

Duncan, can you file the other cases as other buzilla bugs?

-Chris

lattner · 2007-04-02T09:00:35Z

The sintzero function is now optimized. Duncan is going to open other bugs for the other cases.

This one testcase exposed several issues on multiple targets, yay :)

-Chris

llvmbot · 2007-04-02T15:21:09Z

The cases where the index variables are not i32s have been split off into
llvm/llvm-bugzilla-archive#1301 .

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not so hot code produced for Ada array traversal: sintzero #1660

Not so hot code produced for Ada array traversal: sintzero #1660

llvmbot commented Mar 29, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

lattner commented Apr 2, 2007

lattner commented Apr 2, 2007

lattner commented Apr 2, 2007

llvmbot commented Apr 2, 2007

Not so hot code produced for Ada array traversal: sintzero #1660

Not so hot code produced for Ada array traversal: sintzero #1660

Comments

llvmbot commented Mar 29, 2007

Extended Description

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

lattner commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

llvmbot commented Apr 1, 2007

lattner commented Apr 2, 2007

lattner commented Apr 2, 2007

lattner commented Apr 2, 2007

llvmbot commented Apr 2, 2007