New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not so hot code produced for Ada array traversal: sintzero #1660
Comments
Interesting. I was just looking at some similar examples that can happen in C code. For the record, please |
Perhaps the most amusing is sintzero. One X86 we get: .LBB3_1: #cond_next (yay) on ARM we get: .LBB3_1: @cond_next On PPC, we get: BB3_1: #cond_next say what? :) |
Okay, final request. Please attach the machine code generated by GCC with -O3 -fomit-frame-pointer. -Chris |
x86 codegen (i686-pc-linux-gnu) |
If the type of the induction variable had been chosen to be i32, rather |
That is a funny way to add one. We can definitely instcombine this: define i32 @test(i32 %indvar) { into add of one, but I'd like to fix whatever is creating that in the first place :) |
Progress: with several patches, we now compile sintzero to: LBB1_1: @cond_next and to: LBB1_1: ;cond_next These are all good. The only issue now is that this code isn't correct. Duncan, can you file the other cases as other buzilla bugs? -Chris |
The sintzero function is now optimized. Duncan is going to open other bugs for the other cases. This one testcase exposed several issues on multiple targets, yay :) -Chris |
The cases where the index variables are not i32s have been split off into |
Extended Description
The attached .ll file contains unoptimized code produced by the Ada front-end
for the operation "loop over all elements of an array of ints, setting them to
zero", for several different array types. The arrays are all of fixed size,
and differ only in the type of the indexing variable:
@sbytezero: zero an array indexed by a signed i8 (index range -128..127)
@ubytezero: zero an array indexed by an unsigned i8 (index range 0..255)
@sintzero: zero an array indexed by a signed i32
@uintzero: zero an array indexed by an unsigned i32
@srangezero: zero an array with index range -10..10
@urangezero: zero an array with index range 10..30
Thus all of these routines could in theory be optimized to a single memset.
That doesn't happen, but let's not ask for the moon! They do get fairly
well optimized but some optimizations are missed. I think this matters
because these kind of array traversals (only more complicated) come up all
the time in Ada code. The main problems seem to come from code like this
(extracted from @sbytezero):
%tmp = load i8* %i ; [#uses=1]
%tmp1 = sext i8 %tmp to i32 ; [#uses=5]
%tmp3 = sub i32 %tmp1, -128 ; [#uses=1]
%tmp4 = getelementptr [256 x i32]* %tmp2, i32 0,
i32 %tmp3 ; <i32*> [#uses=1]
Here %i is the array index variable. It runs from -128 to 127. The sext to
i32 is generated by the Ada f-e, presumably to avoid overflow when subtracting
off the array lower bound of -128, which occurs in the next line. This gets
optimized to:
%tmp8 = add i8 %indvar, -127 ; [#uses=1]
%tmp115 = sext i8 %tmp8 to i32 ; [#uses=1]
%tmp316 = add i32 %tmp115, 128 ; [#uses=1]
%tmp417 = getelementptr [256 x i32]* %a, i32 0,
i32 %tmp316 ; <i32*> [#uses=1]
Here %indvar runs from 0 to 254 (nice optimization!), which really corresponds
to indices 1..255 in the [256 x i32] array (the first value was peeled off).
Thus %tmp8 is between -127 and 127. Thus %tmp115 is also between -127 and 127.
Thus %tmp316 is between 1 and 255. It is just %tmp8+1 zextended to i32. Thus
the adding of -127 and 128 could have be turned into a single addition of 1.
In fact it would have been better not to peel off the first loop value and
have %indvar go from 0..255, then just zextend it for use in the getelementptr.
PS: there are a bunch of pointless compares in the unoptimized code, like
icmp sle i32 -128, %tmp1 ; :1 [#uses=0]
icmp sle i32 %tmp1, 127 ; :2 [#uses=0]
This is me playing with assertion generation in llvm-convert, i.e. one day
these may become
iassert sle i32 -128, %tmp1
iassert sle i32 %tmp1, 127
if we introduce an assert instruction.
The text was updated successfully, but these errors were encountered: