First Last Prev Next    No search results available
Details
: Invalid alignment for SSE2 code
Bug#: 1649
: libraries
: Backend: X86
Status: RESOLVED
Resolution: FIXED
: PC
: Linux
: trunk
: P2
: normal
: ---

:
:
:
:
  Show dependency tree - Show dependency graph
People
Reporter: Anton Korobeynikov <asl@math.spbu.ru>
Assigned To: Evan Cheng <evan.cheng@apple.com>
:

Attachments
Function extracted (4.80 KB, application/octet-stream)
2007-09-11 02:09, Anton Korobeynikov
Details


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2007-09-11 02:09
Created an attachment (id=1113) [details]
Function extracted

Consider attached testcase.

In bb79:

bb79:           ; preds = %bb57
        %tmp81 = load double* %tmp68, align 16          ; <double> [#uses=1]
        %tmp82 = sub double -0.000000e+00, %tmp81               ; <double>
[#uses=1]
        store double %tmp82, double* %tmp68, align 16
        %tmp84 = getelementptr [3 x double]* %Raw_Normal, i32 0, i32 1         
; <double*> [#uses=2]
        %tmp85 = load double* %tmp84, align 8           ; <double> [#uses=1]
        %tmp86 = sub double -0.000000e+00, %tmp85               ; <double>
[#uses=1]
        store double %tmp86, double* %tmp84, align 8
        %tmp88 = getelementptr [3 x double]* %Raw_Normal, i32 0, i32 2         
; <double*> [#uses=2]
        %tmp89 = load double* %tmp88, align 8           ; <double> [#uses=1]
        %tmp90 = sub double -0.000000e+00, %tmp89               ; <double>
[#uses=1]
        store double %tmp90, double* %tmp88, align 8
        br label %bb92

This is codegen'ed (-disable-fp-elim) into:

        movsd   .LCPI1_0, %xmm0
        movapd  %xmm0, %xmm1
        xorpd   -104(%ebp), %xmm1
        movsd   %xmm1, -104(%ebp)
        movsd   -96(%ebp), %xmm1
        xorpd   %xmm0, %xmm1
        movsd   %xmm1, -96(%ebp)
        movsd   -88(%ebp), %xmm1
        xorpd   %xmm0, %xmm1
        movsd   %xmm1, -88(%ebp)

We definitely see a problem: even if %ebp is 16-bytes aligned, the access of
temporaries is not aligned at all.

This causes povray miscompilation at -O2.
------- Comment #1 From Evan Cheng 2007-09-13 14:04:21 -------
        xorpd   -104(%ebp), %xmm1
        movsd   %xmm1, -104(%ebp)
        movsd   -96(%ebp), %xmm1
        movsd   %xmm1, -96(%ebp)
        movsd   -88(%ebp), %xmm1
        movsd   %xmm1, -88(%ebp)

Only xorpd -104(%ebp) has to be aligned on 16-byte. movsd only require 8-byte
alignment.

Furthermore, the prologue looks like this:

        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $172, %esp

Assuming %esp was 16-byte aligned before the call. The call push 4-byte and the
first pushl %ebp decrement another 4 bytes from %esp. So %ebp = old %esp - 8 =
new %esp + 184.

-104(%ebp) is 16-byte aligned. This is correct assuming %esp is 16-byte
aligned. But is that true? Linux stack alignment requirement is 8-byte only.

Anton, can you tell me the value of %esp upon entry?
------- Comment #2 From Anton Korobeynikov 2007-09-13 14:18:33 -------
Hrm, it seems you're right.

%ebp is 0xbfe24bf8 for me :( Any ideas how this can be fixed? Can result of
PR1636 be used to solve this issue?
------- Comment #3 From Evan Cheng 2007-09-13 15:55:27 -------
We need to dynamically align the stack ptr in the prologue. This means using fp
to reference objects in the caller's frame and using sp to reference local
objects. If alloca() are used, that means keeping another pointer to reference
local objects? I am not sure.

I probably won't be getting to this any time soon. Anton, would you like to
take a crack at it? :-)
------- Comment #4 From Anton Korobeynikov 2007-09-14 06:24:45 -------
This seems to be exactly the task of PR1636. So, if we'll have sane stack
realignment facility both PRs will be fixed.

I'll try to dig into this (hope really soon).
------- Comment #5 From Anton Korobeynikov 2007-09-15 16:17:44 -------
*** Bug 1671 has been marked as a duplicate of this bug. ***
------- Comment #6 From Chris Lattner 2007-10-03 17:09:32 -------
The GCC folks are discussing stack realignment, some interesting links here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660
------- Comment #7 From Anton Korobeynikov 2007-10-04 00:00:35 -------
Another link:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28074
------- Comment #8 From Chris Lattner 2008-02-10 02:11:14 -------
I think this is fixed on mainline, by generating movups instead of movaps when
referencing the stack.  Is this true, and if so, is this a code quality bug?
------- Comment #9 From Evan Cheng 2008-02-10 22:00:28 -------
I don't think this is fixed. I believe we can still generate load from stack
slots which are *believed* to be 16-byte aligned but are not. The problem is
it's isel lowering is assuming the stack ptr is properly aligned upon entry
into the function. But this is not true on some x86 targets. For those, e.g.
cygwin, the stack ptr must be dynamically aligned if 16-byte aligned stack
objects are created.
------- Comment #10 From Anton Korobeynikov 2008-04-23 13:45:33 -------
This should be fixed by series of patches r50152-r50173.

Stack realignment is currently enabled in 'easy' cases, when no dynamic-sized
objects are present.

First Last Prev Next    No search results available