LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 1649 - Invalid alignment for SSE2 code
Summary: Invalid alignment for SSE2 code
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: X86 (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Evan Cheng
URL:
Keywords:
: 1671 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-09-11 02:09 PDT by Anton Korobeynikov
Modified: 2008-04-23 13:45 PDT (History)
4 users (show)

See Also:
Fixed By Commit(s):


Attachments
Function extracted (4.80 KB, application/octet-stream)
2007-09-11 02:09 PDT, Anton Korobeynikov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Anton Korobeynikov 2007-09-11 02:09:57 PDT
Created attachment 1113 [details]
Function extracted

Consider attached testcase.

In bb79:

bb79:           ; preds = %bb57
        %tmp81 = load double* %tmp68, align 16          ; <double> [#uses=1]
        %tmp82 = sub double -0.000000e+00, %tmp81               ; <double> [#uses=1]
        store double %tmp82, double* %tmp68, align 16
        %tmp84 = getelementptr [3 x double]* %Raw_Normal, i32 0, i32 1          ; <double*> [#uses=2]
        %tmp85 = load double* %tmp84, align 8           ; <double> [#uses=1]
        %tmp86 = sub double -0.000000e+00, %tmp85               ; <double> [#uses=1]
        store double %tmp86, double* %tmp84, align 8
        %tmp88 = getelementptr [3 x double]* %Raw_Normal, i32 0, i32 2          ; <double*> [#uses=2]
        %tmp89 = load double* %tmp88, align 8           ; <double> [#uses=1]
        %tmp90 = sub double -0.000000e+00, %tmp89               ; <double> [#uses=1]
        store double %tmp90, double* %tmp88, align 8
        br label %bb92

This is codegen'ed (-disable-fp-elim) into:

        movsd   .LCPI1_0, %xmm0
        movapd  %xmm0, %xmm1
        xorpd   -104(%ebp), %xmm1
        movsd   %xmm1, -104(%ebp)
        movsd   -96(%ebp), %xmm1
        xorpd   %xmm0, %xmm1
        movsd   %xmm1, -96(%ebp)
        movsd   -88(%ebp), %xmm1
        xorpd   %xmm0, %xmm1
        movsd   %xmm1, -88(%ebp)

We definitely see a problem: even if %ebp is 16-bytes aligned, the access of temporaries is not aligned at all.

This causes povray miscompilation at -O2.
Comment 1 Evan Cheng 2007-09-13 14:04:21 PDT
        xorpd   -104(%ebp), %xmm1
        movsd   %xmm1, -104(%ebp)
        movsd   -96(%ebp), %xmm1
        movsd   %xmm1, -96(%ebp)
        movsd   -88(%ebp), %xmm1
        movsd   %xmm1, -88(%ebp)

Only xorpd -104(%ebp) has to be aligned on 16-byte. movsd only require 8-byte alignment.

Furthermore, the prologue looks like this:

        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $172, %esp

Assuming %esp was 16-byte aligned before the call. The call push 4-byte and the first pushl %ebp decrement another 4 bytes from %esp. So %ebp = old %esp - 8 = new %esp + 184.

-104(%ebp) is 16-byte aligned. This is correct assuming %esp is 16-byte aligned. But is that true? Linux stack alignment requirement is 8-byte only.

Anton, can you tell me the value of %esp upon entry?
Comment 2 Anton Korobeynikov 2007-09-13 14:18:33 PDT
Hrm, it seems you're right.

%ebp is 0xbfe24bf8 for me :( Any ideas how this can be fixed? Can result of PR1636 be used to solve this issue?
Comment 3 Evan Cheng 2007-09-13 15:55:27 PDT
We need to dynamically align the stack ptr in the prologue. This means using fp to reference objects in the caller's frame and using sp to reference local objects. If alloca() are used, that means keeping another pointer to reference local objects? I am not sure.

I probably won't be getting to this any time soon. Anton, would you like to take a crack at it? :-)

Comment 4 Anton Korobeynikov 2007-09-14 06:24:45 PDT
This seems to be exactly the task of PR1636. So, if we'll have sane stack realignment facility both PRs will be fixed.

I'll try to dig into this (hope really soon).
Comment 5 Anton Korobeynikov 2007-09-15 16:17:44 PDT
*** Bug 1671 has been marked as a duplicate of this bug. ***
Comment 6 Chris Lattner 2007-10-03 17:09:32 PDT
The GCC folks are discussing stack realignment, some interesting links here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660
Comment 7 Anton Korobeynikov 2007-10-04 00:00:35 PDT
Another link:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28074
Comment 8 Chris Lattner 2008-02-10 02:11:14 PST
I think this is fixed on mainline, by generating movups instead of movaps when referencing the stack.  Is this true, and if so, is this a code quality bug?
Comment 9 Evan Cheng 2008-02-10 22:00:28 PST
I don't think this is fixed. I believe we can still generate load from stack slots which are *believed* to be 16-byte aligned but are not. The problem is it's isel lowering is assuming the stack ptr is properly aligned upon entry into the function. But this is not true on some x86 targets. For those, e.g. cygwin, the stack ptr must be dynamically aligned if 16-byte aligned stack objects are created.
Comment 10 Anton Korobeynikov 2008-04-23 13:45:33 PDT
This should be fixed by series of patches r50152-r50173.

Stack realignment is currently enabled in 'easy' cases, when no dynamic-sized objects are present.