LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 13392 - SROA of doubles confuses ARM codegen
Summary: SROA of doubles confuses ARM codegen
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Backend: ARM (show other bugs)
Version: trunk
Hardware: All All
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 13:29 PDT by Jakob Stoklund Olesen
Modified: 2012-11-19 19:17 PST (History)
6 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakob Stoklund Olesen 2012-07-18 13:29:15 PDT
I added a test case to the nightly test suite: SingleSource/Benchmarks/Misc/matmul_f64_4x4.c

A function multiplies two matrices and puts the result in a local buffer before copying it out:

static void mul4(double *Out, const double A[4][4], const double B[4][4]) {
  double Res[16];
  Res[0] = ...
  ...
  Res[15] = ...
  for (n = 0; n < 16; ++n)
    Out[n] = Res[n];
}

SROA converts the double[16] buffer to an i1024:

 %188 = fadd double %184, %187
 %189 = bitcast double %188 to i64
 %190 = zext i64 %189 to i1024
 %191 = shl nuw nsw i1024 %190, 768
 …
 %ins = or i1024 %mask, %191
 %222 = bitcast double* %c to i1024*
 store i1024 %ins, i1024* %222, align 4
 ret void

This completely confuses ARM codegen which is unable to recover the original stores. Instead it copies the doubles to the GPRs and stores i32s:

        vmov    r12, r9, d13
        str.w   r12, [r0, #96]
        str.w   r9, [r0, #100]

We wanted a simple f64 store:

        vstr   d13, [r0, #96]

The problem seems to begin with LegalizeTypes. It breaks the i1024 store into i32 stores because i64 is not a legal ARM type.
Comment 1 Dan Gohman 2012-07-18 15:36:06 PDT
It looks like the problem is in SROA. It isn't supposed to be creating crazy things like i1024 unless it has no better options. In this testcase, "regular" SROA -- splitting double[16] into 16 doubles -- seems to be a much better option.
Comment 2 Jakob Stoklund Olesen 2012-07-18 17:38:06 PDT
So it seems there are two problems.

SROA: Don't do that.
LegalizeTypes: Consider legal floating point and vector types before breaking up giant integers.
Comment 3 Chris Lattner 2012-07-18 19:37:47 PDT
This should be handled at the IR level.  We shouldn't be making SelectionDAG more complicated to try to paper over this.  Selection dag only sees things one basic block at a time, so it can't fix the general case.
Comment 4 Chandler Carruth 2012-11-19 19:17:49 PST
The core issue here (SROA doing bad things to matmul) was fixed by the SROA rewrite.

Note that it in turn exposed some register allocation bugs that caused the benchmark to *slow down* in some rare cases. ;] Anyways, I think this bug is done.