Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SROA of doubles confuses ARM codegen #13764

Closed
stoklund mannequin opened this issue Jul 18, 2012 · 4 comments
Closed

SROA of doubles confuses ARM codegen #13764

stoklund mannequin opened this issue Jul 18, 2012 · 4 comments
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@stoklund
Copy link
Mannequin

stoklund mannequin commented Jul 18, 2012

Bugzilla Link 13392
Resolution FIXED
Resolved on Nov 19, 2012 19:17
Version trunk
OS All
CC @asl,@chandlerc,@lattner,@sunfishcode

Extended Description

I added a test case to the nightly test suite: SingleSource/Benchmarks/Misc/matmul_f64_4x4.c

A function multiplies two matrices and puts the result in a local buffer before copying it out:

static void mul4(double *Out, const double A[4][4], const double B[4][4]) {
double Res[16];
Res[0] = ...
...
Res[15] = ...
for (n = 0; n < 16; ++n)
Out[n] = Res[n];
}

SROA converts the double[16] buffer to an i1024:

%188 = fadd double %184, %187
%189 = bitcast double %188 to i64
%190 = zext i64 %189 to i1024
%191 = shl nuw nsw i1024 %190, 768

%ins = or i1024 %mask, %191
%222 = bitcast double* %c to i1024*
store i1024 %ins, i1024* %222, align 4
ret void

This completely confuses ARM codegen which is unable to recover the original stores. Instead it copies the doubles to the GPRs and stores i32s:

    vmov    r12, r9, d13
    str.w   r12, [r0, #&#8203;96]
    str.w   r9, [r0, #&#8203;100]

We wanted a simple f64 store:

    vstr   d13, [r0, #&#8203;96]

The problem seems to begin with LegalizeTypes. It breaks the i1024 store into i32 stores because i64 is not a legal ARM type.

@sunfishcode
Copy link
Member

It looks like the problem is in SROA. It isn't supposed to be creating crazy things like i1024 unless it has no better options. In this testcase, "regular" SROA -- splitting double[16] into 16 doubles -- seems to be a much better option.

@stoklund
Copy link
Mannequin Author

stoklund mannequin commented Jul 19, 2012

So it seems there are two problems.

SROA: Don't do that.
LegalizeTypes: Consider legal floating point and vector types before breaking up giant integers.

@lattner
Copy link
Collaborator

lattner commented Jul 19, 2012

This should be handled at the IR level. We shouldn't be making SelectionDAG more complicated to try to paper over this. Selection dag only sees things one basic block at a time, so it can't fix the general case.

@chandlerc
Copy link
Member

The core issue here (SROA doing bad things to matmul) was fixed by the SROA rewrite.

Note that it in turn exposed some register allocation bugs that caused the benchmark to slow down in some rare cases. ;] Anyways, I think this bug is done.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants