31 – CWriter generated code performs poorly

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 31 - CWriter generated code performs poorly

Summary: CWriter generated code performs poorly

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Backend: C (show other bugs)
Version:	1.0
Hardware:	PC Linux

Importance:	P normal
Assignee:	Chris Lattner

URL:
Keywords:

Depends on:
Blocks:

Reported:	2003-10-11 23:11 PDT by Chris Lattner
Modified:	2010-02-22 12:47 PST (History)
CC List:	1 user (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chris Lattner 2003-10-11 23:11:35 PDT

Especially on FP intensive codes, it looks like the C Writer is bad.  For
example, in the oopack test (with options: Max=300000 Matrix=4000 Complex=200000
Iterator=600000), we get this native:

                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            300000    1.4   1.4  209.8 209.8    1.0
Matrix           4000    1.3   1.3  781.2 751.9    1.0
Complex        200000    1.3  10.5 1250.0 152.8    8.2
Iterator       600000    1.4   1.4  863.3 857.1    1.0

And the CWriter produces this:

                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            300000    1.5   1.5  202.7 205.5    1.0
Matrix           4000    2.5   2.5  408.2 393.7    1.0
Complex        200000    5.2   7.2  310.7 223.5    1.4
Iterator       600000    1.4   1.4  863.3 857.1    1.0

Note that especially the Matrix & Complex timings are horrible.

Both are compiled with GCC 3.3, -O3 -fomit-frame-pointer.  Note that the LLVM
optimizer is being fairly aggressive with C++ code, so it's not directly
comperable, but C codes are.

Comment 1 Chris Lattner 2003-10-11 23:38:41 PDT

Applied this patch:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031006/007966.html

This improves the numbers to this:

                         Seconds       Mflops         
Test       Iterations     C    OOP     C    OOP  Ratio
----       ----------  -----------  -----------  -----
Max            300000    1.5   1.5  202.7 204.1    1.0
Matrix           4000    2.3   2.5  436.7 393.7    1.1
Complex        200000    1.3   2.7 1221.4 599.3    2.0
Iterator       600000    2.9   1.4  415.2 863.3    0.5

It is not clear why the iterator/C benchmark dropped so much, the generated C
code is almost identical.  I'm inclined to think this is just wierdness in the
GCC code generator, but will look into it.  The Iterator/OOP benchmark didn't drop.

Comment 2 Chris Lattner 2003-10-11 23:48:09 PDT

Upon inspection of the Iterator regression, it just looks like the GCC code
generator is doing something really stupid with the C version (interactions with
the X86 FP stack go poorly), it's nothing that is LLVM's fault.

Comment 3 Chris Lattner 2003-10-12 14:38:13 PDT

The problem with the matrix test appears to be due to the fact that the CFG
simplification pass it turning three nicely nested loops:

    for( int i=0; i<L; i++ )
        for( int j=0; j<L; j++ ) {
            double sum = 0;
            for( int k=0; k<L; k++ )
                sum += C[L*i+k]*D[L*k+j];
            E[L*i+j] = sum;
 
Into a single natural loop with three backedges.  This should be fixed, matrix
multiplication is sorta important for some people. :)

Comment 4 Chris Lattner 2003-10-12 20:13:06 PDT

All of the C backend specific problems have now been addressed.  The generic
LLVM improvement necessary to get good performance on the Matrix test has been
refiled as Bug 35.