Especially on FP intensive codes, it looks like the C Writer is bad. For example, in the oopack test (with options: Max=300000 Matrix=4000 Complex=200000 Iterator=600000), we get this native: Seconds Mflops Test Iterations C OOP C OOP Ratio ---- ---------- ----------- ----------- ----- Max 300000 1.4 1.4 209.8 209.8 1.0 Matrix 4000 1.3 1.3 781.2 751.9 1.0 Complex 200000 1.3 10.5 1250.0 152.8 8.2 Iterator 600000 1.4 1.4 863.3 857.1 1.0 And the CWriter produces this: Seconds Mflops Test Iterations C OOP C OOP Ratio ---- ---------- ----------- ----------- ----- Max 300000 1.5 1.5 202.7 205.5 1.0 Matrix 4000 2.5 2.5 408.2 393.7 1.0 Complex 200000 5.2 7.2 310.7 223.5 1.4 Iterator 600000 1.4 1.4 863.3 857.1 1.0 Note that especially the Matrix & Complex timings are horrible. Both are compiled with GCC 3.3, -O3 -fomit-frame-pointer. Note that the LLVM optimizer is being fairly aggressive with C++ code, so it's not directly comperable, but C codes are.
Applied this patch: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20031006/007966.html This improves the numbers to this: Seconds Mflops Test Iterations C OOP C OOP Ratio ---- ---------- ----------- ----------- ----- Max 300000 1.5 1.5 202.7 204.1 1.0 Matrix 4000 2.3 2.5 436.7 393.7 1.1 Complex 200000 1.3 2.7 1221.4 599.3 2.0 Iterator 600000 2.9 1.4 415.2 863.3 0.5 It is not clear why the iterator/C benchmark dropped so much, the generated C code is almost identical. I'm inclined to think this is just wierdness in the GCC code generator, but will look into it. The Iterator/OOP benchmark didn't drop.
Upon inspection of the Iterator regression, it just looks like the GCC code generator is doing something really stupid with the C version (interactions with the X86 FP stack go poorly), it's nothing that is LLVM's fault.
The problem with the matrix test appears to be due to the fact that the CFG simplification pass it turning three nicely nested loops: for( int i=0; i<L; i++ ) for( int j=0; j<L; j++ ) { double sum = 0; for( int k=0; k<L; k++ ) sum += C[L*i+k]*D[L*k+j]; E[L*i+j] = sum; Into a single natural loop with three backedges. This should be fixed, matrix multiplication is sorta important for some people. :)
All of the C backend specific problems have now been addressed. The generic LLVM improvement necessary to get good performance on the Matrix test has been refiled as Bug 35.