Code for execution destructors of objects on stack is being duplicated for each throw point, where it could be largely shared. For example: struct F { ~F(); }; void couldThrow(); void test() { F A; couldThrow(); F B; couldThrow(); F C; couldThrow(); }
I'm upgrading this to 'normal' severity, and targetting it to 1.1 (despite it being a QOI issue), because this bug makes the C++ front-end almost useless for some programs.
A great stress test for LLVM in general by the C++ front-end specifically is: g++.dg/eh/cleanup1.C in the GCC testsuite. -Chris
Another test this should speed up is: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8361
On the 8361 bug, I get this: $ time llvmg++ generate-3.4.ii -S -ftime-report Execution times (seconds) preprocessing : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.34 ( 0%) wall parser : 12.56 ( 7%) usr 0.53 (31%) sys 21.29 ( 9%) wall name lookup : 2.86 ( 2%) usr 0.82 (48%) sys 5.64 ( 2%) wall expand : 163.80 (91%) usr 0.26 (15%) sys 207.74 (88%) wall varconst : 0.20 ( 0%) usr 0.03 ( 2%) sys 0.33 ( 0%) wall integration : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall TOTAL : 180.16 1.72 235.91 180.162u 1.736s 3:55.88 77.1% 0+0k 0+0io 1035pf+0w ... and a 26M .s file. :) -Chris
For the record, this is the code currently generated for the simple testcase: void %_Z4testv() { entry: %A = alloca %struct.F ; <%struct.F*> [#uses=6] %B = alloca %struct.F ; <%struct.F*> [#uses=4] %C = alloca %struct.F ; <%struct.F*> [#uses=2] invoke void %_Z10couldThrowv( ) to label %invoke_cont.0 except label %invoke_catch.0 invoke_catch.0: ; preds = %entry invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.0: ; preds = %entry invoke void %_Z10couldThrowv( ) to label %invoke_cont.1 except label %invoke_catch.1 invoke_catch.1: ; preds = %invoke_cont.0 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.11 except label %terminate invoke_cont.11: ; preds = %invoke_catch.1 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.1: ; preds = %invoke_cont.0 invoke void %_Z10couldThrowv( ) to label %invoke_cont.2 except label %invoke_catch.2 invoke_catch.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_cont.8 except label %terminate invoke_cont.8: ; preds = %invoke_catch.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.9 except label %terminate invoke_cont.9: ; preds = %invoke_cont.8 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_cont.3 except label %invoke_catch.3 invoke_catch.3: ; preds = %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.6 except label %terminate invoke_cont.6: ; preds = %invoke_catch.3 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.3: ; preds = %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.4 except label %invoke_catch.4 invoke_catch.4: ; preds = %invoke_cont.3 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.4: ; preds = %invoke_cont.3 call void %_ZN1FD1Ev( %struct.F* %A ) ret void rethrow: ; preds = %invoke_catch.0, %invoke_cont.11, %invoke_cont.9, %invoke_cont.6, %invoke_catch.4 unwind terminate: ; preds = %invoke_catch.0, %invoke_catch.1, %invoke_cont.11, %invoke_catch.2, %invoke_cont.8, %invoke_cont.9, %invoke_catch.3, %invoke_cont.6, %invoke_catch.4 call void %__llvm_cxxeh_call_terminate( ) ret void }
Created attachment 40 [details] Patch to simplify and unify all of the EH "cleanup" handling machinery This patch is a prerequisite for the actual grunt work required by Bug 11. This should not effect functionality in any significant way, though a few dead blocks are no longer emitted. -Chris
Ok, this follow-on patch is also useful: $ diff -u llvm-expand.c~ llvm-expand.c --- llvm-expand.c~ 2003-11-26 19:16:43.000000000 -0600 +++ llvm-expand.c 2003-11-26 22:29:20.000000000 -0600 @@ -739,7 +739,12 @@ */ switch (thisblock->desc) { case BLOCK_NESTING: - DeleteThisFixup = (f->target_bb == thisblock->x.block.StartBlock); + /* If the header for this block is not shared with an outer block, then + * the fixup is done if it is branching to the start block. + */ + if (!thisblock->next || + thisblock->next->x.block.StartBlock != thisblock->x.block.StartBlock) + DeleteThisFixup = (f->target_bb == thisblock->x.block.StartBlock); break; case LOOP_NESTING:
Created attachment 41 [details] Fix a bug with the previous patch, start sharing cleanups finally!! This patch builds on the previous ones posted on this bug to actually start sharing cleanups. With this bug, we reduce the testcase down to the code below, which can still be improved but is better. On testcases other than this, it makes a huge difference. I will continue working on this until the testcase below is as good as I can get it. void %_Z4testv() { entry: %A = alloca %struct.F ; <%struct.F*> [#uses=4] %B = alloca %struct.F ; <%struct.F*> [#uses=3] %C = alloca %struct.F ; <%struct.F*> [#uses=2] invoke void %_Z10couldThrowv( ) to label %invoke_cont.0 except label %invoke_catch.0 invoke_catch.0: ; preds = %entry, %invoke_cont.3 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.0: ; preds = %entry invoke void %_Z10couldThrowv( ) to label %invoke_cont.1 except label %invoke_catch.1 invoke_catch.1: ; preds = %invoke_cont.0, %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.8 except label %terminate invoke_cont.8: ; preds = %invoke_catch.1 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.1: ; preds = %invoke_cont.0 invoke void %_Z10couldThrowv( ) to label %invoke_cont.2 except label %invoke_catch.2 invoke_catch.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_cont.5 except label %terminate invoke_cont.5: ; preds = %invoke_catch.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.6 except label %terminate invoke_cont.6: ; preds = %invoke_cont.5 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_cont.3 except label %invoke_catch.1 invoke_cont.3: ; preds = %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.4 except label %invoke_catch.0 invoke_cont.4: ; preds = %invoke_cont.3 call void %_ZN1FD1Ev( %struct.F* %A ) ret void rethrow: ; preds = %invoke_catch.0, %invoke_cont.8, %invoke_cont.6 unwind terminate: ; preds = %invoke_catch.0, %invoke_catch.1, %invoke_cont.8, %invoke_catch.2, %invoke_co nt.5, %invoke_cont.6 call void %__llvm_cxxeh_call_terminate( ) ret void }
FWIW, on the GCC 8361 bug, I now get this: $ time llvmg++ generate-3.4.ii -S -ftime-report Execution times (seconds) preprocessing : 0.12 ( 0%) usr 0.04 ( 3%) sys 0.20 ( 0%) wall parser : 12.05 ( 7%) usr 0.48 (33%) sys 12.49 ( 7%) wall name lookup : 2.93 ( 2%) usr 0.80 (55%) sys 3.75 ( 2%) wall expand : 152.52 (91%) usr 0.07 ( 5%) sys 152.66 (90%) wall varconst : 0.22 ( 0%) usr 0.01 ( 1%) sys 0.19 ( 0%) wall integration : 0.17 ( 0%) usr 0.01 ( 1%) sys 0.22 ( 0%) wall TOTAL : 168.29 1.45 169.92 168.292u 1.464s 2:49.90 99.9% 0+0k 0+0io 1042pf+0w ... and a 24M .s file. This is a 7% speedup in the expand phase, and an 8% reduction in .s file size. -Chris
Created attachment 42 [details] Final cleanup sharing patch This patch builds on the previous two to eliminate as much duplication in the cleanup emission as is possible right now. In the simple testcase, we get optimal results, and you can't complain about that. :)
With all three patches applied (over 1180 lines of diff, aggregate), I now get the function below on the simple testcase, which cannot be improved anymore. It now contains 11 basic blocks total, compared to the 17 I started out at (and it should scale linearly now, not N^2). Though it's not a huge win on this testcase, it is the minimum, and helps more on other, larger, examples. For example on the large 8361 testcase. The G++ front-end speeds up by 8% and produces a 10% smaller .s file. The C front-end cleanup handling code has been much improved, and is now a lot simpler than it was when I started out on this odyssey. Simple N^2 example output (contrast with http://llvm.cs.uiuc.edu/bugs/show_bug.cgi?id=11#c5): void %_Z4testv() { entry: %A = alloca %struct.F ; <%struct.F*> [#uses=2] %B = alloca %struct.F ; <%struct.F*> [#uses=2] %C = alloca %struct.F ; <%struct.F*> [#uses=2] invoke void %_Z10couldThrowv( ) to label %invoke_cont.0 except label %invoke_catch.0 invoke_catch.0: ; preds = %entry, %invoke_catch.1, %invoke_cont.3 invoke void %_ZN1FD1Ev( %struct.F* %A ) to label %rethrow except label %terminate invoke_cont.0: ; preds = %entry invoke void %_Z10couldThrowv( ) to label %invoke_cont.1 except label %invoke_catch.1 invoke_catch.1: ; preds = %invoke_cont.0, %invoke_catch.2, %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_catch.0 except label %terminate invoke_cont.1: ; preds = %invoke_cont.0 invoke void %_Z10couldThrowv( ) to label %invoke_cont.2 except label %invoke_catch.2 invoke_catch.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_catch.1 except label %terminate invoke_cont.2: ; preds = %invoke_cont.1 invoke void %_ZN1FD1Ev( %struct.F* %C ) to label %invoke_cont.3 except label %invoke_catch.1 invoke_cont.3: ; preds = %invoke_cont.2 invoke void %_ZN1FD1Ev( %struct.F* %B ) to label %invoke_cont.4 except label %invoke_catch.0 invoke_cont.4: ; preds = %invoke_cont.3 call void %_ZN1FD1Ev( %struct.F* %A ) ret void rethrow: ; preds = %invoke_catch.0 unwind terminate: ; preds = %invoke_catch.0, %invoke_catch.1, %invoke_catch.2 call void %__llvm_cxxeh_call_terminate( ) ret void }