33668 – Excessive memory and CPU use in tail duplication and associated passes due to critical edge splitting

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 33668 - Excessive memory and CPU use in tail duplication and associated passes due to critical edge splitting

Summary: Excessive memory and CPU use in tail duplication and associated passes due to...

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Kyle Butt

URL:
Keywords:	slow-compile

Depends on:
Blocks:	release-5.0.1
	Show dependency tree

Reported:	2017-07-01 03:50 PDT by Jörg Sonnenberger
Modified:	2018-02-02 01:58 PST (History)
CC List:	8 users (show)

See Also:
Fixed By Commit(s):

Attachments
Compile for AMD64 with -O2 (238.28 KB, text/x-csrc) 2017-07-01 03:50 PDT, Jörg Sonnenberger	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jörg Sonnenberger 2017-07-01 03:50:38 PDT

Created attachment 18740 [details]
Compile for AMD64 with -O2

The attached (unreduced) test case needs 1.6GB peak RAM and ~3min on my laptop after r296416, before it took 100MB and 1s.

Comment 1 Michael Kuperstein 2017-07-06 11:25:41 PDT

I think critical edge splitting is doing the right thing here, and splitting the critical edges enables (early) tail duplication of the computed goto.

Now, in theory, that also seems like the right thing to do, since this allows the computed gotos to be better predicted, which is pretty much the point of the whole exercise. But it looks like actually performing the tail duplication is slow, and regalloc can't really handle the result well:

  73.0840 ( 43.7%)   0.1280 ( 12.3%)  73.2120 ( 43.5%)  73.2136 ( 43.5%)  Simple Register Coalescing
  37.5720 ( 22.5%)   0.6480 ( 62.3%)  38.2200 ( 22.7%)  38.2208 ( 22.7%)  Tail Duplication
  32.3400 ( 19.4%)   0.1080 ( 10.4%)  32.4480 ( 19.3%)  32.4459 ( 19.3%)  Eliminate PHI nodes for register allocation

Passing -disable-early-taildup makes this go away, so regalloc is fine with the edges just being split, without duplication:
   3.5280 ( 66.2%)   0.0000 (  0.0%)   3.5280 ( 65.3%)   3.5313 ( 65.4%)  Branch Probability Basic Block Placement
   0.4120 (  7.7%)   0.0040 (  5.3%)   0.4160 (  7.7%)   0.4154 (  7.7%)  Simple Register Coalescing
   0.3080 (  5.8%)   0.0120 ( 15.8%)   0.3200 (  5.9%)   0.3196 (  5.9%)  Machine Block Frequency Analysis

I'm not really sure what we want to do here, but I'd say the thing that should be bailing early here is taildup, not edge splitting. Adding Kyle as the resident taildup expert.

Comment 2 Hans Wennborg 2017-08-23 15:33:32 PDT

Kyle, did you have a chance to look into this?

Comment 3 Hans Wennborg 2017-08-29 10:02:54 PDT

Unblocking 5.0.0 as there seems to be no interest in fixing :-/