39545 – Make @llvm.eh.typeid.for outlining-friendly

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 39545 - Make @llvm.eh.typeid.for outlining-friendly

Summary: Make @llvm.eh.typeid.for outlining-friendly

Status:	RESOLVED LATER

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Interprocedural Optimizations (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P normal
Assignee:	Vedant Kumar

URL:
Keywords:

Duplicates (1):	39917 (view as bug list)
Depends on:
Blocks:

Reported:	2018-11-02 16:32 PDT by Vedant Kumar
Modified:	2019-01-17 14:19 PST (History)
CC List:	11 users (show)

See Also:
Fixed By Commit(s):

Attachments
IR with hot/cold splitting (28.42 KB, text/plain) 2018-11-02 16:32 PDT, Vedant Kumar	Details
IR without hot/cold splitting (26.28 KB, text/plain) 2018-11-02 16:33 PDT, Vedant Kumar	Details
Patches to lift EH type tables into IR and share them (7.21 KB, application/x-gzip) 2019-01-17 11:42 PST, Vedant Kumar	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Vedant Kumar 2018-11-02 16:32:53 PDT

Created attachment 21077 [details]
IR with hot/cold splitting

When running this test program with -hot-cold-split=true (with https://reviews.llvm.org/D53887 applied), we’re supposed to see this output:

```
Caught exception: 0: base class
Caught exception: 1: base class
Caught exception: 2: base class
Caught exception: 3: base class
Caught exception: 4: base class
Caught exception: 5: base class
Caught exception: 6: base class
Caught exception: 7: base class
Caught exception: 8: base class
Caught exception: 9: base class
Caught exception: 0: derived class
Caught exception: 1: derived class
Caught exception: 2: derived class
Caught exception: 3: derived class
Caught exception: 4: derived class
Caught exception: 5: derived class
Caught exception: 6: derived class
Caught exception: 7: derived class
Caught exception: 8: derived class
Caught exception: 9: derived class
Caught exception: 0: base class
Caught exception: std::exception
Caught exception: std::exception
Caught unknown exception
Caught unknown exception
```

Instead, we see this:

```
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught exception: 0: derived class
Caught exception: 1: derived class
Caught exception: 2: derived class
Caught exception: 3: derived class
Caught exception: 4: derived class
Caught exception: 5: derived class
Caught exception: 6: derived class
Caught exception: 7: derived class
Caught exception: 8: derived class
Caught exception: 9: derived class
Caught unknown exception
Caught exception: std::exception
Caught exception: std::exception
<segfault>
```

Here’s where we crash:

Regression-C++-class_hierarchy`main.cold.2:
    0x100001d47 <+0>:  pushq  %rbp
    0x100001d48 <+1>:  movq   %rsp, %rbp
    0x100001d4b <+4>:  pushq  %rbx
    0x100001d4c <+5>:  pushq  %rax
    0x100001d4d <+6>:  cmpl   $0x1, %edi
    0x100001d50 <+9>:  jne    0x100001d6c               ; <+37>
    0x100001d52 <+11>: movq   %rsi, %rdi
    0x100001d55 <+14>: callq  0x100001dcc               ; symbol stub for: __cxa_begin_catch
    0x100001d5a <+19>: movb   0x8(%rax), %cl
    0x100001d5d <+22>: addb   $0x30, %cl
    0x100001d60 <+25>: movq   0x10(%rax), %rdx
->  0x100001d64 <+29>: movb   %cl, (%rdx)
 
In the outlined catch path, we pass 0x101 (why?) to __cxa_begin_catch:

(lldb) reg read $rdi
     rdi = 0x0000000000000101

__cxa_begin_catch then returns a pointer to:

(lldb) x/8 $rax
0x100202e60: 0x00001f04 0x00000001 0x00000000 0x00000000
0x100202e70: 0x00000000 0x00000000 0x00000000 0x00000000

Clearly [$rax+16] is null, and we crash when we attempt to dereference that. But something looks like it's gone wrong by the point __cxa_begin_catch is called.

Looking at the IR before and after outlining, the issue isn't immediately jumping out at me. I've filed this because I'm worried this isn't just an issue with D53887, i.e. that there might be a more general problem with how CodeExtractor handles landingpads. I've attached the IR here.

Comment 1 Vedant Kumar 2018-11-02 16:33:28 PDT

Created attachment 21078 [details]
IR without hot/cold splitting

Comment 2 hiraditya 2018-11-02 22:09:09 PDT

Looking at the IR I think Code Extractor (CE) isn't handling Multiple Exit Regions correctly. Do we have examples/testcases for ME regions which work? How does CE handle branches outside the region, for e.g, does it insert return statements properly? Also, how about the side-effects (stores etc.); for ME the side-effects also flow from multiple paths and CE may not be handling those correctly.

Comment 3 Vedant Kumar 2018-11-04 14:53:53 PST

(In reply to hiraditya from comment #2)
> Looking at the IR I think Code Extractor (CE) isn't handling Multiple Exit
> Regions correctly.

I missed this. What do you think looks suspect?

> Do we have examples/testcases for ME regions which work?

Here's one: https://github.com/llvm-mirror/llvm/blob/master/test/Transforms/HotColdSplit/multiple-exits.ll

> How does CE handle branches outside the region, for e.g, does it insert
> return statements properly?

Skimming emitCallAndSwitchStatement, it looks like CE finds all exits which are outside of the to-be-extracted region and creates stub blocks for them in the outlined function. Branches are rewritten to jump to the right exit stub, and the exit stub returns a number which the caller can switch on. There's some logic to simplify/eliminate the switch in simple cases.

> Also, how about the side-effects (stores etc.);
> for ME the side-effects also flow from multiple paths and CE may not be
> handling those correctly.

I'm not sure whether there's an issue here, but fwiw, I'm seeing the incorrect results from single-exit outlined regions containing __cxa_throw.

Comment 4 Vedant Kumar 2018-11-04 17:09:43 PST

I'm starting to suspect that the underlying bug here is in the lowering of @llvm.eh.typeid.for.

In SelectionDAG, we look up type id's within a MachineFunction:

```
  case Intrinsic::eh_typeid_for: {
    // Find the type id for the given typeinfo.
    GlobalValue *GV = ExtractTypeInfo(I.getArgOperand(0));
    unsigned TypeID = DAG.getMachineFunction().getTypeIDFor(GV);
    Res = DAG.getConstant(TypeID, sdl, MVT::i32);
    setValue(&I, Res);
    return nullptr;
  }
```

And MachineFunction gives us the next available ID unless it's seen the type info already:

```
unsigned MachineFunction::getTypeIDFor(const GlobalValue *TI) {
  for (unsigned i = 0, N = TypeInfos.size(); i != N; ++i)
    if (TypeInfos[i] == TI) return i + 1;

  TypeInfos.push_back(TI);
  return TypeInfos.size();
}
```

But, I think this breaks down when you enable outlining. We're passing a type id retrieved from a landingpad in a parent function, to an outlined thunk:

```
  %3 = tail call i32 @llvm.eh.typeid.for(i8* bitcast ({ i8*, i8* }* @_ZTI3foo to i8*)) #11
  ...
  call void @main.cold.1(i32 %2, i8* %1, i32 %i.056) #8
```

The outlined thunk tries to compare a passed-in type id to one synthesized within the outlined code:

```
define internal void @main.cold.1(i32, i8*, i32 %i.056) #7 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
newFuncRoot:
  %2 = tail call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*)) #11
  %matches1 = icmp eq i32 %0, %2
  ...
```

At which point it's comparing the wrong ids, because the outlined thunk's MachineFunction has an incompatible TypeInfos list. I think the fix here would be to ensure that an outlined thunk has the same type info -> ID map as its parent.

Full disclosure -- I've been looking at a failure in SingleSource/Regression/C++/EH/throw_rethrow_test.cpp to debug this. It's the only other test-suite failure with D53887 applied, and it's simpler to understand.

The reference output is:

```
0: 1
1: 1
2: 1
3: 2
4: 2
5: 2
6: 3
7: 3
8: 3
exit 0
```

With D53887, the actual output is:

```
0: 2
1: 2
2: 2
3: 1
4: 1
5: 1
6: 3
7: 3
8: 3
exit 0
```

Comment 5 Reid Kleckner 2018-11-05 10:24:55 PST

(In reply to Vedant Kumar from comment #4)
> I'm starting to suspect that the underlying bug here is in the lowering of
> @llvm.eh.typeid.for.
> 
> In SelectionDAG, we look up type id's within a MachineFunction:
> 
> ...
> 
> And MachineFunction gives us the next available ID unless it's seen the type
> info already:
> ...
> But, I think this breaks down when you enable outlining. We're passing a
> type id retrieved from a landingpad in a parent function, to an outlined
> thunk:

Yes, as designed, landingpad + llvm.eh.typeid.for aren't going to work in the presence of outlining. They were designed with the goal of making *inlining* easy, and I guess outlining was out of scope. :) llvm.eh.typeid.for produces a selector index into a table in the LSDA of the function it is contained in. Moving it to a different function will give you an index into a different catch info table.

John, any ideas for how to address this?

Here's my (probably bad) idea. Make a new intrinsic, llvm.eh.outlined.typeid.for, that takes two parameters: the function whose LSDA we want to get the selector index from, and the typeinfo whose index we want. So:
  call i32 @llvm.eh.outlined.typeid.for(i8* bitcast ... @main, i8* bitcast ... @_ZTI3foo)

Right now we lower these intrinsics to plain immediate i32s. This new intrinsic would lower to materialize an MCSymbol into an i32, kind of like we do for llvm.localescape / llvm.localrecover, but more efficient since we promise it's an i32. We'd try to generate:
  MOV32ri $.Leh_selector_main_ZTI3foo, %EDX

And when we compile main, we'd define the MCSymbol so the assembler can fix it up for us later:
  .Leh_selector_main_ZTI3foo = 3 # or .set depending on your assembly dialect

At least, that was my preferred way to pass information from one MachineFunction compilation to the next: by abusing MC, since the assembler is the last whole-module thing that LLVM does.

Another idea would be to emit a table of i32s on the side that does translation from the selectors of one function to the selectors of any other. We'd emit them at the end of the module after we've codegened all instances of this intrinsic.

Comment 6 John McCall 2018-11-05 10:58:26 PST

This is IR-level outlining?  And you're outlining part of the landing pad, but not the original call?

So, from a high level, the best way to "outline" this — to achieve better code density/locality of the non-EH code — is almost certainly just to emit all the code only reachable via landing pads in a separate top-level object which can be placed far away from the normal function.  That would, of course, have to be done in the code generator, not as an IR outliner, so it's not directly applicable to your question here.

I see two reasonable approaches for this, both of which require extra work during outlining:

- Adopt Reid's @llvm.eh.outlined.typeid.for idea, which I think would require typeid assignments to be recorded globally.
- Add some sort of @llvm.eh.typeid.declare intrinsic or attribute which assigns specific integer values to specific types and then replaces @llvm.eh.typeid.for calls with the assigned constant.

The second idea is probably cleaner.  The big disadvantage is that it's totally incompatible with further inlining.  The advantage is that the assignment can happen in its own pass, which would allow ordinary intraprocedural optimizations to be run after it, such as switch formation.  The outliner then only has to (1) copy the assignment into the outlined function and (2) refuse to outline @llvm.eh.typeid.for calls if for some reason they still exist.

Comment 7 Vedant Kumar 2018-11-05 11:34:57 PST

(In reply to John McCall from comment #6)
> This is IR-level outlining?  And you're outlining part of the landing pad,
> but not the original call?

¥es.


> So, from a high level, the best way to "outline" this — to achieve better
> code density/locality of the non-EH code — is almost certainly just to emit
> all the code only reachable via landing pads in a separate top-level object
> which can be placed far away from the normal function.  That would, of
> course, have to be done in the code generator, not as an IR outliner, so
> it's not directly applicable to your question here.

Why is this out of scope for an IR outliner? With a small linker change, it's possible to order outlined functions towards the end of the __text section.


> I see two reasonable approaches for this, both of which require extra work
> during outlining:
> 
> - Adopt Reid's @llvm.eh.outlined.typeid.for idea, which I think would
> require typeid assignments to be recorded globally.
> - Add some sort of @llvm.eh.typeid.declare intrinsic or attribute which
> assigns specific integer values to specific types and then replaces
> @llvm.eh.typeid.for calls with the assigned constant.

Thanks for these suggestions!

I tried a different approach, and I'm wondering now whether it's too naive. I made the type info -> type ID mapping global (i.e. shared via MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the EH tables too much?

Comment 8 John McCall 2018-11-05 11:50:29 PST

(In reply to Vedant Kumar from comment #7)
> (In reply to John McCall from comment #6)
> > So, from a high level, the best way to "outline" this — to achieve better
> > code density/locality of the non-EH code — is almost certainly just to emit
> > all the code only reachable via landing pads in a separate top-level object
> > which can be placed far away from the normal function.  That would, of
> > course, have to be done in the code generator, not as an IR outliner, so
> > it's not directly applicable to your question here.
> 
> Why is this out of scope for an IR outliner? With a small linker change,
> it's possible to order outlined functions towards the end of the __text
> section.

I was suggesting that the landing pads themselves be moved into the other "function", so that the EH table would tell the unwinder to just resume directly in that function; that's not structurally expressible in IR.  Also, EH code sometimes needs to be able to branch back to blocks that are reachable from non-EH blocks, e.g. if there's a catch.

> > I see two reasonable approaches for this, both of which require extra work
> > during outlining:
> > 
> > - Adopt Reid's @llvm.eh.outlined.typeid.for idea, which I think would
> > require typeid assignments to be recorded globally.
> > - Add some sort of @llvm.eh.typeid.declare intrinsic or attribute which
> > assigns specific integer values to specific types and then replaces
> > @llvm.eh.typeid.for calls with the assigned constant.
> 
> Thanks for these suggestions!
> 
> I tried a different approach, and I'm wondering now whether it's too naive.
> I made the type info -> type ID mapping global (i.e. shared via
> MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the
> EH tables too much?

Certainly in theory, yeah.  A specific function's EH table would only need to contain entries up to the last type it actually uses, but you could end up with a function that has to include dozens of unnecessary entries just because they appear elsewhere in the module.

Comment 9 Vedant Kumar 2018-11-05 15:15:46 PST

(In reply to John McCall from comment #8)
> (In reply to Vedant Kumar from comment #7)
> > (In reply to John McCall from comment #6)
> > > So, from a high level, the best way to "outline" this — to achieve better
> > > code density/locality of the non-EH code — is almost certainly just to emit
> > > all the code only reachable via landing pads in a separate top-level object
> > > which can be placed far away from the normal function.  That would, of
> > > course, have to be done in the code generator, not as an IR outliner, so
> > > it's not directly applicable to your question here.
> > 
> > Why is this out of scope for an IR outliner? With a small linker change,
> > it's possible to order outlined functions towards the end of the __text
> > section.
> 
> I was suggesting that the landing pads themselves be moved into the other
> "function", so that the EH table would tell the unwinder to just resume
> directly in that function; that's not structurally expressible in IR.  Also,
> EH code sometimes needs to be able to branch back to blocks that are
> reachable from non-EH blocks, e.g. if there's a catch.

That's a good point. The IR outliner gets some mileage out of extracting blocks dominated by a landing pad, but it could do more.

Could we specify an outlined unwind target in IR (say, by adding `unwind function` syntax to the `invoke` instruction)? For code that needs to branch back to non-EH blocks, we might use blockaddress+indirectbr.


> > > I see two reasonable approaches for this, both of which require extra work
> > > during outlining:
> > > 
> > > - Adopt Reid's @llvm.eh.outlined.typeid.for idea, which I think would
> > > require typeid assignments to be recorded globally.
> > > - Add some sort of @llvm.eh.typeid.declare intrinsic or attribute which
> > > assigns specific integer values to specific types and then replaces
> > > @llvm.eh.typeid.for calls with the assigned constant.
> > 
> > Thanks for these suggestions!
> > 
> > I tried a different approach, and I'm wondering now whether it's too naive.
> > I made the type info -> type ID mapping global (i.e. shared via
> > MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the
> > EH tables too much?
> 
> Certainly in theory, yeah.  A specific function's EH table would only need
> to contain entries up to the last type it actually uses, but you could end
> up with a function that has to include dozens of unnecessary entries just
> because they appear elsewhere in the module.

Makes sense. Doing this increases the aggregate size of __gcc_except_tab sections across *.o's in the test-suite by 7.1% (or 39.5 KB). That's not too bad compared to the aggregate size of __text (41.8 MB). Can we land that change as a bug fix until we implement something like @llvm.eh.typeid.declare?

Actually, if every function has the same type info -> ID mapping, wouldn't it be better to share the mappings? I suppose that would require changing the format for __gcc_except_tab and updating things that read it. That might be more involved than adding a new intrinsic, but could possibly extra code size savings.

Comment 10 John McCall 2018-11-05 15:51:48 PST

(In reply to Vedant Kumar from comment #9)
> Could we specify an outlined unwind target in IR (say, by adding `unwind
> function` syntax to the `invoke` instruction)? For code that needs to branch
> back to non-EH blocks, we might use blockaddress+indirectbr.

That would be a pretty major change to IR, but yeah, if you wanted to (1) outline at the IR level and (2) share an outlined landing pad across functions, you would need something like this.  Such a landing pad would have to be something without edges back into the main function body, though.

> Makes sense. Doing this increases the aggregate size of __gcc_except_tab
> sections across *.o's in the test-suite by 7.1% (or 39.5 KB). That's not too
> bad compared to the aggregate size of __text (41.8 MB). Can we land that
> change as a bug fix until we implement something like
> @llvm.eh.typeid.declare?

The trade-off seems reasonable, but the engineering approach is not.  We should not be landing "short-term" changes that regress general code-gen for the benefit of an experimental pass.  You should just not outline these intrinsic calls, and if you want the benefits of being able to outline them, you should implement the IR features to make that possible.

> Actually, if every function has the same type info -> ID mapping, wouldn't
> it be better to share the mappings? I suppose that would require changing
> the format for __gcc_except_tab and updating things that read it. That might
> be more involved than adding a new intrinsic, but could possibly extra code
> size savings.

I would love to make a new personality function to optimize code size, but the type info table is probably the least important component of that.

Comment 11 Vedant Kumar 2018-11-06 11:20:09 PST

I've disabled extracting calls to eh_typeid_for in r346256 to address the immediate miscompile.

IMO it'd be ideal if we could find a simple way of sharing type info -> ID mappings in gcc_except_tab without regressing its size. Would that be possible if we used DW_EH_PE_absptr as the TType encoding? I'll try it out and report back; if it doesn't pan out, I'll try one of John or Reid's suggestions.

Comment 12 Vedant Kumar 2018-11-06 13:27:13 PST

Haven’t had a chance to experiment yet, but here’s another simple solution with only slightly worse codegen: hoist the call to eh.typeid.for and pass it in.

Comment 13 Vedant Kumar 2019-01-17 11:33:32 PST

*** Bug 39917 has been marked as a duplicate of this bug. ***

Comment 14 Vedant Kumar 2019-01-17 11:41:17 PST

I prototyped a ModulePass which lifts EH type tables into IR, and taught CodeExtractor how to insert intrinsics into extracted functions s.t type tables are shared.

The idea is basically what Heejin described in llvm.org/PR39917, and what John alluded to here: 1) declare the EH type tables in IR, and 2) introduce an intrinsic which allows functions to share type tables. A module pass is needed to identify connected components of functions which all must share the same type tables.

I'll attach the patches here for posterity. Ultimately, what I found is that this does not substantially improve the efficacy of hot/cold splitting. On our stack, it's possible to achieve 98% of the results without actually splitting out landingpads / eh_typeid_for. What's more, the code size growth imposed by shared type tables is greater than the cost of simply treating eh_typeid_for as an input to the extraction region.

In light of this data, I think 'later' is a reasonable resolution for this bug: we can resurrect this idea if extracting landingpads at the IR-level becomes important down the road.

Comment 15 Vedant Kumar 2019-01-17 11:42:52 PST

Created attachment 21347 [details]
Patches to lift EH type tables into IR and share them

Comment 16 Heejin Ahn 2019-01-17 14:08:00 PST

I understand you are not planning to land it now, but anyway I'm curious: Would your patch using ModulePass work when we compile files separately and link them later using linker?

Comment 17 Vedant Kumar 2019-01-17 14:19:16 PST

(In reply to Heejin Ahn from comment #16)
> I understand you are not planning to land it now, but anyway I'm curious:
> Would your patch using ModulePass work when we compile files separately and
> link them later using linker?

Yes, I think so. If eh_typetable_adopt isn't used, then all of the information needed to perform ISel is fully contained within a single function.

If eh_typetable_adopt *is* used, there is a restriction. The calling function must have access to the definition of the function which donates its EH type table. In practice this seems to be the case.