Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make @llvm.eh.typeid.for outlining-friendly #38893

Open
vedantk opened this issue Nov 2, 2018 · 22 comments
Open

Make @llvm.eh.typeid.for outlining-friendly #38893

vedantk opened this issue Nov 2, 2018 · 22 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla ipo Interprocedural optimizations

Comments

@vedantk
Copy link
Collaborator

vedantk commented Nov 2, 2018

Bugzilla Link 39545
Resolution LATER
Resolved on Jan 17, 2019 14:19
Version trunk
OS All
Attachments IR with hot/cold splitting, IR without hot/cold splitting
CC @aheejin,@dwblaikie,@fhahn,@francisvm,@hiraditya,@ornata,@rjmccall,@rnk,@sebpop

Extended Description

When running this test program with -hot-cold-split=true (with https://reviews.llvm.org/D53887 applied), we’re supposed to see this output:

Caught exception: 0: base class
Caught exception: 1: base class
Caught exception: 2: base class
Caught exception: 3: base class
Caught exception: 4: base class
Caught exception: 5: base class
Caught exception: 6: base class
Caught exception: 7: base class
Caught exception: 8: base class
Caught exception: 9: base class
Caught exception: 0: derived class
Caught exception: 1: derived class
Caught exception: 2: derived class
Caught exception: 3: derived class
Caught exception: 4: derived class
Caught exception: 5: derived class
Caught exception: 6: derived class
Caught exception: 7: derived class
Caught exception: 8: derived class
Caught exception: 9: derived class
Caught exception: 0: base class
Caught exception: std::exception
Caught exception: std::exception
Caught unknown exception
Caught unknown exception

Instead, we see this:

Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught unknown exception
Caught exception: 0: derived class
Caught exception: 1: derived class
Caught exception: 2: derived class
Caught exception: 3: derived class
Caught exception: 4: derived class
Caught exception: 5: derived class
Caught exception: 6: derived class
Caught exception: 7: derived class
Caught exception: 8: derived class
Caught exception: 9: derived class
Caught unknown exception
Caught exception: std::exception
Caught exception: std::exception
<segfault>

Here’s where we crash:

Regression-C++-class_hierarchy`main.cold.2:
0x100001d47 <+0>: pushq %rbp
0x100001d48 <+1>: movq %rsp, %rbp
0x100001d4b <+4>: pushq %rbx
0x100001d4c <+5>: pushq %rax
0x100001d4d <+6>: cmpl $0x1, %edi
0x100001d50 <+9>: jne 0x100001d6c ; <+37>
0x100001d52 <+11>: movq %rsi, %rdi
0x100001d55 <+14>: callq 0x100001dcc ; symbol stub for: __cxa_begin_catch
0x100001d5a <+19>: movb 0x8(%rax), %cl
0x100001d5d <+22>: addb $0x30, %cl
0x100001d60 <+25>: movq 0x10(%rax), %rdx
-> 0x100001d64 <+29>: movb %cl, (%rdx)

In the outlined catch path, we pass 0x101 (why?) to __cxa_begin_catch:

(lldb) reg read $rdi
rdi = 0x0000000000000101

__cxa_begin_catch then returns a pointer to:

(lldb) x/8 $rax
0x100202e60: 0x00001f04 0x00000001 0x00000000 0x00000000
0x100202e70: 0x00000000 0x00000000 0x00000000 0x00000000

Clearly [$rax+16] is null, and we crash when we attempt to dereference that. But something looks like it's gone wrong by the point __cxa_begin_catch is called.

Looking at the IR before and after outlining, the issue isn't immediately jumping out at me. I've filed this because I'm worried this isn't just an issue with D53887, i.e. that there might be a more general problem with how CodeExtractor handles landingpads. I've attached the IR here.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 2, 2018

assigned to @vedantk

@hiraditya
Copy link
Collaborator

Looking at the IR I think Code Extractor (CE) isn't handling Multiple Exit Regions correctly. Do we have examples/testcases for ME regions which work? How does CE handle branches outside the region, for e.g, does it insert return statements properly? Also, how about the side-effects (stores etc.); for ME the side-effects also flow from multiple paths and CE may not be handling those correctly.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 4, 2018

Looking at the IR I think Code Extractor (CE) isn't handling Multiple Exit
Regions correctly.

I missed this. What do you think looks suspect?

Do we have examples/testcases for ME regions which work?

Here's one: https://github.com/llvm-mirror/llvm/blob/master/test/Transforms/HotColdSplit/multiple-exits.ll

How does CE handle branches outside the region, for e.g, does it insert
return statements properly?

Skimming emitCallAndSwitchStatement, it looks like CE finds all exits which are outside of the to-be-extracted region and creates stub blocks for them in the outlined function. Branches are rewritten to jump to the right exit stub, and the exit stub returns a number which the caller can switch on. There's some logic to simplify/eliminate the switch in simple cases.

Also, how about the side-effects (stores etc.);
for ME the side-effects also flow from multiple paths and CE may not be
handling those correctly.

I'm not sure whether there's an issue here, but fwiw, I'm seeing the incorrect results from single-exit outlined regions containing __cxa_throw.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 5, 2018

I'm starting to suspect that the underlying bug here is in the lowering of @​llvm.eh.typeid.for.

In SelectionDAG, we look up type id's within a MachineFunction:

  case Intrinsic::eh_typeid_for: {
    // Find the type id for the given typeinfo.
    GlobalValue *GV = ExtractTypeInfo(I.getArgOperand(0));
    unsigned TypeID = DAG.getMachineFunction().getTypeIDFor(GV);
    Res = DAG.getConstant(TypeID, sdl, MVT::i32);
    setValue(&I, Res);
    return nullptr;
  }

And MachineFunction gives us the next available ID unless it's seen the type info already:

unsigned MachineFunction::getTypeIDFor(const GlobalValue *TI) {
  for (unsigned i = 0, N = TypeInfos.size(); i != N; ++i)
    if (TypeInfos[i] == TI) return i + 1;

  TypeInfos.push_back(TI);
  return TypeInfos.size();
}

But, I think this breaks down when you enable outlining. We're passing a type id retrieved from a landingpad in a parent function, to an outlined thunk:

  %3 = tail call i32 @&#8203;llvm.eh.typeid.for(i8* bitcast ({ i8*, i8* }* @&#8203;_ZTI3foo to i8*)) #&#8203;11
  ...
  call void @&#8203;main.cold.1(i32 %2, i8* %1, i32 %i.056) #&#8203;8

The outlined thunk tries to compare a passed-in type id to one synthesized within the outlined code:

define internal void @&#8203;main.cold.1(i32, i8*, i32 %i.056) #&#8203;7 personality i8* bitcast (i32 (...)* @&#8203;__gxx_personality_v0 to i8*) {
newFuncRoot:
  %2 = tail call i32 @&#8203;llvm.eh.typeid.for(i8* bitcast (i8** @&#8203;_ZTIi to i8*)) #&#8203;11
  %matches1 = icmp eq i32 %0, %2
  ...

At which point it's comparing the wrong ids, because the outlined thunk's MachineFunction has an incompatible TypeInfos list. I think the fix here would be to ensure that an outlined thunk has the same type info -> ID map as its parent.

Full disclosure -- I've been looking at a failure in SingleSource/Regression/C++/EH/throw_rethrow_test.cpp to debug this. It's the only other test-suite failure with D53887 applied, and it's simpler to understand.

The reference output is:

0: 1
1: 1
2: 1
3: 2
4: 2
5: 2
6: 3
7: 3
8: 3
exit 0

With D53887, the actual output is:

0: 2
1: 2
2: 2
3: 1
4: 1
5: 1
6: 3
7: 3
8: 3
exit 0

@rnk
Copy link
Collaborator

rnk commented Nov 5, 2018

I'm starting to suspect that the underlying bug here is in the lowering of
@​llvm.eh.typeid.for.

In SelectionDAG, we look up type id's within a MachineFunction:

...

And MachineFunction gives us the next available ID unless it's seen the type
info already:
...
But, I think this breaks down when you enable outlining. We're passing a
type id retrieved from a landingpad in a parent function, to an outlined
thunk:

Yes, as designed, landingpad + llvm.eh.typeid.for aren't going to work in the presence of outlining. They were designed with the goal of making inlining easy, and I guess outlining was out of scope. :) llvm.eh.typeid.for produces a selector index into a table in the LSDA of the function it is contained in. Moving it to a different function will give you an index into a different catch info table.

John, any ideas for how to address this?

Here's my (probably bad) idea. Make a new intrinsic, llvm.eh.outlined.typeid.for, that takes two parameters: the function whose LSDA we want to get the selector index from, and the typeinfo whose index we want. So:
call i32 @​llvm.eh.outlined.typeid.for(i8* bitcast ... @​main, i8* bitcast ... @​_ZTI3foo)

Right now we lower these intrinsics to plain immediate i32s. This new intrinsic would lower to materialize an MCSymbol into an i32, kind of like we do for llvm.localescape / llvm.localrecover, but more efficient since we promise it's an i32. We'd try to generate:
MOV32ri $.Leh_selector_main_ZTI3foo, %EDX

And when we compile main, we'd define the MCSymbol so the assembler can fix it up for us later:
.Leh_selector_main_ZTI3foo = 3 # or .set depending on your assembly dialect

At least, that was my preferred way to pass information from one MachineFunction compilation to the next: by abusing MC, since the assembler is the last whole-module thing that LLVM does.

Another idea would be to emit a table of i32s on the side that does translation from the selectors of one function to the selectors of any other. We'd emit them at the end of the module after we've codegened all instances of this intrinsic.

@rjmccall
Copy link
Contributor

rjmccall commented Nov 5, 2018

This is IR-level outlining? And you're outlining part of the landing pad, but not the original call?

So, from a high level, the best way to "outline" this — to achieve better code density/locality of the non-EH code — is almost certainly just to emit all the code only reachable via landing pads in a separate top-level object which can be placed far away from the normal function. That would, of course, have to be done in the code generator, not as an IR outliner, so it's not directly applicable to your question here.

I see two reasonable approaches for this, both of which require extra work during outlining:

  • Adopt Reid's @​llvm.eh.outlined.typeid.for idea, which I think would require typeid assignments to be recorded globally.
  • Add some sort of @​llvm.eh.typeid.declare intrinsic or attribute which assigns specific integer values to specific types and then replaces @​llvm.eh.typeid.for calls with the assigned constant.

The second idea is probably cleaner. The big disadvantage is that it's totally incompatible with further inlining. The advantage is that the assignment can happen in its own pass, which would allow ordinary intraprocedural optimizations to be run after it, such as switch formation. The outliner then only has to (1) copy the assignment into the outlined function and (2) refuse to outline @​llvm.eh.typeid.for calls if for some reason they still exist.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 5, 2018

This is IR-level outlining? And you're outlining part of the landing pad,
but not the original call?

¥es.

So, from a high level, the best way to "outline" this — to achieve better
code density/locality of the non-EH code — is almost certainly just to emit
all the code only reachable via landing pads in a separate top-level object
which can be placed far away from the normal function. That would, of
course, have to be done in the code generator, not as an IR outliner, so
it's not directly applicable to your question here.

Why is this out of scope for an IR outliner? With a small linker change, it's possible to order outlined functions towards the end of the __text section.

I see two reasonable approaches for this, both of which require extra work
during outlining:

  • Adopt Reid's @​llvm.eh.outlined.typeid.for idea, which I think would
    require typeid assignments to be recorded globally.
  • Add some sort of @​llvm.eh.typeid.declare intrinsic or attribute which
    assigns specific integer values to specific types and then replaces
    @​llvm.eh.typeid.for calls with the assigned constant.

Thanks for these suggestions!

I tried a different approach, and I'm wondering now whether it's too naive. I made the type info -> type ID mapping global (i.e. shared via MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the EH tables too much?

@rjmccall
Copy link
Contributor

rjmccall commented Nov 5, 2018

So, from a high level, the best way to "outline" this — to achieve better
code density/locality of the non-EH code — is almost certainly just to emit
all the code only reachable via landing pads in a separate top-level object
which can be placed far away from the normal function. That would, of
course, have to be done in the code generator, not as an IR outliner, so
it's not directly applicable to your question here.

Why is this out of scope for an IR outliner? With a small linker change,
it's possible to order outlined functions towards the end of the __text
section.

I was suggesting that the landing pads themselves be moved into the other "function", so that the EH table would tell the unwinder to just resume directly in that function; that's not structurally expressible in IR. Also, EH code sometimes needs to be able to branch back to blocks that are reachable from non-EH blocks, e.g. if there's a catch.

I see two reasonable approaches for this, both of which require extra work
during outlining:

  • Adopt Reid's @​llvm.eh.outlined.typeid.for idea, which I think would
    require typeid assignments to be recorded globally.
  • Add some sort of @​llvm.eh.typeid.declare intrinsic or attribute which
    assigns specific integer values to specific types and then replaces
    @​llvm.eh.typeid.for calls with the assigned constant.

Thanks for these suggestions!

I tried a different approach, and I'm wondering now whether it's too naive.
I made the type info -> type ID mapping global (i.e. shared via
MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the
EH tables too much?

Certainly in theory, yeah. A specific function's EH table would only need to contain entries up to the last type it actually uses, but you could end up with a function that has to include dozens of unnecessary entries just because they appear elsewhere in the module.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 5, 2018

So, from a high level, the best way to "outline" this — to achieve better
code density/locality of the non-EH code — is almost certainly just to emit
all the code only reachable via landing pads in a separate top-level object
which can be placed far away from the normal function. That would, of
course, have to be done in the code generator, not as an IR outliner, so
it's not directly applicable to your question here.

Why is this out of scope for an IR outliner? With a small linker change,
it's possible to order outlined functions towards the end of the __text
section.

I was suggesting that the landing pads themselves be moved into the other
"function", so that the EH table would tell the unwinder to just resume
directly in that function; that's not structurally expressible in IR. Also,
EH code sometimes needs to be able to branch back to blocks that are
reachable from non-EH blocks, e.g. if there's a catch.

That's a good point. The IR outliner gets some mileage out of extracting blocks dominated by a landing pad, but it could do more.

Could we specify an outlined unwind target in IR (say, by adding unwind function syntax to the invoke instruction)? For code that needs to branch back to non-EH blocks, we might use blockaddress+indirectbr.

I see two reasonable approaches for this, both of which require extra work
during outlining:

  • Adopt Reid's @​llvm.eh.outlined.typeid.for idea, which I think would
    require typeid assignments to be recorded globally.
  • Add some sort of @​llvm.eh.typeid.declare intrinsic or attribute which
    assigns specific integer values to specific types and then replaces
    @​llvm.eh.typeid.for calls with the assigned constant.

Thanks for these suggestions!

I tried a different approach, and I'm wondering now whether it's too naive.
I made the type info -> type ID mapping global (i.e. shared via
MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the
EH tables too much?

Certainly in theory, yeah. A specific function's EH table would only need
to contain entries up to the last type it actually uses, but you could end
up with a function that has to include dozens of unnecessary entries just
because they appear elsewhere in the module.

Makes sense. Doing this increases the aggregate size of __gcc_except_tab sections across *.o's in the test-suite by 7.1% (or 39.5 KB). That's not too bad compared to the aggregate size of __text (41.8 MB). Can we land that change as a bug fix until we implement something like @​llvm.eh.typeid.declare?

Actually, if every function has the same type info -> ID mapping, wouldn't it be better to share the mappings? I suppose that would require changing the format for __gcc_except_tab and updating things that read it. That might be more involved than adding a new intrinsic, but could possibly extra code size savings.

@rjmccall
Copy link
Contributor

rjmccall commented Nov 5, 2018

Could we specify an outlined unwind target in IR (say, by adding unwind function syntax to the invoke instruction)? For code that needs to branch
back to non-EH blocks, we might use blockaddress+indirectbr.

That would be a pretty major change to IR, but yeah, if you wanted to (1) outline at the IR level and (2) share an outlined landing pad across functions, you would need something like this. Such a landing pad would have to be something without edges back into the main function body, though.

Makes sense. Doing this increases the aggregate size of __gcc_except_tab
sections across *.o's in the test-suite by 7.1% (or 39.5 KB). That's not too
bad compared to the aggregate size of __text (41.8 MB). Can we land that
change as a bug fix until we implement something like
@​llvm.eh.typeid.declare?

The trade-off seems reasonable, but the engineering approach is not. We should not be landing "short-term" changes that regress general code-gen for the benefit of an experimental pass. You should just not outline these intrinsic calls, and if you want the benefits of being able to outline them, you should implement the IR features to make that possible.

Actually, if every function has the same type info -> ID mapping, wouldn't
it be better to share the mappings? I suppose that would require changing
the format for __gcc_except_tab and updating things that read it. That might
be more involved than adding a new intrinsic, but could possibly extra code
size savings.

I would love to make a new personality function to optimize code size, but the type info table is probably the least important component of that.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 6, 2018

I've disabled extracting calls to eh_typeid_for in r346256 to address the immediate miscompile.

IMO it'd be ideal if we could find a simple way of sharing type info -> ID mappings in gcc_except_tab without regressing its size. Would that be possible if we used DW_EH_PE_absptr as the TType encoding? I'll try it out and report back; if it doesn't pan out, I'll try one of John or Reid's suggestions.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 6, 2018

Haven’t had a chance to experiment yet, but here’s another simple solution with only slightly worse codegen: hoist the call to eh.typeid.for and pass it in.

@vedantk
Copy link
Collaborator Author

vedantk commented Jan 17, 2019

*** Bug llvm/llvm-bugzilla-archive#39917 has been marked as a duplicate of this bug. ***

@vedantk
Copy link
Collaborator Author

vedantk commented Jan 17, 2019

I prototyped a ModulePass which lifts EH type tables into IR, and taught CodeExtractor how to insert intrinsics into extracted functions s.t type tables are shared.

The idea is basically what Heejin described in llvm/llvm-bugzilla-archive#39917 , and what John alluded to here: 1) declare the EH type tables in IR, and 2) introduce an intrinsic which allows functions to share type tables. A module pass is needed to identify connected components of functions which all must share the same type tables.

I'll attach the patches here for posterity. Ultimately, what I found is that this does not substantially improve the efficacy of hot/cold splitting. On our stack, it's possible to achieve 98% of the results without actually splitting out landingpads / eh_typeid_for. What's more, the code size growth imposed by shared type tables is greater than the cost of simply treating eh_typeid_for as an input to the extraction region.

In light of this data, I think 'later' is a reasonable resolution for this bug: we can resurrect this idea if extracting landingpads at the IR-level becomes important down the road.

@vedantk
Copy link
Collaborator Author

vedantk commented Jan 17, 2019

@aheejin
Copy link
Member

aheejin commented Jan 17, 2019

I understand you are not planning to land it now, but anyway I'm curious: Would your patch using ModulePass work when we compile files separately and link them later using linker?

@vedantk
Copy link
Collaborator Author

vedantk commented Jan 17, 2019

I understand you are not planning to land it now, but anyway I'm curious:
Would your patch using ModulePass work when we compile files separately and
link them later using linker?

Yes, I think so. If eh_typetable_adopt isn't used, then all of the information needed to perform ISel is fully contained within a single function.

If eh_typetable_adopt is used, there is a restriction. The calling function must have access to the definition of the function which donates its EH type table. In practice this seems to be the case.

@vedantk
Copy link
Collaborator Author

vedantk commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#39917

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@hiraditya hiraditya changed the title Make @&#8203;llvm.eh.typeid.for outlining-friendly Make @llvm.eh.typeid.for outlining-friendly Jul 31, 2023
@hiraditya
Copy link
Collaborator

Duplicate of: #39264

@EugeneZelenko
Copy link
Contributor

Other issue is marked as duplicate of this one.

@EugeneZelenko EugeneZelenko reopened this Jul 31, 2023
@hiraditya hiraditya assigned hiraditya and unassigned vedantk Jul 31, 2023
@hiraditya
Copy link
Collaborator

Adding attachments from: #39264 as this is an active bug.

no-outlining.ll.txt
with-outlining.bad.S.txt
with-outlining.good.S.txt
with-outlining.ll.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla ipo Interprocedural optimizations
Projects
None yet
Development

No branches or pull requests

6 participants