New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make @llvm.eh.typeid.for outlining-friendly #38893
Comments
assigned to @vedantk |
Looking at the IR I think Code Extractor (CE) isn't handling Multiple Exit Regions correctly. Do we have examples/testcases for ME regions which work? How does CE handle branches outside the region, for e.g, does it insert return statements properly? Also, how about the side-effects (stores etc.); for ME the side-effects also flow from multiple paths and CE may not be handling those correctly. |
I missed this. What do you think looks suspect?
Here's one: https://github.com/llvm-mirror/llvm/blob/master/test/Transforms/HotColdSplit/multiple-exits.ll
Skimming emitCallAndSwitchStatement, it looks like CE finds all exits which are outside of the to-be-extracted region and creates stub blocks for them in the outlined function. Branches are rewritten to jump to the right exit stub, and the exit stub returns a number which the caller can switch on. There's some logic to simplify/eliminate the switch in simple cases.
I'm not sure whether there's an issue here, but fwiw, I'm seeing the incorrect results from single-exit outlined regions containing __cxa_throw. |
I'm starting to suspect that the underlying bug here is in the lowering of @llvm.eh.typeid.for. In SelectionDAG, we look up type id's within a MachineFunction:
And MachineFunction gives us the next available ID unless it's seen the type info already:
But, I think this breaks down when you enable outlining. We're passing a type id retrieved from a landingpad in a parent function, to an outlined thunk:
The outlined thunk tries to compare a passed-in type id to one synthesized within the outlined code:
At which point it's comparing the wrong ids, because the outlined thunk's MachineFunction has an incompatible TypeInfos list. I think the fix here would be to ensure that an outlined thunk has the same type info -> ID map as its parent. Full disclosure -- I've been looking at a failure in SingleSource/Regression/C++/EH/throw_rethrow_test.cpp to debug this. It's the only other test-suite failure with D53887 applied, and it's simpler to understand. The reference output is:
With D53887, the actual output is:
|
Yes, as designed, landingpad + llvm.eh.typeid.for aren't going to work in the presence of outlining. They were designed with the goal of making inlining easy, and I guess outlining was out of scope. :) llvm.eh.typeid.for produces a selector index into a table in the LSDA of the function it is contained in. Moving it to a different function will give you an index into a different catch info table. John, any ideas for how to address this? Here's my (probably bad) idea. Make a new intrinsic, llvm.eh.outlined.typeid.for, that takes two parameters: the function whose LSDA we want to get the selector index from, and the typeinfo whose index we want. So: Right now we lower these intrinsics to plain immediate i32s. This new intrinsic would lower to materialize an MCSymbol into an i32, kind of like we do for llvm.localescape / llvm.localrecover, but more efficient since we promise it's an i32. We'd try to generate: And when we compile main, we'd define the MCSymbol so the assembler can fix it up for us later: At least, that was my preferred way to pass information from one MachineFunction compilation to the next: by abusing MC, since the assembler is the last whole-module thing that LLVM does. Another idea would be to emit a table of i32s on the side that does translation from the selectors of one function to the selectors of any other. We'd emit them at the end of the module after we've codegened all instances of this intrinsic. |
This is IR-level outlining? And you're outlining part of the landing pad, but not the original call? So, from a high level, the best way to "outline" this — to achieve better code density/locality of the non-EH code — is almost certainly just to emit all the code only reachable via landing pads in a separate top-level object which can be placed far away from the normal function. That would, of course, have to be done in the code generator, not as an IR outliner, so it's not directly applicable to your question here. I see two reasonable approaches for this, both of which require extra work during outlining:
The second idea is probably cleaner. The big disadvantage is that it's totally incompatible with further inlining. The advantage is that the assignment can happen in its own pass, which would allow ordinary intraprocedural optimizations to be run after it, such as switch formation. The outliner then only has to (1) copy the assignment into the outlined function and (2) refuse to outline @llvm.eh.typeid.for calls if for some reason they still exist. |
¥es.
Why is this out of scope for an IR outliner? With a small linker change, it's possible to order outlined functions towards the end of the __text section.
Thanks for these suggestions! I tried a different approach, and I'm wondering now whether it's too naive. I made the type info -> type ID mapping global (i.e. shared via MachineModuleInfo). This fixes the bug, but I wonder -- would that bloat the EH tables too much? |
I was suggesting that the landing pads themselves be moved into the other "function", so that the EH table would tell the unwinder to just resume directly in that function; that's not structurally expressible in IR. Also, EH code sometimes needs to be able to branch back to blocks that are reachable from non-EH blocks, e.g. if there's a catch.
Certainly in theory, yeah. A specific function's EH table would only need to contain entries up to the last type it actually uses, but you could end up with a function that has to include dozens of unnecessary entries just because they appear elsewhere in the module. |
That's a good point. The IR outliner gets some mileage out of extracting blocks dominated by a landing pad, but it could do more. Could we specify an outlined unwind target in IR (say, by adding
Makes sense. Doing this increases the aggregate size of __gcc_except_tab sections across *.o's in the test-suite by 7.1% (or 39.5 KB). That's not too bad compared to the aggregate size of __text (41.8 MB). Can we land that change as a bug fix until we implement something like @llvm.eh.typeid.declare? Actually, if every function has the same type info -> ID mapping, wouldn't it be better to share the mappings? I suppose that would require changing the format for __gcc_except_tab and updating things that read it. That might be more involved than adding a new intrinsic, but could possibly extra code size savings. |
That would be a pretty major change to IR, but yeah, if you wanted to (1) outline at the IR level and (2) share an outlined landing pad across functions, you would need something like this. Such a landing pad would have to be something without edges back into the main function body, though.
The trade-off seems reasonable, but the engineering approach is not. We should not be landing "short-term" changes that regress general code-gen for the benefit of an experimental pass. You should just not outline these intrinsic calls, and if you want the benefits of being able to outline them, you should implement the IR features to make that possible.
I would love to make a new personality function to optimize code size, but the type info table is probably the least important component of that. |
I've disabled extracting calls to eh_typeid_for in r346256 to address the immediate miscompile. IMO it'd be ideal if we could find a simple way of sharing type info -> ID mappings in gcc_except_tab without regressing its size. Would that be possible if we used DW_EH_PE_absptr as the TType encoding? I'll try it out and report back; if it doesn't pan out, I'll try one of John or Reid's suggestions. |
Haven’t had a chance to experiment yet, but here’s another simple solution with only slightly worse codegen: hoist the call to eh.typeid.for and pass it in. |
*** Bug llvm/llvm-bugzilla-archive#39917 has been marked as a duplicate of this bug. *** |
I prototyped a ModulePass which lifts EH type tables into IR, and taught CodeExtractor how to insert intrinsics into extracted functions s.t type tables are shared. The idea is basically what Heejin described in llvm/llvm-bugzilla-archive#39917 , and what John alluded to here: 1) declare the EH type tables in IR, and 2) introduce an intrinsic which allows functions to share type tables. A module pass is needed to identify connected components of functions which all must share the same type tables. I'll attach the patches here for posterity. Ultimately, what I found is that this does not substantially improve the efficacy of hot/cold splitting. On our stack, it's possible to achieve 98% of the results without actually splitting out landingpads / eh_typeid_for. What's more, the code size growth imposed by shared type tables is greater than the cost of simply treating eh_typeid_for as an input to the extraction region. In light of this data, I think 'later' is a reasonable resolution for this bug: we can resurrect this idea if extracting landingpads at the IR-level becomes important down the road. |
I understand you are not planning to land it now, but anyway I'm curious: Would your patch using ModulePass work when we compile files separately and link them later using linker? |
Yes, I think so. If eh_typetable_adopt isn't used, then all of the information needed to perform ISel is fully contained within a single function. If eh_typetable_adopt is used, there is a restriction. The calling function must have access to the definition of the function which donates its EH type table. In practice this seems to be the case. |
mentioned in issue llvm/llvm-bugzilla-archive#39917 |
Duplicate of: #39264 |
Other issue is marked as duplicate of this one. |
Adding attachments from: #39264 as this is an active bug. no-outlining.ll.txt |
Extended Description
When running this test program with
-hot-cold-split=true
(with https://reviews.llvm.org/D53887 applied), we’re supposed to see this output:Instead, we see this:
Here’s where we crash:
Regression-C++-class_hierarchy`main.cold.2:
0x100001d47 <+0>: pushq %rbp
0x100001d48 <+1>: movq %rsp, %rbp
0x100001d4b <+4>: pushq %rbx
0x100001d4c <+5>: pushq %rax
0x100001d4d <+6>: cmpl $0x1, %edi
0x100001d50 <+9>: jne 0x100001d6c ; <+37>
0x100001d52 <+11>: movq %rsi, %rdi
0x100001d55 <+14>: callq 0x100001dcc ; symbol stub for: __cxa_begin_catch
0x100001d5a <+19>: movb 0x8(%rax), %cl
0x100001d5d <+22>: addb $0x30, %cl
0x100001d60 <+25>: movq 0x10(%rax), %rdx
-> 0x100001d64 <+29>: movb %cl, (%rdx)
In the outlined catch path, we pass 0x101 (why?) to __cxa_begin_catch:
(lldb) reg read $rdi
rdi = 0x0000000000000101
__cxa_begin_catch then returns a pointer to:
(lldb) x/8 $rax
0x100202e60: 0x00001f04 0x00000001 0x00000000 0x00000000
0x100202e70: 0x00000000 0x00000000 0x00000000 0x00000000
Clearly [$rax+16] is null, and we crash when we attempt to dereference that. But something looks like it's gone wrong by the point __cxa_begin_catch is called.
Looking at the IR before and after outlining, the issue isn't immediately jumping out at me. I've filed this because I'm worried this isn't just an issue with D53887, i.e. that there might be a more general problem with how CodeExtractor handles landingpads. I've attached the IR here.
The text was updated successfully, but these errors were encountered: